1. I see Operation/Transaction fees 10000's or millions for B2 BB or google cloud for example. How does CB handle those? Like how many "operations" does CB perform and is there a place I can find that out?
2. A hypothetical... Lets say my ex deletes my digitized record collection in my local selected folder for backups. The folder is nested in a complex way. I dont listen to them often. How would I know they were deleted? I get if I dont allow purging then well I would find them gone and simple restore them from a previous point. But what if I do purge my backups. Would CB purge those older sets that have my digitized record collection? Lets even say I do it by accident and dont realize it. GFS wouldnt protect against this would it? I dont know if this makes sense.
The best way to determine transaction fees for those cloud service providers that charge such fees is to look at your monthly bills after using our product. There's no real way to estimate the number of operations using our software since there are many variables to consider. But usually those fees are pretty low, so I wouldn't expect them to consume any significant part of your monthly cost. However if you prefer to not have to deal with those fees, there are options out there with different cloud service providers that do not charge any data egress or API charges that you can look into. But I think you'll find even with those charges the Backblaze B2 fees are extremely reasonable.
Regarding your second question. If you're using the new backup format for your file backups, and you're using GFS retention, then you'll maintain backup sets for the total duration of your GFS settings. So if you keep annual backups for 3 years then your oldest backup that contains all of your files will not be removed for 3 years. And presumably, you'd realize by then that something needs to be restored. Having said that, you may be better off using the legacy backup format for your file backup, as that format has version-based retention and there's an option there to always keep the last version of every file even if it's past the the retention period and an option to keep the file backup if the original is deleted. Since that format is not backup-set based, only the first time do we back up everything and after that we only back up new and changed files (incremental forever). And if you use the proper retention settings mentioned above, you'll never have to worry about old files that were removed locally on your computer without your knowledge ever disappearing from backup storage. So my recommendation would be to move over to the legacy file backup format to solve your particular needs.
Question 1:
That's what I was thinnking. Could you give me a ballpark typical scenario? Like lets say i have 500 GB and 100,000 files total. How many transactions would CB do roughly? Uploading to google cloud lets say. Fill in whatever other variables you like. Maybe it's too complex idk.
Question 2:
Yes that's what I thought too. The new backup format is the one that supports client side dedup? I def like that feature. Why doesnt the new format also support file version retention? Will it ever?
Sorry, but I can't estimate the various API calls and related charges for the clouds. You may be able to do something like use the number of files and the chunk size or part sizes we call it in Options for the legacy format to estimate and then use whatever calculator the cloud storage provider provides to help estimate those costs, but as I said earlier, I've never seen API charges be expensive.
The new backup format uses backup generations, and those backup generations are kept as whole sets. It's not version based because GFS demands that an entire backup set is kept. So it just works differently. But both backup formats are staying around and you should use the one that works best for you.
For file backups, use the one that is best. For image and virtual machine backups, the new backup format is far superior. As far as client side deduplication, you're not going to benefit too much if you're talking documents that do not change much, like digital music. But even so, the legacy file backup format supports block level backups if enabled via scheduling. And those block level backups can back up only the changes within larger files so the whole file doesn't have to be backed up each time it changes. But that's generally something that's really helpful for very large files, like an Outlook PST file which can grow to many gigabytes in size. For something like music files that are generally in the megabytes, that feature won't provide much utility, and it's much better to just to upload the file if it changes for some reason in full.
Well there is a definite difference between the amount of files in the new format vs the legacy format. The latter i dont think does any archiving at all from what Ive seen. I mean taking all the files and merging them into one. One would think there would be far fewer transactions. Google cloud price is per 10,000 operations. Would each grp of 10,000 files have its own operation fee? Using the legacy backup?
they're likely going to be multiple IO operations per file, probably based on the chunk size you're uploading. If the chunk size is large enough and most of your files are under that size, then you may get two IO operations per file. But again, I wouldn't spend too much time thinking about IO operation cost, as it's likely to be very low. Just find the Google calculator that they have posted somewhere and type in a number that's something like 10 times the number of files you're uploading just to get a very rough high level estimate about what those operations might cost for the initial backup. After that, they should be relatively low as file changes should be minimal compared to the total number of files already backed up.
But you're correct that the legacy file backup format uses a one-to-one ratio of file to object in the cloud. Whereas the new format groups many files together into archives. It's not a single archive, although it can be if the number files backed up is small or sizes are small. But there is a lot of grouping going on which will minimize IO and the associated latency when backing up to the cloud.
The new format does generations. For 300 GB (compressed), that would fill storage space fast with each new generation. If you regularly do fulls which is recommended. Synthetic would be a must for me. Im not paying for spikes in cloud space I only want to pay for what I need and that's it.
BTW thank you for the answers I appreciate it. :smile:
That's correct. Each generation is a full copy, even if it's created using a synthetic full operation. That's needed for proper GFS management and Immutability / Object-Lock if the customer enables that feature. We do not yet support Google Cloud for synthetic full operations.