This is probably more of a wish-list item than an actual request but here goes.
Something I would love to see in Cloudberry is a better/expanded set of retention settings that allow for more control over backups for extended timeframes. For example, I currently have a client that is required to maintain a backup of their data for a period of 7 years. There are files in this backup that change daily. Currently, the only setting I have available is to set the "delete after" setting to 7 years. This means I now have 2555 versions of a single file backed up over that time span which will absolutely eat me alive in storage costs.
A better way to handle this would be to be able to set up rules for backup retention that will remove unnecessary versions from a backup while still keeping various waypoints available should they be needed. For the example above, I would set things up like this:
Rule 1: Keep one version of the file for each of the last 5 days.
Rule 2: After this, keep one version of the file for each of the following 3 weeks.
Rule 3: After this, keep one version of the file for each of the following 11 months.
Rule 4: After this, keep one version of the file for each of the following 6 years.
So now, instead of having over 2500 backups, I have a grand total of 25 which, if listed as dates look like this (hidden):
Reveal
1/23/2019
1/22/2019
1/21/2019
1/20/2019
1/19/2019
12/28/2018
12/21/2018
12/14/2018
November 2018 (last remaining version of month)
October 2018
September 2018
August 2018
July 2018
June 2018
May 2018
April 2018
March 2018
February 2018
January 2018
2017 (last remaining version of year)
2016
2015
2014
2013
2012
A system like this would definitely help keep the storage under control while also providing the capabilities for longer periods of backup retention. Again, I know this is probably more wishful thinking than anything but it never hurts to throw ideas out there.
Well string me silly that's great news. Any idea on when it might be usable? I noticed the article mentioned it's currently AWS-only. Any likelihood of it being available for B2/Azure/etc in the future?
In the absence of a GFS-type of long-term retention, we did something different.
We utilize Google Nearline for our operational Cloud backup/recovery where we keep 90 days of versions. For those customers who require longer-term retention, we utilize Amazon SIA and run monthly full backups on the first of the month. We then use a Lifecycle policy to move the monthly backups to Glacier after 30 days. This lowers the cost significantly in that Glacier is only $0.004/GB/Mo (~.05/GB/Yr) compared to SIA @ $0.0125/GB - $0.15/GB/yr.
The problem with using one back-end Storage Account for operational and long term retention is that you cannot transition files to Glacier as it would result in elongated restore times. And what was hard to understand in the beginning is that a FULL backup is not like it was in the old days of tape, ie. a fresh copy of everything. As the Novus Computer writer knows, a Full in Cloudberry backup is only a full backup of anything that has had a block incremental backup since the last "FULL". So files that might never change, but are essential to daily operation, would wind up in Glacier - with a 4-5 hour restore delay.
So you would have needed to leave all long term retention files in SIA or Nearline meaning we would have had to keep all daily versions of the files for 7 years.
So we did the math and it costs a LOT less money to use the Nearline and Glacier model than to keep all daily incrementals in SIA . That calculation even included the "Migrate to Glacier" fees.
And even with a GFS model that mimics the "keep only monthly fulls" that we are using, it is still less expensive to utilize Glacier and Nearline than only Amazon SIA.
Our calculation shows that if you start with 200GB and add 5GB per month of version data (assuming a GFS once per month model), the 200GB turns into 615GB in 7 years. The total cost of that over 7 years in SIA comes out to only $427 (our approach was $356). An average of a little over $5.00 per month over 7 years.
And with all that being said - How much would you charge your client for 200GB of data under protection? And how much more for the extended 7-year retention?
Even if we had to keep daily incremental for 7 years in SIA or Nearline, the 7-year total cost of probably $600+, would represent a small percentage of what we charge customers for our Backup Service with a long term retention option. Storage costs are "peanuts" compared to the cost of managing the backups - doing the upgrades/migrations/conversions,monitoring the daily results, dealing with the overdue/failures, etc.
If anyone would like more details on this approach and/or the calculations I would be happy to share them.
- Cloud Steve
So the GFS model makes a lot of sense for customers who have a 7 year retention requirement where we could eliminate eleven out of the twelve months after a year or so. Right now though most of our clients are satisfied with 90 days retention - a few want one year and one (the one I was referring to above) wanted 7 years. The thing is we have been doing our model for a few years now - it will be interesting to see how much work it is to migrate to a GFS model for existing backups.