Cloudberry Explorer Pro to AWS Glacier on a 50MBps ISP connection in Singapore, but storing in Oregon. Average file size is 5000MB (4k video files). No compression or encryption. Dell laptop workstation has 32GB RAM.
I am currently running 5 threads at 256MB chunk size, which yields 2-15MBps upload speed. That varies widely, but seems to be the sweet spot.
Are these good numbers? How might I be able to increase them?
Maybe my ISP is throttling? Of course, the SG-US pipe might be causing the slow down.
Does AWS Glacier set an upload rate limit?
I read in the Cloudberry Backup forum about increasing the RAM allocation, but not sure if I can do that in Cloudberry Explorer. It appears to ask for RAM = 2 x threads x chunksize, which would be only 3GB of RAM, but does Explorer limit RAM usage at some point?
Thanks for any advice or a general comment on what controls the upload rate.
I'm assuming from your post you tried using more than 5 threads and that did not help. If that's the case, you are not having any single stream connection upload limits. It's really hard to say what your expected performance should be going from Singapore to the US (Oregon). Sometimes, using the AWS Transfer Acceleration test can provide some guidance on the speed differences going between two regions. https://s3-accelerate-speedtest.s3-accelerate.amazonaws.com/en/accelerate-speed-comparsion.html
For me, that test reports that using S3 Transfer Acceleration (I'm in the US) to hit Singapore as opposed to hitting that region directly resulted in a 2,575% faster connection. But S3 Transfer Acceleration is not available for Glacier (as I recall) and it's expensive for your region anyway.
You could try uploading to Glacier using a test file of sufficient size from the AWS Console to see if speeds are different - but you'd want to wait until CloudBerry Explorer is complete.
Not sure if you meant megabit instead of megabyte in your speeds. You wrote Megabyte, but alas, 15 Mega-whatever is 30% of your rated internet speed and given the distance you are traveling, that seems fair on the surface.
Maybe you can describe what variables you've adjusted in CloudBerry Explorer and what the results were of those tests.
Thanks David. Yeah, it's megabytes MB - tried to keep all the numbers in the same format so the math wouldn't hurt my head. We have residential 1Gbps fibre in Singapore and I consistently get 400mbps up and down to a local server at any hour. So that's why I said a 50MB connection. No bottleneck there.
Aha, when I try speedtest.net to various Oregon servers (e.g. Comcast), then I'm getting only 8Mbps (1MBps) up, and only slightly better to San Francisco. Cloudberry is currently showing 2-4MBps - perhaps the AWS servers are better located.
So, no surprise there that the SG-US connection is a bottleneck. But weird, while cloudberry is working on the same 6 or so files, all 6-10GB each, for a long time, the upload speed is usually around 2-3MBps, but might go up to 10MBps for a little while, then back down. Concurrently, I keep running the speedtest.net and my connection is consistently 1MBps, which makes it really odd that my effective connection speed is 1MB and my upload speed can get up to 10MB.
Time of day doesn't appear to matter. Internet connection appears to be stable, but I don't have a good test on that (general web surfing doesn't cut out).
I went through the Cloudberry options and it seems only the thread count and chunk size would make a difference??? There's no proxy needed, I have speeds set to no-limit, etc.
Thread count from 5-20 doesn't seem to make a difference, but the queue shows only 9 active uploads at the most so maybe this option doesn't really change in real time?
For Glacier multipart upload chunks, I've tried 10-128-256-512 MB chunks ("upload chunks in parallel threads" is checked). The larger numbers are a little better, but the variability mentioned above seems to trump everything.
AWS Transfer Acceleration SG-US shows 2000% increase but as you mentioned, it is not available for Glacier and is a bit pricey. Alternatively, I could upload to Singapore and then transfer to US within AWS, but that too incurs a fee (not sure if available for Glacier anyway). I store in the US because I want it there longterm and the storage rate is a little lower than Asia.
Once I get my current 2TB uploaded, then incremental uploads will only be 200GB/week so this won't be as much of a problem.
The other option you might try is to upload to S3 -Standard (NOT S3-IA, S3-One Zone IA, or S3-RRS) and then use an Object Lifecycle Policy to move the data to Glacier immediately (it will move after 1 day in most cases). I only suggest that as an option in case S3-Standard provides better transfer speeds. You can test this with a single file backup of sufficient size to gauge any speed differences. When you transition objects in this manner from S3-Standard to S3 Glacier Storage, some metadata remains in S3-Standard for each file for inventory purposes and there is a small bit of metadata that resides in Glacier per file. For large files, this should not cause any real changes in billing. The advantage is you can inventory files immediately, even though the files are in the Glacier Storage Class. Might be worth a test.
Another option is to upload the data to a bucket in an availability zone much closer to you and then have the bucket replicate to the Oregon one. I think you can have the replication move it to glacier but i'm not 100% about that. Then you can either blow away the local bucket after you seed the data and then upload directly to the Oregon bucket, or you can keep it as is.
It's an option, but there are a few gotchas with Cross-Region Replication (CRR). primary use case is when you need the data replicated so it can be access from both locations or for DR. There's an issue of cost. Transfers cost 4 cents per GB. In addition, the feature requires bucket versioning be enabled (and I don't believe a versioned bucket can become a non-versioned one once the change is made - but you can pause versioning). Deletes of local files then would not actually delete the replicated data - the object would just be marked as deleted and you'd need a lifecycle policy to remove non-current expirations, I think. You also have to pay for the local storage as long as you keep the data there for replication purposes.
I think CRR is a great idea for an active bucket in your region that you absolutely need replicated to another one. But in this case, I think it adds a lot of complexity.
Cross-region replication does support moving the replicated data to Glacier.