S3 secondary storage performance considerations

christian.niephaus Fri, 29 Apr 2016 07:42:06 -0700

Hi,

we plan to switch from a per-zone NFS secondary storage to a (region-wide) S3 
secondary storage. Currently we have some concerns wrt the performance of this 
setup, though.


The throughput we achieve for e.g. a single file download (wget 
http://our.s3.storage/some-file) is approx. 100mbps. Due to the nature of an 
object store, this behavior is quite normale. High performance will only be 
achieved through multipart up-/download. This feature is exploited by most 
tools that natively “speak” the S3 protocol.

However, at the same time, we have a couple of “power-users”. Such users 
require data volumes up to 1 TB size frequently and also enable recurring daily 
snapshots of these volumes on a regular basis. Thus, we produce quite a lot of 
data (approx. 6-10TB per day) with snapshots. 

Given that, we are interested in understanding the following:
- Is anybody successfully using S3 secondary storage with a similar or even 
higher amount of data? If so, is there anything we have to consider in 
particular? 
- Is the Cloudstack synchronization process single- or multi-threaded? That is, 
if we produce 6-10TB per day data volume to be stored on the S3 object store it 
might take longer than one day to copy it to the S3, if the Cloudstack copies 
the data sequentially file by file. 
- Does Cloudstack support multipart uploads? That is, like the aforementioned 
example, uploading a 1TB file to S3 will take forever if this is not supported. 
- Is there any advise on the size of the secondary staging storage per zone, 
e.g. depending on the primary storage volume or the amount of VMs or something?

Just in case, we still use Cloudstack 4.5.1 with VMWare 5.5 as hypervisor. 

Thanks in advance for any help. 

Best regards,
Christian

S3 secondary storage performance considerations

Reply via email to