Andi abes asked:
> Doesn't that depend on the ratios of read vs write? > In a read tilted environment (e.g. CDN's, image stores etc), being able to > dedup at the block level in the > relatively rare write case seems a boon. The simplification this could allow > - performing localized dedup > (i.e. each object server deduping just its local storage) seems worth while. For the most part deduplication has no impact on read performance. The same chunks will be fetched whether they were de-duplicated or not. If you have a central metadata system (like GFS or HDFS) then deduplication can impair optimizing the location of the chunks for streaming reads. But with hash driven algorithms you either place the entire object on one server, which will preclude parallelizing the fetch, or you distribute the object's chunks to multiple servers which will impair the efficiency of a slow streaming read. Because distributed deduplication relies on fingerprinted chunks it has the advantage of allowing unrestricted Chunk caching, which is the real solution to optimizing reads of extremely popular data. _______________________________________________ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp