On Wed, May 25, 2011 at 03:50:09PM -0700, Matthew Ahrens wrote: > That said, for each block written (unique or not), the DDT must be updated, > which means reading and then writing the block that contains that dedup > table entry, and the indirect blocks to get to it. With a reasonably large > DDT, I would expect about 1 write to the DDT for every block written to the > pool (or "written" but actually dedup'd).
That, right there, illustrates exactly why some people are disappointed wrt performance expectations from dedup. To paraphrase, and in general: * for write, dedup may save bandwidth but will not save write iops. * dedup may amplify iops with more metadata reads * dedup may turn larger sequential io into smaller random io patterns * many systems will be iops bound before they are bandwidth or space bound (and l2arc only mitigates read iops) * any iops benefit will only come on later reads of dedup'd data, so is heavily dependent on access pattern. Assessing whether these amortised costs are worth it for you can be complex, especially when the above is not clearly understood. To me, the thing that makes dedup most expensive in iops is the writes for update when a file (or snapshot) is deleted. These are additional iops that dedup creates, not ones that it substitutes for others in roughly equal number. This load is easily forgotten in a cursory analysis, and yet is always there in a steady state with rolling auto-snapshots. As I've written before, I've had some success managing this load using deferred deletes and snapshot holds, either to spread the load or to shift it to otherwise-quiet times, as the case demanded. I'd rather not have to. :-) -- Dan.
pgpS8sJWxBEVR.pgp
Description: PGP signature
_______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss