On Thu, Jan 21, 2010 at 05:04:51PM +0100, erik.ableson wrote: > What I'm trying to get a handle on is how to estimate the memory > overhead required for dedup on that amount of storage.
We'd all appreciate better visibility of this. This requires: - time and observation and experience, and - better observability tools and (probably) data exposed for them > So the question is how much memory or L2ARC would be necessary to > ensure that I'm never going back to disk to read out the hash keys. I think that's a wrong-goal for optimisation. For performance (rather than space) issues, I look at dedup as simply increasing the size of the working set, with a goal of reducing the amount of IO (avoided duplicate writes) in return. If saving one large async write costs several small sync reads, you fall off a very steep performance cliff, especially for IOPS-limited seeking media. However, it doesn't matter whether those reads are for DDT entries or other filesystem metadata necessary to complete the write. Nor does it even matter if those reads are data reads, for other processes that have been pushed out of ARC because of the larger working set. So I think it's right that arc doesn't treat DDT entries specially. The trouble is that the hash function produces (we can assume) random hits across the DDT, so the working set depends on the amount of data and the rate of potentially dedupable writes as well as the actual dedup hit ratio. A high rate of writes also means a large amount of data in ARC waiting to be written at the same time. This makes analysis very hard (and pushes you very fast towards that very steep cliff, as we've all seen). Separately, what might help is something like "dedup=opportunistic" that would keep the working set smaller: - dedup the block IFF the DDT entry is already in (l2)arc - otherwise, just write another copy - maybe some future async dedup "cleaner", using bp-rewrite, to tidy up later. I'm not sure what, in this scheme, would ever bring DDT entries into cache, though. Reads for previously dedup'd data? I also think a threshold on the size of blocks to try deduping would help. If I only dedup blocks (say) 64k and larger, i might well get most of the space benefit for much less overhead. -- Dan.
pgpfZ1iTPb0nB.pgp
Description: PGP signature
_______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss