bloom filters are a great fit for this :-) -- richard
On Jan 19, 2013, at 5:59 PM, Nico Williams <n...@cryptonector.com> wrote: > I've wanted a system where dedup applies only to blocks being written > that have a good chance of being dups of others. > > I think one way to do this would be to keep a scalable Bloom filter > (on disk) into which one inserts block hashes. > > To decide if a block needs dedup one would first check the Bloom > filter, then if the block is in it, use the dedup code path, else the > non-dedup codepath and insert the block in the Bloom filter. This > means that the filesystem would store *two* copies of any > deduplicatious block, with one of those not being in the DDT. > > This would allow most writes of non-duplicate blocks to be faster than > normal dedup writes, but still slower than normal non-dedup writes: > the Bloom filter will add some cost. > > The nice thing about this is that Bloom filters can be sized to fit in > main memory, and will be much smaller than the DDT. > > It's very likely that this is a bit too obvious to just work. > > Of course, it is easier to just use flash. It's also easier to just > not dedup: the most highly deduplicatious data (VM images) is > relatively easy to manage using clones and snapshots, to a point > anyways. > > Nico > -- > _______________________________________________ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss