bloom filters are a great fit for this :-)

  -- richard



On Jan 19, 2013, at 5:59 PM, Nico Williams <n...@cryptonector.com> wrote:

> I've wanted a system where dedup applies only to blocks being written
> that have a good chance of being dups of others.
> 
> I think one way to do this would be to keep a scalable Bloom filter
> (on disk) into which one inserts block hashes.
> 
> To decide if a block needs dedup one would first check the Bloom
> filter, then if the block is in it, use the dedup code path, else the
> non-dedup codepath and insert the block in the Bloom filter.  This
> means that the filesystem would store *two* copies of any
> deduplicatious block, with one of those not being in the DDT.
> 
> This would allow most writes of non-duplicate blocks to be faster than
> normal dedup writes, but still slower than normal non-dedup writes:
> the Bloom filter will add some cost.
> 
> The nice thing about this is that Bloom filters can be sized to fit in
> main memory, and will be much smaller than the DDT.
> 
> It's very likely that this is a bit too obvious to just work.
> 
> Of course, it is easier to just use flash.  It's also easier to just
> not dedup: the most highly deduplicatious data (VM images) is
> relatively easy to manage using clones and snapshots, to a point
> anyways.
> 
> Nico
> --
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to