[EMAIL PROTECTED] wrote on 12/06/2007 09:58:00 AM: > On Dec 6, 2007 1:13 AM, Bakul Shah <[EMAIL PROTECTED]> wrote: > > > Note that I don't wish to argue for/against zfs/billtodd but > > the comment above about "no *real* opensource software > > alternative zfs automating checksumming and simple > > snapshotting" caught my eye. > > > > There is an open source alternative for archiving that works > > quite well. venti has been available for a few years now. > > It runs on *BSD, linux, macOS & plan9 (its native os). It > > uses strong crypto checksums, stored separately from the data > > (stored in the pointer blocks) so you get a similar guarantee > > against silent data corruption as ZFS. > > Last time I looked into Venti, it used content hashing to > locate storage blocks. Which was really cool, because (as > you say) it magically consolidates blocks with the same checksum > together. > > The 45 byte score is the checksum of the top of the tree, isn't that > right? > > Good to hear it's still alive and been revamped somewhat. > > ZFS snapshots and clones save a lot of space, but the > 'content-hash == address' trick means you could potentially save > much more. > > Though I'm still not sure how well it scales up - > Bigger working set means you need longer (more expensive) hashes > to avoid a collision, and even then its not guaranteed. > > When i last looked they were still using SHA-160 > and I ran away screaming at that point :)
The hash chosen is close to inconsequential as long as you perform collision checks and the collision rate is "low". Hash key collision branching is pretty easy and has been used for decades (see perl's collision forking for hash var key collisions for an example). The process is lookup a key, verify data matches, if it does inc the ref count store and go, if no match split out a sub key, store and go. There are "cost" curves for both the hashing, and data matching portions. As the number of hash matches goes up so does the cost for data verifying -- but no matter what hash you use (assuming at least one bit less information then the original data) there _will_ be collisions possible so the verify must exist. -Wade _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss