On Thu, Sep 14, 2006 at 10:32:59PM +0200, Henk Langeveld wrote:
> Bady, Brant RBCM:EX wrote:
> >Part of the archiving process is to generate checksums (I happen to use
> >MD5), and store them with other metadata about the digital object in
> >order to verify data integrity and demonstrate the authenticity of the
> >digital object over time.
> 
> >Wouldn't it be helpful if there was a utility to access/read  the
> >checksum data created by ZFS, and use it for those same purposes.
> 
> Doesn't ZFS use block-level checksums?

Yes, but the checksum is stored with the pointer.

So then, for each file/directory there's a dnode, and that dnode has
several block pointers to data blocks or indirect blocks, and indirect
blocks have pointers to... and so on.

If a bit of data in a file changes, then a new block will be written,
and the pointer to the previous block will be changed in the indirect
block that pointed to it or the dnode itself if there was no indirect
block, and so on, and a new block will be written for each indirect
block and dnode so modified.  All in one transaction.  That's how COW
works.

And this will necessarily change any checksum of the dnode itself
(assuming there are no collisions in the checksum algorithm).

So, a checksum of a dnode will capture the entire file's contents and
meta-data.  Read from the file, update the atime, and so change its
checksum.  ZFS could export a dnode checksum that only covers the data,
and another that covers both, data and meta-data.

Of course, a filesystem "scrub" (if one is implemented, but I think it
will be necessary) would change all such checksums.  So these checksums
may not have the desired property.

> >Hoping to see something like that in a future release, or a command line
> >utility that could do the same.
> 
> It might be possible to add a user set property to a file with the md5sum 
> and
> a timestamp when it was computed.

That would be slow.

> But what would this protect against?  If you need to avoid tampering, you
> need the checksums offline anyway - cf. tripwire.

ZFS can very quickly compute a checksum of a file's data by checksumming
all the top-level block pointers in the file's dnode.  Or the data and
meta-data by checksumming the entire dnode.  That's O(1), no matter how
large the file.  That'd be nice indeed!

But because of the semantics for when such checksums can/could change
(see above), ZFS checksums can only be used to detect the possiblity of
change, and so there may be false positives, IMO.  Which means that for
tamper detection one would need to compute a checksum of the file
contents and then store it and the ZFS checksum together, using the ZFS
checksum only as a way to optimize against checksumming the entire file
most of the time.

Nico
-- 
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to