On Thu, Sep 14, 2006 at 10:32:59PM +0200, Henk Langeveld wrote: > Bady, Brant RBCM:EX wrote: > >Part of the archiving process is to generate checksums (I happen to use > >MD5), and store them with other metadata about the digital object in > >order to verify data integrity and demonstrate the authenticity of the > >digital object over time. > > >Wouldn't it be helpful if there was a utility to access/read the > >checksum data created by ZFS, and use it for those same purposes. > > Doesn't ZFS use block-level checksums?
Yes, but the checksum is stored with the pointer. So then, for each file/directory there's a dnode, and that dnode has several block pointers to data blocks or indirect blocks, and indirect blocks have pointers to... and so on. If a bit of data in a file changes, then a new block will be written, and the pointer to the previous block will be changed in the indirect block that pointed to it or the dnode itself if there was no indirect block, and so on, and a new block will be written for each indirect block and dnode so modified. All in one transaction. That's how COW works. And this will necessarily change any checksum of the dnode itself (assuming there are no collisions in the checksum algorithm). So, a checksum of a dnode will capture the entire file's contents and meta-data. Read from the file, update the atime, and so change its checksum. ZFS could export a dnode checksum that only covers the data, and another that covers both, data and meta-data. Of course, a filesystem "scrub" (if one is implemented, but I think it will be necessary) would change all such checksums. So these checksums may not have the desired property. > >Hoping to see something like that in a future release, or a command line > >utility that could do the same. > > It might be possible to add a user set property to a file with the md5sum > and > a timestamp when it was computed. That would be slow. > But what would this protect against? If you need to avoid tampering, you > need the checksums offline anyway - cf. tripwire. ZFS can very quickly compute a checksum of a file's data by checksumming all the top-level block pointers in the file's dnode. Or the data and meta-data by checksumming the entire dnode. That's O(1), no matter how large the file. That'd be nice indeed! But because of the semantics for when such checksums can/could change (see above), ZFS checksums can only be used to detect the possiblity of change, and so there may be false positives, IMO. Which means that for tamper detection one would need to compute a checksum of the file contents and then store it and the ZFS checksum together, using the ZFS checksum only as a way to optimize against checksumming the entire file most of the time. Nico -- _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss