Robert Milkowski <mi...@task.gda.pl> writes: > On 13/12/2009 20:51, Steve Radich, BitShop, Inc. wrote: >> Because if you can de-dup anyway why bother to compress THEN check? >> This SEEMS to be the behaviour - i.e. I would suspect many of the >> files I'm writing are dups - however I see high cpu use even though >> on some of the copies I see almost no disk writes. > > First, the checksum is calculated after compression happens.
for some reason I, like Steve, thought the checksum was calculated on the uncompressed data, but a look in the source confirms you're right, of course. thinking about the consequences of changing it, RAID-Z recovery would be much more CPU intensive if hashing was done on uncompressed data -- every possible combination of the N-1 disks would have to be decompressed (and most combinations would fail), and *then* the remaining candidates would be hashed to see if the data is correct. this would be done on a per recordsize basis, not per stripe, which means reconstruction would fail if two disk blocks (512 octets) on different disks and in different stripes go bad. (doing an exhaustive search for all possible permutations to handle that case doesn't seem realistic.) in addition, hashing becomes slightly more expensive since more data needs to be hashed. overall, my guess is that this choice (made before dedup!) will give worse performance in normal situations in the future, when dedup+lzjb will be very common, at a cost of faster and more reliable resilver. in any case, there is not much to be done about it now. -- Kjetil T. Homme Redpill Linpro AS - Changing the game _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss