On 13/12/2009 20:51, Steve Radich, BitShop, Inc. wrote:
I enabled compression on a zfs filesystem with compression=gzip9 - i.e. fairly
slow compression - this stores backups of databases (which compress fairly
well).
The next question is: Is the CRC on the disk based on the uncompressed data
(which seems more likely to be able to be recovered) or based on the zipped
data (which seems slightly less likely to be able to be recovered).
Why?
Because if you can de-dup anyway why bother to compress THEN check? This SEEMS
to be the behaviour - i.e. I would suspect many of the files I'm writing are
dups - however I see high cpu use even though on some of the copies I see
almost no disk writes.
If the dup check logic happens first AND it's a duplicate I shouldn't see
hardly any CPU use (because it won't need to compress the data).
First, the checksum is calculated after compression happens.
If both compression and dedup is enabled for a given dataset then zfs
will first compress the data, calculae the checksum and then dedup it.
It makes perfect sense as if you have some data which is very
compressible and the unique set is big enough so compression would be
useful it makes sense to use them both.
If you don't want the compression while using dedup just disable it.
--
Robert Milkowski
http://milek.blogspot.com
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss