On 13/12/2009 20:51, Steve Radich, BitShop, Inc. wrote:
I enabled compression on a zfs filesystem with compression=gzip9 - i.e. fairly 
slow compression - this stores backups of databases (which compress fairly 
well).

The next question is:  Is the CRC on the disk based on the uncompressed data 
(which seems more likely to be able to be recovered) or based on the zipped 
data (which seems slightly less likely to be able to be recovered).

Why?

Because if you can de-dup anyway why bother to compress THEN check? This SEEMS 
to be the behaviour - i.e. I would suspect many of the files I'm writing are 
dups - however I see high cpu use even though on some of the copies I see 
almost no disk writes.

If the dup check logic happens first AND it's a duplicate I shouldn't see 
hardly any CPU use (because it won't need to compress the data).



First, the checksum is calculated after compression happens.

If both compression and dedup is enabled for a given dataset then zfs will first compress the data, calculae the checksum and then dedup it.

It makes perfect sense as if you have some data which is very compressible and the unique set is big enough so compression would be useful it makes sense to use them both.

If you don't want the compression while using dedup just disable it.


--
Robert Milkowski
http://milek.blogspot.com

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to