> From: Roy Sigurd Karlsbakk [mailto:r...@karlsbakk.net]
> > increases the probability of arc/ram cache hit. So dedup allows you
> to
> > stretch your disk, and also stretch your ram cache. Which also
> > benefits performance.
> 
> Theoretically, yes, but there will be an overhead in cpu/memory that
> can reduce this benefit to a penalty.

That's why a really fast compression algorithm is used in-line, in hopes that 
the time cost of compression is smaller than the performance gain of 
compression.  Take for example, v.42bis and v.44 which was used to accelerate 
56K modems.  (Probably still are, if you actually have a modem somewhere.  ;-)

Nowadays we have faster communication channels; in fact when talking about 
dedup we're talking about local disk speed, which is really fast.  But we also 
have fast processors, and the algorithm in question can be really fast.

I recently benchmarked lzop, gzip, bzip2, and lzma for some important data on 
our fileserver that I would call "typical."  No matter what I did, lzop was so 
ridiculously light weight that I could never get lzop up to 100% cpu.  Even 
reading data 100% from cache and filtering through lzop to /dev/null, the 
kernel overhead of reading ram cache was higher than the cpu overhead to 
compress.

For the data in question, lzop compressed to 70%, gzip compressed to 42%, bzip 
32%, and lzma something like 16%.  bzip2 was the slowest (by a factor of 4).  
lzma -1 and gzip --fast were closely matched in speed but not compression.  So 
the compression of lzop was really weak for the data in question, but it 
contributed no significant cpu overhead.  The point is:  It's absolutely 
possible to compress quickly, if you have a fast algorithm, and gain 
performance.  I'm boldly assuming dedup performs this fast.  It would be nice 
to actually measure and prove it.

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to