> From: Roy Sigurd Karlsbakk [mailto:r...@karlsbakk.net] > > increases the probability of arc/ram cache hit. So dedup allows you > to > > stretch your disk, and also stretch your ram cache. Which also > > benefits performance. > > Theoretically, yes, but there will be an overhead in cpu/memory that > can reduce this benefit to a penalty.
That's why a really fast compression algorithm is used in-line, in hopes that the time cost of compression is smaller than the performance gain of compression. Take for example, v.42bis and v.44 which was used to accelerate 56K modems. (Probably still are, if you actually have a modem somewhere. ;-) Nowadays we have faster communication channels; in fact when talking about dedup we're talking about local disk speed, which is really fast. But we also have fast processors, and the algorithm in question can be really fast. I recently benchmarked lzop, gzip, bzip2, and lzma for some important data on our fileserver that I would call "typical." No matter what I did, lzop was so ridiculously light weight that I could never get lzop up to 100% cpu. Even reading data 100% from cache and filtering through lzop to /dev/null, the kernel overhead of reading ram cache was higher than the cpu overhead to compress. For the data in question, lzop compressed to 70%, gzip compressed to 42%, bzip 32%, and lzma something like 16%. bzip2 was the slowest (by a factor of 4). lzma -1 and gzip --fast were closely matched in speed but not compression. So the compression of lzop was really weak for the data in question, but it contributed no significant cpu overhead. The point is: It's absolutely possible to compress quickly, if you have a fast algorithm, and gain performance. I'm boldly assuming dedup performs this fast. It would be nice to actually measure and prove it. _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss