David Magda wrote:
On Tue, June 16, 2009 15:32, Kyle McDonald wrote:
So the cache saves not only the time to access the disk but also the CPU
time to decompress. Given this, I think it could be a big win.
Unless you're in GIMP working on JPEGs, or doing some kind of MPEG video
editing--or ripping audio (MP3 / AAC / FLAC) stuff. All of which are
probably some of the largest files in most people's homedirs nowadays.
1 GB of e-mail is a lot (probably my entire personal mail collection for a
decade) and will compress well; 1 GB of audio files is nothing, and won't
compress at all.
Perhaps compressing /usr could be handy, but why bother enabling
compression if the majority (by volume) of user data won't do anything but
burn CPU?
So the correct answer on whether compression should be enabled by default
is "it depends". (IMHO :) )
The performance tests I've found almost universally show LZJB as not
being cpu-bound on recent equipment. A few years from now GZIP may get
away from being cpu-bound. As performance tests on current hardware
show that enabling LZJB improves overall performance it would make sense
to enable it by default. In the future when GZIP is no longer
cpu-bound, it might become the default (or there could be another
algorithm). There is a long history of previously formidable tasks
starting out as cpu-bound but quickly progressing to an 'easily handled
in the background' task. Decoding MP3 and MPEG1, MPEG2 (DVD
resolutions), softmodems (and other host signal processor devices), and
RAID are all tasks that can easily be handled by recent equipment.
Another option/idea to consider is using LZJB as the default compression
method, and then performing a background scrub-recompress during
otherwise idle times. Technique ideas:
1.) A performance neutral/performance enhancing technique: use any
algorithm that is not CPU bound on your hardware, and rarely if ever has
worse performance than the uncompressed state
2.) Adaptive technique 1: rarely used blocks could be given the
strongest compression (using an algorithm tuned for the data type
detected), while frequently used blocks would be compressed at a
performance neutral or performance improving levels.
3.) Adaptive technique 2: rarely used blocks could be given the
strongest compression (using an algorithm tuned for the data type
detected), while frequently used blocks would be compressed at a
performance neutral or performance improving levels. As the storage
device gets closer to its native capacity, start applying compression
both proactively (to new data) and retroactively (to old data),
progressively using more powerful compression techniques as the maximum
native capacity is approached. Compression could delay the users from
reaching the 80-95% capacity point where system performance curves often
have their knees (a massive performance degradation with each additional
unit).
4.) Maximize space technique: detect the data type and use the best
available algorithm for the block.
As a counterpoint, if drive capacities keep growing at their current
pace it seems they ultimately risk obviating the need to give much
thought to the compression algorithm, except to choose one that boosts
system performance. (I.e. in hard drives, compression may primarily be
used to improve performance rather than gain extra storage space, as
drive capacity has grown many times faster than drive performance.)
JPEGs often CAN be /losslessly/ compressed further by useful amounts
(e.g. 25% space savings). There is more on this here:
Tests:
http://www.maximumcompression.com/data/jpg.php
http://compression.ca/act/act-jpeg.html
http://www.downloadsquad.com/2008/09/11/winzip-12-supports-lossless-jpg-compression/
http://download.cnet.com/8301-2007_4-10038172-12.html
http://www.online-tech-tips.com/software-reviews/winzip-vs-7-zip-best-compression-method/
These have source code available:
http://sylvana.net/jpeg-ari/
PAQ8R http://www.cs.fit.edu/~mmahoney/compression/ (general info
http://en.wikipedia.org/wiki/PAQ )
This one says source code is "not yet available" (implying it may become
available):
http://www.elektronik.htw-aalen.de/packjpg/packjpg_m.htm
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss