On 02/10/10 10:36, Sean M Clark wrote: > xz/lzma is another consideration. At moderate compression levels, lzma > seems to be about the same or slightly faster than bzip2 with a little > better compression. At lower compression levels it seems like it's > about as fast as gzip while compressing noticeably farther - at least > in the small amount of testing I've done so far with the "xz" > implementation of lzma compression.
I was going to mention xz myself. I just completed some rather more extensive tests. I'm using three example test files here. The first, a 590MB ISO of Windows XP Pro SP3, contains a large amount of already-compressed data, and can be expected to compress poorly. The second, an 8.5MB stripped ELF 32-bit LSB executable, can probably be expected to compress moderately well. The third, a ebook resaved in text format, isabout 1.5MB of English ASCII text and should compress very well. I'm compressing each with gzip default options, gzip -9, bzip2, xz default options, and xz -7. (The xz man page notes that compression settings above 7 are not recommended unless absolute maximum compression is necessary due to time and memory usage.) First, the WinXP ISO (whitespace adjusted for clarity): babylon5:alaric:~:10 $ ls -l winxp.iso -rw-r----- 1 alaric users 617754624 Feb 10 10:24 winxp.iso babylon5:alaric:~:11 $ time gzip -c < winxp.iso | dd bs=64K >/dev/null 0+35022 records in 0+35022 records out 573799160 bytes (574 MB) copied, 78.782 s, 7.3 MB/s real 1m18.935s user 0m53.804s sys 0m4.357s compression: 7.12% compression/time: 0.0901 babylon5:alaric:~:12 $ time gzip -9 -c < winxp.iso | dd bs=64K >/dev/null 0+35013 records in 0+35013 records out 573652786 bytes (574 MB) copied, 111.185 s, 5.2 MB/s real 1m51.207s user 1m11.860s sys 0m4.905s compression: 7.14% compression/time: 0.0643 babylon5:alaric:~:13 $ time bzip2 -c < winxp.iso | dd bs=64K >/dev/null 0+140444 records in 0+140444 records out 575258513 bytes (575 MB) copied, 808.258 s, 712 kB/s real 13m28.370s user 10m11.257s sys 0m6.221s compression: 6.88% compression/time: 0.0085 babylon5:alaric:~:14 $ time xz -c < winxp.iso | dd bs=64K >/dev/null 0+69111 records in 0+69111 records out 566328660 bytes (566 MB) copied, 1395.3 s, 406 kB/s real 23m15.341s user 17m39.189s sys 0m9.664s compression: 8.43% compression/time: 0.0060 babylon5:alaric:~:15 $ time xz -7 -c < winxp.iso | dd bs=64K >/dev/null 0+69040 records in 0+69040 records out 565609576 bytes (566 MB) copied, 1512.2 s, 374 kB/s real 25m12.247s user 19m7.363s sys 0m10.943s compression: 8.45% compression/time: 0.0055 With this poorly compressible data, both gzip and gzip -9 yield better compression than bzip2, with roughly an order of magnitude higher throughput and lower CPU usage. The best compression on this file, by a hair, is achieved by xz -7, with default xz only 0.02% behind but taking 8% less time. The worst compression of 6.88% is bzip2, but it takes around half the time xz takes to do it, resulting in an actual compression/time score 50% better than xz. gzip achieves about 1.3% less compression than xz and about 0.25% better than bzip2, but does it 7 to 10 times faster than bzip2 and 12 to 20 times faster than xz. The best compression per unit time score is achieved by default gzip. The worst, xz -7, is an order of magnitude worse than gzip -9 in compression/time and achieves only 1.29% additional compression. Next, the ELF executable. babylon5:alaric:~:21 $ ls -l mplayer -rwxr-x--- 1 alaric users 8485168 Feb 10 12:04 mplayer babylon5:alaric:~:22 $ time gzip -c < mplayer | dd bs=64K >/dev/null 0+230 records in 0+230 records out 3752190 bytes (3.8 MB) copied, 1.26176 s, 3.0 MB/s real 0m1.266s user 0m1.032s sys 0m0.055s compression: 55.8% compression/time: 44.075 babylon5:alaric:~:23 $ time gzip -9 -c < mplayer | dd bs=64K >/dev/null 0+228 records in 0+228 records out 3734027 bytes (3.7 MB) copied, 2.76918 s, 1.3 MB/s real 0m2.779s user 0m2.119s sys 0m0.054s compression: 56% compression/time: 20.173 babylon5:alaric:~:24 $ time bzip2 -c < mplayer | dd bs=64K >/dev/null 0+880 records in 0+880 records out 3603587 bytes (3.6 MB) copied, 6.41314 s, 562 kB/s real 0m6.426s user 0m5.128s sys 0m0.050s compression: 57.5% compression/time: 8.948 babylon5:alaric:~:25 $ time xz -c < mplayer | dd bs=64K >/dev/null 0+362 records in 0+362 records out 2964084 bytes (3.0 MB) copied, 21.0693 s, 141 kB/s real 0m21.098s user 0m15.434s sys 0m0.316s compression: 65% compression/time: 3.081 babylon5:alaric:~:26 $ time xz -7 -c < mplayer | dd bs=64K >/dev/null 0+362 records in 0+362 records out 2964084 bytes (3.0 MB) copied, 19.8819 s, 149 kB/s real 0m19.913s user 0m15.347s sys 0m0.301s compression: 65% compression/time: 3.264 This is not all that dissimilar a picture. Interestingly, here, default xz and xz -7 achieve identical compression, but xz -7 accomplishes it slightly over a second faster. Both lead bzip2 by about 7.5% in compression, but take around three times as long. gzip -9 achieves only 0.2% better compression than default gzip, but takes more than 50% longer to do it. Even gzip -9 is still more than twice as fast as bzip2 and almost seven times faster than xz, and trails it by only 9% in compression. Vanilla gzip, only a fraction behind gzip -9 in compression, is more than twice as fast as gzip -9 and five times faster than bzip2, and has almost 15 times the compression/time score of the best compressor, xz -7. Finally, the text file: babylon5:alaric:~:31 $ ls -l 1634-The_Baltic_War.txt -rw-rw---- 1 alaric users 1501227 Feb 10 12:23 1634-The_Baltic_War.txt babylon5:alaric:~:32 $ time gzip -c < 1634-The_Baltic_War.txt | dd bs=64K >/dev/null 0+35 records in 0+35 records out 568436 bytes (568 kB) copied, 0.248751 s, 2.3 MB/s real 0m0.256s user 0m0.217s sys 0m0.006s compression: 62.135% compression/time: 242.695 babylon5:alaric:~:33 $ time gzip -9 -c < 1634-The_Baltic_War.txt | dd bs=64K >/dev/null 0+35 records in 0+35 records out 566204 bytes (566 kB) copied, 0.311892 s, 1.8 MB/s real 0m0.321s user 0m0.269s sys 0m0.009s compression: 62.284% compression/time: 194.018 babylon5:alaric:~:34 $ time bzip2 -c < 1634-The_Baltic_War.txt | dd bs=64K >/dev/null 0+101 records in 0+101 records out 412638 bytes (413 kB) copied, 1.12327 s, 367 kB/s real 0m1.130s user 0m0.949s sys 0m0.023s compression: 72.513% compression/time: 64.168 babylon5:alaric:~:35 $ time xz -c < 1634-The_Baltic_War.txt | dd bs=64K >/dev/null 0+55 records in 0+55 records out 444852 bytes (445 kB) copied, 4.89832 s, 90.8 kB/s real 0m4.917s user 0m3.809s sys 0m0.069s compression: 70.367% compression/time: 14.311 babylon5:alaric:~:36 $ time xz -7 -c < 1634-The_Baltic_War.txt | dd bs=64K >/dev/null 0+55 records in 0+55 records out 444852 bytes (445 kB) copied, 4.79776 s, 92.7 kB/s real 0m4.815s user 0m3.854s sys 0m0.095s compression: 70.367% compression/time: 14.614 Now this gets interesting. Default xz and xz -7 still achieve identical compression, and again, xz -7 is actually fractionally faster than default xz. However, on this data, bzip2 beats both in compression by just over 2%, and does it more than three times faster, for a compression/time score about 4.5 times better than xz -7. This is the first time any of the three has achieved a *significantly* smaller output than gzip; the bzip2 output file is roughly 28% smaller than gzip's. However, even gzip -9 is still over three times faster than bzip2, and default gzip is four times faster, with a compression/time score 17 times better than xz. So, the overall conclusion: Yes, you can achieve some savings in space by using bzip2 or even xz for your compression, if you can afford the additional CPU and memory utilization. But it will probably not be a significant space saving on most data, and it will come at a horrendous cost in terms of CPU usage and actual throughput. For on-the-fly compression of a high-volume stream of mixed data, you're actually probably best off staying with plain-vanilla gzip, unless you're I/O bound on your backup writes already *AND* you have a massive compute server to do your compression - and even then, you may not gain much unless you use one of the multi-threaded versions of bzip2 or xz that can make use of multiple CPU cores for a single compression task. -- Phil Stracchino, CDK#2 DoD#299792458 ICBM: 43.5607, -71.355 ala...@caerllewys.net ala...@metrocast.net p...@co.ordinate.org Renaissance Man, Unix ronin, Perl hacker, Free Stater It's not the years, it's the mileage. ------------------------------------------------------------------------------ SOLARIS 10 is the OS for Data Centers - provides features such as DTrace, Predictive Self Healing and Award Winning ZFS. Get Solaris 10 NOW http://p.sf.net/sfu/solaris-dev2dev _______________________________________________ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users