Hi Wols, On Mon, 2011-02-28 at 18:04 +0000, Wols Lists wrote: > > So - personally, I find it amazing that ZIP is better than LZMA - ever, > > it is such an older compression algorithm, and this flies in the face of > > everything I've seen in practise with it [ eg. we use LZMA for our RPMs > > in openSUSE these days AFAIR, better than bzip2, which in turn was > > better than .tgz (essentially zip) ]. > > Why the surprise? How long has Huffman been around - 30 years? 40? And > you CANNOT do better than Huffman (there are other equally as good > algorithms, though).
Ho hum; glad to know Huffman compression is optimal ;-) (personally I'd go for arithmetic compression with a good "guess a libreoffice" model alogorithm to get it down to a handful of bits ;-). Notice - that ZIP is in fact two layered compression algorithm: Lempel Ziv, and then Huffman, it is not clear that this makes it obviously non-optimal but it is certainly somewhat weaker in that regard. #!/usr/bin/perl -w for (my $i = 0; $i < 10000; $i++) { print "$i\n"; } type size plain 48k zip 23k bzip2 11k lzma 3k That is not a very representative input (of course ;-), but perhaps not so far away from the rampant duplication we face. I would (personally) expect to see lzma do better than plain deflate. Of course, you can easily imagine a compression algorithm, that turns the file into just those two lines of perl above ;-) > Having tried to back up a hard disk over ethernet (dd if=/dev/sda > of=//samba/archive.hdi), it was direly slow over a 10Mb card. So I tried > to pipe it through bzip2 to speed up the transfer - it was *even* > *slower* ! bzip2 is not fast; that is certainly so, OTOH it was far better space wise than the deflate it replaced; this is why much source code is still packaged as .bz2 - if you want to bend your mind, read up on the Burrows-Wheeler transform [ FWIW ;-]. > Why do you think LZMA is better than zip - because it's faster? More efficient. Slowness of compression we can cope with easily, as long as decompression is fast: and for most algorithms this balance is easyish to achieve. > (I think we need to add compression time to KAMI's nice table :-) As long as it is within five or so minutes of CPU time on a fast machine, and/or preferably is multi-thread-able [ though our choice of tools and platforms strongly limits what we can use at the cab level anyway ] - compression time is fairly irrelevant. ATB, Michael. -- michael.me...@novell.com <><, Pseudo Engineer, itinerant idiot _______________________________________________ LibreOffice mailing list LibreOffice@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/libreoffice