On 28/02/11 21:51, Michael Meeks wrote: >> Why the surprise? How long has Huffman been around - 30 years? 40? And >> > you CANNOT do better than Huffman (there are other equally as good >> > algorithms, though).
> Ho hum; glad to know Huffman compression is optimal ;-) (personally I'd > go for arithmetic compression with a good "guess a libreoffice" model > alogorithm to get it down to a handful of bits ;-). Notice - that ZIP is > in fact two layered compression algorithm: Lempel Ziv, and then Huffman, > it is not clear that this makes it obviously non-optimal but it is > certainly somewhat weaker in that regard. Interesting ... I checked on wikipedia after I posted and Huffman is actually SIXTY years old! Set as a class project, and published in 1952. So zip should be tricky to beat for compression if it actually uses Huffman, although I would expect it to take ages. Your figures imply either (a) zip doesn't do Huffman, or (b) bzip2 and lzma use a much larger input block size. Was the wink to say you knew it was optimal already? I thought that was common knowledge :-) although reading wikipedia it reminded me that there is a pathological case - lots of identical elements. Given that I gather Windows files contain large sections of nulls, Huffman won't like that! That's probably the main time you *do* want double-compression - some form of RLE will suppress any pathological input into Huffman. The main reason I suggested adding times to KAMI's table was that it makes explicit the time/compression tradeoff. Cheers, Wol _______________________________________________ LibreOffice mailing list LibreOffice@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/libreoffice