"inhahe" <[EMAIL PROTECTED]> writes: > i don't see anybody mentioning huffman encoding. i think it just works per > byte, so it's not as tight as gzip or whatever. but it sounds like it would > be easy to implement and wouldn't require any corpus-wide compression > information. except a character frequency count if you wanted to be optimal.
In principle you could do it over digraphs but I tried that once and it didn't help much. Basially -because- it doesn't use any corpus-wide compression information, it doesn't compress anywhere near as well as LZ, DMC, or whatever. -- http://mail.python.org/mailman/listinfo/python-list