Paul Rubin wrote:
I have a lot of short English strings I'd like to compress in order to
reduce the size of a database.  That is, I'd like a compression
function that takes a string like (for example) "George Washington"

[...]


Thanks.

I think your idea is good, maybe you'd want to build an LZ78 encoder in Python (LZ78 is pretty easy), feed it with a long English text and then pickle the resulting object. You could then unpickle it on program start and encode your short strings with it. I bet there's a working implementation around that already that does it ... but if you can't find any, LZ78 is implemented in 1 or 2 hours. There was a rather good explanation of the algorithm in German, unfortunately it's vanished from the net recently (I have a backup if you're interested).

Cheers,
Thomas.
--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to