[EMAIL PROTECTED] wrote: > Essentially, they note that the NCD does not always bevave like a > metric and one reason they put forward is that this may be due to the > size of the header portion (they were using the command line gzip and > bzip2 programs) compared to the strings being compressed (which are on > average 48 bytes long).
gzip datastreams have a real header, with a file type identifier, optional filenames, comments, and a bunch of flags. but even if you strip that off (which is basically what happens if you use zlib.compress instead of gzip), I doubt you'll get representative "compressability" metrics on strings that short. like most other compression algorithms, those algorithms are designed for much larger datasets. </F> -- http://mail.python.org/mailman/listinfo/python-list