Bjoern Schliessmann wrote: > Rajarshi wrote: > > > Does anybody know how I can remove the header portion of the > > compressed bytes, such that I only have the compressed data > > remaining? (Obviously I do not intend to perform the > > decompression!) > > Just curious: What's your goal? :) A home made hash function?
Actually I was implementing the use of the normalized compression distance to evaluate molecular similarity as described in an article in J.Chem.Inf.Model (http://dx.doi.org/10.1021/ci600384z, subscriber access only, unfortunately). Essentially, they note that the NCD does not always bevave like a metric and one reason they put forward is that this may be due to the size of the header portion (they were using the command line gzip and bzip2 programs) compared to the strings being compressed (which are on average 48 bytes long). So I was interested to see if the NCD behaved like a metric if I removed everything that was not the compressed string. And since I only need to calculate similarity between two strings, I do not need to do any decompression. -- http://mail.python.org/mailman/listinfo/python-list