Re: Middle matching - any Python library functions (besides re)?

bearophileHUGS Sun, 27 Aug 2006 17:51:52 -0700

If you want to avoid an O(n^2) algorithm, you may need to find a
signature for each file. Then you use such signatures to compute
hashes, and unique them with a dict (dict values may be the file
names). Later you can weed out the few wrong collisions produced by the
possibly approximated signature. If you can determine of a given a file
where its heading garbage stops, then you can compute the signature
just computing the python hash of some of the following bytes (30 or
200 byte may suffice).


Bye,
bearophile

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Middle matching - any Python library functions (besides re)?

Reply via email to