I wonder which algorithm determines the similarity between two strings better?
On 1/31/06, Kent Johnson <[EMAIL PROTECTED]> wrote: > Gregory Piñero wrote: > > Ok, ok, I got it! The Pythonic way is to use an existing library ;-) > > > > import difflib > > CloseMatches=difflib.get_close_matches(AFileName,AllFiles,20,.7) > > > > I wrote a script to delete duplicate mp3's by filename a few years > > back with this. If anyone's interested in seeing it, I'll post a blog > > entry on it. I'm betting it uses a similiar algorithm your functions. > > A quick trip to difflib.py produces this description of the matching > algorithm: > > The basic > algorithm predates, and is a little fancier than, an algorithm > published in the late 1980's by Ratcliff and Obershelp under the > hyperbolic name "gestalt pattern matching". The basic idea is to find > the longest contiguous matching subsequence that contains no "junk" > elements (R-O doesn't address junk). The same idea is then applied > recursively to the pieces of the sequences to the left and to the > right of the matching subsequence. > > So no, it doesn't seem to be using Levenshtein distance. > > Kent > -- > http://mail.python.org/mailman/listinfo/python-list > -- Gregory Piñero Chief Innovation Officer Blended Technologies (www.blendedtechnologies.com) -- http://mail.python.org/mailman/listinfo/python-list