On 29/06/2006 9:28 AM, BBands wrote: > I'd like to see if a string exists, even approximately, in another. For > example if "black" exists in "blakbird" or if "beatles" exists in > "beatlemania". The application is to look though a long list of songs > and return any approximate matches along with a confidence factor. I > have looked at edit distance, but that isn't a good choice for finding > a short string in a longer one.
There is a trivial difference between the traditional distance-matrix-based Levenshtein algorithm for edit distance and the corresponding one for approximate string searching. Ditto between finite-state-machine approaches. Ditto between modern bit-bashing approaches. > I have also explored > difflib.SequenceMatcher and .get_close_matches, but what I'd really > like is something like: > > a = FindApprox("beatles", "beatlemania") > print a > 0.857 > > Any ideas? You got no ideas from googling "approximate string search python"??? -- http://mail.python.org/mailman/listinfo/python-list