Dave Hughes wrote: > Another algorithm that might interest isn't based on "sounds-like" but > instead computes the number of transforms necessary to get from one > word to another: the Levenshtein distance. A C based implementation > (with Python interface) is available:
I don't know what algorithm it uses, but the difflib module looks similar. I've had good results using the get_close_matches function to locate similarly-named mp3 files. However I don't think "close enough" is well suited for this application. The sequences are short and non-distinct. Difference matching needs longer sequences to be effective. Phoneme matching seems overly complex and might grab things like Tsu-zi. I'd just use a list of alternate spellings like Ben suggested. -- http://mail.python.org/mailman/listinfo/python-list