Dave Hughes wrote:
> Another algorithm that might interest isn't based on "sounds-like" but
> instead computes the number of transforms necessary to get from one
> word to another: the Levenshtein distance. A C based implementation
> (with Python interface) is available:

I don't know what algorithm it uses, but the difflib module looks similar. 
I've had good results using the get_close_matches function to locate
similarly-named mp3 files.

However I don't think "close enough" is well suited for this application. 
The sequences are short and non-distinct.  Difference matching needs longer
sequences to be effective.  Phoneme matching seems overly complex and might
grab things like Tsu-zi.  I'd just use a list of alternate spellings like
Ben suggested.


-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to