hawkesed wrote: > If I have a list, say of names. And I want to count all the people > named, say, Susie, but I don't care exactly how they spell it (ie, > Susy, Susi, Susie all work.) how would I do this? Set up a regular > expression inside the count? Is there a wildcard variable I can use? > Here is the code for the non-fuzzy way: > lstNames.count("Susie") > Any ideas? Is this something you wouldn't expect count to do? > Thanks y'all from a newbie. > Ed
You might want to check out the SoundEx and MetaPhone algorithms which provide approximations of the "sound" of a word based on spelling (assuming English pronunciations). Apparently a soundex module used to be built into Python but was removed in 2.0. You can find several implementations on the 'net, for example: http://orca.mojam.com/~skip/python/soundex.py http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/52213 MetaPhone is generally considered better than SoundEx for "sounds-like" matching, although it's considerably more complex (IIRC, although it's been a long time since I wrote an implementation of either in any language). A Python MetaPhone implementations (there must be more than this one?): http://joelspeters.com/awesomecode/ Another algorithm that might interest isn't based on "sounds-like" but instead computes the number of transforms necessary to get from one word to another: the Levenshtein distance. A C based implementation (with Python interface) is available: http://trific.ath.cx/resources/python/levenshtein/ Whichever algorithm you go with, you'll wind up with some sort of "similar" function which could be applied in a similar manner to Ben's example (I've just mocked up the following -- it's not an actual session): >>> import soundex >>> import metaphone >>> import levenshtein >>> my_strings = ['Bob', 'Sally', 'Susi', 'Dick', 'Susy', 'Jane' ] >>> found_suzys = [s for s in my_strings if soundsex.sounds_similar(s, 'Susy')] >>> found_suzys = [s for s in my_strings if metaphone.sounds_similar(s, 'Susy')] >>> found_suzys = [s for s in my_strings if levenshtein.distance(s, 'Susy') < 4] >>> found_suzys ['Susi', 'Susy'] (one hopes anyway!) HTH, Dave. -- -- http://mail.python.org/mailman/listinfo/python-list