> Behalf Of hawkesed > If I have a list, say of names. And I want to count all the people > named, say, Susie, but I don't care exactly how they spell it (ie, > Susy, Susi, Susie all work.) how would I do this? Set up a regular > expression inside the count? Is there a wildcard variable I can use? > Here is the code for the non-fuzzy way: > lstNames.count("Susie") > Any ideas? Is this something you wouldn't expect count to do? > Thanks y'all from a newbie.
If there are specific spellings you want to allow, you could just create a list of them and see if your Suzy is in there: >>> possible_suzys = [ 'Susy', 'Susi', 'Susie' ] >>> my_strings = ['Bob', 'Sally', 'Susi', 'Dick', 'Jane' ] >>> for line in my_strings: ... if line in possible_suzys: print line ... Susi I think a general solution to this problem is to use edit (also called Levenshtein) distance. There is an implementation in Python at this Wiki: http://en.wikisource.org/wiki/Levenshtein_distance You could use this distance function, and normalize for string length using the following score function: def score( a, b ): "Calculates the similarity score of the two strings based on edit distance." high_len = max( len(a), len(b) ) return float( high_len - distance( a, b ) ) / float( high_len ) >>> for line in my_strings: ... if score( line, 'Susie' ) > .75: print line ... Susi -- Regards, Ryan Ginstrom -- http://mail.python.org/mailman/listinfo/python-list