Wojciech Mula wrote: > Steve Bergman wrote: > > I'm looking for a module to do fuzzy comparison of strings. [...] > > Check module difflib, it returns difference between two sequences.
and it's intended for comparing text files, and is relatively slow. Google "python levenshtein". You'll probably find this a better fit for typoed keys in a database. What I would do for a quick start on this exercise (as described) is: To compare two strings, take copies, and: 1. strip out all spaces (including \xA0 i.e. if the data has been anywhere near the web) from each string; also all "-" (in general strip frequently occurring meaningless punctuation) 2. remove leading zeroes from each string 3. d = levenshtein_distance(string_a, string_b) # string_a etc is the reduced string, not the original 4. error_metric = float(d) / max(len(string_a), len(string_b)) The error_metric will be 0.0 if the strings are the same (after removing spaces, leading zeroes, etc) and 1.0 if they are completely different (no characters in common). ... and you don't want anything "kind of like soundex". That's a bit like saying you'd like to travel in an aeroplane "kind of like the Wright brothers' " ;-) Cheers, John -- http://mail.python.org/mailman/listinfo/python-list