"Steve Bergman" <[EMAIL PROTECTED]> writes: > I'm looking for a module to do fuzzy comparison of strings. I have 2 > item master files which are supposed to be identical, but they have > thousands of records where the item numbers don't match in various > ways. One might include a '-' or have leading zeros, or have a single > character missing, or a zero that is typed as a letter 'O'. That kind > of thing. These tables currently reside in a mysql database. I was > wondering if there is a good package to let me compare strings and > return a value that is a measure of their similarity. Kind of like > soundex but for strings that aren't words.
If you were using PostgreSQL there's a contrib package (pg_trgm) that could help a lot with that. It can show you the distance between two strings based on a trigram comparison. You can see how it works on the README (http://www.sai.msu.su/~megera/postgres/gist/pg_trgm/README.pg_trgm) and maybe port it for your needs. But it probably won't be a one operation only search, you'll have to post process results to decide what to do on multiple matches. -- Jorge Godoy <[EMAIL PROTECTED]> -- http://mail.python.org/mailman/listinfo/python-list