On Fri, 13 Jun 2014 17:17:06 +0200, BrJohan wrote: > Or to put the namevariants in some sequence of sets having elements > like: ("Kristina", "Christina", "Cristine", "Kristine")
> Matching is then just applying the 'in' operator. That's definitely a better approach, for the reasons you mentioned. > Comments? A soundex (or similar) algorithm will be better in the long run for the less common, but more often misspelled names. It's fairly simple to guess at a number of common spellings for names that *you* think are common now, but what about names that run in families that aren't yours, or aren't that common outside of that family, or were wildly popular a couple of hundred years ago but have fallen out of favor now? My wife's ancestors (she's the genealogist, I just get to hear the horror stories) are notorious for being somewhat illiterate; for changing their names, on purpose, after a feud, in order to "distance" themselves from their relatives; and also for using not-common-now (or even not-so-common-then) names. Add in somewhat illiterate records keepers and hospital workers (or midwives or neighbors), not to mention bad copies of bad copies of centuries-old smudged documents, and you have an instant soup of names that sound alike but are spelled differently in ways you cannot guess ahead of time. Your users will appreciate *some* sort of fuzzy matching, or runtime extensibility, atop the "obvious" spellings you take the time to include in your software. And that's *not* a comment on your abilities; it's a comment on the abilities and creativity of their ancestors. Dan -- https://mail.python.org/mailman/listinfo/python-list