Marc 'BlackJack' Rintsch wrote: > What about names with letters not in the ASCII range?
Like Asian names? The names we encounter are spelled out in English... like Xu, Zu, Li-Cheng, Matsumoto, Wantanabee, etc. So the ASCII approach would still work. I guess. My first thought was to spell out names entirely, but that quickly seemed a bad idea. Doing an re on smith with whitespace boundaries is more accurate than smi w/o, but the volume of names just makes it impossible. And the volume of false positives using only smi makes it somewhat worthless too. It's tough when a problem needs an accurate yet broad solution. Too broad and the results are irrelevant as they'll include so many false positives, too accurate and the results will be missing a few names. It's a no-win :( Thanks for the advice. Brad -- http://mail.python.org/mailman/listinfo/python-list