On 24/03/2006 11:44 PM, Peter Otten wrote: > John Machin wrote: > > >>0x00d0: ord('D'), # Ð >>0x00f0: ord('o'), # ð >>Icelandic capital eth becomes D, OK; but the small letter becomes o!!! > > > I see information flow from Iceland is a bit better than from Armenia :-)
No information flow needed. Capital letter BLAH -> D and small letter BLAH -> o should trigger one's palpable nonsense detector for *any* BLAH. > > >>Some of the transformations are a little unfortunate :-( > > > The OP, as you pointed out in your first post in this thread, has more > pressing problems with his normalization approach. > > Lastly, even if all went well, turning a list of French addresses into an > ascii-uppercase graveyard would be a sad thing to do... Oh indeed. Not only sad, but incredibly stupid. I fervently hope and trust that such a normalisation is intended only for fuzzy matching purposes. I can't imagine that anyone would contemplate writing the output to storage for any reason other than logging or for regression testing. Update it back to the database? Do you know anyone who would do that?? -- http://mail.python.org/mailman/listinfo/python-list