John Machin wrote: > Another point: there are many non-latin1 characters that could be > mapped to ASCII. For example: > u"\u0141ukasziewicz".translate(unaccented_map()) > doesn't work unless an entry is added to the no-decomposition table: > 0x0141: u"L", # LATIN CAPITAL LETTER L WITH STROKE > > It looks like generating extra entries like that could be done, with > the aid of unicodedata.name(): > > LATIN CAPITAL LETTER X WITH blahblah -> "X" > LATIN SMALL LETTER X WITH blahblah -> "X".lower() > > This would require a fair bit of care -- obviously there are special > cases like LATIN CAPITAL LETTER O WITH STROKE. Eyeballing by regional > experts is probably required.
see the comments over at http://effbot.org/zone/unicode-convert.htm for an extended table, eyeballed by a regional expert (and since he makes the same point about OE vs Oe as you do, I'll probably have to change the code ;-) </F> -- http://mail.python.org/mailman/listinfo/python-list