On Wed, Nov 17, 2010 at 8:21 PM, Sorin Schwimmer <sx...@yahoo.com> wrote: > Hi All, > > I have to eliminate diacritics in a fairly large file. > > Inspired by http://code.activestate.com/recipes/81330/, I came up with the > following code: > > #! /usr/bin/env python > > import re > > nodia={chr(196)+chr(130):'A', # mamaliga > chr(195)+chr(130):'A', # A^ > chr(195)+chr(142):'I', # I^ > chr(195)+chr(150):'O', # OE > chr(195)+chr(156):'U', # UE > chr(195)+chr(139):'A', # AE > chr(197)+chr(158):'S', > chr(197)+chr(162):'T', > chr(196)+chr(131):'a', # mamaliga > chr(195)+chr(162):'a', # a^ > chr(195)+chr(174):'i', # i^ > chr(195)+chr(182):'o', # oe > chr(195)+chr(188):'u', # ue > chr(195)+chr(164):'a', # ae > chr(197)+chr(159):'s', > chr(197)+chr(163):'t' > } > name="R\xc3\xa2\xc5\x9fca" > > regex = re.compile("(%s)" % "|".join(map(re.escape, nodia.keys()))) > print regex.sub(lambda mo: dict[mo.string[mo.start():mo.end()]], name)
Have you considered using string.maketrans() and str.translate() instead? It's simpler and likely faster than generating+using regexes like that. http://docs.python.org/library/string.html#string.maketrans Cheers, Chris -- Cue someone quoting Zawinski. http://blog.rebertia.com -- http://mail.python.org/mailman/listinfo/python-list