David Siroky: > output = ''
I suspect you really want "output = u''" here. > for c in line: > if not unicodedata.combining(c): > output += c This is creating as many as 50000 new string objects of increasing size. To build large strings, some common faster techniques are to either create a list of characters and then use join on the list or use a cStringIO to accumulate the characters. This is about 10 times faster for me: def no_diacritics(line): if type(line) != unicode: line = unicode(line, 'utf-8') line = unicodedata.normalize('NFKD', line) output = [] for c in line: if not unicodedata.combining(c): output.append(c) return u''.join(output) Neil -- http://mail.python.org/mailman/listinfo/python-list