I have a regular expression that I use to extract the surname: surname = r'(?u).+ (\w+)'
However, when I apply it to this Unicode string, I get only the first 3 letters of the surname: name = 'Anton\xc3\xadn Dvo\xc5\x99\xc3\xa1k' surname_re = re.compile(surname) m = surname_re.search(name) m.groups() ('Dvo\xc5',) I suppose that there is an encoding problem, but I don't understand Unicode well enough to know what to do to digest properly the Unicode characters in the surname. -- Jeffrey Barish -- http://mail.python.org/mailman/listinfo/python-list