Jeffrey> However, when I apply it to this Unicode string, I get only the Jeffrey> first 3 letters of the surname:
Jeffrey> name = 'Anton\xc3\xadn Dvo\xc5\x99\xc3\xa1k' Maybe name = unicode('Anton\xc3\xadn Dvo\xc5\x99\xc3\xa1k', "utf-8") ? Yup, that works: >>> name = unicode('Anton\xc3\xadn Dvo\xc5\x99\xc3\xa1k', "utf-8") >>> name u'Anton\xedn Dvo\u0159\xe1k' >>> surname = r'(?u).+ (\w+)' >>> import re >>> surname_re = re.compile(surname) >>> m = surname_re.search(name) >>> m.groups() (u'Dvo\u0159\xe1k',) -- http://mail.python.org/mailman/listinfo/python-list