Jeffrey Barish wrote:
> I have a regular expression that I use to extract the surname:
>
> surname = r'(?u).+ (\w+)'
>
> However, when I apply it to this Unicode string, I get only the first 3
> letters of the surname:
>
> name = 'Anton\xc3\xadn Dvo\xc5\x99\xc3\xa1k'
That's a byte string. You
Jeffrey> However, when I apply it to this Unicode string, I get only the
Jeffrey> first 3 letters of the surname:
Jeffrey> name = 'Anton\xc3\xadn Dvo\xc5\x99\xc3\xa1k'
Maybe
name = unicode('Anton\xc3\xadn Dvo\xc5\x99\xc3\xa1k', "utf-8")
? Yup, that works:
>>> name = unico