On Wed, 17 Sep 2008 09:27:47 -0400, christopher taylor wrote: > the other day, i was trying to match unicode character sequences that > looked like this: > > \\uAD0X... > > my issue, is that the pattern i used was returning: > > [ '\\uAD0X', '\\u1BF3', ... ] > > when i expected: > > [ '\\uAD0X\\u1BF3', ] > > the code looks something like this: > > pat = re.compile("(\\\u[0-9A-F]{4})+", re.UNICODE|re.LOCALE) #print > pat.findall(txt_line) > results = pat.finditer(txt_line) > > i ran the pattern through a couple of my colleagues and they were all in > agreement that my pattern should have matched correctly.
Correctly for what input? And the examples above are not matching (no pun intended) the regular expression. `pat` doesn't match '\\uAD0X' because there's no 'X' in the character class. BTW: Are you sure you need or want the `re.UNICODE` flag? Ciao, Marc 'BlackJack' Rintsch -- http://mail.python.org/mailman/listinfo/python-list