Re: python regex character group matches

Fredrik Lundh Wed, 17 Sep 2008 06:58:48 -0700

christopher taylor wrote:

my issue, is that the pattern i used was returning:


[ '\\uAD0X', '\\u1BF3', ... ]

when i expected:

[ '\\uAD0X\\u1BF3', ]

the code looks something like this:

pat = re.compile("(\\\u[0-9A-F]{4})+", re.UNICODE|re.LOCALE)
#print pat.findall(txt_line)
results = pat.finditer(txt_line)

i ran the pattern through a couple of my colleagues and they were all
in agreement that my pattern should have matched correctly.

First, [0-9A-F] cannot match an "X". Assuming that's a typo, your nextproblem is a precedence issue: (X)+ means "one or more (X)", not "one ormore X inside parens". In other words, that pattern matches one or moreX's and captures the last one.

Assuming that you want to find runs of \uXXXX escapes, simply usenon-capturing parentheses:


   pat = re.compile(u"(?:\\\u[0-9A-F]{4})")

and use group(0) instead of group(1) to get the match.

</F>

--
http://mail.python.org/mailman/listinfo/python-list

Re: python regex character group matches

Reply via email to