Re: regex help: splitting string gets weird groups

Tim Chase Thu, 08 Apr 2010 13:05:05 -0700

gry wrote:

[ python3.1.1, re.__version__='2.2.1' ]
I'm trying to use re to split a string into (any number of) pieces of
these kinds:
1) contiguous runs of letters
2) contiguous runs of digits
3) single other characters


e.g.   555tHe-rain.in#=1234   should give:   [555, 'tHe', '-', 'rain',
'.', 'in', '#', '=', 1234]
I tried:

re.match('^(([A-Za-z]+)|([0-9]+)|([-.#=]))+$', '555tHe-rain.in#=1234').groups()

('1234', 'in', '1234', '=')

Why is 1234 repeated in two groups?  and why doesn't "tHe" appear as a
group?  Is my regexp illegal somehow and confusing the engine?

well, I'm not sure what it thinks its finding but nested capture-groupsalways produce somewhat weird results for me (I suspect that's what'striggering the duplication). Additionally, you're only searching forone match (.match() returns a single match-object or None; not allpossible matches within the repeated super-group).

I *would* like to understand what's wrong with this regex, though if
someone has a neat other way to do the above task, I'm also interested
in suggestions.


Tweaking your original, I used

  >>> s='555tHe-rain.in#=1234'
  >>> import re
  >>> r=re.compile(r'([a-zA-Z]+|\d+|.)')
  >>> r.findall(s)
  ['555', 'tHe', '-', 'rain', '.', 'in', '#', '=', '1234']

The only difference between my results and your results is that the 555and 1234 come back as strings, not ints.


-tkc




--
http://mail.python.org/mailman/listinfo/python-list

Re: regex help: splitting string gets weird groups

Reply via email to