gry wrote:
[ python3.1.1, re.__version__='2.2.1' ]
I'm trying to use re to split a string into (any number of) pieces of
these kinds:
1) contiguous runs of letters
2) contiguous runs of digits
3) single other characters
e.g. 555tHe-rain.in#=1234 should give: [555, 'tHe', '-', 'rain',
'.', 'in', '#', '=', 1234]
I tried:
re.match('^(([A-Za-z]+)|([0-9]+)|([-.#=]))+$', '555tHe-rain.in#=1234').groups()
('1234', 'in', '1234', '=')
Why is 1234 repeated in two groups? and why doesn't "tHe" appear as a
group? Is my regexp illegal somehow and confusing the engine?
well, I'm not sure what it thinks its finding but nested capture-groups
always produce somewhat weird results for me (I suspect that's what's
triggering the duplication). Additionally, you're only searching for
one match (.match() returns a single match-object or None; not all
possible matches within the repeated super-group).
I *would* like to understand what's wrong with this regex, though if
someone has a neat other way to do the above task, I'm also interested
in suggestions.
Tweaking your original, I used
>>> s='555tHe-rain.in#=1234'
>>> import re
>>> r=re.compile(r'([a-zA-Z]+|\d+|.)')
>>> r.findall(s)
['555', 'tHe', '-', 'rain', '.', 'in', '#', '=', '1234']
The only difference between my results and your results is that the 555
and 1234 come back as strings, not ints.
-tkc
--
http://mail.python.org/mailman/listinfo/python-list