On 8 Apr, 19:49, gry <georgeryo...@gmail.com> wrote: > [ python3.1.1, re.__version__='2.2.1' ] > I'm trying to use re to split a string into (any number of) pieces of > these kinds: > 1) contiguous runs of letters > 2) contiguous runs of digits > 3) single other characters > > e.g. 555tHe-rain.in#=1234 should give: [555, 'tHe', '-', 'rain', > '.', 'in', '#', '=', 1234] > I tried:>>> re.match('^(([A-Za-z]+)|([0-9]+)|([-.#=]))+$', > '555tHe-rain.in#=1234').groups() > > ('1234', 'in', '1234', '=') > > Why is 1234 repeated in two groups? and why doesn't "tHe" appear as a > group? Is my regexp illegal somehow and confusing the engine? > > I *would* like to understand what's wrong with this regex, though if > someone has a neat other way to do the above task, I'm also interested > in suggestions.
I would avoid .match and use .findall (if you walk through them both together, it'll make sense what's happening with your match string). >>> s = """555tHe-rain.in#=1234""" >>> re.findall('[A-Za-z]+|[0-9]+|[-.#=]', s) ['555', 'tHe', '-', 'rain', '.', 'in', '#', '=', '1234'] hth, Jon. -- http://mail.python.org/mailman/listinfo/python-list