Re: Different number of matches from re.findall and re.split

Tim Chase Tue, 12 Jan 2010 18:55:18 -0800

Steve Holden wrote:

Steve Holden wrote:
[...]

Can someone explain why these two commands are giving different
results?  I thought I should have the same number of matches (or maybe
different by 1, but not 6000!)

re.MULTLINE is apprently 1, and you are providing it as the "maxsplit"
argument. Check the API in the documentation.

Sorry, I presume re.MULTILINE must actually be zero for the result of
re,split() to be of length 1 ...

Because it's not doing a multiline split and it's anchored at thebeginning of the line, it only returns one result (there'snothing before the start-of-line to return as the left-side ofthe split):


>>> import re
>>> re.MULTILINE
8
>>> s = """
... abc
... def
... abc
... def"""
>>> re.split('^', s,  re.MULTILINE)
['\nabc\ndef\nabc\ndef']
>>> re.split('b', s,  re.MULTILINE)
['\na', 'c\ndef\na', 'c\ndef']
>>> re.split('b', 'ab'*10,  re.MULTILINE)
['a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'abab']

But your original logic is sound...the 3rd argument re.split() is"maxsplit" not "flags", and if you want to use flags with.split() you have to either specify it within the regexp or bycompiling the regexp and using the resulting compiled object asdetailed elsewhere in the thread by MRAB and Duncan.


-tkc



--
http://mail.python.org/mailman/listinfo/python-list

Re: Different number of matches from re.findall and re.split

Reply via email to