Re: Regular expression bug?

MRAB Thu, 19 Feb 2009 12:04:18 -0800

Ron Garret wrote:

I'm trying to split a CamelCase string into its constituent components.This kind of works:
re.split('[a-z][A-Z]', 'fooBarBaz')
['fo', 'a', 'az']
but it consumes the boundary characters. To fix this I tried usinglookahead and lookbehind patterns instead, but it doesn't work:
re.split('((?<=[a-z])(?=[A-Z]))', 'fooBarBaz')
['fooBarBaz']

However, it does seem to work with findall:
re.findall('(?<=[a-z])(?=[A-Z])', 'fooBarBaz')
['', '']
So the regular expression seems to be doing the Right Thing. Is this abug in re.split, or am I missing something?
(BTW, I tried looking at the source code for the re module, but I couldnot find the relevant code. re.split calls sre_compile.compile().split,but the string 'split' does not appear in sre_compile.py. So where doesthis method come from?)
I'm using Python2.5.

I, amongst others, think it's a bug (or 'misfeature'); Guido thinks it
might be intentional, but changing it could break some existing code.
You could do this instead:

>>> re.sub('(?<=[a-z])(?=[A-Z])', '@', 'fooBarBaz').split('@')
['foo', 'Bar', 'Baz']
--
http://mail.python.org/mailman/listinfo/python-list

Re: Regular expression bug?

Reply via email to