2011/9/30 Ovidiu Deac <ovidiud...@gmail.com>: > This is only part of a regex taken from an old perl application which > we are trying to understand/port to our new Python implementation. > > The original regex was considerably more complex and it didn't compile > in python so I removed all the parts I could in order to isolate the > problem such that I can ask help here. > > So the problem is that this regex doesn't compile. On the other hand > I'm not really sure it should. It's an anchor on which you apply *. > I'm not sure if this is legal. > > On the other hand if I remove one of the * it compiles. > >>>> re.compile(r"""^(?: [^y]* )*""", re.X) > Traceback (most recent call last): > File "<stdin>", line 1, in <module> > File "/usr/lib/python2.6/re.py", line 190, in compile > return _compile(pattern, flags) > File "/usr/lib/python2.6/re.py", line 245, in _compile > raise error, v # invalid expression > sre_constants.error: nothing to repeat >>>> re.compile(r"""^(?: [^y] )*""", re.X) > <_sre.SRE_Pattern object at 0x7f4069cc36b0> >>>> re.compile(r"""^(?: [^y]* )""", re.X) > <_sre.SRE_Pattern object at 0x7f4069cc3730> > > Is this a bug in python regex engine? Or maybe some incompatibility with Perl? > > On Fri, Sep 30, 2011 at 12:29 PM, Chris Angelico <ros...@gmail.com> wrote: >> On Fri, Sep 30, 2011 at 7:26 PM, Ovidiu Deac <ovidiud...@gmail.com> wrote: >>> $ python --version >>> Python 2.6.6 >> >> Ah, I think I was misinterpreting the traceback. You do actually have >> a useful message there; it's the same error that my Py3.2 produced: >> >> sre_constants.error: nothing to repeat >> >> I'm not sure what your regex is trying to do, but the problem seems to >> be connected with the * at the end of the pattern. >> >> ChrisA >> --
I believe, this is a limitation of the builtin re engine concerning nested infinite quantifiers - (...*)* - in your pattern. You can try a more powerful recent regex implementation, which appears to handle it: http://pypi.python.org/pypi/regex using the VERBOSE flag - re.X all (unescaped) whitespace outside of character classes is ignored, http://docs.python.org/library/re.html#re.VERBOSE the pattern should be equivalent to: r"^(?:[^y]*)*" ie. you are not actually gaining anything with double quantifier, as there isn't anything "real" in the pattern outside [^y]* It appears, that you have oversimplified the pattern (if it had worked in the original app), however, you may simply try with import regex as re and see, if it helps. Cf: >>> >>> regex.findall(r"""^(?: [^y]* )*""", "a bcd e", re.X) ['a bcd e'] >>> re.findall(r"""^(?: [^y]* )*""", "a bcd e", re.X) Traceback (most recent call last): File "<input>", line 1, in <module> File "re.pyc", line 177, in findall File "re.pyc", line 244, in _compile error: nothing to repeat >>> >>> re.findall(r"^(?:[^y]*)*", "a bcd e") Traceback (most recent call last): File "<input>", line 1, in <module> File "re.pyc", line 177, in findall File "re.pyc", line 244, in _compile error: nothing to repeat >>> regex.findall(r"^(?:[^y]*)*", "a bcd e") ['a bcd e'] >>> regex.findall(r"^[^y]*", "a bcd e") ['a bcd e'] >>> hth, vbr -- http://mail.python.org/mailman/listinfo/python-list