Tim Peters <t...@python.org> added the comment:

Without re.IGNORECASE, the leading

    ^[_a-z0-9-]+

can't even match the first character (`val` starts with uppercase Z), so it 
fails instantly.

With re.IGNORECASE, it's not "stuck", but is taking a verrrrrry long time to 
try an enormous number of (ultimately doomed) possibilities due to the way the 
regexp is written.

This is due to using nested quantifiers for no apparent reason.  For any 
character class C,

    (C+)*

matches the same set of strings as

    C*

but the former way can _try_ to match in an exponential (in the length of the 
string) number of ways.  So replace

    ([\.'_a-z0-9-]+)*
and
    ([\.a-z0-9-]+)*

with

    [\.'_a-z0-9-]*
and
    [\.a-z0-9-]*

and it fails to match `val` quickly (even with re.IGNORECASE).

For more on this (which applies to many regexp implementations, not just 
Python's), here's a start:

https://www.mathworks.com/matlabcentral/answers/95953-why-can-nested-quantifiers-in-regexp-can-cause-inefficient-failures-in-matlab-6-5-r13

The "Mastering Regular Expressions" book referenced in that answer is an 
excellent book-length treatment of this (and related) topic(s).

----------
nosy: +tim.peters
resolution:  -> not a bug
stage:  -> resolved
status: open -> closed

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue35932>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to