Gareth Rees added the comment:

This is a well-known gotcha with backtracking regexp implementations. The 
problem is that in the alternation "( +|'[^']*'|\"[^\"]*\"|[^>]+)" there are 
some characters (space, apostrophe, double quotes) that match multiple 
alternatives (for example a space matches both " +" and "[^>]+"). This causes 
the regexp engine to have to backtrack for each ambiguous character to try out 
the other alternatives, leading to runtime that's exponential in the number of 
ambiguous characters.

Linear behaviour can be restored if you make the alternation unambiguous, like 
this: ( +|'[^']*'|\"[^\"]*\"|[^>'\"]+)

----------
nosy: +Gareth.Rees

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue28690>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to