On 2012-10-05 16:27, Evan Driscoll wrote:
On 10/05/2012 04:23 AM, Duncan Booth wrote:
A regular expression element may be followed by a quantifier.
Quantifiers are '*', '+', '?', '{n}', '{n,m}' (and lazy quantifiers
'*?', '+?', '{n,m}?'). There's nothing in the regex language which says
you can follow an element with two quantifiers.
In fact, *you* did -- the first sentence of that paragraph! :-)
\s is a regex, so you can follow it with a quantifier and get \s{6}.
That's also a regex, so you should be able to follow it with a quantifier.
I can understand that you can create a grammar that excludes it. I'm
actually really interested to know if anyone knows whether this was a
deliberate decision and, if so, what the reason is. (And if not --
should it be considered a (low priority) bug?)
Was it because such patterns often reveal a mistake? Because "\s{6}+"
has other meanings in different regex syntaxes and the designers didn't
want confusion? Because it was simpler to parse that way? Because the
"hey you recognize regular expressions by converting it to a finite
automaton" story is a lie in most real-world regex implementations (in
part because they're not actually regular expressions) and repeated
quantifiers cause problems with the parsing techniques that actually get
used?
You rarely want to repeat a repeated element. It can also result in
catastrophic
backtracking unless you're _very_ careful.
In many other regex implementations (including mine), "*+", "*+" and
"?+" are possessive quantifiers, much as "??", "*?" and "??" are lazy
quantifiers.
You could, of course, ask why adding "?" after a quantifier doesn't
make it optional, e.g. why r"\s{6}?" doesn't mean the same as
r"(?:\s{6})?", or why r"\s{0,6}?" doesn't mean the same as
r"(?:\s{0,6})?".
--
http://mail.python.org/mailman/listinfo/python-list