Author: larry
Date: Tue Jul 10 17:39:45 2007
New Revision: 14428
Modified:
doc/trunk/design/syn/S05.pod
Log:
The ** form is now syntactically independent of the following token.
This allows us to distinguish literal counts and ranges from indirect ones
specified via closure. It also allows a notational simplification for
infix repetition suggested by Morrie Siegel++. (As a consequence, the ?
character to specify minimal matching now attaches to the ** directly.)
Modified: doc/trunk/design/syn/S05.pod
==
--- doc/trunk/design/syn/S05.pod(original)
+++ doc/trunk/design/syn/S05.podTue Jul 10 17:39:45 2007
@@ -14,9 +14,9 @@
Maintainer: Patrick Michaud <[EMAIL PROTECTED]> and
Larry Wall <[EMAIL PROTECTED]>
Date: 24 Jun 2002
- Last Modified: 9 Jul 2007
+ Last Modified: 10 Jul 2007
Number: 5
- Version: 60
+ Version: 61
This document summarizes Apocalypse 5, which is about the new regex
syntax. We now try to call them I rather than "regular
@@ -676,28 +676,60 @@
=item *
-The repetition specifier is now C<**{...}> for maximal matching,
-with a corresponding C<**{...}?> for minimal matching. Space is
-allowed on either side of the asterisks. The curlies are taken to
-be a closure returning an Int or a Range object.
+The general repetition specifier is now C<**> for maximal matching,
+with a corresponding C<**?> for minimal matching. Space is
+allowed on either side. The next token will determine what kind of
+repetition is desired:
- / value was (\d ** {1..6}?) with ([\w]**{$m..$n}) /
+If the next thing is an integer, then it is parsed as either as an exact
+count or a range:
+
+. ** 42 # match exactly 42 times
+ ** 3..* # match 3 or more times
+
+This form is considered declarational.
+
+If you supply a closure, it should return either an C or a C
object.
+
+'x' ** {$m} # exact count returned from closure
+ ** {$m..$n}# range returned from closure
+
+/ value was (\d **? {1..6}) with ([ \w* ]**{$m..$n}) /
It is illegal to return a list, so this easy mistake fails:
- / [foo]**{1,3} /
+/ [foo] ** {1,3} /
+
+The closure form is always considered procedural, so the item it is
+modifying is never considered part of the longest token.
+
+If you supply any other atom (which may not be quantified), it is
+interpreted as a separator (such as an infix operator), and the
+initial item is quantified by the number of times the separator is
+seen between items:
+
+ ** '|'# repetition controlled by presence of separator
+ ** # repetition controlled by presence of separator
+ ** [ \!?'==' ] # repetition controlled by presence of separator
+
+A successful match of such a quantifier always ends "in the middle",
+that is, after the initial item but before the next separator.
+(The separator never matches independently of the next item; if the
+separator matches but the next item fails, it backtracks all the way
+back through the separator.) Therefore
+
+/ ** ',' /
+
+can match
+
+foo
+foo,bar
+foo,bar,baz
-(At least, it fails in the absence of C,
-which is likely to be unimplemented in PerlĀ 6.0.0 anyway.)
+but never
-The optimizer will likely optimize away things like C<**{1..*}>
-so that the closure is never actually run in that case. But it's
-a closure that must be run in the general case, so you can use
-it to generate a range on the fly based on the earlier matching.
-(Of course, bear in mind the closure must be run I attempting to
-match whatever it quantifies.) A closure that must be run is considered
-procedural, but a closure that recognizably returns the same thing every
-time is considered declarative.
+foo,
+foo,bar,
=item *