Author: larry
Date: Fri Jun 30 15:31:34 2006
New Revision: 9728

Modified:
   doc/trunk/design/syn/S03.pod
   doc/trunk/design/syn/S05.pod

Log:
<( and )> no longer need to balance.
<< and >> are now directional word boundaries, along with « and ».
<?wb> is generic replacement for \b, <!wb> for \B
Clarified case semantics of array subrules.


Modified: doc/trunk/design/syn/S03.pod
==============================================================================
--- doc/trunk/design/syn/S03.pod        (original)
+++ doc/trunk/design/syn/S03.pod        Fri Jun 30 15:31:34 2006
@@ -949,6 +949,8 @@
     submethod foo
     multi foo
     proto foo
+    macro foo
+    quote qX
     regex foo
     rule foo
     token foo

Modified: doc/trunk/design/syn/S05.pod
==============================================================================
--- doc/trunk/design/syn/S05.pod        (original)
+++ doc/trunk/design/syn/S05.pod        Fri Jun 30 15:31:34 2006
@@ -16,7 +16,7 @@
    Date: 24 Jun 2002
    Last Modified: 30 June 2006
    Number: 5
-   Version: 26
+   Version: 27
 
 This document summarizes Apocalypse 5, which is about the new regex
 syntax.  We now try to call them I<regex> because they haven't been
@@ -557,7 +557,8 @@
 As with a scalar variable, each element is matched as a literal
 unless it happens to be a C<Regex> object, in which case it is matched
 as a subrule.  As with scalar subrules, a tainted subrule always fails.
-All values pay attention to the current C<:ignorecase> setting.
+All string values pay attention to the current C<:ignorecase> setting,
+while C<Regex> values use their own C<:ignorecase> settings.
 
 =item *
 
@@ -611,11 +612,15 @@
 
 =head1 Extensible metasyntax (C<< <...> >>)
 
-=over
+Both C<< < >> and C<< > >> are metacharacters, and are usually (but not
+always) used in matched pairs.  (Some combinations of metacharacters
+function as standalone tokens, and these may include angles.  These are
+describe below.)
 
-=item *
+For matched pairs, the first character after C<< < >> determines the
+behavior of the assertion:
 
-The first character after C<< < >> determines the behavior of the assertion.
+=over
 
 =item *
 
@@ -799,27 +804,6 @@
 
 =item *
 
-A leading C<(> indicates the start of a result capture:
-
-    / foo <( \d+ )> bar /
-
-is equivalent to:
-
-    / <after foo> \d+ <before bar> /
-
-except that the scan for "C<foo>" can be done in the forward direction,
-while a lookbehind assertion would presumably scan for C<\d+> and then
-match "C<foo>" backwards.  The use of C<< <(...)> >> affects only the
-meaning of the I<result object> and the positions of the beginning and
-ending of the match.  That is, after the match above, C<$()> contains
-only the digits matched, and C<.pos> is pointing to after the digits.
-Other captures (named or numbered) are unaffected and may be accessed
-through C<$/>.
-
-It is a syntax error to use an unbalanced C<< <( >> or C<< )> >>.
-
-=item *
-
 A leading C<[> or C<+> indicates an enumerated character class.  Ranges
 in enumerated character classes are indicated with C<..>.
 
@@ -858,6 +842,24 @@
 
 =item *
 
+In general, any general quoting form such as C<q> or C<qq> will be
+recognized as if it had curlies around it.  This includes quotes
+declared with the C<quote> declarator:
+
+    quote qX = q:x:c;
+    /<qX[cat -n {$foo}]>/
+
+same as
+
+    /<{ qX[cat -n {$foo}] }>/
+
+This hides any qX rule that might be defined in the gramma.  Note that
+this means that the language parser has to pass the current list
+of quote forms into the regex parser since it needs to be known at
+compile time.
+
+=item *
+
 The special assertion C<< <.> >> matches any logical grapheme
 (including a Unicode combining character sequences):
 
@@ -876,13 +878,43 @@
 Note that C<< <!alpha> >> is different from C<< <-alpha> >> because the
 latter matches C</./> when it is not an alpha.
 
+=back
+
+The following tokens include angles but are not required to balance:
+
+=over
+
 =item *
 
-Conjecture: Multiple opening angles are matched by a corresponding
-number of closing angles, and otherwise function as single angles.
-This can be used to visually isolate unmatched angles inside:
+A C<< <( >> token indicates the start of a result capture, while the
+corresponding C<< )> >> token indicates its endpoint.  When matched,
+these behave as assertions that are always true, but have the side
+effect of setting the C<.from> and C<.to> attributes of the match
+object.  That is:
+
+    / foo <( \d+ )> bar /
+
+is equivalent to:
+
+    / <after foo> \d+ <before bar> /
+
+except that the scan for "C<foo>" can be done in the forward direction,
+while a lookbehind assertion would presumably scan for C<\d+> and then
+match "C<foo>" backwards.  The use of C<< <(...)> >> affects only the
+meaning of the I<result object> and the positions of the beginning and
+ending of the match.  That is, after the match above, C<$()> contains
+only the digits matched, and C<.pos> is pointing to after the digits.
+Other captures (named or numbered) are unaffected and may be accessed
+through C<$/>.
+
+=item *
 
-    <<<Ccode: a >> 1>>>
+A C<«> or C<<< << >>> token indicates a left word boundary.  A C<»> or
+C<<< >> >>> token indicates a right word boundary.  (As separate tokens,
+these need not be balanced.)  Perl 5's C<\b> is replaced by a C<< <?wb> >>
+"word boundary" assertion, while C<\B> becomes C<< <!wb> >>.  (None of
+these are dependent on the definition of C<< <ws> >>, but only on the C<\s>
+definition of whitespace.)
 
 =back
 

Reply via email to