On Fri, Feb 17, 2006 at 02:33:12PM +0100, H. Stelling wrote:
> Patrick R. Michaud wrote:
> >>In the following,
> >>
> >>/ (a) [ (b) (c) | $5 := (d) $0 := (e) ] (f) /
> >>
> >>does the first alias have any effect on where the f's will go
> >>(probably not)?
> >
> >I'll defer to @Larry on this one, but my initial impression is
> >that the (f) capture would go into $6.
> 
> I think that sequences should behave exactly as single branch
> alternations (only that there is no such thing, although we
> can write "[foo|<fail>]"). So I would rather opt for $1.

The current implementation is that a capturing subpattern
is indexed based on the largest index in all of the alternation
branches.  I'm not sure it makes sense to base it on aliases of 
the last alternation branch.  

Here are some examples we can chew on:

    / (a) [ (b) (c) | (d) ] (f) /         # (f) is $3 or $2?  (currently $3)

    / (a) [ (b) (c) | $1 := (d) ] (f) /   # (f) is $3 or $2?

Since the second example is essentially saying the same as the first,
the (f) capture ought to go to the same place in each case.  If we
say that the existence of the $1 causes the (f) to go into $2, it
also becomes the case that $2 is an array of match objects, which
isn't technically problematic but it might be a bit surprising for
many.

Some other examples to consider:

    / (a) [ (b) (c) | $0 := (d) ] (f) /   # (f) is $3 or $1?  

    / (a) [ (b) (c) | $0 := (d) (3) ] (f) /   # (f) is $3 or $2? 

At any rate, I find that having a subpattern capture base its
index on the highest index of all of the previous alternation
branches is easy to understand and works well in practice.  It can
also be easily changed with another alias if needed.

> But wouldn't it be nice if the same rules applied to aliases and
> subrule invocations, that is, recursion put aside, to think of
> 
> / <foo> /
> 
> simply as a shorter way to say
> 
> / $<foo> := ([definition of foo]:) /?

First, is that colon following "[definition of foo]" intentional or
a typo?  Currently we can backtrack into subrules -- there's no "cut"
assumed after them.

But secondly, I'm not sure we can casually toss recursion
aside when thinking about this, since it's really a driving force 
behind having named subrules.  :-)  There's also a difference in
that subrules can take arguments, as in <foo('args')>, or can come
from another grammar, as in <Rule::foo>, which seems to argue that 
<foo> is really something other than an alias shorthand.

> The synopsis says:
> 
> * If a subrule appears two (or more) times in the same lexical scope
>   (i.e. twice within the same subpattern and alternation), or if the
>   subrule is quantified anywhere within the entire rule, then its
>   corresponding hash entry is always assigned a reference to an array
>   of Match objects, rather than a single Match object.
> 
> Maybe you're not the right person to ask, but is there a particular
> reason for the "entire rule" bit?
> 
> / (<foo>|None) <foo> (<foo>) /
> 
> Here we get three Matches $0<foo> (possibly undefined), $<foo>, and
> $1<foo>. At least, I think so.
> 
> / (<foo>?) <foo> (<foo>) /
> 
> Now, we suddenly get three more or less unrelated arrays with lengths
> 1..1, 1, and 1. Of course, I admit this example is a bit artificial.

Oh, I hadn't caught that particular clause (or hadn't read it as
you just did).  PGE certainly doesn't implement things that way.
I think the "entire rule" clause was intended to cover cases like

    / [ <foo> ]* /

where <foo> is indirectly quantified and therefore is an array of
match objects.  We should probably reword it, or get a clarification
of what is intended.  (Damian, @Larry:  can you confirm or clarify
this for us?)

> Furthermore, I think "within the same subpattern and alternation" is
> not quite correct, at least it wouldn't apply to somethink like
> 
> / (<foo> [ <foo> | ... ]) /
>
> unless we consider the (...) sequence as a kind of single branch
> alternation. And why are alternation branches considered to be
> lexical scopes, anyway? 

In the example you give, $0<foo> is indeed an array of match objects.
The "same alternation" in this case is the subpattern... compare to

   / (<foo> [ <foo> | ... ]) | <foo> /

$0<foo> is an array, $<foo> is a single match object.

Alternation branches don't create new lexical scopes, they just
affect quantification and subpattern numbering.  In both of the 
following examples

    / abc <foo> def <foo> /

    / ghi <foo> | jkl <foo> /

each <foo> has the same lexical scope ($<foo>), but in the "abc"
example $<foo> is an array of match objects, while in the "ghi"
example $<foo> is a single match object.

> My second question is why adding a "?" or "??" to an unquantified
> subrule which would otherwise result in a single Match object should
> result in an array, rather than a single (possibly undefined) Match.

The specification was originally this way but was later changed
to the current definition.  I think people found the idea of
"?" producing a single match object confusing, so for consistency
we ended up with "all quantifiers produces arrays of match objects".

(Note also that even if "?" produced a single Match object instead
of an array, it wouldn't be "undefined" -- it would be a failed Match.)

Pm

Reply via email to