On Fri, Feb 17, 2006 at 02:33:12PM +0100, H. Stelling wrote: > Patrick R. Michaud wrote: > >>In the following, > >> > >>/ (a) [ (b) (c) | $5 := (d) $0 := (e) ] (f) / > >> > >>does the first alias have any effect on where the f's will go > >>(probably not)? > > > >I'll defer to @Larry on this one, but my initial impression is > >that the (f) capture would go into $6. > > I think that sequences should behave exactly as single branch > alternations (only that there is no such thing, although we > can write "[foo|<fail>]"). So I would rather opt for $1.
The current implementation is that a capturing subpattern is indexed based on the largest index in all of the alternation branches. I'm not sure it makes sense to base it on aliases of the last alternation branch. Here are some examples we can chew on: / (a) [ (b) (c) | (d) ] (f) / # (f) is $3 or $2? (currently $3) / (a) [ (b) (c) | $1 := (d) ] (f) / # (f) is $3 or $2? Since the second example is essentially saying the same as the first, the (f) capture ought to go to the same place in each case. If we say that the existence of the $1 causes the (f) to go into $2, it also becomes the case that $2 is an array of match objects, which isn't technically problematic but it might be a bit surprising for many. Some other examples to consider: / (a) [ (b) (c) | $0 := (d) ] (f) / # (f) is $3 or $1? / (a) [ (b) (c) | $0 := (d) (3) ] (f) / # (f) is $3 or $2? At any rate, I find that having a subpattern capture base its index on the highest index of all of the previous alternation branches is easy to understand and works well in practice. It can also be easily changed with another alias if needed. > But wouldn't it be nice if the same rules applied to aliases and > subrule invocations, that is, recursion put aside, to think of > > / <foo> / > > simply as a shorter way to say > > / $<foo> := ([definition of foo]:) /? First, is that colon following "[definition of foo]" intentional or a typo? Currently we can backtrack into subrules -- there's no "cut" assumed after them. But secondly, I'm not sure we can casually toss recursion aside when thinking about this, since it's really a driving force behind having named subrules. :-) There's also a difference in that subrules can take arguments, as in <foo('args')>, or can come from another grammar, as in <Rule::foo>, which seems to argue that <foo> is really something other than an alias shorthand. > The synopsis says: > > * If a subrule appears two (or more) times in the same lexical scope > (i.e. twice within the same subpattern and alternation), or if the > subrule is quantified anywhere within the entire rule, then its > corresponding hash entry is always assigned a reference to an array > of Match objects, rather than a single Match object. > > Maybe you're not the right person to ask, but is there a particular > reason for the "entire rule" bit? > > / (<foo>|None) <foo> (<foo>) / > > Here we get three Matches $0<foo> (possibly undefined), $<foo>, and > $1<foo>. At least, I think so. > > / (<foo>?) <foo> (<foo>) / > > Now, we suddenly get three more or less unrelated arrays with lengths > 1..1, 1, and 1. Of course, I admit this example is a bit artificial. Oh, I hadn't caught that particular clause (or hadn't read it as you just did). PGE certainly doesn't implement things that way. I think the "entire rule" clause was intended to cover cases like / [ <foo> ]* / where <foo> is indirectly quantified and therefore is an array of match objects. We should probably reword it, or get a clarification of what is intended. (Damian, @Larry: can you confirm or clarify this for us?) > Furthermore, I think "within the same subpattern and alternation" is > not quite correct, at least it wouldn't apply to somethink like > > / (<foo> [ <foo> | ... ]) / > > unless we consider the (...) sequence as a kind of single branch > alternation. And why are alternation branches considered to be > lexical scopes, anyway? In the example you give, $0<foo> is indeed an array of match objects. The "same alternation" in this case is the subpattern... compare to / (<foo> [ <foo> | ... ]) | <foo> / $0<foo> is an array, $<foo> is a single match object. Alternation branches don't create new lexical scopes, they just affect quantification and subpattern numbering. In both of the following examples / abc <foo> def <foo> / / ghi <foo> | jkl <foo> / each <foo> has the same lexical scope ($<foo>), but in the "abc" example $<foo> is an array of match objects, while in the "ghi" example $<foo> is a single match object. > My second question is why adding a "?" or "??" to an unquantified > subrule which would otherwise result in a single Match object should > result in an array, rather than a single (possibly undefined) Match. The specification was originally this way but was later changed to the current definition. I think people found the idea of "?" producing a single match object confusing, so for consistency we ended up with "all quantifiers produces arrays of match objects". (Note also that even if "?" produced a single Match object instead of an array, it wouldn't be "undefined" -- it would be a failed Match.) Pm