Re: A rule by any other name...

Damian Conway Tue, 09 May 2006 18:26:53 -0700

Allison wrote:

I'm comfortable with the semantic distinction between 'rule' as "thingy
inside a grammar" and 'regex' as "thingy outside a grammar". But, I
think we can find a better name than 'regex'. The problem is both the
'regex' vs. 'regexp' battle,


Is that really an issue? I've never met anyone who *voluntarily* added
the 'p'. ;-)

 and the fact that everyone knows 'regex(p)'
means "regular expression" no matter how may times we say it doesn't.


Sure. But almost nobody knows what "regular" actually means, and of
those few only a tiny number of pedants actually *care* anymore. So
does it matter?

(I'm not fond of the idea of spending the next 20 years explaining that
over and over again.)


Then don't. I teach regexes all the time and I *never* explain what
"regular" means, or why it doesn't apply to Perl (or any other
commonly used) regexes any more.

Maybe 'match' is a better keyword.


I don't think so. "Match" is a better word for what comes back from
a regex match (what we currently refer to as a Capture, which is
okay too).

Then again, from a practical perspective, it seems likely that we'll
want something like ":ratchet is set by default in all rules" turned on
in some grammars and off in other grammars. In which case, the real
distinction is that rules inside a grammar pull default attributes from
their grammar class, while rules outside a grammar have no default
attributes. Which brings us back to a single keyword 'rule' making sense
for both.


That's pretty much the Pelr 5 argument for using "sub" for both subroutines
and methods, which we've definitively rejected in Perl 6. If we use
"rule" for both kinds of regexes, we force the reader to constantly
check surrounding context in order to understand the behaviour of the
construct. :-(

I'm not comfortable with the semantic distinction between 'rule' and
'token'. Whitespace skipping is not the defining difference between a
rule and a token in general use of the terms, so the names are misleading.


True. "Token" is the wrong word for another reason: a token is a
segments component of the input stream, *not* a rule for matching
segmented components of the input stream. The correct term for that is
"terminal". So a suitable keyword might well be "term".

However, terminals do differ from rules in that they do not attempt to
be smart about what they ignore.

More importantly, whitespace skipping isn't a very significant option in
grammars in general, so creating two keywords that distinguish between
skipping and no skipping is linguistically infelicitous. It's like
creating two different words for "shirts with horizontal stripes" and
"shirts with vertical stripes". Sure, they're different, but the
difference isn't particularly significant, so it's better expressed by a
modifier on "shirt" than by a different word.


I'd *strongly* disagree with that. Whitespace skipping (for suitable
values of "whitespace") is a critical feature of parsers. I'd go so far
as to say that it's *the* killer feature of Parse::RecDescent.

 From a practical perspective, both the Perl 6 and Punie grammars have
ended up using 'token' in many places (for things that aren't tokens),
because :words isn't really the semantics you want for parsing computer
languages. (Though it is quite useful for parsing natural language and
other things.) What you want is comment skipping, which isn't the same
as :words.


What you want is *whitespace* skipping (where comments are a special
form of whitespace). What you *really* want is is whitespace skipping
where you get to define what constitutes whitespace in each context
where whitespace might be skipped.

But the defining characteristic of a "terminal" is that you try to match
it exactly, without being smart about what to ignore. That's why I like the
fundamental rule/token distinction as it is currently specified.

I also suggest a new modifier for comment skipping (or skipping in
general) that's separate from :words, with semantics much closer to
Parse::RecDescent's 'skip'.


Note, however, that the recursive nature of Parse::RecDescent's <skip>
directive is a profound nuisance in practice, because you have to
remember to turn it off in every one of the terminals.


In light of all that, perhaps :words could become :skip, which defaults to
:skip(/<ws>/) but allows you to specify :skip(/whatever/).

As for the keywords and behaviour, I think the right set is:


                                   Default           Default
    Keyword        Where         Backtracking        Skipping

     regex         anywhere       :!ratchet          :!skip
      rule         grammars       :ratchet           :skip
      term         grammars       :ratchet           :!skip

I do agree that a rule should inherit properties from its grammar, so
you can write:

   grammar Perl6 is skip(/[<ws>+ | \# <brackets> | \# \N]+/) {
       ...
   }

to allow your grammar to redefine in one place what its rules skip.

Damian

Re: A rule by any other name...

Reply via email to