Allison wrote:
I'm comfortable with the semantic distinction between 'rule' as "thingy inside a grammar" and 'regex' as "thingy outside a grammar". But, I think we can find a better name than 'regex'. The problem is both the 'regex' vs. 'regexp' battle,
Is that really an issue? I've never met anyone who *voluntarily* added the 'p'. ;-)
and the fact that everyone knows 'regex(p)' means "regular expression" no matter how may times we say it doesn't.
Sure. But almost nobody knows what "regular" actually means, and of those few only a tiny number of pedants actually *care* anymore. So does it matter?
(I'm not fond of the idea of spending the next 20 years explaining that over and over again.)
Then don't. I teach regexes all the time and I *never* explain what "regular" means, or why it doesn't apply to Perl (or any other commonly used) regexes any more.
Maybe 'match' is a better keyword.
I don't think so. "Match" is a better word for what comes back from a regex match (what we currently refer to as a Capture, which is okay too).
Then again, from a practical perspective, it seems likely that we'll want something like ":ratchet is set by default in all rules" turned on in some grammars and off in other grammars. In which case, the real distinction is that rules inside a grammar pull default attributes from their grammar class, while rules outside a grammar have no default attributes. Which brings us back to a single keyword 'rule' making sense for both.
That's pretty much the Pelr 5 argument for using "sub" for both subroutines and methods, which we've definitively rejected in Perl 6. If we use "rule" for both kinds of regexes, we force the reader to constantly check surrounding context in order to understand the behaviour of the construct. :-(
I'm not comfortable with the semantic distinction between 'rule' and 'token'. Whitespace skipping is not the defining difference between a rule and a token in general use of the terms, so the names are misleading.
True. "Token" is the wrong word for another reason: a token is a segments component of the input stream, *not* a rule for matching segmented components of the input stream. The correct term for that is "terminal". So a suitable keyword might well be "term". However, terminals do differ from rules in that they do not attempt to be smart about what they ignore.
More importantly, whitespace skipping isn't a very significant option in grammars in general, so creating two keywords that distinguish between skipping and no skipping is linguistically infelicitous. It's like creating two different words for "shirts with horizontal stripes" and "shirts with vertical stripes". Sure, they're different, but the difference isn't particularly significant, so it's better expressed by a modifier on "shirt" than by a different word.
I'd *strongly* disagree with that. Whitespace skipping (for suitable values of "whitespace") is a critical feature of parsers. I'd go so far as to say that it's *the* killer feature of Parse::RecDescent.
From a practical perspective, both the Perl 6 and Punie grammars have ended up using 'token' in many places (for things that aren't tokens), because :words isn't really the semantics you want for parsing computer languages. (Though it is quite useful for parsing natural language and other things.) What you want is comment skipping, which isn't the same as :words.
What you want is *whitespace* skipping (where comments are a special form of whitespace). What you *really* want is is whitespace skipping where you get to define what constitutes whitespace in each context where whitespace might be skipped. But the defining characteristic of a "terminal" is that you try to match it exactly, without being smart about what to ignore. That's why I like the fundamental rule/token distinction as it is currently specified.
I also suggest a new modifier for comment skipping (or skipping in general) that's separate from :words, with semantics much closer to Parse::RecDescent's 'skip'.
Note, however, that the recursive nature of Parse::RecDescent's <skip> directive is a profound nuisance in practice, because you have to remember to turn it off in every one of the terminals. In light of all that, perhaps :words could become :skip, which defaults to :skip(/<ws>/) but allows you to specify :skip(/whatever/). As for the keywords and behaviour, I think the right set is: Default Default Keyword Where Backtracking Skipping regex anywhere :!ratchet :!skip rule grammars :ratchet :skip term grammars :ratchet :!skip I do agree that a rule should inherit properties from its grammar, so you can write: grammar Perl6 is skip(/[<ws>+ | \# <brackets> | \# \N]+/) { ... } to allow your grammar to redefine in one place what its rules skip. Damian