On Apr 20, 2006, at 1:32 PM, Damian Conway wrote:

     Keyword    Implicit adverbs    Behaviour
      regex     (none)              Ignores whitespace, backtracks
      token     :ratchet            Ignores whitespace, no backtracking
      rule      :ratchet :words     Skips whitespace, no backtracking

[...and following threads...]


I'm comfortable with the semantic distinction between 'rule' as "thingy inside a grammar" and 'regex' as "thingy outside a grammar". But, I think we can find a better name than 'regex'. The problem is both the 'regex' vs. 'regexp' battle, and the fact that everyone knows 'regex(p)' means "regular expression" no matter how may times we say it doesn't. (I'm not fond of the idea of spending the next 20 years explaining that over and over again.) Maybe 'match' is a better keyword.

Then again, from a practical perspective, it seems likely that we'll want something like ":ratchet is set by default in all rules" turned on in some grammars and off in other grammars. In which case, the real distinction is that rules inside a grammar pull default attributes from their grammar class, while rules outside a grammar have no default attributes. Which brings us back to a single keyword 'rule' making sense for both.


I'm not comfortable with the semantic distinction between 'rule' and 'token'. Whitespace skipping is not the defining difference between a rule and a token in general use of the terms, so the names are misleading.

More importantly, whitespace skipping isn't a very significant option in grammars in general, so creating two keywords that distinguish between skipping and no skipping is linguistically infelicitous. It's like creating two different words for "shirts with horizontal stripes" and "shirts with vertical stripes". Sure, they're different, but the difference isn't particularly significant, so it's better expressed by a modifier on "shirt" than by a different word.

From a practical perspective, both the Perl 6 and Punie grammars have ended up using 'token' in many places (for things that aren't tokens), because :words isn't really the semantics you want for parsing computer languages. (Though it is quite useful for parsing natural language and other things.) What you want is comment skipping, which isn't the same as :words.

I suggest making whitespace skipping a default setting on the grammar class, so the grammars that need whitespace skipping most of the time can turn it on by default for their rules. That means 'token' and 'rule' collapse into just 'rule'.

I also suggest a new modifier for comment skipping (or skipping in general) that's separate from :words, with semantics much closer to Parse::RecDescent's 'skip'.

Allison

Reply via email to