Re: A rule by any other name...

Damian Conway Wed, 10 May 2006 01:31:51 -0700

Allison wrote:

I've never met anyone who *voluntarily* added
the 'p'. ;-)


You've spent too much time in the U.S. ;)

And Australia. I don't know where the silent 'p' comes from but it sure ain'tthe New World.

Picking names that mean what they say is important in Perl. It's why we have
'given'/'when' instead of 'switch'/'case'. We don't have to use the same old
name for things just because everyone else is doing it (even if we started it).

There's nothing about 'regex' that says "backtracking enabled".

Sure there is. About 20 years of computing history. Nowadays "regex" hasvirtually nothing to "regular expressions"; it's now just the computing termfor "compact set of instructions for a pattern matching machine".

But isn't it appealing to stop using an archaic word that has now become
meaningless?

No. For a start, "regex" isn't archaic. In fact it's a comparative neologism,having only recently broken awa--both syntactically and semantically--from theolder "regular expression". More importantly, the *concept* hasn't becomemeaningless at all; indeed it's grown significantly in meaning over the pastdecade. And the word "regex" is now far more strongly associated with thatexpanded concept than with the original idea of a "regular expression".

That's pretty much the Perl 5 argument for using "sub" for both subroutines
and methods, which we've definitively rejected in Perl 6.


Subs and methods have a number of distinguising characteristics. If the only
distinction between them was one small characteristic change, I might argue
against using different keywords there too. (I think the choice of using only
'sub' made sense for Perl 5 with its simplistic OO semantics, but Perl 6
provides more intelligent defaults for methods so the separation makes sense
here.)

I think you're wrong. I think "sub" has proved not to be the right choice inPerl 5 either. As abstractions, methods and subs are very different. In usage,they're very different. It's only in implementation that they're similar.Using the same keyword for two constructs that are used--and which act--verydifferently was a rare misstep on Larry's part.

And it's those same enormous abstract and pragmatic differences that we needtwo keywords to distinguish when it comes to pattern matching. Think about thetrouble we're going to have translating Perl 5 subs to Perl 6 subs or methods,precisely because of the lack of semantic marking. The designers of Perl 7won't thank us if we repeat the mistake with regexes and rules.

Rules inside and outside grammars are the same class. They have the same
behaviour aside from :ratchet,


And skipping!

and :ratchet can be set without the keyword change.

But then you've no way of knowing from *local* context which way it defaultsfor a given instance.

More than that, the current 'rule' and 'regex' can both be used inside
and outside a grammar. If we were to take the 'sub'/'method' pattern, then
'rule' should never be allowed outside a grammar,


I entirely agree.

and 'regex' should either not be allowed inside a 'grammar',

> or should express some distinctive feature

inside the grammar (like "non-inherited" or "doesn't operate on the match

object",

The main distinction is that rules are "ratcheted and skippy" whereas regexesaren't. But yes regexes they ought not be inherited either.

but there are better words for those concepts than 'regex').

If you can come up with even one other word that means "backtrackable,non-skippy, and uninherited", in the same way that "rule" implies "ratcheted,whitespace-skipping, and heritable", then I'd be more than delighted toconsider it.

Personally, I thought "regex" already fit the bill admirably, sincebacktracking, not skipping, and not inheriting is exactly what regexes do inmost current languages (including Perl 5).

If we use "rule" for both kinds of regexes, we force the reader to constantly
check surrounding context in order to understand the behaviour of the
construct. :-(


Context is a Perlish concept. :)

*Local* context is. Having three fundamental behaviours change because of anamespace declaration 1000 lines earlier doesn't seem very Perlish to me.

Making different things different is an important design principle, but so is
making similar things similar.


I disagree. What we've been doing in Perl 6 is making different things

different, and identical things identical (or, more precisely, consolidatingthings that turn out to be identical if you look closely enough).


But regexes and rules aren't identical; merely similar. And making
similar things identical is a *bad* idea in language. IANL(inguist) but
it seems to me that most languages evolve towards make similar things as
different as possible, so that they're not accidentally confused.

I do like 'term' better.


Me too. :-)

That really isn't "whitespace" skipping, though.

Sure it is. "Whitespace" is just the industry term for "anything we politelyignore". Comments are whitespace. Spaces, tabs, and newlines are whitespace.Pod is whitespace. Larry's tuxedos are whitespace. Just because some kinds ofwhitespace is neither white nor spacey, doesn't mean their not whitespace. ;-)

Can you give me some additional characteristics for 'term' beyond just "turn

off :skip"?


Yep. See below.

Grammars also need to turn off skipping in rules that aren't terminals,

Very rarely, in my experience. And generally only for that part of a rule thatsomeone has been too lazy to factor out as a separate terminal.

And in the current form you have to remember to use 'token' for all the
terminals. Not really a significant difference in mental effort.


You see, I'd argue that it *is* a significant difference in mental effort.

Parser writers think in terms of rules and terminals, with the terminals doingthe precise matching against the input, and the rules abstracting andorchestrating the terminals' collective matching and taking care of theskipping behaviour. So writing:


     rule sentence { <noun> <verb> <noun> }

     term noun { s?he | they | we | I | you | Larry | Audrey | Guido }

     term verb { hugged | helped | hit }

reflects the two distinctive roles far better than:

     rule sentence { <noun> <verb> <noun> }

     rule noun :!skip { s?he | they | we | I | you | Larry | Audrey | Guido }

     rule verb :!skip { hugged | helped | hit }

If nothing else, it's far easier to distinguish the terminals from theaggregations when the distinguishing keyword is hard on the left, rather thanburied somewhere in the middle of the declaration.

Including :skip(/<someotherrule>/). Yes, agreed, it's a huge improvement. I'd
be more comfortable if the default rule to use for skipping was named <skip>
instead of <ws>. (On IRC <sep> was also proposed, but the connection between
:skip and <skip> is more immediately obvious.)


Yes, I like <skip> too. I too keep mistakely reading <ws> as "WhiteSpace".

As for the keywords and behaviour, I think the right set is:

                                    Default           Default
     Keyword        Where         Backtracking        Skipping

      regex         anywhere       :!ratchet          :!skip
       rule         grammars       :ratchet           :skip
       term         grammars       :ratchet           :!skip


And I think the right set is:

         rule         anywhere       :!ratchet          :!skip
         rule         grammars       :ratchet           :!skip

But that's *wrong*. Grammar rules absolutely need to skip by default, whichwould make your table:


           rule         anywhere       :!ratchet          :!skip
           rule         grammars       :ratchet           :skip

whereupon we have two fundamental differences between grammar rules andnon-grammar rules. Which is why the external rules (which don't act likegrammatical rules at all, but like standard backtracking non-skipping regexes)need a different keyword (like "regex").

And removing the "term" keyword (or "token" or whatever) removes the obvioussyntactic marking of a fundamentally important semantic distinction, as Idiscussed above.

I'm still utterly convinced my original three-keyword list is the right one(and that the three keywords in it are the right ones too). Collapsing thesethree clearly distiguishable concepts into one keyword and then requiring thatkeyword be adverbially modified about 2/3 of the time, seems like a falseeconomy to me: a loss in readability *and* a signiciant increase in the amountof code required. :-(


Damian

Re: A rule by any other name...

Reply via email to