Allison wrote:

I've never met anyone who *voluntarily* added
the 'p'. ;-)

You've spent too much time in the U.S. ;)

And Australia. I don't know where the silent 'p' comes from but it sure ain't the New World.


Picking names that mean what they say is important in Perl. It's why we have
'given'/'when' instead of 'switch'/'case'. We don't have to use the same old
name for things just because everyone else is doing it (even if we started it).

There's nothing about 'regex' that says "backtracking enabled".

Sure there is. About 20 years of computing history. Nowadays "regex" has virtually nothing to "regular expressions"; it's now just the computing term for "compact set of instructions for a pattern matching machine".


But isn't it appealing to stop using an archaic word that has now become
meaningless?

No. For a start, "regex" isn't archaic. In fact it's a comparative neologism, having only recently broken awa--both syntactically and semantically--from the older "regular expression". More importantly, the *concept* hasn't become meaningless at all; indeed it's grown significantly in meaning over the past decade. And the word "regex" is now far more strongly associated with that expanded concept than with the original idea of a "regular expression".


That's pretty much the Perl 5 argument for using "sub" for both subroutines
and methods, which we've definitively rejected in Perl 6.

Subs and methods have a number of distinguising characteristics. If the only
distinction between them was one small characteristic change, I might argue
against using different keywords there too. (I think the choice of using only
'sub' made sense for Perl 5 with its simplistic OO semantics, but Perl 6
provides more intelligent defaults for methods so the separation makes sense
here.)

I think you're wrong. I think "sub" has proved not to be the right choice in Perl 5 either. As abstractions, methods and subs are very different. In usage, they're very different. It's only in implementation that they're similar. Using the same keyword for two constructs that are used--and which act--very differently was a rare misstep on Larry's part.

And it's those same enormous abstract and pragmatic differences that we need two keywords to distinguish when it comes to pattern matching. Think about the trouble we're going to have translating Perl 5 subs to Perl 6 subs or methods, precisely because of the lack of semantic marking. The designers of Perl 7 won't thank us if we repeat the mistake with regexes and rules.


Rules inside and outside grammars are the same class. They have the same
behaviour aside from :ratchet,

And skipping!

and :ratchet can be set without the keyword change.

But then you've no way of knowing from *local* context which way it defaults for a given instance.


More than that, the current 'rule' and 'regex' can both be used inside
and outside a grammar. If we were to take the 'sub'/'method' pattern, then
'rule' should never be allowed outside a grammar,

I entirely agree.


and 'regex' should either not be allowed inside a 'grammar',
> or should express some distinctive feature
inside the grammar (like "non-inherited" or "doesn't operate on the match
object",

The main distinction is that rules are "ratcheted and skippy" whereas regexes aren't. But yes regexes they ought not be inherited either.


but there are better words for those concepts than 'regex').

If you can come up with even one other word that means "backtrackable, non-skippy, and uninherited", in the same way that "rule" implies "ratcheted, whitespace-skipping, and heritable", then I'd be more than delighted to consider it.

Personally, I thought "regex" already fit the bill admirably, since backtracking, not skipping, and not inheriting is exactly what regexes do in most current languages (including Perl 5).


If we use "rule" for both kinds of regexes, we force the reader to constantly
check surrounding context in order to understand the behaviour of the
construct. :-(

Context is a Perlish concept. :)

*Local* context is. Having three fundamental behaviours change because of a namespace declaration 1000 lines earlier doesn't seem very Perlish to me.


Making different things different is an important design principle, but so is
making similar things similar.

I disagree. What we've been doing in Perl 6 is making different things
different, and identical things identical (or, more precisely, consolidating things that turn out to be identical if you look closely enough).

But regexes and rules aren't identical; merely similar. And making
similar things identical is a *bad* idea in language. IANL(inguist) but
it seems to me that most languages evolve towards make similar things as
different as possible, so that they're not accidentally confused.


I do like 'term' better.

Me too. :-)


That really isn't "whitespace" skipping, though.

Sure it is. "Whitespace" is just the industry term for "anything we politely ignore". Comments are whitespace. Spaces, tabs, and newlines are whitespace. Pod is whitespace. Larry's tuxedos are whitespace. Just because some kinds of whitespace is neither white nor spacey, doesn't mean their not whitespace. ;-)


Can you give me some additional characteristics for 'term' beyond just "turn
off :skip"?

Yep. See below.


Grammars also need to turn off skipping in rules that aren't terminals,

Very rarely, in my experience. And generally only for that part of a rule that someone has been too lazy to factor out as a separate terminal.


And in the current form you have to remember to use 'token' for all the
terminals. Not really a significant difference in mental effort.

You see, I'd argue that it *is* a significant difference in mental effort.

Parser writers think in terms of rules and terminals, with the terminals doing the precise matching against the input, and the rules abstracting and orchestrating the terminals' collective matching and taking care of the skipping behaviour. So writing:

     rule sentence { <noun> <verb> <noun> }

     term noun { s?he | they | we | I | you | Larry | Audrey | Guido }

     term verb { hugged | helped | hit }

reflects the two distinctive roles far better than:

     rule sentence { <noun> <verb> <noun> }

     rule noun :!skip { s?he | they | we | I | you | Larry | Audrey | Guido }

     rule verb :!skip { hugged | helped | hit }

If nothing else, it's far easier to distinguish the terminals from the aggregations when the distinguishing keyword is hard on the left, rather than buried somewhere in the middle of the declaration.


Including :skip(/<someotherrule>/). Yes, agreed, it's a huge improvement. I'd
be more comfortable if the default rule to use for skipping was named <skip>
instead of <ws>. (On IRC <sep> was also proposed, but the connection between
:skip and <skip> is more immediately obvious.)

Yes, I like <skip> too. I too keep mistakely reading <ws> as "WhiteSpace".


As for the keywords and behaviour, I think the right set is:

                                    Default           Default
     Keyword        Where         Backtracking        Skipping

      regex         anywhere       :!ratchet          :!skip
       rule         grammars       :ratchet           :skip
       term         grammars       :ratchet           :!skip

And I think the right set is:

         rule         anywhere       :!ratchet          :!skip
         rule         grammars       :ratchet           :!skip

But that's *wrong*. Grammar rules absolutely need to skip by default, which would make your table:

           rule         anywhere       :!ratchet          :!skip
           rule         grammars       :ratchet           :skip

whereupon we have two fundamental differences between grammar rules and non-grammar rules. Which is why the external rules (which don't act like grammatical rules at all, but like standard backtracking non-skipping regexes) need a different keyword (like "regex").

And removing the "term" keyword (or "token" or whatever) removes the obvious syntactic marking of a fundamentally important semantic distinction, as I discussed above.

I'm still utterly convinced my original three-keyword list is the right one (and that the three keywords in it are the right ones too). Collapsing these three clearly distiguishable concepts into one keyword and then requiring that keyword be adverbially modified about 2/3 of the time, seems like a false economy to me: a loss in readability *and* a signiciant increase in the amount of code required. :-(

Damian

Reply via email to