Allison wrote:
I've never met anyone who *voluntarily* added
the 'p'. ;-)
You've spent too much time in the U.S. ;)
And Australia. I don't know where the silent 'p' comes from but it sure ain't
the New World.
Picking names that mean what they say is important in Perl. It's why we have
'given'/'when' instead of 'switch'/'case'. We don't have to use the same old
name for things just because everyone else is doing it (even if we started it).
There's nothing about 'regex' that says "backtracking enabled".
Sure there is. About 20 years of computing history. Nowadays "regex" has
virtually nothing to "regular expressions"; it's now just the computing term
for "compact set of instructions for a pattern matching machine".
But isn't it appealing to stop using an archaic word that has now become
meaningless?
No. For a start, "regex" isn't archaic. In fact it's a comparative neologism,
having only recently broken awa--both syntactically and semantically--from the
older "regular expression". More importantly, the *concept* hasn't become
meaningless at all; indeed it's grown significantly in meaning over the past
decade. And the word "regex" is now far more strongly associated with that
expanded concept than with the original idea of a "regular expression".
That's pretty much the Perl 5 argument for using "sub" for both subroutines
and methods, which we've definitively rejected in Perl 6.
Subs and methods have a number of distinguising characteristics. If the only
distinction between them was one small characteristic change, I might argue
against using different keywords there too. (I think the choice of using only
'sub' made sense for Perl 5 with its simplistic OO semantics, but Perl 6
provides more intelligent defaults for methods so the separation makes sense
here.)
I think you're wrong. I think "sub" has proved not to be the right choice in
Perl 5 either. As abstractions, methods and subs are very different. In usage,
they're very different. It's only in implementation that they're similar.
Using the same keyword for two constructs that are used--and which act--very
differently was a rare misstep on Larry's part.
And it's those same enormous abstract and pragmatic differences that we need
two keywords to distinguish when it comes to pattern matching. Think about the
trouble we're going to have translating Perl 5 subs to Perl 6 subs or methods,
precisely because of the lack of semantic marking. The designers of Perl 7
won't thank us if we repeat the mistake with regexes and rules.
Rules inside and outside grammars are the same class. They have the same
behaviour aside from :ratchet,
And skipping!
and :ratchet can be set without the keyword change.
But then you've no way of knowing from *local* context which way it defaults
for a given instance.
More than that, the current 'rule' and 'regex' can both be used inside
and outside a grammar. If we were to take the 'sub'/'method' pattern, then
'rule' should never be allowed outside a grammar,
I entirely agree.
and 'regex' should either not be allowed inside a 'grammar',
> or should express some distinctive feature
inside the grammar (like "non-inherited" or "doesn't operate on the match
object",
The main distinction is that rules are "ratcheted and skippy" whereas regexes
aren't. But yes regexes they ought not be inherited either.
but there are better words for those concepts than 'regex').
If you can come up with even one other word that means "backtrackable,
non-skippy, and uninherited", in the same way that "rule" implies "ratcheted,
whitespace-skipping, and heritable", then I'd be more than delighted to
consider it.
Personally, I thought "regex" already fit the bill admirably, since
backtracking, not skipping, and not inheriting is exactly what regexes do in
most current languages (including Perl 5).
If we use "rule" for both kinds of regexes, we force the reader to constantly
check surrounding context in order to understand the behaviour of the
construct. :-(
Context is a Perlish concept. :)
*Local* context is. Having three fundamental behaviours change because of a
namespace declaration 1000 lines earlier doesn't seem very Perlish to me.
Making different things different is an important design principle, but so is
making similar things similar.
I disagree. What we've been doing in Perl 6 is making different things
different, and identical things identical (or, more precisely, consolidating
things that turn out to be identical if you look closely enough).
But regexes and rules aren't identical; merely similar. And making
similar things identical is a *bad* idea in language. IANL(inguist) but
it seems to me that most languages evolve towards make similar things as
different as possible, so that they're not accidentally confused.
I do like 'term' better.
Me too. :-)
That really isn't "whitespace" skipping, though.
Sure it is. "Whitespace" is just the industry term for "anything we politely
ignore". Comments are whitespace. Spaces, tabs, and newlines are whitespace.
Pod is whitespace. Larry's tuxedos are whitespace. Just because some kinds of
whitespace is neither white nor spacey, doesn't mean their not whitespace. ;-)
Can you give me some additional characteristics for 'term' beyond just "turn
off :skip"?
Yep. See below.
Grammars also need to turn off skipping in rules that aren't terminals,
Very rarely, in my experience. And generally only for that part of a rule that
someone has been too lazy to factor out as a separate terminal.
And in the current form you have to remember to use 'token' for all the
terminals. Not really a significant difference in mental effort.
You see, I'd argue that it *is* a significant difference in mental effort.
Parser writers think in terms of rules and terminals, with the terminals doing
the precise matching against the input, and the rules abstracting and
orchestrating the terminals' collective matching and taking care of the
skipping behaviour. So writing:
rule sentence { <noun> <verb> <noun> }
term noun { s?he | they | we | I | you | Larry | Audrey | Guido }
term verb { hugged | helped | hit }
reflects the two distinctive roles far better than:
rule sentence { <noun> <verb> <noun> }
rule noun :!skip { s?he | they | we | I | you | Larry | Audrey | Guido }
rule verb :!skip { hugged | helped | hit }
If nothing else, it's far easier to distinguish the terminals from the
aggregations when the distinguishing keyword is hard on the left, rather than
buried somewhere in the middle of the declaration.
Including :skip(/<someotherrule>/). Yes, agreed, it's a huge improvement. I'd
be more comfortable if the default rule to use for skipping was named <skip>
instead of <ws>. (On IRC <sep> was also proposed, but the connection between
:skip and <skip> is more immediately obvious.)
Yes, I like <skip> too. I too keep mistakely reading <ws> as "WhiteSpace".
As for the keywords and behaviour, I think the right set is:
Default Default
Keyword Where Backtracking Skipping
regex anywhere :!ratchet :!skip
rule grammars :ratchet :skip
term grammars :ratchet :!skip
And I think the right set is:
rule anywhere :!ratchet :!skip
rule grammars :ratchet :!skip
But that's *wrong*. Grammar rules absolutely need to skip by default, which
would make your table:
rule anywhere :!ratchet :!skip
rule grammars :ratchet :skip
whereupon we have two fundamental differences between grammar rules and
non-grammar rules. Which is why the external rules (which don't act like
grammatical rules at all, but like standard backtracking non-skipping regexes)
need a different keyword (like "regex").
And removing the "term" keyword (or "token" or whatever) removes the obvious
syntactic marking of a fundamentally important semantic distinction, as I
discussed above.
I'm still utterly convinced my original three-keyword list is the right one
(and that the three keywords in it are the right ones too). Collapsing these
three clearly distiguishable concepts into one keyword and then requiring that
keyword be adverbially modified about 2/3 of the time, seems like a false
economy to me: a loss in readability *and* a signiciant increase in the amount
of code required. :-(
Damian