Why C needs work (was Re: L2R/R2L syntax)

Michael Lazzaro Mon, 20 Jan 2003 16:31:17 -0800

On Monday, January 20, 2003, at 12:30  PM, Smylers wrote:

Ah.  It was only on reading that (and discovering that you hadn't
previously known about the 'optional comma with closure argument' rule)
that I understood why you had previously been so in favour of proposed
new syntaxes: through a desire to banish the Perl 5 syntax.

Yes. Again, it's not that I never knew about the perl5 rule, it's that I _dislike_ the perl5 rule, as I dislike most special-case grammar rules. If we dropped C<map>, etc., we could drop the inferred-comma rule as well (unless we're really successful in making C<if> a simple function, in which case we need the rule anyway, so who cares?)

But the inferred-comma rule is tangential to my dislike of the current C<map> syntax. I'd dislike C<map>, and like pipelines, even if the inferred-comma rule wasn't there.

Mike, now you've come to terms with the Perl 5 syntax, do you still find
right-left pipelines as compelling as you've previously argued?

Yes, very much so. Here's why...

<The Problem>

Over the years, I've seen quite a few newbies struggle with the C<map> syntax; the most frequent mistake is to reverse the order of the arguments, saying

map @a, {...}

(Another common one is to not understand what C<map {...} @a, @b> would do.) There's a couple of reasons for that, IMO, but the wording I often get when I'm trying to get people to verbalize "what this statement does" is that "it takes @a, and does <blah> to it".

In other words, when newcomers verbalize the statement, they tend to think of the C<map> as operating on @a. Which is true. You might also say "it does <blah> to @a", which is also true. For many people, the brain seems to lock around the C<@a> as the most important part of the statement, not the "blah" part.

Logically, that makes sense. C<map>, C<grep>, etc., is an operation on @a. It's not an operation on the {...} part -- the {...} is just an adverb describing the "how", which is of lesser importance.

<Is there an OO solution?>

So is there fix for C<map> that wouldn't trip newbies up quite so much? The simple fix would seem to be to emphasis the "it operates on @a" part, so that people more easily remember it from the way they verbalize it. Obviously, an OO style fixes that easily:

@a.map {...};

That's very intuitive to people, esp. new converts from other languages. And, happily, it's very easy to implement and extend. But it works only in one direction, L2R. (And, as Damian stated, chaining:

@a.map {...} .sort {...};

probably wouldn't work at all, unless you tossed some parens around all your curlies. Yuck.)

So fine, we convert C<map>, etc., to be methods of collections, as is typical programming practice these days. And, in fact, we _are_ doing that elsewhere in the language. Things like C<length @a> will become C<@a.length> or similar. Even subscripts C<[$n]> can have dotted syntax, e.g. C<.[$n]>. And, presumably, there will be a C<.map>, C<.grep>, etc.

But in doing that, we lose a lot of functionality -- mainly, the simple L2R and R2L chaining that we have now. So people want to keep C<map> as functions, not methods, and use the old syntax. Well, that's OK, but we're back to the beginning.

< Is there a pipelike solution? >

People like the current C<map> syntax because it allows R2L chaining. Saying C< @a.map {...} > isn't so onerous that it should send people reeling, since it's the exact same syntax as the entire rest of the language. But losing the chaining would hurt, because it's a very useful feature.

Here's where the pipeline suggestions come in. Looking at a strawman syntax using '<-', JUST FOR THE SAKE OF IMPARTIAL EXAMPLE:

@out = map {...} <- @a;

That's two (OK, three) characters longer than the old syntax. It could be described to newcomers as "using @a, do <blah> with it".

Essentially, it's a "post-invocant", a reversed way of saying C< @a.map {...} >. Note that it doesn't rely on the existence of universal functions -- C<map> is just a method of @a.

Here's the important part: This same syntax will work on ANY object, and ANY method of that object. So you're removing the specific global functions C<map>, C<grep>, C<sort>, etc., and replacing them with a generic syntax for right-to-left chaining for any operations object. You don't have to _build_ functions that can support chaining, because _any_ method can be chained.

And it provides a very visual way to define any pipe-like algorithm, in either direction:

$in -> lex -> parse -> codify -> optimize -> $out; # L2R

$out <- optimize <- codify <- parse <- lex <- $in; # R2L

It's clear, from looking at either of those, that they represent data pipes. Both are even clearer than the equiv:

$out = $in.lex.parse.codify.optimize;

Whether you would code a pipeline as L2R or R2L depends on the specific algorithm you're representing, and is purely a matter of taste... you'd probably use R2L in the same places as you use C<map>-like chaining in Perl5. I can't begin to justify why one should be superior to the other, and would certainly use them both.

So that's what it gains you. The ability to visually express pipe-like algorithms, in either direction, using OO techniques, without a lot of parens, without relying on global functions, and with only a couple of extra chars. That's why I'm gung-ho on it. Even if it's named <~ or <|.

<Fine, but really -- why kill the old C<map> function?>

Well, if we have both OO C<@a.map> and OO-based pipelines (post-invocants and, um, pre-invocants?), then we don't need the old C<map>. It's redundant. For two extra characters, we get a syntax that looks nearly the same, but is much more powerful.

To most people here, C<map> is ingrained. You learned it, now you know it. Newbies don't, but even newbies seem to understand that C<map> or C<grep> is an operation on an array -- they just can't always remember the Perl5 way of phrasing that. (Not a crisis, but Perl is currently seen as very difficult to learn, probably because it has a lot of little things that are a *little* hard to learn.)

But both the OO and pipeline syntaxes do more to point out the noun, verb, and adjective of the operation. As a bonus, perhaps through exposure to both newbies will begin to understand much earlier on how both OO abstractions and OO chaining can be used to best effect.

Killing C<map> outright, however, requires that we unlearn the old syntax. We may not be willing to do that, regardless of the advantages of the more generic piping.

Was that an understandable explanation? :-)

MikeL

Why C needs work (was Re: L2R/R2L syntax)

Reply via email to