Re: [PHP-DEV] RFC Draft: Comprehensions

Larry Garfield Thu, 04 Apr 2019 18:55:29 -0700

On Wed, Mar 13, 2019, at 10:22 PM, Larry Garfield wrote:
> On Wed, Mar 13, 2019, at 6:30 PM, Rowan Collins wrote:
> > On 13/03/2019 21:10, Dik Takken wrote:


> If I can summarize the responses so far, they seem to fall into one of 
> two categories:
> 
> 1) Love the idea, but wouldn't short-closures be close enough?
> 
> 2) Love the idea, but hate the particular syntax proposed.
> 
> On the plus side, it seems almost everyone is on board in concept, so 
> yay.  That of course just leaves the syntax bikeshedding, which is 
> always the fun part.

Bumping this thread again.

Thinking on it further, I see two possible syntactic approaches, given that 
short lambdas as currently written would not give us a viable comprehension 
syntax.

1) [foreach ($list as $x => $y) if (condition) yield expression]

That is, essentially the same syntax as the list would be if wrapped in a 
function, but with a more compact way of writing it.  The above would be 
effectively identical to:

$gen = function () {
  foreach ($list as $x => $y)
    if ($condition)
      yield expression;
}();


(But with auto-capture.)  I am personally not at all a fan of the extra 
verbosity (foreach, parens, etc.) but it seems most respondents in the thread 
want it for familiarity.

Advantages:

* Very compact.
* Works for both arrays and traversables
* Would play very nicely with the proposed spread operator for iterables 
(https://wiki.php.net/rfc/spread_operator_for_array).

Disadvantages:

* New syntax
* If you need to do multiple filter or map operations it gets potentially ugly 
and unwieldy.
* Not super extensible.
* Doesn't have a natural way to enforce the types produced.  (Although one 
could add it easily.)

This approach has the advantage of being compact and working for both arrays 
and traversables, but is new syntax.

2) Allow comprehensions to work only on traversable objects, which lets us 
chain methods.  Specifically:

$new = $anyTraversable->filter(fn($x) => $x < 0);

Would return a new traversable that filters $anyTraversable, using a callable.  
It would effectively be identical to 

$new = new CallbackFilterIterator($anyTraversable, fn($x) => $x < 0);

Similarly:

$new = $anyTraversable->map(fn($x) => $x * 2);

Would produce a new traversable that lazily produces a function over the items 
as they're returned.   Equivalent to:

$new = function () {
  foreach ($list as $x)
      yield expression;
}();

And both would also need to support a key/value as well, probably if the 
callable takes 2 parameters then it's $key, $value, if just one parameter then 
it's just $value.

This approach has a few advantages:

* It piggy-backs on existing traversable behavior; essentially, rather than 
short-syntax for generators it's short syntax for wrapping a bunch of iterator 
objects around each other.
* More elaborate cases (multiple filters, multiple maps) become somewhat nicer; 
you can easily call filter() or map() multiple times and it's still entirely 
obvious what's going on.
* Has a natural (if verbose) way to enforce types: filter(fn($x) => $x 
instanceof Foo || throw new \TypeError);
* Actually, since short-lambdas already would support return type declaration, 
there's another alternative: filter(fn($x) : Foo => $x);  (Although you'd 
probably just fit that into a filter function you're using for something else.)
* next() is already a useful method that works for the an() case discussed in 
the RFC, and it flows very naturally.  I don't see a nice equivalent of all(), 
however.

But also some disadvantages:

* It only works for traversable objects, not arrays.  (Workaround: new 
ArrayObject($arr).)
* It is more verbose than the other syntax option.
* Adding special-meaning methods to Traversable objects is weird, and I don't 
think we've done that anywhere before.  I have no idea if there are engine 
implications.
* The short lambda RFC becomes effectively a prerequisite, as it's way too 
verbose to do with an anon function as we have now.
* My gut feeling is it would be slower as it would likely mean more function 
calls internally, but I've zero data to back that up.

And before someone else mentions it, it also poses some interesting possible 
extensions that are not all that relevant to the current target, but would fit 
naturally:

* a ->limit(0, 3) method, that is functionally equivalent to \LimitIterator.
* Potentially RegexIterator() could also become a regex() method, that's a 
special case of filter()?
* Languages like Rust have a method to "run out" the comprehension ( .collect() 
in the case of Rust).  We could easily do the same to produce a resultant 
array, similar to the spread operator.  (That said, that should in no way 
detract from the spread operator proposal, which I also like on its own merits.)
* Possibly other stuff that slowly turns iterables into "collection objects" 
(sort of).


Discussion:

For me, the inability to work with arrays is the big problem with the second 
approach.  I very very often am type declaring my returns and parameters as 
`iterable`, which means I may have an array and not know it. Using approach 2 
means I suddenly really really need to care which kind of iterable it is, which 
defeats the purpose of `iterable`.  Calling methods on arrays, though, I'm 
pretty sure is out of scope.

Frankly were it not for that limitation I'd say I favor the chained method 
style, as while it is more verbose it is also more self-documenting.  Given 
that limitation, I'm torn but would probably lean toward option 1.   And of 
course there's the "methods that apply to all traversable objects" thing which 
is its own can of worms I know nothing about.

(If someone has a suggestion for how to resolve that disadvantage, I'd love to 
hear it.)

Those seem like the potential options.  Any further thoughts?  Or volunteers? 
:-)

--Larry Garfield

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Re: [PHP-DEV] RFC Draft: Comprehensions

Reply via email to