Re: [PHP-DEV] Proposal: Expanded iterable helper functions and aliasing iterator_to_array in `iterable\` namespace

Larry Garfield Sun, 30 Oct 2022 09:22:47 -0700

On Fri, Oct 28, 2022, at 8:45 AM, tyson andre wrote:
> Hi internals,
>
> https://wiki.php.net/rfc/iterator_xyz_accept_array recently passed in 
> php 8.2,
> fixing a common inconvenience of those functions throwing a TypeError 
> for arrays.
>
> However, from the `iterator_` name 
> (https://www.php.net/manual/en/class.iterator.php),
> it's likely to become a source of confusion when writing or reviewing 
> code decades from now,
> when the name suggests it only accepts objects (Traversable 
> Iterator/IteratorAggregate).
>
> I'm planning on creating an RFC adding the following functions to the 
> `iterable\` namespace as aliases of iterator_count/iterator_to_array.
> Those accept iterables 
> (https://www.php.net/manual/en/language.types.iterable.php), i.e. both 
> Traversable objects and arrays.
>
> Namespaces were chosen after feedback on my previous RFC,
> and I believe `iterable\` follows the guidance from 
> https://wiki.php.net/rfc/namespaces_in_bundled_extensions and
> https://wiki.php.net/rfc/namespaces_in_bundled_extensions#core_standard_spl
>
> I plan to create an RFC with the following functionality in the 
> iterable\ namespace, and wanted to see what the preference on naming 
> was, or if there was other feedback.
> (Not having enough functionality and wanting a better idea of the 
> overall 
>
> - `iterable\count(...)` (alias of iterator_count)
> - `iterable\to_array(Traversable $iterator, bool $preserve_keys = 
> true): array` (alias of iterator_to_array, so that users can stop using 
> a misleading name)
>
> - `iterable\any(iterable $input, ?callable $callback = null): bool` - 
> Determines whether any value of the iterable satisfies the predicate.
>    and all() - Determines whether all values of the iterable satisfies 
> the predicate.
>
>   This is a different namespace from 
> https://wiki.php.net/rfc/any_all_on_iterable
> - `iterable\none(iterable $input, ?callable $callback = null): bool`
>
>    returns the opposite of any()
> - `iterable\find(iterable $iterable, callable $callback, mixed $default 
> = null): mixed`
>
>    Returns the first value for which $callback($value) is truthy. On 
> failure, returns default
> - `iterable\fold(iterable $iterable, callable $callback, mixed 
> $initial): mixed`
>
>   `fold` and requiring an initial value seems like better practice. See 
> https://externals.io/message/112558#112834
>   and 
> https://stackoverflow.com/questions/25149359/difference-between-reduce-and-fold
> - `iterable\unique_values(iterable $iterable): array {}`
>
>   Returns true if this iterable includes a value identical to $value (`===`).
> - `iterable\includes_value(iterable $iterable, mixed $value): bool {}`
>    Returns a list of unique values of $iterable
>
> There's other functionality that I was less certain about proposing, 
> such as `iterable\keys(iterable $iterable): array`,
> which would work similarly to array_keys but also work on Traversables 
> (e.g. to be used with userland/internal collections, generators, etc.)
> Or functions to get the iterable\first()/last() value in an iterable. 
> Any thoughts on those?
>
> I also wanted to know if more verbose names such as find_value(), 
> fold_values(), any_values(), all_values() were generally preferred 
> before proposing this,
> since I only had feedback from a small number of names. My assumption 
> was short names were generally preferred when possible.
>
> See https://github.com/TysonAndre/pecl-teds/blob/main/teds.stub.php for 
> documentation of the other functions mentioned here. The functionality 
> can be tried out by installing https://pecl.php.net/package/teds
>
> Background
> -----------
>
> In February 2021, I proposed expanded iterable functionality and 
> brought it to a vote,
> https://wiki.php.net/rfc/any_all_on_iterable , where feedback was 
> mainly about being too small in scope and the choice of naming.
>
> Later, after https://externals.io/message/112558#112780 , 
> https://wiki.php.net/rfc/namespaces_in_bundled_extensions#proposal was 
> created and brought to a vote in April 2021 that passed,
> offering useful recommendations on how to standardize namespaces in 
> future proposals of new categories of functionality
> (e.g. `iterable\any()` and `iterable\all()`)
>
> Any comments?


Oh, a topic near and dear to me. :-)  I'm going to try and respond to both the 
OP and some other responses together here.

First off, I am generally in favor of improving PHP's iterable story, so 
consider me on board on the concept.

Second, I have similar user-space utilities that were intended for pipe usage 
available in a library (since Levi mentioned pipe compatibility).  I learned 
some very important things from that process.  Details here:

https://github.com/Crell/fp/blob/master/src/composition.php
https://github.com/Crell/fp/blob/master/src/array.php

Of particular note:

1. Because of PHP's inconsistent handling of excess arguments to functions, 
there MUST be separate versions of every function that takes a callback, one 
that passes the key and one that does not.  It would be a fatal design flaw to 
do otherwise.  Yes, this balloons the number of such functions, which sucks, 
but that's PHP for you.

2. There are ample use cases for most operations to return an array or a lazy 
iterable.  Both totally exist.  I solved that by also having a separate version 
of each function, eg, amap() vs itmap().  The former returned an array, the 
latter returned a generator that generated the equivalent array.  It would be a 
fatal design flaw to not account for this.  Yes, this balloons the number of 
such functions, which sucks, but that's PHP for you.

So, eg, I have *four* map functions: amap(), itmap(), amapWithKeys(), 
itmapWithKeys().  Same for filter.  Other operations only needed 2 variants, 
eg, first() and firstWithKeys(), any() and anyWithKeys(), etc.

I do not claim that naming pattern to be ideal; in fact I don't particularly 
like *WithKeys().  We should think carefully on the naming.  A possible 
alternative would be to always return a lazy iterable in all circumstances and 
assume someone can use to_array() or equivalent on the result if they want it 
as an array.  (That's effectively what Python 3 does with comprehensions.)  
However, that could have non-trivial performance impact since generators are 
slower than plain arrays.

3. Feel free to borrow liberally, design-wise, from the above code.  There's a 
few more methods in there that could be of use, too.  Note, though, that all 
are designed to be used with a pipe(), so they mostly return a closure that has 
been manually partially applied with everything except the iterable, so you get 
a single-argument function, which is what a pipe() or compose() chain needs.

Third, speaking of pipe, I disagree with Tim that putting the callback first 
would be easier for pipe/partials.  If we ever get partials similar to 
previously implemented, then the argument order won't matter.  If we get pipes 
as I've previously proposed, then none of these functions are directly usable 
because they're multi-argument.

The alternative I've considered is somewhat inspired by Elixir (assuming I 
understand the little Elixir I've read), in which a function after a |> is 
automatically assumed to be partially applying everything but the first 
argument.  So $list |> map($callable) translates to map($list, $callable).  
I've not decided yet if that's a good way to avoid needing full partial 
application or a good way to make horribly confusing code.  But if that were to 
happen, it would only work if all of these functions took the iterable, the 
"object to be operated on", as their first argument.

The callable, if inlined, is almost always the longest argument.  That means it 
is most readable when it is the last argument, so there is no need to look at 
the end of the closure to see if there's any other arguments.  (This is a 
problem with array_map() currently.)  So I would instead propose that *all* 
iterator functions follow the pattern:

name($iterable, other stuff, $callback_if_applicable);

That is easily learnable, most likely to result in clean-ish code, and most 
likely to be nice with any future pipe or partial implementations.  At worst, 
it would make pipe-ifying all such functions a trivially identical operation 
for all of them, making my library little more than a series of boring 
one-liners.  (Please make my library little more than a series of boring 
one-liners.)

That does also mean we cannot support variadics or optional arguments.  I am OK 
with that.  And if someone really needs a different order, well, we have named 
arguments now.

Tim noted nesting these functions and what would make that cleanest.  What 
would make it cleanest is to not nest them and instead use proper chaining 
instead; my pipe() function, a native pipe operator, or similar.  Expecting 
these functions to nest and not be ugly is a fools errand, especially when 
there are vastly better options readily available.

Fourth, I agree with Levi that figuring out the edge case handling around empty 
lists is crucial.  The more we can design the sematics such that they "fall 
out" naturally, the better.  Eg, first() may return null for not found, which 
dovetails nicely with the ?? operator to provide a default.  However, that 
means null cannot be used as a meaningful found-value.  I'd argue that is *the 
correct behavior*, but I'm sure some would disagree.

An Option type would be nice, but to get that we really need to get ADTs first, 
and I don't have a timeline on that.  Ilija is more interested in fixing core 
bugs right now than in adding new features, the silly man... :-P

(Technically an Option/Maybe object could be implemented with just classes as 
we have them now, especially if it's done in core, but it would be cleaner and 
more ergonomic if built on top of an Enum.)

Also, Monads are clunky in a language without first-class support for them.  
I've written extensively on this topic recently: 
https://peakd.com/hive-168588/@crell/much-ado-about-null

I'm not sure of the best way forward here, other than it should be addressed 
very carefully and explicitly.

Fifth, I would absolutely include map and filter in the included operations.  
They are critical parts of list handling.  If we had pipe-compatible map and 
filter, that would basically give us a list comprehension tool for free.  
(That's exactly how many languages approach list comprehensions.)  In my own 
list-operation-centric work, I've used map and filter a lot more than any of 
the other operations in the list above.

I'm on board with the direction, modulo implementation details.

--Larry Garfield

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php

Re: [PHP-DEV] Proposal: Expanded iterable helper functions and aliasing iterator_to_array in `iterable\` namespace

Reply via email to