Re: [PHP-DEV] [RFC] Change behaviour of array sort functions to return a copy of the sorted array

2024-10-21 Thread mickmackusa
On Mon, Oct 21, 2024 at 3:24 PM Larry Garfield 
wrote:

> I am confused by this statement.  How is "adding more functions that do
> exactly what they say on the tin and act in a predictable fashion" bad?
> It's not like we're running out of space for functions.  Small,
> purpose-built, highly-predictable APIs that fully address a problem space
> are the ideal goal; the number of them is largely irrelevant.
>
> I wouldn't mind array_sorted() instead of sorted(), since it wouldn't work
> on iterables.  But array_sort() vs sort() is just horribly confusing for no
> good reason, and the API itself does nothing to tell you which is which.
>
> --Larry Garfield
>

Couldn't it be said that prefixing native array functions with "array_" and
not adding "ed"  to the middle (e.g. array_walked_recursive) or end of
native functions is moving toward naming convention normalcy?
Does PHP have any "ed" functions?
Having separate array_*() and *() functions offers a path to the possible
future decision to cull the *() functions and/or remove the
modify-by-reference behaviour of array_*() functions ...if that is ever
desirable.
I see this naming distinction as a stepping stone to a more consistent
naming convention rather than the final solution.
(Just like defining the function alias preg_escape() as a pathway to
removing preg_quote() -- which is a function that doesn't do what it says
on the tin.)

Mick


Re: [PHP-DEV] [RFC] Change behaviour of array sort functions to return a copy of the sorted array

2024-10-21 Thread Morgan

On 2024-10-21 06:42, Gina P. Banyard wrote:

Hello internals,

I would like to propose a short RFC to make the return value of the sort() and 
similar functions more useful:
https://wiki.php.net/rfc/array-sort-return-array

I intend for the discussion to last 2 weeks and then open the vote.

Best regards,

Gina P. Banyard

I've been working up something very much similar so I'd be well behind this.

My thinking ended up the same as Larry Garfield's:

> *sorted($arr) seems like it would be a lot less confusing, and 
consistent with what other languages (and hopefully future PHP) do.  (I 
don't know what that means for array_walk(), but I don't know what that 
would even return anyway.)


Making it a new function instead of changing the existing function would 
be less of a break it seems.



Excerpts from my drafts:

All of the functions [the thirteen array sorting functions, including 
shuffle()] have the less-than-informative return type of `true` (except 
`array_multisort()` with its more complex argument requirements and 
hence the ability to also return `false`). All of their behaviour is by 
reference. Oh, except when they flat-out fail, in which case they return 
null.


This means that if you want to sort an array you have to have an entire 
statement devoted to the task. As an expression, `sort()` just doesn’t 
compose: you can’t sensibly use it anywhere as part of a larger expression.

You have to write:
$escaped = $array->valued($expression);
sort($escaped);
$escaped = array_map(escape(...), $escaped);
because
$escaped = array_map(escape(...), sort($array->valued($expression));
simply won’t work.
It gets in the way when using it as a callback in array functions. You 
can’t use:

$sorted_datasets = array_map(sort(...), $datasets);
You want
$sorted_datasets = $datasets;
array_walk($sorted_datasets, sort(...));
`array_walk()` is the only built-in array function using callbacks that 
can meaningfully accept `sort()` (even the `array_walk_recursive()` 
variant will recurse into child arrays instead of sorting them). Because 
`array_walk()` iterates through the array by reference as well, it too 
interrupts functional composition, as just shown above. But at least it 
can often be replaced by `array_map()` (just look at the number of user 
notes on its manual page that suggest using references to make it behave 
more that way). `array_walk()` is basically for the case where there is 
no useful value to return from traversing the array (good thing, too, 
since its return type is `true`). But I digress.


...

The name `sorted()` has already been introduced. The `-ed` suffix can be 
grammatically applied to all of the sorting functions (technically for 
`shuffle()` the suffix would be `-d` because the `e` is already there). 
But that is just the one I came up with.
What we don’t want is Yet Another Prefix to go into the alphabet soup 
that is `([ak]?r?|u[ak]?|nat(case)?)`. “Hey, how about we use ‘f’ as a 
prefix meaning ‘functional’?” “Do you want to be responsible for 
prefixing ‘uksort’ with an ‘f’?”
At the same time we don’t want any modification that would make the name 
clunky to use. Not everything should be reliant on IDE autocompletion.


...

Most languages in widespread use do operate on their arrays in place, 
but in those cases the arrays are object types and sorting is an 
instance method performed on and modifying the object. That’s not how 
PHP models arrays. Functional languages meanwhile have – by definition – 
functional sort functions.
One notable example is Python, in which lists have an in-place `.sort()` 
method. The language however also offers a global `sorted()` function, 
which takes a list and returns a new, sorted one, pretty much like what 
is proposed here, and for pretty much the same reason: it is more easily 
composable with other operations.


...

It would impact anyone (such as myself) who already defines global 
functions named `sorted()`. In my case it’s precisely so that I can have 
such functional sort functions. It’s not unlikely that there are 
functions with that name that have other jobs (e.g. a `sorted` that 
tests whether a list is sorted) and would need to be renamed (e.g. to 
`is_sorted`). But adding any functions to core would mean renaming 
existing global user functions that currently have the same names 
regardless of what they are for. A quick github search turned up a 
number of `sorted` global functions, some of them exactly matching the 
userland implementation above; and all (from the sample I took) looking 
like the intent is to take an array (by value) or something that is 
queried for an array, sort it in some way, and return the result. In 
other words, passing the array to be sorted by value and returning the 
sorted array is the more common idiom.


[The userland implementation mentioned being

function sorted(array $array): array
{
sort($array);

Re: [PHP-DEV] [RFC] Change behaviour of array sort functions to return a copy of the sorted array

2024-10-21 Thread Ilija Tovilo
Hi Gina

On Mon, Oct 21, 2024 at 3:21 PM Gina P. Banyard  wrote:
>
> On Sunday, 20 October 2024 at 18:42, Gina P. Banyard  
> wrote:
>
> > https://wiki.php.net/rfc/array-sort-return-array
>
> For an example, I'm going to pull out my solution to day 1 of the 2022 
> advents of code:
>
> https://github.com/Girgias/advent-of-code/blob/19283e1f5ef503c8a4478e58aaa57ff2fb7164c7/2022/01/puzzle.php#L25
>
>
> However, if the sort functions would return a useful value instead of `true` 
> I could chain it together completely as follows:
>
> $top3 = array_slice(
> rsort(
> array_map(
> 'array_sum',
> array_map(
> fn($v) => explode("\n", $v),
> explode(
> "\n\n",
> $input
> )
> )
> )
> ),
> 0,
> length: 3
> );

Note that your example would still warn after this RFC, which is
likely enough to deter people from writing code like this.

https://3v4l.org/mRuoK


Re: [PHP-DEV] [RFC] Change behaviour of array sort functions to return a copy of the sorted array

2024-10-21 Thread mickmackusa
On Mon, 21 Oct 2024, 18:09 Morgan,  wrote:

> You can’t use:
> $sorted_datasets = array_map(sort(...), $datasets);
> You want
> $sorted_datasets = $datasets;
> array_walk($sorted_datasets, sort(...));
>

A warning: no one should ever use array_walk($sorted_datasets, sort(...));
as general-use script.

When sorting a 2d array in this fashion (only non-fatally executed with
numeric first level keys https://3v4l.org/HaU42), the first level keys will
be used as the sorting flag while sorting each row.  This means that
different rows may have different sorting flags applied -- effectively
corrupting the result.  https://3v4l.org/FeIpj -- notice how rows with keys
2, 5, and 10 are sorted as strings.

Mick

>


Re: [PHP-DEV] [RFC] Change behaviour of array sort functions to return a copy of the sorted array

2024-10-21 Thread Gina P. Banyard


On Sunday, 20 October 2024 at 18:42, Gina P. Banyard  wrote:

> Hello internals,
> 
> I would like to propose a short RFC to make the return value of the sort() 
> and similar functions more useful:
> https://wiki.php.net/rfc/array-sort-return-array

I am going to respond out of threads.

This RFC does *NOT* change the by-ref parameter passing *NOR* the in-place 
sorting of these functions.
I don't see why people are thinking this would be changed, as I only ever talk 
about the return value.
But I added a sentence in the Unaffected PHP Functionality section.

I really struggle to see the benefit of introducing new functions that would do 
the exact same thing as this RFC,
except the in-place modification, as seen by the already existing bikesheeding 
on function naming.

For an example, I'm going to pull out my solution to day 1 of the 2022 advents 
of code:

https://github.com/Girgias/advent-of-code/blob/19283e1f5ef503c8a4478e58aaa57ff2fb7164c7/2022/01/puzzle.php#L25

If we had a pipe operator, or if decided to write this less legible with nested 
calls, it would like this:

$elves = array_map(
'array_sum',
array_map(
fn($v) => explode("\n", $v),
explode(
"\n\n",
$input
)
)
);
rsort($elves);
$top3 = array_slice($elves, 0, length: 3);

However, if the sort functions would return a useful value instead of `true` I 
could chain it together completely as follows:

$top3 = array_slice(
rsort(
array_map(
'array_sum',
array_map(
fn($v) => explode("\n", $v),
explode(
"\n\n",
$input
)
)
)
),
0,
length: 3
);

In either case, the fact the array is sorted in place is something I don't 
care, as it is just a temporary step.
However, in the current state, I need to stop chaining function calls just to 
be able to get the sorted array that I am going to process and discard at the 
next function call.

This type of data processing pattern can be generalized to any collection of 
data that needs to be sorted/filtered and processed upon.

This RFC is "just" making the return value of these functions *useful*.

Would a "proper" functional variant of these functions which do not take the 
input argument by reference be useful? Yes.
Is this completely orthogonal to the point of this RFC? Also yes.

Best regards,

Gina P. Banyard


Re: [PHP-DEV] [RFC] Change behaviour of array sort functions to return a copy of the sorted array

2024-10-21 Thread Gina P. Banyard
On Monday, 21 October 2024 at 14:29, Ilija Tovilo  
wrote:

> Hi Gina
> 
> On Mon, Oct 21, 2024 at 3:21 PM Gina P. Banyard intern...@gpb.moe wrote:
> 
> > On Sunday, 20 October 2024 at 18:42, Gina P. Banyard intern...@gpb.moe 
> > wrote:
> > 
> > > https://wiki.php.net/rfc/array-sort-return-array
> > 
> > For an example, I'm going to pull out my solution to day 1 of the 2022 
> > advents of code:
> > 
> > https://github.com/Girgias/advent-of-code/blob/19283e1f5ef503c8a4478e58aaa57ff2fb7164c7/2022/01/puzzle.php#L25
> > 
> > However, if the sort functions would return a useful value instead of 
> > `true` I could chain it together completely as follows:
> > 
> > $top3 = array_slice(
> > rsort(
> > array_map(
> > 'array_sum',
> > array_map(
> > fn($v) => explode("\n", $v),
> > explode(
> > "\n\n",
> > $input
> > )
> > )
> > )
> > ),
> > 0,
> > length: 3
> > );
> 
> 
> Note that your example would still warn after this RFC, which is
> likely enough to deter people from writing code like this.
> 
> https://3v4l.org/mRuoK

There is a very simple fix, which is to add the /** @prefer-ref $array */ doc 
comment to the stub of the relevant functions.
Something that array_multisort() already does.
But thanks for reminding me how crap PHP references are :D

I'll amend the RFC soon to include this change as part of the proposal.

Best regards,

Gina P. Banyard


Re: [PHP-DEV] [RFC] Change behaviour of array sort functions to return a copy of the sorted array

2024-10-21 Thread Valentin Udaltsov
On Mon, 21 October 2024 г. at 18:13, Gina P. Banyard  wrote:
>
> On Monday, 21 October 2024 at 14:29, Ilija Tovilo  
> wrote:
>
> > Hi Gina
> >
> > On Mon, Oct 21, 2024 at 3:21 PM Gina P. Banyard intern...@gpb.moe wrote:
> >
> > > On Sunday, 20 October 2024 at 18:42, Gina P. Banyard intern...@gpb.moe 
> > > wrote:
> > >
> > > > https://wiki.php.net/rfc/array-sort-return-array
> > >
> > > For an example, I'm going to pull out my solution to day 1 of the 2022 
> > > advents of code:
> > >
> > > https://github.com/Girgias/advent-of-code/blob/19283e1f5ef503c8a4478e58aaa57ff2fb7164c7/2022/01/puzzle.php#L25
> > >
> > > However, if the sort functions would return a useful value instead of 
> > > `true` I could chain it together completely as follows:
> > >
> > > $top3 = array_slice(
> > > rsort(
> > > array_map(
> > > 'array_sum',
> > > array_map(
> > > fn($v) => explode("\n", $v),
> > > explode(
> > > "\n\n",
> > > $input
> > > )
> > > )
> > > )
> > > ),
> > > 0,
> > > length: 3
> > > );
> >
> >
> > Note that your example would still warn after this RFC, which is
> > likely enough to deter people from writing code like this.
> >
> > https://3v4l.org/mRuoK
>
> There is a very simple fix, which is to add the /** @prefer-ref $array */ doc 
> comment to the stub of the relevant functions.
> Something that array_multisort() already does.
> But thanks for reminding me how crap PHP references are :D
>
> I'll amend the RFC soon to include this change as part of the proposal.
>
> Best regards,
>
> Gina P. Banyard

Hi, Gina!

What if instead of this proposal we reimplement all of the array
functions in a different namespace and fix a lot of other problems and
inconsistencies?

Array\map(iterable $iterable, callable $mapper): array
Array\sort(iterable $iterable, int $flags = SORT_REGULAR): array
...

-- 
Valentin


Re: [PHP-DEV] [RFC] Change behaviour of array sort functions to return a copy of the sorted array

2024-10-21 Thread Morgan

On 2024-10-22 00:17, mickmackusa wrote:
On Mon, 21 Oct 2024, 18:09 Morgan, > wrote:


You can’t use:
         $sorted_datasets = array_map(sort(...), $datasets);
You want
         $sorted_datasets = $datasets;
         array_walk($sorted_datasets, sort(...));


A warning: no one should ever use array_walk($sorted_datasets, 
sort(...)); as general-use script.


When sorting a 2d array in this fashion (only non-fatally executed with 
numeric first level keys https://3v4l.org/HaU42 ), the first level keys will be used as the sorting flag while 
sorting each row.  This means that different rows may have different 
sorting flags applied -- effectively corrupting the result. 
https://3v4l.org/FeIpj  -- notice how rows with 
keys 2, 5, and 10 are sorted as strings.


Mick

Right, but the issue was that array_walk() is the only 
callback-parameterised array_* function which is even _compatible_ with 
sort(). As you point out, even then it's problematic.


Re: [PHP-DEV] [RFC] Change behaviour of array sort functions to return a copy of the sorted array

2024-10-21 Thread Rowan Tommins [IMSoP]

On 21/10/2024 14:20, Gina P. Banyard wrote:

This RFC does*NOT* change the by-ref parameter passing*NOR* the in-place 
sorting of these functions.
I don't see why people are thinking this would be changed, as I only ever talk 
about the return value.
But I added a sentence in the Unaffected PHP Functionality section.



I think maybe people are saying that the RFC *should* aim to do that, in 
some form.


I absolutely agree that the current design is unfortunate; but I worry 
that bugs will be introduced by people not realising that the function 
both modifies and returns its argument.


Take the classic example of why DateTimeImmutable exists:

$now = new DateTime;
$tomorrow = $now->modify('+1 day');
// $tomorrow has the expected value, but $now does not

And then compare to the proposed behaviour of sort():

$names_as_entered = explode(',', $_POST['names']);
$names_sorted = sort($names_as_entered);
// oops, we just lost our original order


If someone wants to write that code (i.e. make a sorted copy), they'd 
need to force a clone on the array somehow, e.g.


$names_as_entered = explode(',', $_POST['names']);
$names_sorted = sort(iterator_to_array($names_as_entered));

Which is probably less clear than the existing version:

$names_as_entered = explode(',', $_POST['names']);
$names_sorted = $names_as_entered;
sort($names_sorted);


The same bug can easily hide inside less obviously imperative code. This 
would remove the highest 10 numbers from a list, leaving others untouched:


$result = array_filter(
    array: $data,
    callback: fn($item) => !in_array($item, array_slice(rsort($data, 
SORT_NUMERIC), 0, 10))

);

This looks like a safe refactor for performance or readability, but 
would lose the original keys and order:


$top_10 = array_slice(rsort($data, SORT_NUMERIC), 0, 10);
$result = array_filter(
    array: $data,
    callback: fn($item) => !in_array($item, $top_10)
);

[Here's a demo: https://3v4l.org/9Nieo For anyone confused by this 
example, the first version works because the fn() closure captures $data 
by-value, effectively cloning it.]



Regards,

--
Rowan Tommins
[IMSoP]


Re: [PHP-DEV] [RFC] Change behaviour of array sort functions to return a copy of the sorted array

2024-10-21 Thread mickmackusa
On Tue, 22 Oct 2024, 02:30 Valentin Udaltsov, 
wrote:

>
> What if instead of this proposal we reimplement all of the array
> functions in a different namespace and fix a lot of other problems and
> inconsistencies?
>
> Array\map(iterable $iterable, callable $mapper): array
> Array\sort(iterable $iterable, int $flags = SORT_REGULAR): array
> ...
>
> --
> Valentin


Getting off-topic here, but array_map()'s parameter order matters.  Because
it can receive one or more arrays, the callback should be the first
parameter to afford, say, spreading the rows of a multidimensional array
during transposition.  The same "spreadability" logic should apply to
array_*diff*(), array_*intersect*(), and array_multisort().  If we were
going for consistency, always write callbacks as the first parameter for
any function that has one. In other words, parameters which can be repeated
more than once must be list as the last parameter.

It is clear that this RFC is seeking success by making the smallest
possible, very practical change. While it is healthy to explore deviations,
I like the RFC on it's own because it makes PHP more adaptable to writing
elegant code with fewer temporary variables.

Mick

>


Re: [PHP-DEV] [RFC] Change behaviour of array sort functions to return a copy of the sorted array

2024-10-21 Thread Bilge

On 21/10/2024 21:41, mickmackusa wrote:
On Tue, 22 Oct 2024, 02:30 Valentin Udaltsov, 
 wrote:



What if instead of this proposal we reimplement all of the array
functions in a different namespace and fix a lot of other problems and
inconsistencies?

Array\map(iterable $iterable, callable $mapper): array
Array\sort(iterable $iterable, int $flags = SORT_REGULAR): array
...

-- 
Valentin



Getting off-topic here
The whole discussion thread is "off-topic" in the context of the RFC, 
but I don't consider that a problem; it's a useful discussion to have 
whether the RFC sparked it or not.


Regarding the Array\map(iterable $iterable, ...) signature, if new 
functions are to accept iterables instead of arrays (as they absolutely 
should) then the `Array` namespace makes no sense. But at that point 
we're basically just putting https://github.com/nikic/iter in core (not 
that that's a bad thing).


Cheers,
Bilge