Hi Marco,

________________________________
From: Marco Pivetta <ocram...@gmail.com>
Sent: Wednesday, February 1, 2023 9:25 AM
To: Sergii Shymko <ser...@shymko.net>
Cc: internals@lists.php.net <internals@lists.php.net>
Subject: Re: [PHP-DEV] RFC proposal: function array_filter_list() to avoid 
subtle bugs/workarounds

Hey Sergii,


On Wed, 1 Feb 2023 at 18:22, Sergii Shymko 
<ser...@shymko.net<mailto:ser...@shymko.net>> wrote:
Hi,

After programming in PHP for two decades, my goal for 2023 is to try to 
contribute to the language.
The plan is to start small and, if successful, work my way up increasing 
complexity of proposals.
This topic has been chosen for starters, because IMO it strikes a good balance 
between simplicity and usefulness.
I should be able to implement the RFC myself, unless some deep OPcache/JIT 
nuances pop up.

Let me give you a brief overview of the problem and the proposed solution.
Function array_is_list() added in PHP 8.1 introduces the concept of a "list" – 
array having 0..count-1 indexes.
The function is awesome and array "lists" are completely compatible with all 
array_* functions!
However, function array_filter() exhibits a nuanced behavior when filtering 
lists.
For instance, it preserves array keys which may (or may not) create gaps in 
sequential indexes.
These gaps mean that a filtered list is not a list anymore as validated by 
array_is_list().

For example:
$originalList = ['first', '', 'last'];
$filteredList = array_filter($originalList);
var_export(filteredList);  // array(0 => 'first', 2 => 'last')
var_export(array_is_list($originalList));  // true
var_export(array_is_list($filteredList));  // false

The behavior is counterintuitive and can lead to subtle bugs, such as encoding 
issues:
echo json_encode($originalList);  // ["first", "", "last"]
echo json_encode($filteredList);  // {"0": "first", "2": "last"}

The workaround is to post-process the filtered array with array_values() to 
reset the indexes.
The proposal is to introduce a function array_filter_list() that would work 
solely on lists.
It will have the same signature as array_filter() and will always return a 
valid list.

See a draft RFC with more details here:
https://dev.to/sshymko/php-rfc-arrayfilterlist-function-35mb-temp-slug-7074000?preview=21d6760126a02464b0511498bbb95749150afb17a7ff6377c458ee54a8f57cfe00d4e258aa06bad3232c0dd9e73a2d62138fc990048987e9e2339a3d

I just registered a wiki account "sshymko" with the intention of submitting the 
RFC.
Could someone please approve the account and give it some karma?

Looking forward to collaborating with the internals team! 🙂


I don't want to shoot this down too early, but:

 1. why in the language, when a simple userland function suffices?
 2. what's wrong with writing `array_values(array_filter(...))`?

Marco Pivetta

https://twitter.com/Ocramius

https://ocramius.github.io/


I do understand the aversion to adding more functions to PHP core 🙂
Let me try to articulate the answers to your questions:

  1.  A userland function is enough, it's almost always possible to create a 
userland function.
However, pretty much every single PHP project would need it and use pretty 
often.
It would be great to standardize the function name, especially now that it 
compliments array_is_list().
The same userland argument can be made for array_is_list(), yet people are 
appreciating the function.
The reasoning for adding array_filter_list() to the core is similar to 
array_is_list().
  2.  The array_values(array_filter()) workaroud is usable and is indeed used 
on the projects I'm involved with.
It's a relatively short one-liner allowing to not go to the extent of wrapping 
it in a userland function.
Some of the disadvantages include having to always mentally translate it in 
your head to "filter list".
The worst thing is that 9/10 times developers (me included) would overlook this 
nuanced behavior.
Logically, people assume that a subset of a list is a list, which unfortunately 
isn't always the case.
They'd forget to implement array_values() and cause bugs that are not 
immediately detectable.
At least 3-4 of projects in my recent memory experienced this issue, it got 
overlooked at code review.
Even automated tests did not catch it as the issue remains hidden when 
filtering out the tail end of a list.
After brainstorming with developers, we concluded formalizing the concept of a 
"list" more would help.
Having functions array_is_list() and array_filter_list() would make you think 
in terms of lists more.
One would use array_filter_list() instead of array_filter() by default unless 
they work with assoc arrays.

Maybe there could be an even more advanced approach: formalizing the list data 
type.
But it would be an overkill to propose a major change like this at this time.

Regards,
Sergii

Reply via email to