From: Alexandru Pătrănescu <dreal...@gmail.com> Sent: Wednesday, February 1, 2023 10:53 AM To: Sergii Shymko <ser...@shymko.net> Cc: Marco Pivetta <ocram...@gmail.com>; PHP internals <internals@lists.php.net> Subject: Re: [PHP-DEV] RFC proposal: function array_filter_list() to avoid subtle bugs/workarounds
On Wed, Feb 1, 2023, 19:59 Sergii Shymko <ser...@shymko.net<mailto:ser...@shymko.net>> wrote: Hi Marco, ________________________________ From: Marco Pivetta <ocram...@gmail.com<mailto:ocram...@gmail.com>> Sent: Wednesday, February 1, 2023 9:25 AM To: Sergii Shymko <ser...@shymko.net<mailto:ser...@shymko.net>> Cc: internals@lists.php.net<mailto:internals@lists.php.net> <internals@lists.php.net<mailto:internals@lists.php.net>> Subject: Re: [PHP-DEV] RFC proposal: function array_filter_list() to avoid subtle bugs/workarounds Hey Sergii, On Wed, 1 Feb 2023 at 18:22, Sergii Shymko <ser...@shymko.net<mailto:ser...@shymko.net><mailto:ser...@shymko.net<mailto:ser...@shymko.net>>> wrote: Hi, After programming in PHP for two decades, my goal for 2023 is to try to contribute to the language. The plan is to start small and, if successful, work my way up increasing complexity of proposals. This topic has been chosen for starters, because IMO it strikes a good balance between simplicity and usefulness. I should be able to implement the RFC myself, unless some deep OPcache/JIT nuances pop up. Let me give you a brief overview of the problem and the proposed solution. Function array_is_list() added in PHP 8.1 introduces the concept of a "list" – array having 0..count-1 indexes. The function is awesome and array "lists" are completely compatible with all array_* functions! However, function array_filter() exhibits a nuanced behavior when filtering lists. For instance, it preserves array keys which may (or may not) create gaps in sequential indexes. These gaps mean that a filtered list is not a list anymore as validated by array_is_list(). For example: $originalList = ['first', '', 'last']; $filteredList = array_filter($originalList); var_export(filteredList); // array(0 => 'first', 2 => 'last') var_export(array_is_list($originalList)); // true var_export(array_is_list($filteredList)); // false The behavior is counterintuitive and can lead to subtle bugs, such as encoding issues: echo json_encode($originalList); // ["first", "", "last"] echo json_encode($filteredList); // {"0": "first", "2": "last"} The workaround is to post-process the filtered array with array_values() to reset the indexes. The proposal is to introduce a function array_filter_list() that would work solely on lists. It will have the same signature as array_filter() and will always return a valid list. See a draft RFC with more details here: https://dev.to/sshymko/php-rfc-arrayfilterlist-function-35mb-temp-slug-7074000?preview=21d6760126a02464b0511498bbb95749150afb17a7ff6377c458ee54a8f57cfe00d4e258aa06bad3232c0dd9e73a2d62138fc990048987e9e2339a3d I just registered a wiki account "sshymko" with the intention of submitting the RFC. Could someone please approve the account and give it some karma? Looking forward to collaborating with the internals team! 🙂 I don't want to shoot this down too early, but: 1. why in the language, when a simple userland function suffices? 2. what's wrong with writing `array_values(array_filter(...))`? Marco Pivetta https://twitter.com/Ocramius https://ocramius.github.io/ I do understand the aversion to adding more functions to PHP core 🙂 I think that's rather simple from the discussions I've seen here in the past years: C code is harder to write, maintain and understand while PHP code is simpler and is also not bound to php versioning so it can evolve faster. Of course, given that the performance is similar. When the performance is greatly better when having the function native, it makes more sense, like it was the case with array_is_list(), where the information exists as a flag internally almost always. I would suggest you check https://github.com/azjezz/psl/blob/next/src/Psl/Vec/filter.php#L34 and maybe use the library for other features as well. Or you can actually do the implementation and show also some numbers regarding the performance to have a better chance of having the rfc pass. That's what I observed it helps. Regards, Alex Let me try to articulate the answers to your questions: 1. A userland function is enough, it's almost always possible to create a userland function. However, pretty much every single PHP project would need it and use pretty often. It would be great to standardize the function name, especially now that it compliments array_is_list(). The same userland argument can be made for array_is_list(), yet people are appreciating the function. The reasoning for adding array_filter_list() to the core is similar to array_is_list(). 2. The array_values(array_filter()) workaroud is usable and is indeed used on the projects I'm involved with. It's a relatively short one-liner allowing to not go to the extent of wrapping it in a userland function. Some of the disadvantages include having to always mentally translate it in your head to "filter list". The worst thing is that 9/10 times developers (me included) would overlook this nuanced behavior. Logically, people assume that a subset of a list is a list, which unfortunately isn't always the case. They'd forget to implement array_values() and cause bugs that are not immediately detectable. At least 3-4 of projects in my recent memory experienced this issue, it got overlooked at code review. Even automated tests did not catch it as the issue remains hidden when filtering out the tail end of a list. After brainstorming with developers, we concluded formalizing the concept of a "list" more would help. Having functions array_is_list() and array_filter_list() would make you think in terms of lists more. One would use array_filter_list() instead of array_filter() by default unless they work with assoc arrays. Maybe there could be an even more advanced approach: formalizing the list data type. But it would be an overkill to propose a major change like this at this time. Regards, Sergii Thanks for your feedback, Alex. Performance of array_filter_list() will be identical to array_filter() without the overhead of array_values(). Not sure that would be significant enough of a difference to be comparable with array_is_list(). I may implement this just to practice C and will be able to provide exact measurements then.