On Sat, Oct 25, 2025, at 10:23, Rob Landers wrote:
> On Fri, Oct 24, 2025, at 21:34, Jason Marble wrote:
>> Hello everybody!
>> 
>> I'd like to open a discussion regarding the behavior of `array_unique()` 
>> with the `SORT_REGULAR` flag when used on arrays containing mixed types.
>> 
>> Currently, `SORT_REGULAR` uses non-strict comparisons, which can lead to 
>> unintentional data loss when values like `100` and `"100"` are treated as 
>> duplicates. This forces developers to implement user-land workarounds.
>> 
>> Here is a common scenario where this behavior is problematic:
>> 
>> ```php
>> $events = [
>>     ['id' => 100, 'type' => 'user.login'],        // User event (int)
>>     ['id' => "100", 'type' => 'system.migration'],  // System event (string)
>>     ['id' => 100, 'type' => 'user.login'],        // Duplicate user event
>> ];
>> 
>> $event_ids = array_column($events, 'id'); // [100, "100", 100]
>> 
>> // Current behavior with SORT_REGULAR
>> $unique_ids = array_unique($event_ids, SORT_REGULAR); // Result: [100]
>> // The string "100" is lost due to type coercion.
>> ```
>> 
>> To address this, I propose adding a new flag, `SORT_STRICT`, which would use 
>> strict (`===`) comparisons to differentiate between values of different 
>> types.
>> 
>> With the new flag, the result would be:
>> 
>> ```php
>> // Proposed behavior with SORT_STRICT
>> $unique_ids = array_unique($event_ids, SORT_STRICT); // Result: [100, "100"]
>> // Both integer and string values are preserved.
>> ```
>> 
>> I've already submitted a PR to correct the bug I just highlighted:
>> PR: https://github.com/php/php-src/pull/20273
>> The potential for a `SORT_NATURAL` flag also came to mind as another useful 
>> addition, but I believe `SORT_STRICT` is the more critical feature to 
>> discuss first.
>> 
>> I look forward to your feedback.
>> 
>> Thanks,  
>> - Jason
> 
> Hi Jason,
> 
> Other than the bytes in memory and how they’re laid out, I fail to see how 
> 100 is different from 100. They’re conceptually identical, and array_* 
> functions generally behave by value, not by identity. I think it’s probably 
> wise to take a step back here and evaluate the knock-on effects of something 
> like this:
> 
> SORT_REGULAR has some warts, it isn’t perfect. Having a SORT_STRICT sounds 
> kinda nice until you start thinking about it a bit. This parameter has 
> traditionally been used to indicate a "comparison mode" that describes how to 
> compare values. Strict identity is on a completely different axis (they can’t 
> be less/greater than; objects aren’t *strictly* comparable, but they’re 
> loosely comparable, 1.0 is strictly comparable to 1 or "1"). Further, it begs 
> the question: "can I get a SORT_STRICT_NUMERIC" or "can I get a 
> SORT_STRICT_STRING", which further indicates this is a completely different 
> axis altogether than "just" a different comparison mode.
> 
> As to your example, it conflates two namespaces of Ids — user ids and system 
> ids — into a single untyped bag, then asks array_unique() to preserve that 
> boundary. This is a domain distinction, not a language problem. Simply 
> removing your array_column() step in your example arrives at your desired 
> solution.
> 
> — Rob

I mis-typed this:

> they can’t be less/greater than; objects aren’t *strictly* comparable, but 
> they’re loosely comparable, 1.0 is strictly comparable to 1 or "1"

It should have read:

> they can’t be less/greater than; objects aren’t *strictly* comparable, but 
> they’re loosely comparable, 1.0 is *not* strictly comparable to 1 or "1"

PS. Speaking of "bytes in memory", it might be better to propose a SORT_BINARY. 
It has the same effect you’re looking for, but arrays of bytes have a 
lexicographical ordering.

— Rob

Reply via email to