On Fri, Oct 24, 2025, at 21:34, Jason Marble wrote:
> Hello everybody!
> 
> I'd like to open a discussion regarding the behavior of `array_unique()` with 
> the `SORT_REGULAR` flag when used on arrays containing mixed types.
> 
> Currently, `SORT_REGULAR` uses non-strict comparisons, which can lead to 
> unintentional data loss when values like `100` and `"100"` are treated as 
> duplicates. This forces developers to implement user-land workarounds.
> 
> Here is a common scenario where this behavior is problematic:
> 
> ```php
> $events = [
>     ['id' => 100, 'type' => 'user.login'],        // User event (int)
>     ['id' => "100", 'type' => 'system.migration'],  // System event (string)
>     ['id' => 100, 'type' => 'user.login'],        // Duplicate user event
> ];
> 
> $event_ids = array_column($events, 'id'); // [100, "100", 100]
> 
> // Current behavior with SORT_REGULAR
> $unique_ids = array_unique($event_ids, SORT_REGULAR); // Result: [100]
> // The string "100" is lost due to type coercion.
> ```
> 
> To address this, I propose adding a new flag, `SORT_STRICT`, which would use 
> strict (`===`) comparisons to differentiate between values of different types.
> 
> With the new flag, the result would be:
> 
> ```php
> // Proposed behavior with SORT_STRICT
> $unique_ids = array_unique($event_ids, SORT_STRICT); // Result: [100, "100"]
> // Both integer and string values are preserved.
> ```
> 
> I've already submitted a PR to correct the bug I just highlighted:
> PR: https://github.com/php/php-src/pull/20273
> The potential for a `SORT_NATURAL` flag also came to mind as another useful 
> addition, but I believe `SORT_STRICT` is the more critical feature to discuss 
> first.
> 
> I look forward to your feedback.
> 
> Thanks,  
> - Jason

Hi Jason,

Other than the bytes in memory and how they’re laid out, I fail to see how 100 
is different from 100. They’re conceptually identical, and array_* functions 
generally behave by value, not by identity. I think it’s probably wise to take 
a step back here and evaluate the knock-on effects of something like this:

SORT_REGULAR has some warts, it isn’t perfect. Having a SORT_STRICT sounds 
kinda nice until you start thinking about it a bit. This parameter has 
traditionally been used to indicate a "comparison mode" that describes how to 
compare values. Strict identity is on a completely different axis (they can’t 
be less/greater than; objects aren’t *strictly* comparable, but they’re loosely 
comparable, 1.0 is strictly comparable to 1 or "1"). Further, it begs the 
question: "can I get a SORT_STRICT_NUMERIC" or "can I get a 
SORT_STRICT_STRING", which further indicates this is a completely different 
axis altogether than "just" a different comparison mode.

As to your example, it conflates two namespaces of Ids — user ids and system 
ids — into a single untyped bag, then asks array_unique() to preserve that 
boundary. This is a domain distinction, not a language problem. Simply removing 
your array_column() step in your example arrives at your desired solution.

— Rob

Reply via email to