On Fri, Oct 24, 2025, at 21:34, Jason Marble wrote: > Hello everybody! > > I'd like to open a discussion regarding the behavior of `array_unique()` with > the `SORT_REGULAR` flag when used on arrays containing mixed types. > > Currently, `SORT_REGULAR` uses non-strict comparisons, which can lead to > unintentional data loss when values like `100` and `"100"` are treated as > duplicates. This forces developers to implement user-land workarounds. > > Here is a common scenario where this behavior is problematic: > > ```php > $events = [ > ['id' => 100, 'type' => 'user.login'], // User event (int) > ['id' => "100", 'type' => 'system.migration'], // System event (string) > ['id' => 100, 'type' => 'user.login'], // Duplicate user event > ]; > > $event_ids = array_column($events, 'id'); // [100, "100", 100] > > // Current behavior with SORT_REGULAR > $unique_ids = array_unique($event_ids, SORT_REGULAR); // Result: [100] > // The string "100" is lost due to type coercion. > ``` > > To address this, I propose adding a new flag, `SORT_STRICT`, which would use > strict (`===`) comparisons to differentiate between values of different types. > > With the new flag, the result would be: > > ```php > // Proposed behavior with SORT_STRICT > $unique_ids = array_unique($event_ids, SORT_STRICT); // Result: [100, "100"] > // Both integer and string values are preserved. > ``` > > I've already submitted a PR to correct the bug I just highlighted: > PR: https://github.com/php/php-src/pull/20273 > The potential for a `SORT_NATURAL` flag also came to mind as another useful > addition, but I believe `SORT_STRICT` is the more critical feature to discuss > first. > > I look forward to your feedback. > > Thanks, > - Jason
Hi Jason, Other than the bytes in memory and how they’re laid out, I fail to see how 100 is different from 100. They’re conceptually identical, and array_* functions generally behave by value, not by identity. I think it’s probably wise to take a step back here and evaluate the knock-on effects of something like this: SORT_REGULAR has some warts, it isn’t perfect. Having a SORT_STRICT sounds kinda nice until you start thinking about it a bit. This parameter has traditionally been used to indicate a "comparison mode" that describes how to compare values. Strict identity is on a completely different axis (they can’t be less/greater than; objects aren’t *strictly* comparable, but they’re loosely comparable, 1.0 is strictly comparable to 1 or "1"). Further, it begs the question: "can I get a SORT_STRICT_NUMERIC" or "can I get a SORT_STRICT_STRING", which further indicates this is a completely different axis altogether than "just" a different comparison mode. As to your example, it conflates two namespaces of Ids — user ids and system ids — into a single untyped bag, then asks array_unique() to preserve that boundary. This is a domain distinction, not a language problem. Simply removing your array_column() step in your example arrives at your desired solution. — Rob
