On Sat, Oct 25, 2025, at 10:23, Rob Landers wrote: > On Fri, Oct 24, 2025, at 21:34, Jason Marble wrote: >> Hello everybody! >> >> I'd like to open a discussion regarding the behavior of `array_unique()` >> with the `SORT_REGULAR` flag when used on arrays containing mixed types. >> >> Currently, `SORT_REGULAR` uses non-strict comparisons, which can lead to >> unintentional data loss when values like `100` and `"100"` are treated as >> duplicates. This forces developers to implement user-land workarounds. >> >> Here is a common scenario where this behavior is problematic: >> >> ```php >> $events = [ >> ['id' => 100, 'type' => 'user.login'], // User event (int) >> ['id' => "100", 'type' => 'system.migration'], // System event (string) >> ['id' => 100, 'type' => 'user.login'], // Duplicate user event >> ]; >> >> $event_ids = array_column($events, 'id'); // [100, "100", 100] >> >> // Current behavior with SORT_REGULAR >> $unique_ids = array_unique($event_ids, SORT_REGULAR); // Result: [100] >> // The string "100" is lost due to type coercion. >> ``` >> >> To address this, I propose adding a new flag, `SORT_STRICT`, which would use >> strict (`===`) comparisons to differentiate between values of different >> types. >> >> With the new flag, the result would be: >> >> ```php >> // Proposed behavior with SORT_STRICT >> $unique_ids = array_unique($event_ids, SORT_STRICT); // Result: [100, "100"] >> // Both integer and string values are preserved. >> ``` >> >> I've already submitted a PR to correct the bug I just highlighted: >> PR: https://github.com/php/php-src/pull/20273 >> The potential for a `SORT_NATURAL` flag also came to mind as another useful >> addition, but I believe `SORT_STRICT` is the more critical feature to >> discuss first. >> >> I look forward to your feedback. >> >> Thanks, >> - Jason > > Hi Jason, > > Other than the bytes in memory and how they’re laid out, I fail to see how > 100 is different from 100. They’re conceptually identical, and array_* > functions generally behave by value, not by identity. I think it’s probably > wise to take a step back here and evaluate the knock-on effects of something > like this: > > SORT_REGULAR has some warts, it isn’t perfect. Having a SORT_STRICT sounds > kinda nice until you start thinking about it a bit. This parameter has > traditionally been used to indicate a "comparison mode" that describes how to > compare values. Strict identity is on a completely different axis (they can’t > be less/greater than; objects aren’t *strictly* comparable, but they’re > loosely comparable, 1.0 is strictly comparable to 1 or "1"). Further, it begs > the question: "can I get a SORT_STRICT_NUMERIC" or "can I get a > SORT_STRICT_STRING", which further indicates this is a completely different > axis altogether than "just" a different comparison mode. > > As to your example, it conflates two namespaces of Ids — user ids and system > ids — into a single untyped bag, then asks array_unique() to preserve that > boundary. This is a domain distinction, not a language problem. Simply > removing your array_column() step in your example arrives at your desired > solution. > > — Rob
I mis-typed this: > they can’t be less/greater than; objects aren’t *strictly* comparable, but > they’re loosely comparable, 1.0 is strictly comparable to 1 or "1" It should have read: > they can’t be less/greater than; objects aren’t *strictly* comparable, but > they’re loosely comparable, 1.0 is *not* strictly comparable to 1 or "1" PS. Speaking of "bytes in memory", it might be better to propose a SORT_BINARY. It has the same effect you’re looking for, but arrays of bytes have a lexicographical ordering. — Rob
