Hello everybody!
I'd like to open a discussion regarding the behavior of `array_unique()`
with the `SORT_REGULAR` flag when used on arrays containing mixed types.
Currently, `SORT_REGULAR` uses non-strict comparisons, which can lead to
unintentional data loss when values like `100` and `"100"` are treated as
duplicates. This forces developers to implement user-land workarounds.
Here is a common scenario where this behavior is problematic:
```php
$events = [
['id' => 100, 'type' => 'user.login'], // User event (int)
['id' => "100", 'type' => 'system.migration'], // System event (string)
['id' => 100, 'type' => 'user.login'], // Duplicate user event
];
$event_ids = array_column($events, 'id'); // [100, "100", 100]
// Current behavior with SORT_REGULAR
$unique_ids = array_unique($event_ids, SORT_REGULAR); // Result: [100]
// The string "100" is lost due to type coercion.
```
To address this, I propose adding a new flag, `SORT_STRICT`, which would
use strict (`===`) comparisons to differentiate between values of different
types.
With the new flag, the result would be:
```php
// Proposed behavior with SORT_STRICT
$unique_ids = array_unique($event_ids, SORT_STRICT); // Result: [100, "100"]
// Both integer and string values are preserved.
```
I've already submitted a PR to correct the bug I just highlighted:
PR: https://github.com/php/php-src/pull/20273
The potential for a `SORT_NATURAL` flag also came to mind as another useful
addition, but I believe `SORT_STRICT` is the more critical feature to
discuss first.
I look forward to your feedback.
Thanks,
- Jason