ID: 48322
User updated by: netspy at me dot com
Reported By: netspy at me dot com
-Status: Closed
+Status: Open
Bug Type: *Unicode Issues
Operating System: Mac OS X
PHP Version: 5.2.9
New Comment:
On Linux strcoll works fine, I get only on Mac OS X (BSD) a false
order. I also test it with a ISO 8859-1 string and locale
de_DE.ISO8859-1. The same result, on Linux correct, on Mac OS X wrong.
So I think it's not a Unicode issue!
Here is another test code:
$string_utf = "abcdefghijklmnopqrstuvwxyzäöüß";
$string_iso = utf8_decode($string_utf);
$array_utf = array(); $array_iso = array();
for ($i=0; $i<mb_strlen($string_utf, 'UTF-8'); $i++) {
$array_utf[]=mb_substr($string_utf, $i, 1, 'UTF-8');
$array_iso[]=substr($string_iso, $i, 1);
}
print("\nLocale: " . setlocale(LC_COLLATE, 'de_DE.UTF-8'));
usort($array_utf, 'strcoll');
print("\n" . implode('', $array_utf) . "\n");
print("\nLocale: " . setlocale(LC_COLLATE, 'de_DE.ISO8859-1'));
usort($array_iso, 'strcoll');
print("\n" . utf8_encode(implode('', $array_iso)) . "\n");
The result on Mac OS X:
Locale: de_DE.UTF-8
abcdefghijklmnopqrstuvwxyzßäöü
Locale: de_DE.ISO8859-1
abcdefghijklmnopqrstuvwxyzßäöü
And the Linux result:
Locale: de_DE.UTF-8
aäbcdefghijklmnoöpqrsßtuüvwxyz
Locale: de_DE.ISO8859-1
aäbcdefghijklmnoöpqrsßtuüvwxyz
Previous Comments:
------------------------------------------------------------------------
[2009-05-19 10:50:59] [email protected]
It doesn't work on any system below PHP 6. You can always use the intl
extension from PECL while waiting for proper unicode support:
http://pecl.php.net/intl
Using the collator (http://php.net/collator) you can achieve sorting
with any locales.
------------------------------------------------------------------------
[2009-05-18 22:37:22] netspy at me dot com
Description:
------------
strcoll() does not sort UTF-8 strings correctly on Mac OS X.
Reproduce code:
---------------
$locale = 'de_DE.UTF-8';
$string = "abcdefghijklmnopqrstuvwxyzäöüß";
$array = array();
for ($i=0; $i<mb_strlen($string, 'UTF-8'); $i++) {
$array[]=mb_substr($string, $i, 1, 'UTF-8');
}
$oldLocale = setlocale(LC_COLLATE, "0");
print("\nOld: $oldLocale New: ");
print(setlocale(LC_COLLATE, $locale));
usort($array, 'strcoll');
setlocale(LC_COLLATE, $oldLocale);
print("\n" . implode('', $array) . "\n");
Expected result:
----------------
Old: C New: de_DE.UTF-8
aäbcdefghijklmnoöpqrsßtuüvwxyz
Actual result:
--------------
Old: C New: de_DE.UTF-8
abcdefghijklmnopqrstuvwxyzßäöü
------------------------------------------------------------------------
--
Edit this bug report at http://bugs.php.net/?id=48322&edit=1