ID: 48322
Updated by: [email protected]
Reported By: netspy at me dot com
Status: Wont fix
Bug Type: *Unicode Issues
Operating System: Mac OS X
PHP Version: 5.2.9
New Comment:
The code for this function is just:
RETURN_LONG(strcoll((const char *) Z_STRVAL_PP(s1),
(const char *) Z_STRVAL_PP(s2)));
We use the underlying system strcoll function. There is nothing for us
to fix here. If your system's strcoll function is broken, you are out
of luck. OSX has a long history of buggy C99 functions and it wouldn't
surprise me if the strcoll function doesn't handle UTF8 locales
correctly. But that still isn't something we can fix short of doing an
OS-specific hack here which we try to avoid.
Previous Comments:
------------------------------------------------------------------------
[2009-05-19 14:03:13] netspy at me dot com
What is your result on Linux? Do you saved the test file with UTF-8
coding?
Because strcoll is basically a C function, I can't see why it is a PHP
Unicode issue and why you close the bug as Wont fix.
------------------------------------------------------------------------
[2009-05-19 12:58:18] [email protected]
I get the wrong order on Linux. Did you mix the results there? Anyways,
this really is a problem in unicode support. To get _really_ working
stuff, use the intl extension or wait for PHP 6. Wont fix.
------------------------------------------------------------------------
[2009-05-19 12:35:29] netspy at me dot com
On Linux strcoll works fine, I get only on Mac OS X (BSD) a false
order. I also test it with a ISO 8859-1 string and locale
de_DE.ISO8859-1. The same result, on Linux correct, on Mac OS X wrong.
So I think it's not a Unicode issue!
Here is another test code:
$string_utf = "abcdefghijklmnopqrstuvwxyzäöüß";
$string_iso = utf8_decode($string_utf);
$array_utf = array(); $array_iso = array();
for ($i=0; $i<mb_strlen($string_utf, 'UTF-8'); $i++) {
$array_utf[]=mb_substr($string_utf, $i, 1, 'UTF-8');
$array_iso[]=substr($string_iso, $i, 1);
}
print("\nLocale: " . setlocale(LC_COLLATE, 'de_DE.UTF-8'));
usort($array_utf, 'strcoll');
print("\n" . implode('', $array_utf) . "\n");
print("\nLocale: " . setlocale(LC_COLLATE, 'de_DE.ISO8859-1'));
usort($array_iso, 'strcoll');
print("\n" . utf8_encode(implode('', $array_iso)) . "\n");
The result on Mac OS X:
Locale: de_DE.UTF-8
abcdefghijklmnopqrstuvwxyzßäöü
Locale: de_DE.ISO8859-1
abcdefghijklmnopqrstuvwxyzßäöü
And the Linux result:
Locale: de_DE.UTF-8
aäbcdefghijklmnoöpqrsßtuüvwxyz
Locale: de_DE.ISO8859-1
aäbcdefghijklmnoöpqrsßtuüvwxyz
------------------------------------------------------------------------
[2009-05-19 10:50:59] [email protected]
It doesn't work on any system below PHP 6. You can always use the intl
extension from PECL while waiting for proper unicode support:
http://pecl.php.net/intl
Using the collator (http://php.net/collator) you can achieve sorting
with any locales.
------------------------------------------------------------------------
[2009-05-18 22:37:22] netspy at me dot com
Description:
------------
strcoll() does not sort UTF-8 strings correctly on Mac OS X.
Reproduce code:
---------------
$locale = 'de_DE.UTF-8';
$string = "abcdefghijklmnopqrstuvwxyzäöüß";
$array = array();
for ($i=0; $i<mb_strlen($string, 'UTF-8'); $i++) {
$array[]=mb_substr($string, $i, 1, 'UTF-8');
}
$oldLocale = setlocale(LC_COLLATE, "0");
print("\nOld: $oldLocale New: ");
print(setlocale(LC_COLLATE, $locale));
usort($array, 'strcoll');
setlocale(LC_COLLATE, $oldLocale);
print("\n" . implode('', $array) . "\n");
Expected result:
----------------
Old: C New: de_DE.UTF-8
aäbcdefghijklmnoöpqrsßtuüvwxyz
Actual result:
--------------
Old: C New: de_DE.UTF-8
abcdefghijklmnopqrstuvwxyzßäöü
------------------------------------------------------------------------
--
Edit this bug report at http://bugs.php.net/?id=48322&edit=1