Dmitry Stogov wrote on 21/10/2014 10:01:
The "right" approach, would be extending zend_string with "encoding" and
then adopting near all functions working with zend_string to take
"encoding" into account. But, of course, this is going to lead to much more
complicated solution (with some slowdown).
Isn't that kind of what ext/mbstring does?
I think that treating Unicode as nothing more than an encoding, and
trying to hide all its complexity from the user, is not particularly
wise. Unicode isn't just "ASCII, but bigger", so keeping the same API
but making the implementation "work" with more characters isn't really
"Unicode support".
For instance, what does "allowing Unicode strings as array keys"
actually mean? We already allow pretty much any sequence of bytes as an
array key, so what we're actually talking about is that array-handling
functions should be somehow "Unicode aware". In the case of sorting
functions, that means a mechanism for selecting a collation, even if you
know how the strings are encoded.
There are a handful of operations which have an obvious meaning under
Unicode - strtoupper(), for instance. It might be nice if those worked
transparently with UStrings, but I don't think that really constitutes
"complete Unicode support" either.
I think we're going to keep going round in circles unless we can really
pin down what it means for a language to "support Unicode".
--
Rowan Collins
[IMSoP]
--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php