This looks good. When the unicode_semantics switch is turned on, it
provides Unicode everywhere development solution to PHP developers.
The output, input, and script encodings are all utf-8 by default.
The internal encoding is utf-16 which allows developers to handle
surrogate pairs correctly.

I have one issue and one question.

"HTTP Input Encoding
...
If the HTTP request contains the encoding specification in the headers,
then it will be used instead of this setting."

With my best knowledge there isn't such http request header which
specifies the encoding of the request. In case the intent is to honor
the ACCEPT-CHARSET, it may cause a problem because browsers don't
gurantee the encoding in the ACCEPT-CHARSET is same as the encoding
used to escape characters in the URL query string. After all, the
ACCEPT-CHARSET is to specify the character encodings acceptable for
the response.


"Upgrading Existing Functions"

It seems that all the existing functions need to be upgraded to work properly when unicode_semantics switch is turned on becuase it changes the semantics of
fundamental functions. I'm assuming all the existing functions don't work
properly if fundamental functions such as strlen() behave differently.

Is there any way to keep the byte semantics (in oppose to unicode semantics)
only for the existing functions? For example, the Oracle 8 functions can be
configured to use utf-8 for the character encoding of strings. In order for
them to work properly, fundamental functions, which Oracle 8 function call,
have to behave in byte samentics. And if they work properly when the unicode
semantics switch is turned on, by setting the runtime_encoding to utf-8,
they can be called by uncode applications.


Makoto

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to