I think we should make the following assumptions:
a) Being able to create and manipulate IS_UNICODE zvals when
unicode_semantics=off will be very useful to people including the exposing
of the ICU extension.
b) Defining Unicode identifiers like classes/properties/functions if
unicode_semantics=off does not seem useful and should be prohibited.
c) People can always find ways of misusing the language & apis to reach a
state which they shouldn't be reaching, For example, assuming (a) & (b)
using create_function to misuse the engine and create a Unicode function
name when Unicode=off.
I don't believe we can or should enforce every possibility of misuse or
we'll bloat the code and will never reach perfection. That said, we
probably can enforce the obvious places where people try to define unicode
classes/functions/properties when unicode_semantics=off.
btw, I'm only referring to identifiers. If unicode=off then i believe
things like arrays should support IS_UNCODE keys/values in addition to
IS_STRING for reasons as in (a). As per original design those two wouldn't
match though as they would when we're in full blown unicode mode.
Dmitry, do you thing that not allowing unicode identifiers when unicode=off
would be hard to accomplish? it would make life easier when it comes to
code that sparked this discussion (and maybe harder in other cases).
Due to (c) I'm king of worried of trying to simplify the model and we might
just need to provide eaier to use apis to extension writers which would
save them effort in checking the different options. A ggood API is key in
making sure that we get a consistent implementation and upgrade of php
functions.
Andi
At 03:13 PM 8/16/2005 -0700, Andrei Zmievski wrote:
It does make the engine more complicated though, because we can't just
check for UG(unicode) and expect all identifiers to be of the same type.
We would actually need to amend a lot of API functions to include passing
the identifier type along, e.g. zend_get_active_function() would need to
return the identifier type along with the identifier itself.
-Andrei
On Aug 16, 2005, at 1:36 PM, Andi Gutmans wrote:
IIRC if unicode_semnantics=on, we agreed to use Unicode for array offsets
and properties (and do auto-conversion). however, if unicode = off, we
should not do auto conversion but allow php users to manually create
unicode data. when it comes to arrays we agreed that in this case they
can use strings and unicode as they wish (makes sense for apps that can't
make the complete move but can unicode-enable some of the app, for
example, a web service).
so bottom line, i dont think we can expect class name and property to be
in the same encoding unless we hard code it, but i like the flexibility
of being able to use unicode strings when unicode_semantics is off....
(this took me far too long to write :)
--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php