I agree with all of this. Anyone want to update README.UNICODE to
reflect this change?
-Andrei
On Aug 16, 2005, at 4:52 PM, Andi Gutmans wrote:
I think we should make the following assumptions:
a) Being able to create and manipulate IS_UNICODE zvals when
unicode_semantics=off will be very useful to people including the
exposing of the ICU extension.
b) Defining Unicode identifiers like classes/properties/functions
if unicode_semantics=off does not seem useful and should be
prohibited.
c) People can always find ways of misusing the language & apis to
reach a state which they shouldn't be reaching, For example,
assuming (a) & (b) using create_function to misuse the engine and
create a Unicode function name when Unicode=off.
I don't believe we can or should enforce every possibility of
misuse or we'll bloat the code and will never reach perfection.
That said, we probably can enforce the obvious places where people
try to define unicode classes/functions/properties when
unicode_semantics=off.
btw, I'm only referring to identifiers. If unicode=off then i
believe things like arrays should support IS_UNCODE keys/values in
addition to IS_STRING for reasons as in (a). As per original design
those two wouldn't match though as they would when we're in full
blown unicode mode.
Dmitry, do you thing that not allowing unicode identifiers when
unicode=off would be hard to accomplish? it would make life easier
when it comes to code that sparked this discussion (and maybe
harder in other cases).
Due to (c) I'm king of worried of trying to simplify the model and
we might just need to provide eaier to use apis to extension
writers which would save them effort in checking the different
options. A ggood API is key in making sure that we get a consistent
implementation and upgrade of php functions.
Andi
At 03:13 PM 8/16/2005 -0700, Andrei Zmievski wrote:
It does make the engine more complicated though, because we can't
just check for UG(unicode) and expect all identifiers to be of the
same type. We would actually need to amend a lot of API functions
to include passing the identifier type along, e.g.
zend_get_active_function() would need to return the identifier
type along with the identifier itself.
-Andrei
On Aug 16, 2005, at 1:36 PM, Andi Gutmans wrote:
IIRC if unicode_semnantics=on, we agreed to use Unicode for array
offsets and properties (and do auto-conversion). however, if
unicode = off, we should not do auto conversion but allow php
users to manually create unicode data. when it comes to arrays we
agreed that in this case they can use strings and unicode as they
wish (makes sense for apps that can't make the complete move but
can unicode-enable some of the app, for example, a web service).
so bottom line, i dont think we can expect class name and
property to be in the same encoding unless we hard code it, but i
like the flexibility of being able to use unicode strings when
unicode_semantics is off....
(this took me far too long to write :)
--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php