Hello all.
I'm currently working on unicode support in serialize()/unserialize() and stuck
with some issues.
Here they are:
1) What to do with unserializing serialized unicode strings when
unicode_semantics is Off?
I presume it's safe to create & return IS_UNICODE in this case ?
2) Classnames are serialized without U: or s: prefix, but I can detect unicode string by
it's leading "\".
It's looks kinda tricky, but on the other hand forward slash can't appear there
if it's not unicode.
Or should I change it to use U:/s: prefixes? (Didn't try it yet, so I can't say
how difficult it would be).
The other problem here is that we can't use unicode class names when
unicode_semantics is Off because in this case class_table stores them as
IS_STRING and we won't be able to find class entry by it's unicode name (thanks
to Val for noticing this).
3) Currently serialize() produces valid \u0000 sequences, which can be
parsed/restored perfectly fine when reading them from a file or returning from
serialize().
But specifying them as a const string won't work as these sequences get parsed
in compile time.
Short example:
<?php
var_dump(unserialize('U:2:"\u0061\u0061";')); // won't work
var_dump(unserialize(serialize("aa"))); // works
var_dump('U:2:"\u0061\u0061";'); //produces unicode(9) "U:2:"aa";"
?>
IMO the best way here is to change serialize() output to produce something else
(for example \pu0000 instead of \u0000) - in this case it works just fine.
Comments?
--
Wbr,
Antony Dovgal
--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php