Bear with me, I don't know about how PHP uses the serialized info, but if your goal is to minimize the overhead for unicode data when unicode_semantics is off, or simply generally, and this is for a transfer file format, or storage format, then you might consider using utf-8. Since the majority of the info is ascii, it will be efficient (ascii in utf-8 is ascii). For other unicode data it will expand as needed. For file formats it is handy as many editors will read utf-8 (if the rest of the data is put into test format). You can continue to use this when unicode_semantics is on, so it is one format for all modes.
Conversion between utf-8 and utf-16 is very fast and likely wont be noticed given the other tasks of I/O and packing or unpacking the remaining data. Also, utf-8 doesnt have any endian issues, as utf-16 does. hth Tex Texin Internationalization Architect, Yahoo! Inc. > -----Original Message----- > From: Pierre Joye [mailto:[EMAIL PROTECTED] > Sent: Tuesday, September 13, 2005 3:44 AM > To: Antony Dovgal > Cc: Derick Rethans; val khokhlov; internals@lists.php.net > Subject: Re: [PHP-DEV] unserialize() & unicode issues > > > On 9/13/05, Antony Dovgal <[EMAIL PROTECTED]> wrote: > > > Yes, in this case there is no way to avoid converting (while doing > > unserialize()), but I don't see any point in creating > Unicode strings > > when serializing with unicode_semantics is Off. > > If I use serialized data on different hosts with different > php, I can see a need of having unicode strings in serialize > even if unicode_semantics is off. > > --Pierre > > -- > PHP Internals - PHP Runtime Development Mailing List > To unsubscribe, visit: http://www.php.net/unsub.php > > > -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php