Bear with me, I don't know about how PHP uses the serialized info, but if
your goal is to minimize the overhead for unicode data when
unicode_semantics is off, or simply generally, and this is for a transfer
file format, or storage format, then you might consider using utf-8. Since
the majority of the info is ascii, it will be efficient (ascii in utf-8 is
ascii).
For other unicode data it will expand as needed.
For file formats it is handy as many editors will read utf-8 (if the rest of
the data is put into test format).
You can continue to use this when unicode_semantics is on, so it is one
format for all modes.

Conversion between utf-8 and utf-16 is very fast and likely wont be noticed
given the other tasks of I/O and packing or unpacking the remaining data.

Also, utf-8 doesn’t have any endian issues, as utf-16 does.

hth

Tex Texin
Internationalization Architect,   Yahoo! Inc.
 
 


> -----Original Message-----
> From: Pierre Joye [mailto:[EMAIL PROTECTED] 
> Sent: Tuesday, September 13, 2005 3:44 AM
> To: Antony Dovgal
> Cc: Derick Rethans; val khokhlov; internals@lists.php.net
> Subject: Re: [PHP-DEV] unserialize() & unicode issues
> 
> 
> On 9/13/05, Antony Dovgal <[EMAIL PROTECTED]> wrote:
> 
> > Yes, in this case there is no way to avoid converting (while doing 
> > unserialize()), but I don't see any point in creating 
> Unicode strings 
> > when serializing with unicode_semantics is Off.
> 
> If I use serialized data on different hosts with different 
> php, I can see a need of having unicode strings in serialize 
> even if unicode_semantics is off.
> 
> --Pierre
> 
> -- 
> PHP Internals - PHP Runtime Development Mailing List
> To unsubscribe, visit: http://www.php.net/unsub.php
> 
> 
> 

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to