On Jul 25, 2006, at 3:03 PM, Nuno Lopes wrote:
Hello,
So Andrei asked me to upgrade the zlib extension, but I have a few
questions I would like to discuss with you:
* when receiving an unicode string, what should we do? compress
with as-is, prepend a BOM header (and skip it while
uncompressing)? (now I'm unsure if PHP/ICU uses utf16 in the
machine endianess or not)
It does use UTF-16 in machine specific endian format. I think you
have several approaches you can take when compressing a Unicode string:
1. Compress as-is
2. Convert the string to big endian, for example, and compress
3. Convert to UTF-8 and then compress
The problem with #2 and #3 is decompression. You need to know that it
was a Unicode string and do appropriate conversion after decompressing.
* when uncompressing, check for a BOM header and return a unicode
string if it is present? return always a binary string?
BOM header is not present in internal UTF-16 strings. It is only
present if you convert them to UTF-16BE or UTF-16LE.
-Andrei
--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php