On Jul 25, 2006, at 3:03 PM, Nuno Lopes wrote:

Hello,

So Andrei asked me to upgrade the zlib extension, but I have a few questions I would like to discuss with you: * when receiving an unicode string, what should we do? compress with as-is, prepend a BOM header (and skip it while uncompressing)? (now I'm unsure if PHP/ICU uses utf16 in the machine endianess or not)

It does use UTF-16 in machine specific endian format. I think you have several approaches you can take when compressing a Unicode string:

1. Compress as-is
2. Convert the string to big endian, for example, and compress
3. Convert to UTF-8 and then compress

The problem with #2 and #3 is decompression. You need to know that it was a Unicode string and do appropriate conversion after decompressing.

* when uncompressing, check for a BOM header and return a unicode string if it is present? return always a binary string?

BOM header is not present in internal UTF-16 strings. It is only present if you convert them to UTF-16BE or UTF-16LE.

-Andrei

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to