Re: [PHP-DEV] upgrading the zlib extension to unicode

Andrei Zmievski Tue, 25 Jul 2006 16:49:17 -0700

On Jul 25, 2006, at 3:03 PM, Nuno Lopes wrote:

Hello,
So Andrei asked me to upgrade the zlib extension, but I have a fewquestions I would like to discuss with you:* when receiving an unicode string, what should we do? compresswith as-is, prepend a BOM header (and skip it whileuncompressing)? (now I'm unsure if PHP/ICU uses utf16 in themachine endianess or not)

It does use UTF-16 in machine specific endian format. I think youhave several approaches you can take when compressing a Unicode string:


1. Compress as-is
2. Convert the string to big endian, for example, and compress
3. Convert to UTF-8 and then compress

The problem with #2 and #3 is decompression. You need to know that itwas a Unicode string and do appropriate conversion after decompressing.

* when uncompressing, check for a BOM header and return a unicodestring if it is present? return always a binary string?

BOM header is not present in internal UTF-16 strings. It is onlypresent if you convert them to UTF-16BE or UTF-16LE.


-Andrei

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Re: [PHP-DEV] upgrading the zlib extension to unicode

Reply via email to