They are not documented and I am testing configurations that might break scripts. If I test things and want to make code portable, configuration is
not supposed to be rational. I can set option with ini_set(), if I
understand what option does and it fixes the issue.

http://www.php.net/unicode

Do you have updated documentation version which explains encoding settings and lists available configuration values? Or am I testing PHP6 too early and you are still months or years away from 6.0.0 betas and rcs? Could you
implement pseudo encoding similar to 'pass' encoding used in mbstring?
Current implementation does not give controls needed by script writers.

Have you looked at any of the talks I've given on this topic?

http://www.gravitonic.com/talks

That's the closest thing to documentation you'll find right now. Unfortunately, documentation always lags behind the actual development.

SquirrelMail scripts are not written in unicode. They are in ascii. If
some 8bit value is used, it is always written in octal or hex notation.
These hex values are not written in one character set. In some cases
scripts use byte values. For example, locating first utf-8 byte or looking for 0x80-0xFF bytes in string. In other cases they are written in source
or target character set. For example, iso-8859-2 decoding function
contains array with iso-8859-2 hex values mapped to html codes. Code can't
use raw 8bit strings, because they might be corrupted in misconfigured
editor used by developer and it is very hard to track such corruption.
8bit data can come only from user input (composed emails and preferences, html forms, one common charset) and imap server (received emails, lots of
different charsets and encodings).

Maybe you don't need to turn unicode.semantics=on, if you are working only with 8-bit strings.

-Andrei

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to