They are not documented and I am testing configurations that might
break
scripts. If I test things and want to make code portable,
configuration is
not supposed to be rational. I can set option with ini_set(), if I
understand what option does and it fixes the issue.
http://www.php.net/unicode
Do you have updated documentation version which explains encoding
settings
and lists available configuration values? Or am I testing PHP6 too
early
and you are still months or years away from 6.0.0 betas and rcs?
Could you
implement pseudo encoding similar to 'pass' encoding used in mbstring?
Current implementation does not give controls needed by script
writers.
Have you looked at any of the talks I've given on this topic?
http://www.gravitonic.com/talks
That's the closest thing to documentation you'll find right now.
Unfortunately, documentation always lags behind the actual development.
SquirrelMail scripts are not written in unicode. They are in ascii. If
some 8bit value is used, it is always written in octal or hex
notation.
These hex values are not written in one character set. In some cases
scripts use byte values. For example, locating first utf-8 byte or
looking
for 0x80-0xFF bytes in string. In other cases they are written in
source
or target character set. For example, iso-8859-2 decoding function
contains array with iso-8859-2 hex values mapped to html codes.
Code can't
use raw 8bit strings, because they might be corrupted in misconfigured
editor used by developer and it is very hard to track such corruption.
8bit data can come only from user input (composed emails and
preferences,
html forms, one common charset) and imap server (received emails,
lots of
different charsets and encodings).
Maybe you don't need to turn unicode.semantics=on, if you are working
only with 8-bit strings.
-Andrei
--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php