On Fri, June 29, 2007 1:21 am, Tomas Kuliavas wrote: >> If unicode semantics are "on" what exactly is borked in PHP 5? > > In Unicode mode \[0-7]{1,3} and \x[0-9A-Fa-f]{1,2} refer to unicode > code > points and not to octal or hexadecimal byte values. Fix is not > backwards > compatible.
Gak. You mean this will break: <?php $mask = 0xf0; $value = $_POST['foo'] & $mask; ?> because of Unicode? That's nuts. That can't be right... > Scripts can't match bytes. How they are supposed to check if string is > in > plain ascii or in 8bit? Do conversion to ASCII and check for errors > instead of looking for 8bit byte values? How can scripts replace 8bit > bytes with some other strings? ISO-8859-2 decoding table contains 95 > entries written and evaluated as binary strings. Same thing applies to > other iso-8859 and windows-125x character sets. iso-89859-1 and utf-8 > decoding does not use mapping tables and performs complex calculations > with byte values. multibyte character set decoding might actually > benefit > from unicode_encode(), if Table 325 (http://www.php.net/unicode) > provides > more information about U_INVALID_SUBSTITUTE and other unicode. > settings. I don't even understand this. But if I haven't done something new-fangled to make a string be some new-fangled Unicode thingie, then it's just plain old ASCII, no? Or PHP can just assume that anyway... > PHP6 does not provide backwards compatible functions to work with > bytes. > Provided constructs are not backwards compatible. If scripts want to > do > MIME Q encoding, they must work with bytes. Doing Q encoding with > provided > PHP extensions adds extra dependencies. Another one I don't understand... But since I believe MIME emails are a blight on the universe, I suspect I just don't care either. :-) > ICU does not support HTML target. Text conversion to iso-8859-x or > windows-125x targets will be lossy. Well, yeah, if you down-sample UTF-* to a character set that doesn't have the characters you typed in UTF-*, then those characters won't make it through the translation. Output your HTML in UTF-* or accept the loss. >> Can that be fixed to be BC without resorting to this toggle? > > Unicode and binary typecasting causes E_PARSE error in PHP 5.2.0 and > older. That's fine. PHP 6 code that uses new PHP 6 features needs PHP 6. If that surprises somebody, they have a fundamental misunderstanding of major release version. > PHP6 could introduce new Unicode aware functions, but Unicode > implementation choose to modify existing ones. All low level string > operations ($string[1]) are Unicode aware by default and not when > script > actually asks for it. Such implementation is designed for developers, > who > don't care about Unicode support and want it out of the box without > any > changes in their Unicode unaware scripts. It is not designed for > developers that actually need it and want to have code working in PHP6 > and > PHP4/5. But an old script ought to just work... > Unicode code points can be defined with \u, but PHP6 breaks existing > octal > and hex escape sequences. If you're saying what I think you're saying, that's just daft... Nobody [*] will switch to PHP 6 if I am interpreting these statements correctly... * Nobody == even a slower adoption rate than the glacial PHP 5. > PHP6 is very noisy ("Notice: fwrite(): 13 character unicode buffer > downcoded for binary stream runtime_encoding", "Warning: > base64_encode() > expects parameter 1 to be strictly a binary string, Unicode string > given") > about data stream and string operations. even when fwrite() or > base64_encode() works only with plain ascii data. PHP script > developers > are not used to strict variable type checks in string functions. Which > functions are modified to require binary typecasting? Do I have to > make a > list myself every time some function freaks out? Hopefully these are going away as the Unicode stuff is finished?... -- Some people have a "gift" link here. Know what I want? I want you to buy a CD from some indie artist. http://cdbaby.com/browse/from/lynch Yeah, I get a buck. So? -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php