On Fri, June 29, 2007 1:21 am, Tomas Kuliavas wrote:
>> If unicode semantics are "on" what exactly is borked in PHP 5?
>
> In Unicode mode \[0-7]{1,3} and \x[0-9A-Fa-f]{1,2} refer to unicode
> code
> points and not to octal or hexadecimal byte values. Fix is not
> backwards
> compatible.

Gak.

You mean this will break:

<?php
  $mask = 0xf0;
  $value = $_POST['foo'] & $mask;
?>

because of Unicode?

That's nuts.

That can't be right...

> Scripts can't match bytes. How they are supposed to check if string is
> in
> plain ascii or in 8bit? Do conversion to ASCII and check for errors
> instead of looking for 8bit byte values? How can scripts replace 8bit
> bytes with some other strings? ISO-8859-2 decoding table contains 95
> entries written and evaluated as binary strings. Same thing applies to
> other iso-8859 and windows-125x character sets. iso-89859-1 and utf-8
> decoding does not use mapping tables and performs complex calculations
> with byte values. multibyte character set decoding might actually
> benefit
> from unicode_encode(), if Table 325 (http://www.php.net/unicode)
> provides
> more information about U_INVALID_SUBSTITUTE and other unicode.
> settings.

I don't even understand this.

But if I haven't done something new-fangled to make a string be some
new-fangled Unicode thingie, then it's just plain old ASCII, no?

Or PHP can just assume that anyway...

> PHP6 does not provide backwards compatible functions to work with
> bytes.
> Provided constructs are not backwards compatible. If scripts want to
> do
> MIME Q encoding, they must work with bytes. Doing Q encoding with
> provided
> PHP extensions adds extra dependencies.

Another one I don't understand...

But since I believe MIME emails are a blight on the universe, I
suspect I just don't care either. :-)

> ICU does not support HTML target. Text conversion to iso-8859-x or
> windows-125x targets will be lossy.

Well, yeah, if you down-sample UTF-* to a character set that doesn't
have the characters you typed in UTF-*, then those characters won't
make it through the translation.

Output your HTML in UTF-* or accept the loss.

>> Can that be fixed to be BC without resorting to this toggle?
>
> Unicode and binary typecasting causes E_PARSE error in PHP 5.2.0 and
> older.

That's fine.

PHP 6 code that uses new PHP 6 features needs PHP 6.

If that surprises somebody, they have a fundamental misunderstanding
of major release version.

> PHP6 could introduce new Unicode aware functions, but Unicode
> implementation choose to modify existing ones. All low level string
> operations ($string[1]) are Unicode aware by default and not when
> script
> actually asks for it. Such implementation is designed for developers,
> who
> don't care about Unicode support and want it out of the box without
> any
> changes in their Unicode unaware scripts. It is not designed for
> developers that actually need it and want to have code working in PHP6
> and
> PHP4/5.

But an old script ought to just work...

> Unicode code points can be defined with \u, but PHP6 breaks existing
> octal
> and hex escape sequences.

If you're saying what I think you're saying, that's just daft...

Nobody [*] will switch to PHP 6 if I am interpreting these statements
correctly...

* Nobody == even a slower adoption rate than the glacial PHP 5.

> PHP6 is very noisy ("Notice: fwrite(): 13 character unicode buffer
> downcoded for binary stream runtime_encoding", "Warning:
> base64_encode()
> expects parameter 1 to be strictly a binary string, Unicode string
> given")
> about data stream and string operations. even when fwrite() or
> base64_encode() works only with plain ascii data. PHP script
> developers
> are not used to strict variable type checks in string functions. Which
> functions are modified to require binary typecasting? Do I have to
> make a
> list myself every time some function freaks out?

Hopefully these are going away as the Unicode stuff is finished?...

-- 
Some people have a "gift" link here.
Know what I want?
I want you to buy a CD from some indie artist.
http://cdbaby.com/browse/from/lynch
Yeah, I get a buck. So?

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to