Pierre Joye wrote:
> On Tue, Jun 21, 2011 at 1:33 PM, Lester Caine<les...@lsces.co.uk>  wrote:
>> Pierre Joye wrote:
>>>>
>>>> It depended on ICU there, and I would be against making a core thing in
>>>>>   PHP 5.x depend on ICU.
>>>
>>> It can and should be done as part of intl, actually.
>>>
>>> But that's somehow unrelated to the proposal here, as it is about
>>> byte, not characters :)
>>
>> I believe this may be where some of the new niggles may be coming from? With
>> browsers returning unicode, it may be that some of the 'extra' characters
>> are being returned as multibyte rather than as single bytes? Such as the
>> problem reported on the general list currently. How do we ensure that we are
>> dealing with single byte character strings nowadays?
>
> As it has been stated numerous times in this thread and other, we do
> not do anything with multi bytes systems, unicode, etc. mbstring and
> intl do, but php's string as of now is all about bytes, array of bytes
> if I may describe them this way.
>
> And we can't change this behavior.

This mindset is fundamentally broken. You can call it a byte array all you 
want, but the truth is that 99.999% of the time, when a developer is using a 
string they need it for characters, not for bytes, and characters are not 
single byte. Even English users tend to submit Unicode range characters at an 
alarming rate. If you're using a WYSIWYG editor, Chrome will submit 
non-breaking-spaces as the actual UTF8 encoded character, not as an HTML 
encoded entity. Whether developers like it, or even know it, supporting an 
extended universal character set is not really optional.

PHP makes this bad enough with the whole collection of bytewise string 
functions, including many with no appropriate multibyte aware replacement, but 
at least this can be avoided, quickly audited, and in the future can even be 
fixed in any number of ways with only a nominal BC impact. Hard coding this 
single byte idiocy into a language construct (foreach) though would be an 
incredibly awful idea. This would create a trap for new naive PHP developers, 
and create a character set problem that the language could NEVER recover from 
without a massive BC break.

This proposal is really about adding a feature which whenever it used is almost 
guaranteed to be an error. It probably won't look to the developer like an 
error during simple testing, but will almost certainly show up as an error in 
production. Is it really worth all that for a bit of syntax sugar that the 
developer will have to strip out anyway to fix their bug?

If string iteration needs to be addressed in the core (and IMO it doesn't 
because it can be handled at the script level, but if it does) why not use 
iterator classes? This gives the same functionality and prevents the language 
from encouraging hidden bugs.

John Crenshaw
Priacta, Inc.

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to