David Zülke wrote:
Am 21.08.2008 um 18:08 schrieb Rasmus Lerdorf:

David Zülke wrote:
Am 21.08.2008 um 03:34 schrieb William A. Rowe, Jr.:

Stanislav Malyshev wrote:
Hi!
Are there any objections to incorporating bugfix for #43941 (fix for
how json handles invalid UTF-8 sequences) into 5.2? I had some
requests about it, right now it's only in 5.3+.

Is there the alternative of substituting an unmappable character
FFFD in
place of the invalid sequence? This a a reasonable alternative behavior
for some less stringent cases.

(Yes, the fix is better than the status quo, but just taking this a
step
further).

I agree, that would be quite reasonable and also more consistent with
how UTF-8 works in other apps (browsers etc).

Well, using browsers as the benchmark here is a bad idea. IE is
absolutely braindead about dealing with illegal UTF-8 chars. It will
accept just about any sequence of bytes as a valid UTF-8 char which
causes all sorts of problems.

I was talking about the common representation of an invalid sequence.
That's the question mark sign you usually see in a browser when the
encoding is incorrect.

Yes, but it all comes down to how you do it. Say you have a 3 byte sequence that starts with 0xE0 (E0 indicates the start of a 3-byte utf-8 char) but the 3 bytes together don't actually make up a valid utf-8 char. Id you substitute those 3 bytes with a ? or some other character you have just created a nasty XSS vector for web apps.

And yes, that is exactly what IE does and it has caused us no end of headaches over the years.

-Rasmus

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to