Re: [PHP-DEV] bug #43941

William A. Rowe, Jr. Thu, 21 Aug 2008 16:22:04 -0700

David Zülke wrote:

Am 21.08.2008 um 18:50 schrieb Rasmus Lerdorf:

David Zülke wrote:

Am 21.08.2008 um 18:41 schrieb Rasmus Lerdorf:

David Zülke wrote:

Am 21.08.2008 um 18:08 schrieb Rasmus Lerdorf:

David Zülke wrote:

Am 21.08.2008 um 03:34 schrieb William A. Rowe, Jr.:

Stanislav Malyshev wrote:

Hi!
Are there any objections to incorporating bugfix for #43941(fix for
how json handles invalid UTF-8 sequences) into 5.2? I had some
requests about it, right now it's only in 5.3+.


Is there the alternative of substituting an unmappable character
FFFD in
place of the invalid sequence? This a a reasonable alternative
behavior
for some less stringent cases.

(Yes, the fix is better than the status quo, but just taking this a
step
further).

I agree, that would be quite reasonable and also more consistentwith

how UTF-8 works in other apps (browsers etc).


Well, using browsers as the benchmark here is a bad idea. IE is
absolutely braindead about dealing with illegal UTF-8 chars. It will
accept just about any sequence of bytes as a valid UTF-8 char which
causes all sorts of problems.


I was talking about the common representation of an invalid sequence.
That's the question mark sign you usually see in a browser when the
encoding is incorrect.


Yes, but it all comes down to how you do it. Say you have a 3 byte
sequence that starts with 0xE0 (E0 indicates the start of a 3-byte
utf-8 char) but the 3 bytes together don't actually make up a valid
utf-8 char. Id you substitute those 3 bytes with a ? or some other
character you have just created a nasty XSS vector for web apps.


You don't substitute it with "a ? or some other character", you replace
it with U+FFFD (0xEF 0xBF 0xBD in UTF-8). I'd love to hear how that
causes an attack vector.


It doesn't matter what you replace it with.  If the byte sequence is:

0xE0 " >

And you replace those bytes with some other byte in this sort of context:

<input type=text name=foo value="0xE0">
<input type=text name=bar value="$data">

Now do your silly replacement:

<input type=text name=foo value="0xEF 0xBF 0xBD
<input type=text name=bar value="$data">

That now means that IE interprets the value attribute of the fooelement as: value="0xEF 0xBF 0xBD <input type=text name=bar value="And now $data is suddenly outside the quoted value attribute! Oops!Major XSS. Google Groups and Yahoo were both hit by this last year.

Interesting. I assume that was a weakness in the respectiveimplementation, right? Since


0xE0 " >

should never be regarded a valid sequence since neither " nor > are inthe range above 0x7F...


This is (obviously) given to multiple intepretations.

But when I suggested the feature, I mentioned for "less stringent apps".
Rasmus' case, the URL, should be more stringent and reject those which
contained wholly invalid utf-8 sequences, for short sequences, overlong
sequences and outright unmappable bytes.


--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Re: [PHP-DEV] bug #43941

Reply via email to