Am 21.08.2008 um 19:03 schrieb Rasmus Lerdorf:
David Zülke wrote:
Interesting. I assume that was a weakness in the respective
implementation, right? Since
0xE0 " >
should never be regarded a valid sequence since neither " nor > are
in
the range above 0x7F...
But that's what we are talking about. What to do with invalid
sequences. The E0 says that the following 2 bytes are part of the
UTF-8 character. So this is a 3-byte sequence. Together these 3
bytes are not valid, so Microsoft chose to replace those 3 with some
other character. And yes, Microsoft is notoriously bad at reading
specs, but I don't think it is completely clear what to do here, but
we do know that we shouldn't do that.
Well to me, the invalid part would be "0xE0" since it is incomplete
(0x7F and below are never part of multi-byte sequences, so they don't
count into the sequence here), so 0xE0 would be replaced by 0xEF 0xBF
0xBD, and then you don't have an XSS unless I'm mistaken :)
If Microsoft regards 0x7F and below as valid sequence members, then
that is unfortunate, but that shouldn't stop PHP from doing it
properly, as we all know better now, don't we :)
I mean what does the patch do in that case? Strip it all? Then it's
the same problem. Strip 0xE0 only? Then we could just as well insert U
+FFFD instead. No difference.
David
--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php