On 08/24/2012 02:23 PM, Ángel González wrote:
> El 23/08/12 18:06, Rasmus Lerdorf escribió:
>> htmlspecialchars(), htmlentities(), html_entity_decode() and
>> get_html_translation_table() all take an encoding parameter that used to
>> default to iso-8859-1. We changed the default in PHP 5.4 to UTF-8. This
>> is a much more sensible default and in the case of the encoding
>> functions more secure as it prevents invalid UTF-8 from getting through.
>> If you use 8859-1 as the default but your app is actually in UTF-8 or
>> worse, some encoding that isn't low-ascii compatible then
>> htmlspecialchars()/htmlentities() aren't doing what you think they are
>> and you have a glaring security hole in your app.
> I don't see how passing utf-8 as latin1 gets into a security hole. The
> characters
> you want to replace (&'"<>) in utf-8 are the same as in latin1, and it
> can't
> get trickied with synchronizations. If it was passing latin1 to a function
> expecting utf-8 or "some encoding that isn't low-ascii compatible" then
> I see the hole, but not here.

In 8859-1 no chars are invalid so anything that doesn't get encoded will
get passed through as-is. For example the byte 0xE0 is a perfectly valid
8859-1 character (à), but if the page is actually UTF-8 then this
becomes the first byte of a 3-byte UTF-8 character. IE is famous for
having a really weak Unicode parser and at least IE6/7 would see the
0xE0 and combine it with the next 2 bytes to form the UTF-8 char.

So, if you had code like this:

$str = htmlspecialchars($str);  // Assuming iso-8859-1
echo '<a href="'.$str.'">';

You now have a problem because if the last byte of $str was character
0xE0 now IE will swallow the closing " and > characters in your output
leaving you in a very weird state. IE still thinks you are inside an
attribute in the <a> tag, but you think you are outside in regular HTML
mode and whatever you output next will now be filtered with the wrong
context and you have a potential XSS.

When htmlspecialchars() is in UTF-8 mode it will not allow invalid UTF-8
byte sequences through and you are safe from this particular problem.

-Rasmus

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to