On 08/24/2012 02:23 PM, Ángel González wrote: > El 23/08/12 18:06, Rasmus Lerdorf escribió: >> htmlspecialchars(), htmlentities(), html_entity_decode() and >> get_html_translation_table() all take an encoding parameter that used to >> default to iso-8859-1. We changed the default in PHP 5.4 to UTF-8. This >> is a much more sensible default and in the case of the encoding >> functions more secure as it prevents invalid UTF-8 from getting through. >> If you use 8859-1 as the default but your app is actually in UTF-8 or >> worse, some encoding that isn't low-ascii compatible then >> htmlspecialchars()/htmlentities() aren't doing what you think they are >> and you have a glaring security hole in your app. > I don't see how passing utf-8 as latin1 gets into a security hole. The > characters > you want to replace (&'"<>) in utf-8 are the same as in latin1, and it > can't > get trickied with synchronizations. If it was passing latin1 to a function > expecting utf-8 or "some encoding that isn't low-ascii compatible" then > I see the hole, but not here.
In 8859-1 no chars are invalid so anything that doesn't get encoded will get passed through as-is. For example the byte 0xE0 is a perfectly valid 8859-1 character (à), but if the page is actually UTF-8 then this becomes the first byte of a 3-byte UTF-8 character. IE is famous for having a really weak Unicode parser and at least IE6/7 would see the 0xE0 and combine it with the next 2 bytes to form the UTF-8 char. So, if you had code like this: $str = htmlspecialchars($str); // Assuming iso-8859-1 echo '<a href="'.$str.'">'; You now have a problem because if the last byte of $str was character 0xE0 now IE will swallow the closing " and > characters in your output leaving you in a very weird state. IE still thinks you are inside an attribute in the <a> tag, but you think you are outside in regular HTML mode and whatever you output next will now be filtered with the wrong context and you have a potential XSS. When htmlspecialchars() is in UTF-8 mode it will not allow invalid UTF-8 byte sequences through and you are safe from this particular problem. -Rasmus -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php