>> Should really theses functions discard the whole string for a single >> incomplete sequence ? > > I think since it is not possible to recover true content of the string, > it is ok to return failure value. Cutting it in random places or > ignoring problems doesn't seem a good idea - it might lead to all kinds > of nasty things, such as security filtering checking one data and > database getting entirely different data.
Instead of using simple sanitizing function users are forced to check for errors. How good is that? It makes code complex or unreliable. htmlspecialchars() and htmlentities() are not used to sanitize database data. What kind of errors you expect in htmlspecialchars()? I think supported charsets don't have alternative symbols in 0x22, 0x26, 0x27, 0x3C, 0x3E. Only CJK charsets and htmlentities might have issues. With any other charset you know start and end byte of symbol. If you think that broken utf-8 can cause issues, strip or sanitize broken symbols. If users detect error in htmlspecialchars(), they will use str_replace() in order to provide some failsafe instead of losing whole text and it won't solve security issues. -- Tomas -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php