Re: [PHP-DEV] RFC: Implementing a core anti-XSS escaping class

Anthony Ferrara Wed, 19 Sep 2012 21:21:57 -0700

Bryan,

“You hit the nail on the head here. You cannot black-list convert
> ISO-8859-1 to UTF-8. However, when we talk about escaping, we're talking
> about a context where the encoding is already correct, we're just
> preventing special characters from imparting special meaning. In that case,
> escaping is the correct way of handling it.”****
>
> ** **
>
> We can never safely assume that the encoding is correct. If the encoding
> of the original data is different than the assumed encoding, characters
> with “special meaning” may have different values and will be allowed
> through. For a simple proof-of-concept, see
> http://shiflett.org/blog/2005/dec/google-xss-example.  Now, that is a
> specific exploit for an underlying vulnerability. The vulnerability is the
> fact that htmlentities() doesn’t decode the input before trying to escape
> characters.
>


Actually, in my mind, that's the role of filtering. You should filter the
proper charset. Everything inside of the application should have a
consistent character set. And if that's the case, these sorts of
vulnerabilities (not to mention a whole host of possible bugs) are no
longer possible...


> What I’m trying to convey is that all context relevant to the operation
> matters. In this case, if characters are compared/replaced at the
> byte-level, we need to decode to the byte-level before performing those
> operations. To take that further, It’s important for everyone to realize
> that encoding doesn’t just apply to character sets; data is encoded for a
> specific layer. This is the same problem that the TCP and ISO layers solved
> decades ago; we’re just adding layers above the application layer. You
> wouldn’t expect an HTML parser to be able to parse JavaScript because they
> are different encodings. If you wanted to translate an HTML implementation
> cleanly to a JavaScript implementation, you would have to decode the HTML
> and then build a translator to build the same DOM elements in JavaScript. I
> know that’s sort of a blurry line, but I need to wrap this up. Hopefully,
> I’ve conveyed the idea.****
>
> ** **
>
> The sooner we all grasp this concept of encoding layers, the sooner this
> problem of injection/scripting at every layer goes away. The solution:
> Decode all inputs, halt execution on decoding errors,  and then re-encode
> them. Yes, this is going to add overhead. But where security is concerned,
> we have to be willing to accept some overhead.
>

Again, that's the role of filtering. Inputs should never get to a
presentation layer unfiltered. That's a bigger problem that needs to be
addressed first. But I would concede that it's worth doing again at output
to catch any issues. But those issues it catches should be seen as
application bugs and not a caught attack vector...


> Okay, with that out of the way, I’ll reiterate my agreement with your
> statement, “I think it strongly depends upon the exact behavior of the
> library. If we do wind up doing transcoding as well as escaping, then that
> may be valid. If we don't, then it wouldn't.“****
>
> ** **
>
> If the aim of this API is to really tackle the problem, we need to go
> beyond wrapping htmlentities() and htmlspecialchars() and change the names
> to “encode”. If it’s just to maintain the status quo and leave it to
> developers who barely understand encoding or escaping to ensure that their
> entire stack is using the same encoding, then we should leave the name
> as-is.
>

Just wrapping any library is often not a good idea. We'd need to add
meaningful logic in addition to the namespace name change. So yes, I'm in
favor of doing it right at that point...


> The official PHP documentation discourages the use of
> mysql_real_escape_string:
> http://php.net/manual/en/function.mysql-real-escape-string.php. The
> recommendation is to use a library that is character-set aware, like mysqli
> or PDO. But note that even using mysqli_real_escape_string or PDO:quote
> requires you to manually set the connection-level character-set. I’ve been
> operating on the assumption (there I go assuming) that PDO prepared
> statements were aware of the connection-level character set and mitigated
> this problem; however, I just reviewed PDO’s source code and I’m starting
> to question its implementation. As for your OWASP reference, keep in mind
> that OWASP makes many tiers of recommendations. Notice that manually
> escaping is the last option for mitigating injection problems.
>

In short, that's wrong (MRES is encouraged). But I've taken the reply
off-list as it's off topic here.


> In any case, I’m not here to carry on an endless flame war. I just want to
> make sure that we’re doing what’s necessary to mitigate the number one
> vulnerability in web applications.
>

I don't think this discussion is a flame war. I think it's a very good and
constructive point that needs to be made. It's at least a whole lot more
important and relevant than the last 40 posts on OOP vs Procedural names...

Anthony

Re: [PHP-DEV] RFC: Implementing a core anti-XSS escaping class

Reply via email to