Re: [PHP-DEV] RFC: Implementing a core anti-XSS escaping class

Anthony Ferrara Tue, 18 Sep 2012 12:13:10 -0700

Bryan,

On Tue, Sep 18, 2012 at 2:52 PM, Bryan C. Geraghty <br...@ravensight.org>wrote:


> Antony,****
>
> ** **
>
> I’ll concede that the term “escaping” is improperly used in many places;
> even in the OWASP documentation.****
>
> ** **
>
> But I’ll point out that the CWE document is identifying a distinction in
> the two terms by saying,  “This overlapping usage extends to the Web,
> such as the "escape" JavaScript function whose purpose is stated to be
> encoding”.
>

There is a distinction between them. But in this case it's not particularly
relevant (as both work quite fine). I'll elaborate further in a second.


> But when you say, “With the end result being the exact same...”, I don’t
> think you’ve thought it through. I’ve read some of your stuff and I’m
> pretty confident that you understand the benefits of white-listing over
> black-listing. For the uninitiated, yes, a black-list can be configured to
> produce the same results at a given point-in-time, but the fundamental
> approach is different.  A white-list operates on an explicit specification
> and lets nothing else through. A black-list assumes that the input data is
> mostly correct and it filters out the bad. To add to that, how do you
> convert from ISO-8859-1 to UTF-8 with a black-list or by escaping?
>

You hit the nail on the head here. You cannot black-list convert ISO-8859-1
to UTF-8. However, when we talk about escaping, we're talking about a
context where the encoding is already correct, we're just preventing
special characters from imparting special meaning. In that case, escaping
is the correct way of handling it.

But if you wanted to output arbitrary input into a UTF-8 document, you
would also need to ensure that it's encoded properly into UTF-8. So I can
see your distinction applying to that case. But from a different angle.

Escaping preserves the security context. Encoding preserves the semantic
context. You could escape away all invalid UTF-8 bytes, but you'd loose the
meaning of the original character set. So semantically, encoding is
necessary. But from a security perspective, the encoding doesn't really
matter much. What matters is the security context (not injecting harmful
code, etc).

Now, both can be handled by the same routine. But that's not necessary to
preserve the security aspect. And that's why I objected to using the term
"encoding" here. If we want to go that route, that's fine. But you don't
need to encode for security. Escaping will handle that (possibly at the
expense of invalid semantic meaning).


> Your reference to mysql_real_escape_string is exactly the point I’m trying
> to make. The use of that function is “discouraged” because it DID escape;
> it looked for specific bad characters. It was fundamentally flawed. And
> that is the functionality PHP developers, as you just demonstrated, will
> refer to. The current recommendation is to use a library that properly
> encodes the entire data stream.
>

How is mres fundamentally flawed? And how is it discouraged? It's actually
listed as a valid defense by OWASP:
https://www.owasp.org/index.php/SQL_Injection_Prevention_Cheat_Sheet#Defense_Option_3:_Escaping_All_User_Supplied_Input

The only 2 ways of securely getting data to MySQL is either by escaping, or
binding as a parameter on a prepared statement. Neither of which encodes a
data stream (the PS uses a binary format that puts the data in plain binary
form, as is, with a header to identify length).

Black listing works fine for a specified format (like XML, like HTML, like
SQL, like JavaScript). Where you get in trouble with black lists is when
your data format isn't specified (hence edge-cases aren't well known) or
when you're not serializing to a format (generic input black lists). But
for escaping output, black lists are a very well known, well understood,
and easily implemented approach.


> I’ll also agree that consistency with the industry is not as important
> because there seem to be plenty of misuses. However, I do think that we
> should use terminology that sets the functionality apart. So, given the
> operating mode difference and the precedent set by mysql_escape_string,
> mysql_real_escape_string, etc., I think “encode” is the way to go.
>

I think it strongly depends upon the exact behavior of the library. If we
do wind up doing transcoding as well as escaping, then that may be valid.
If we don't, then it wouldn't.

But I think we can both agree on the need...

Anthony

Re: [PHP-DEV] RFC: Implementing a core anti-XSS escaping class

Reply via email to