Antony,
I'll concede that the term "escaping" is improperly used in many places; even in the OWASP documentation. But I'll point out that the CWE document is identifying a distinction in the two terms by saying, "This overlapping usage extends to the Web, such as the "escape" JavaScript function whose purpose is stated to be encoding". But when you say, "With the end result being the exact same...", I don't think you've thought it through. I've read some of your stuff and I'm pretty confident that you understand the benefits of white-listing over black-listing. For the uninitiated, yes, a black-list can be configured to produce the same results at a given point-in-time, but the fundamental approach is different. A white-list operates on an explicit specification and lets nothing else through. A black-list assumes that the input data is mostly correct and it filters out the bad. To add to that, how do you convert from ISO-8859-1 to UTF-8 with a black-list or by escaping? Your reference to mysql_real_escape_string is exactly the point I'm trying to make. The use of that function is "discouraged" because it DID escape; it looked for specific bad characters. It was fundamentally flawed. And that is the functionality PHP developers, as you just demonstrated, will refer to. The current recommendation is to use a library that properly encodes the entire data stream. I'll also agree that consistency with the industry is not as important because there seem to be plenty of misuses. However, I do think that we should use terminology that sets the functionality apart. So, given the operating mode difference and the precedent set by mysql_escape_string, mysql_real_escape_string, etc., I think "encode" is the way to go. Thanks, Bryan From: Anthony Ferrara [mailto:ircmax...@gmail.com] Sent: Tuesday, September 18, 2012 1:09 PM To: Bryan C. Geraghty Cc: internals@lists.php.net Subject: Re: [PHP-DEV] RFC: Implementing a core anti-XSS escaping class Bryan et al, On Tue, Sep 18, 2012 at 1:58 PM, Bryan C. Geraghty <br...@ravensight.org> wrote: Hello everyone, Paddy is correct here. The purpose of this API is output ENCODING which is a very good thing. This discussion provides a very good case for a point I made via Twitter this morning: In this RFC, all uses of the term "escape" should be replaced by the term "encode". This is not solely a problem with this RFC. The term "escape" is being used developers in the industry when they mean "encoding". This is bad thing because, from a security perspective, escaping is exactly the opposite of encoding. It's a very common thing: http://cwe.mitre.org/data/definitions/116.html > The usage of the "encoding" and "escaping" terms varies widely. For example, in some programming languages, the terms are used interchangeably, while other languages provide APIs that use both terms for different tasks. This overlapping usage extends to the Web, such as the "escape" JavaScript function whose purpose is stated to be encoding. Of course, the concepts of encoding and escaping predate the Web by decades. Given such a context, it is difficult for CWE to adopt a consistent vocabulary that will not be misinterpreted by some constituency. > I think that picking one, and sticking with it is fine. No matter which is chosen... - Escaping is done by setting up a black-list and replacing those elements with an approved variant. - Encoding is done by converting all of the input data into the target format. Some bytes may end up being exactly the same but they are all processed. With the end result being the exact same... I understand why people on this list are associating the functionality defined in this RFC with filtering because the name is leading them astray. Besides the fundamental difference in the definitions of each item, the security industry is using the term "encoding"; take a look at the OWASP documentation for a quick example. The OWASP documentation uses them interchangeably. However, specifically for this task, the ESAPI is defined as a: https://www.owasp.org/index.php/XSS_(Cross_Site_Scripting)_Prevention_Cheat_ Sheet > The OWASP <https://www.owasp.org/index.php/ESAPI> ESAPI project has created an escaping library in a variety of languages including Java, PHP, Classic ASP, Cold Fusion, Python, and Haskell. > If we want developers with little application security background to be able to understand these things, we need to be consistent. In this case, I'm not sure consistency with the industry is as important (mainly because the industry is itself inconsistent). The important thing is to pick one and stick to it. I would suggest "escape" mainly because people in PHP are already familiar with it (via mysql_real_escape_string, etc)... Anthony