I'm looking at xmlescape in html.py: http://code.google.com/p/web2py/source/browse/gluon/html.py?r=#96
<http://code.google.com/p/web2py/source/browse/gluon/html.py?r=#96>cgi. escape(data, quote).replace("'","'") This looks good. I need to do some performance analysis of replace() to see if I can speed up encoding by only walking the string once, but that is for another day. I also tested my proof of concept from the original post. It doesn't work, and I see the proper escaping is being done. Great work! Craig On Wed, Jul 14, 2010 at 5:26 PM, mdipierro <mdipie...@cs.depaul.edu> wrote: > I did not know this would work in attributes. I tried and yes, it > works! > The patch is now in trunk. Please check it. > > Massimo > > On 14 Lug, 12:01, Craig Younkins <cyounk...@gmail.com> wrote: > > Yes, you can escape both a and b such that it works in either context. > > > > Reference rule #1 and #2 onhttp:// > www.owasp.org/index.php/XSS_(Cross_Site_Scripting)_Prevention_... > > > > <http://www.owasp.org/index.php/XSS_(Cross_Site_Scripting)_Prevention_.. > .>Rule > > #1 states that data inserted into HTML element content (variable b in > your > > example) should escape these 6 characters: > > > > & --> & > > < --> <> --> > > > > > " --> " > > ' --> ' ' is not recommended > > / --> / forward slash is included as it helps end an HTML entity > > > > Though many implementations leave off the forward slash. This encoding > would > > escape variable b enough to be put into the <div> body context. > > > > Variable a is being inserted into an HTML attribute. As per rule #2... > > > > "Except for alphanumeric characters, escape all characters with ASCII > values > > less than 256 with the &#xHH; format (or a named entity if available) to > > prevent switching out of the attribute. The reason this rule is so broad > is > > that developers frequently leave attributes unquoted. Properly quoted > > attributes can only be escaped with the corresponding quote. Unquoted > > attributes can be broken out of with many characters, including [space] % > * > > + , - / ; < = > ^ and |." > > > > I have spoke to the author of this text, and he indicates this is * > > overencoding.* The overencoding is necessary because developers often > leave > > attributes unquoted. If the attribute is quoted, the only way to break > out > > the quoted context is with the corresponding quote. For a few reasons > > dealing with the sequence of parsers, we highly advise encoding all 6 of > the > > characters above. This is the same routine as above for rule #1. This > makes > > data safe for inclusion in *quoted *HTML attributes but NOT in *unquoted* > HTML > > attributes. Thus, there are warnings on the pythonsecurity.org wiki that > we > > would like to see in template engine documentation that indicate > developers > > should *always* quote HTML attributes. Or, if you wish to be super-safe > and > > do the best thing possible, follow the quoted advice above and encode all > > non-alphanumerics below 256. > > > > I hope this sufficiently answers your question. In practice, an escaping > > routine escaping all 6 of the characters above should be created and used > > for any variables handled by the template engine unless marked as safe. > > > > Best, > > Craig > > > > > > > > On Wed, Jul 14, 2010 at 12:25 PM, mdipierro <mdipie...@cs.depaul.edu> > wrote: > > > here is the problem as I see it > > > > > #controller > > > def index(): return dict(a=' x"y ', b=' x"y ') > > > > > #view > > > <div onclick="{{=a}}">{{=b}}</div> > > > > > Notice that a and b have the same value. a should be escaped as x\"y > > > while this escaping would be wrong for b. > > > Are you telling me there is a way to escape both a and b that works in > > > both way whatever the context? > > > If there is I do not know about it. > > > > > Massimo > > > > > On 14 Lug, 09:52, Craig Younkins <cyounk...@gmail.com> wrote: > > > > I want to re-raise this issue because I feel it is important. > > > > > > > > * Do not use cgi.escape for HTML escaping because it does not > escape > > > > > > single quotes and may lead to XSS - See > > > > > >http://www.pythonsecurity.org/wiki/web2py/#cross-site-scripting-xss > > > > <http://www.pythonsecurity.org/wiki/web2py/#cross-site-scripting-xss > > > > > > > > > > and http://www.pythonsecurity.org/wiki/cgi/< > > >http://www.google.com/url?sa=D&q=http://www.pythonsecurity.org/wiki/c.. > .> > > > > > I assume you refer to attribute escaping. When using helpers like > > > > > > > {{=A(link,_href=url)}} then link is escaped using cgi.escape but > url > > > > > > > is escaped differently (quotes are escaped). The problem is that > the > > > > > escape function does not know whether a variable is to be inserted > in > > > > > html, css, js, attribute, a string in js, etc. etc. and therefore > if > > > > > the function does know the context it is in it can never always > escape > > > > > correcly. I do not believe there is a general solution to this > > > > > problem. web2py assumes {{=....}} is escaping HTML/XML. If you need > to > > > > > scape attributes we suggest using helpers. If you need to scape js > > > > > code or strings in js code, you may have to do it manually. > > > > > > That's not quite what I was getting at. You're right about needing > the > > > > context in order to escape correctly though. I think the default > escaping > > > > should include single and double quotes. cgi.escape escapes double > quotes > > > > but not single quotes. > > > > > > I thought that the default escaping was going through cgi.escape by > way > > > of > > > > the xmlescape method, but given the below, that appears to not be the > > > case. > > > > I'm a little confused. > > > > > > Here's an example of something I don't think I should be able to do: > > > > > > Controller: return dict(data='" onload="alert(1);" bad="') > > > > View: <body class="{{=data}}"></body> > > > > Output: <body class="" onload="alert(1);" bad=""></body> > > > > > > The same attack works with single quoted attributes. While you're > right, > > > we > > > > can't do full proper escaping without knowing the context, I don't > think > > > > quotes should be permitted in any web context. > > > > -- > > > > Craig Younkins > > > > -- > > Craig Younkins > -- Craig Younkins