I did not know this would work in attributes. I tried and yes, it
The patch is now in trunk. Please check it.


> Yes, you can escape both a and b such that it works in either context.
> Reference rule #1 and #2 
http://www.owasp.org/index.php/XSS_(Cross_Site_Scripting)_Prevention_...
Rule
> #1 states that data inserted into HTML element content (variable b in your
> example) should escape these 6 characters:
> & --> &amp;
> < --> &lt;> --> &gt;
> " --> &quot;
> ' --> &#x27; &apos; is not recommended
> / --> &#x2F; forward slash is included as it helps end an HTML entity
> Though many implementations leave off the forward slash. This encoding would
> escape variable b enough to be put into the <div> body context.
> Variable a is being inserted into an HTML attribute. As per rule #2...
> "Except for alphanumeric characters, escape all characters with ASCII values
> less than 256 with the &#xHH; format (or a named entity if available) to
> prevent switching out of the attribute. The reason this rule is so broad is
> that developers frequently leave attributes unquoted. Properly quoted
> attributes can only be escaped with the corresponding quote. Unquoted
> attributes can be broken out of with many characters, including [space] % *
> + , - / ; < = > ^ and |."
> I have spoke to the author of this text, and he indicates this is *
> overencoding.* The overencoding is necessary because developers often leave
> attributes unquoted. If the attribute is quoted, the only way to break out
> the quoted context is with the corresponding quote. For a few reasons
> dealing with the sequence of parsers, we highly advise encoding all 6 of the
> characters above. This is the same routine as above for rule #1. This makes
> data safe for inclusion in *quoted *HTML attributes but NOT in *unquoted* HTML
> attributes. Thus, there are warnings on the pythonsecurity.org wiki that we
> would like to see in template engine documentation that indicate developers
> should *always* quote HTML attributes. Or, if you wish to be super-safe and
> do the best thing possible, follow the quoted advice above and encode all
> non-alphanumerics below 256.
> I hope this sufficiently answers your question. In practice, an escaping
> routine escaping all 6 of the characters above should be created and used
> for any variables handled by the template engine unless marked as safe.
> > here is the problem as I see it
> > #controller
> > def index(): return dict(a=' x"y ', b=' x"y ')
> > #view
> > <div onclick="{{=a}}">{{=b}}</div>
> > Notice that a and b have the same value. a should be escaped as x\"y
> > while this escaping would be wrong for b.
> > Are you telling me there is a way to escape both a and b that works in
> > both way whatever the context?
> > If there is I do not know about it.
> > Massimo
> > > I want to re-raise this issue because I feel it is important.
> > > > > * Do not use cgi.escape for HTML escaping because it does not escape
> > > > > single quotes and may lead to XSS - See
http://www.pythonsecurity.org/wiki/web2py/#cross-site-scripting-xss
> > > <http://www.pythonsecurity.org/wiki/web2py/#cross-site-scripting-xss>
http://www.pythonsecurity.org/wiki/cgi/
> >http://www.google.com/url?sa=D&q=http://www.pythonsecurity.org/wiki/c...>
> > > > I assume you refer to attribute escaping. When using helpers like
> > >  > {{=A(link,_href=url)}} then link is escaped using cgi.escape but url
> > > > is escaped differently (quotes are escaped). The problem is that the
> > > > escape function does not know whether a variable is to be inserted in
> > > > html, css, js, attribute, a string in js, etc. etc. and therefore if
> > > > the function does know the context it is in it can never always escape
> > > > correcly. I do not believe there is a general solution to this
> > > > problem. web2py assumes {{=....}} is escaping HTML/XML. If you need to
> > > > scape attributes we suggest using helpers.  If you need to scape js
> > > > code or strings in js code, you may have to do it manually.
> > > That's not quite what I was getting at. You're right about needing the
> > > context in order to escape correctly though. I think the default escaping
> > > should include single and double quotes. cgi.escape escapes double quotes
> > > but not single quotes.
> > > I thought that the default escaping was going through cgi.escape by way
> > of
> > > the xmlescape method, but given the below, that appears to not be the
> > case.
> > > I'm a little confused.
> > > Here's an example of something I don't think I should be able to do:
> > > Controller:         return dict(data='" onload="alert(1);" bad="')
> > > View:               <body class="{{=data}}"></body>
> > > Output:            <body class="" onload="alert(1);" bad=""></body>
> > > The same attack works with single quoted attributes. While you're right,
> > we
> > > can't do full proper escaping without knowing the context, I don't think
> > > quotes should be permitted in any web context.
