I'm looking at xmlescape in html.py:
http://code.google.com/p/web2py/source/browse/gluon/html.py?r=#96

<http://code.google.com/p/web2py/source/browse/gluon/html.py?r=#96>cgi.
escape(data, quote).replace("'","&#x27;")

This looks good. I need to do some performance analysis of replace() to see
if I can speed up encoding by only walking the string once, but that is for
another day.

I also tested my proof of concept from the original post. It doesn't work,
and I see the proper escaping is being done. Great work!

Craig

On Wed, Jul 14, 2010 at 5:26 PM, mdipierro <mdipie...@cs.depaul.edu> wrote:

> I did not know this would work in attributes. I tried and yes, it
> works!
> The patch is now in trunk. Please check it.
>
> Massimo
>
> On 14 Lug, 12:01, Craig Younkins <cyounk...@gmail.com> wrote:
> > Yes, you can escape both a and b such that it works in either context.
> >
> > Reference rule #1 and #2 onhttp://
> www.owasp.org/index.php/XSS_(Cross_Site_Scripting)_Prevention_...
> >
> > <http://www.owasp.org/index.php/XSS_(Cross_Site_Scripting)_Prevention_..
> .>Rule
> > #1 states that data inserted into HTML element content (variable b in
> your
> > example) should escape these 6 characters:
> >
> > & --> &amp;
> > < --> &lt;> --> &gt;
> >
> > " --> &quot;
> > ' --> &#x27; &apos; is not recommended
> > / --> &#x2F; forward slash is included as it helps end an HTML entity
> >
> > Though many implementations leave off the forward slash. This encoding
> would
> > escape variable b enough to be put into the <div> body context.
> >
> > Variable a is being inserted into an HTML attribute. As per rule #2...
> >
> > "Except for alphanumeric characters, escape all characters with ASCII
> values
> > less than 256 with the &#xHH; format (or a named entity if available) to
> > prevent switching out of the attribute. The reason this rule is so broad
> is
> > that developers frequently leave attributes unquoted. Properly quoted
> > attributes can only be escaped with the corresponding quote. Unquoted
> > attributes can be broken out of with many characters, including [space] %
> *
> > + , - / ; < = > ^ and |."
> >
> > I have spoke to the author of this text, and he indicates this is *
> > overencoding.* The overencoding is necessary because developers often
> leave
> > attributes unquoted. If the attribute is quoted, the only way to break
> out
> > the quoted context is with the corresponding quote. For a few reasons
> > dealing with the sequence of parsers, we highly advise encoding all 6 of
> the
> > characters above. This is the same routine as above for rule #1. This
> makes
> > data safe for inclusion in *quoted *HTML attributes but NOT in *unquoted*
> HTML
> > attributes. Thus, there are warnings on the pythonsecurity.org wiki that
> we
> > would like to see in template engine documentation that indicate
> developers
> > should *always* quote HTML attributes. Or, if you wish to be super-safe
> and
> > do the best thing possible, follow the quoted advice above and encode all
> > non-alphanumerics below 256.
> >
> > I hope this sufficiently answers your question. In practice, an escaping
> > routine escaping all 6 of the characters above should be created and used
> > for any variables handled by the template engine unless marked as safe.
> >
> > Best,
> > Craig
> >
> >
> >
> > On Wed, Jul 14, 2010 at 12:25 PM, mdipierro <mdipie...@cs.depaul.edu>
> wrote:
> > > here is the problem as I see it
> >
> > > #controller
> > > def index(): return dict(a=' x"y ', b=' x"y ')
> >
> > > #view
> > > <div onclick="{{=a}}">{{=b}}</div>
> >
> > > Notice that a and b have the same value. a should be escaped as x\"y
> > > while this escaping would be wrong for b.
> > > Are you telling me there is a way to escape both a and b that works in
> > > both way whatever the context?
> > > If there is I do not know about it.
> >
> > > Massimo
> >
> > > On 14 Lug, 09:52, Craig Younkins <cyounk...@gmail.com> wrote:
> > > > I want to re-raise this issue because I feel it is important.
> >
> > > > > > * Do not use cgi.escape for HTML escaping because it does not
> escape
> > > > > > single quotes and may lead to XSS - See
> >
> > > >http://www.pythonsecurity.org/wiki/web2py/#cross-site-scripting-xss
> > > > <http://www.pythonsecurity.org/wiki/web2py/#cross-site-scripting-xss
> >
> >
> > > > > > and  http://www.pythonsecurity.org/wiki/cgi/<
> > >http://www.google.com/url?sa=D&q=http://www.pythonsecurity.org/wiki/c..
> .>
> > > > > I assume you refer to attribute escaping. When using helpers like
> >
> > > >  > {{=A(link,_href=url)}} then link is escaped using cgi.escape but
> url
> >
> > > > > is escaped differently (quotes are escaped). The problem is that
> the
> > > > > escape function does not know whether a variable is to be inserted
> in
> > > > > html, css, js, attribute, a string in js, etc. etc. and therefore
> if
> > > > > the function does know the context it is in it can never always
> escape
> > > > > correcly. I do not believe there is a general solution to this
> > > > > problem. web2py assumes {{=....}} is escaping HTML/XML. If you need
> to
> > > > > scape attributes we suggest using helpers.  If you need to scape js
> > > > > code or strings in js code, you may have to do it manually.
> >
> > > > That's not quite what I was getting at. You're right about needing
> the
> > > > context in order to escape correctly though. I think the default
> escaping
> > > > should include single and double quotes. cgi.escape escapes double
> quotes
> > > > but not single quotes.
> >
> > > > I thought that the default escaping was going through cgi.escape by
> way
> > > of
> > > > the xmlescape method, but given the below, that appears to not be the
> > > case.
> > > > I'm a little confused.
> >
> > > > Here's an example of something I don't think I should be able to do:
> >
> > > > Controller:         return dict(data='" onload="alert(1);" bad="')
> > > > View:               <body class="{{=data}}"></body>
> > > > Output:            <body class="" onload="alert(1);" bad=""></body>
> >
> > > > The same attack works with single quoted attributes. While you're
> right,
> > > we
> > > > can't do full proper escaping without knowing the context, I don't
> think
> > > > quotes should be permitted in any web context.
> > > > --
> > > > Craig Younkins
> >
> > --
> > Craig Younkins
>



-- 
Craig Younkins

Reply via email to