Hi Ángel,

The methods all refer to literal strings, values or digits. We can't
reasonably escape data while allowing valid markup for the current
context since that's a contradiction by its very nature. If you needed
to let user values drive CSS names, Javascript functions or variable
naming, or HTML markup, you need something completely different. For
example, HTML markup can be sanitised against a whitelist using
HTMLPurifier.

> I'm fine with the concept, but I'm not sold on the interface.
> It should be really clear when each of them should be used.
>
> escapeHtml()
> Ok, this is going to be used to show content inside a html document.
>
> escapeHtmlAttr()
> Use when using unquoted html attributes, otherwise use html escaping.
> When was the last time I saw an unquotted attribute with user-provided 
> content?

Hopefully never since that's the ideal ;). However, HTML5 allows
unquoted attributes which is perfectly valid. We don't make the user's
choice on this but we could provide the relevant tool for escaping if
they are completely and irredeemably insane :P.

> I think it should be replaced by a quoteHtmlAttr() function which properly
> escapes the content and adds the quotes for you (or it might skip them
> if it determines it's not needed in this case).

The RFC focuses on escaping - not sanitising or reformatting.

> escapeJs()
> Escape javascript... but inside <script> tags, I guess? So it's not to
> be used
> for dynamically generated javascript. Not so clear.

Javascript literal strings (as defined by the standard).

> escapeCss()
> I'm not even sure in which cases would this be needed. Standalone CSS,
> inside
> a <style> tag, as style="" attribute?

CSS values like a font size or background color. If user data is
allowed to alter names or any other CSS markup, you would need
sanitisation rather than escaping.

> escapeUrl()
> "It is included primarily for consistency". When do I need to use
> escapeUrl and
> when escapeHtml? What if it's an url inside a css tag inside a html
> document?

Basically any URL inside any attribute. It encodes part of a URL - the
overall URL would still need to be validated separately.

> It makes things more confusing, so I'd remove it.

Needs to be included to maintain consistency in having a full set of
go-to escapers.

> It should be clear what you are passing to that function and in which
> context
> it expects you to leave the output.

It might not be obvious but these are very straightforward to link to
specific contexts. Here's the clearest explanation of where all of
this fits into templating:
https://www.owasp.org/index.php/XSS_%28Cross_Site_Scripting%29_Prevention_Cheat_Sheet

I should probably add that as a link to the RFC (Anthony will finally
get an ESAPI reference out of me ;)).

Paddy



-- 
Pádraic Brady

http://blog.astrumfutura.com
http://www.survivethedeepend.com
Zend Framework Community Review Team

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to