Shoot..I see that the href from the example was stripped once it is
displayed on https://mail.gnome.org/archives/xml/2018-January/msg00010.html.
Here is a gist that preserves formatting:
https://gist.github.com/ptoomey3/4f684c7386229658b39b69756e262050.



On Fri, Jan 12, 2018 at 10:01 AM Patrick Toomey <patrick.too...@github.com>
wrote:

> While triaging a reported cross site scripting bug we were analyzing the
> behavior of our HTML sanitization code and noticed that it was parsing an
> input in an unexpected way.  The sanitization library itself eventually
> wraps Nokogiri, which is a relatively thin wrapper around libxml2. We
> reached out to Kurt Seifriend and Daniel Veillard to let them know about
> the observed behavior.  There was discussion about whether the observed
> behavior was was expected or not and eventually it was suggested that we
> move the discussion to the libxml2 mailing list (the folks on that thread
> reserved CVE-2016-3709 in case it is decided that this is an issue).
>
> In short https://github.com/GNOME/libxml2/commit/960f0e2 introduced
> support for not URI escaping server side includes. However, this seems to
> lead to the below behavior.
>
> ->  libxml2 git:(bc5a5d65) ✗ cat test/HTML/ssiquote.html
> <html><body><p><a
> href='&lt;!--"&gt;&lt;script&gt;alert(1)&lt;/script&gt;-->'>test1</a></p></body></html>
> ->  libxml2 git:(bc5a5d65) ✗ make testHTML
> ->  libxml2 git:(bc5a5d65) ✗ ./testHTML test/HTML/ssiquote.html
> <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "
> http://www.w3.org/TR/REC-html40/loose.dtd";>
> <html><body><p><a
> href="<!--"><script>alert(1)</script>-->">test1</a></p></body></html>
>
> Notice that the output results in a script tag added to the resulting
> parsed output.  Here is a small bit of Java/Xerces code to compare:
>
> import java.io.IOException;
> import java.io.StringReader;
> import java.io.StringWriter;
>
> import javax.xml.parsers.*;
> import javax.xml.transform.OutputKeys;
> import javax.xml.transform.Transformer;
> import javax.xml.transform.TransformerException;
> import javax.xml.transform.TransformerFactory;
> import javax.xml.transform.dom.DOMSource;
> import javax.xml.transform.stream.StreamResult;
>
> import org.w3c.dom.Document;
> import org.w3c.dom.NamedNodeMap;
> import org.w3c.dom.Node;
> import org.w3c.dom.NodeList;
> import org.xml.sax.InputSource;
> import org.xml.sax.SAXException;
>
> public class XmlTester {
>
> public static void main(String[] args) throws
> ParserConfigurationException, SAXException, IOException,
> TransformerException {
> String text = "<html><body><p><a
> href='&lt;!--\"&gt;&lt;script&gt;alert(1)&lt;/script&gt;-->'>test1</a></p></body></html>";
> DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
> DocumentBuilder builder = factory.newDocumentBuilder();
>
> // text contains the XML content
> Document doc = builder.parse(new InputSource(new StringReader(text)));
> System.out.println(getStringFromDocument(doc));
> }
>
> public static String getStringFromDocument(Document doc) throws
> TransformerException {
> DOMSource domSource = new DOMSource(doc);
> StringWriter writer = new StringWriter();
> StreamResult result = new StreamResult(writer);
> TransformerFactory tf = TransformerFactory.newInstance();
> Transformer transformer = tf.newTransformer();
> transformer.transform(domSource, result);
> return writer.toString();
> }
> }
>
> This Java code, with the same input, results in the following output:
>
> <html>
> <body>
> <p>
> <a href="<!--%22><script>alert(1)</script>-->">test1</a>
> </p>
> </body>
> </html>
>
> The attribute contents are quoted/escaped such that  they don’t break out
> of the attribute once it is parsed. This libxml2 behavior doesn’t apply to
> all attributes. If we change the href to a class attribute there is no
> issue. This likely makes sense since the above mentioned commit
> specifically references not URI escaping.
>
> ->  libxml2 git:(bc5a5d65) ✗ cat test/HTML/ssiquote.html
> <html><body><p><a
> class='&lt;!--"&gt;&lt;script&gt;alert(1)&lt;/script&gt;-->'>test1</a></p></body></html>
> ->  libxml2 git:(bc5a5d65) ✗ make testHTML
> ->  libxml2 git:(bc5a5d65) ✗ ./testHTML test/HTML/ssiquote.html
> <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "
> http://www.w3.org/TR/REC-html40/loose.dtd";>
> <html><body><p><a
> class='<!--"><script>alert(1)</script>-->'>test1</a></p></body></html>
>
> So, I guess the question is, what do people think? I believe the argument
> from Daniel was roughly that this would be expected behavior for server
> side includes. However, this functionality seems to be in conflict with the
> Xerces behavior and it also leads to a trivial way to cause new/unexpected
> nodes to be introduced into the tree simply by parsing the document.
>
>
_______________________________________________
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml

Reply via email to