On Tue, Sep 04, 2012 at 06:36:12PM +0200, rbondue....@orange.com wrote: > Hello, > I am working on a project where we are using either libxslt or xalan for xslt > transformations. > We have internally deprecated xalan because libxslt is considerably faster, > and all other xml processing is performed by libxml2. > We now would like to drop xalan completely, but there is one important case > where both libraries are producing a different output, which prevents us from > doing so. > > Consider whatever xml file and the following style sheet : > > > <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" > version='1.0'> > <xsl:output method="html"/> > <xsl:variable name="apache"><!--apache-stuff--></xsl:variable> > <xsl:variable name="script">&{My script};</xsl:variable> > > <xsl:template match="/"> > <a href="{$apache}/page.html" onMouseUp="{$script}">link</a> > </xsl:template> > > </xsl:stylesheet> > > > > libxml2/libxslt currently produce the following file from the transformation: > > <a href="<!--apache-stuff-->/page.html" onMouseUp="&{My > script};">link</a> > > > And Xerces/Xalan are producing: > > <a href="<!--apache-stuff-->/page.html" onMouseUp="&{My script};">link</a> > > > The <!--apache-stuff--> part is supposed to be replaced by the web server for > load balancing purpose, but this is not happening when using libxslt because > of the escaping (< >), > And that is the issue we're running into. > > I have tracked it down, and the problem lies within libxml2, not libxslt > (hence why I am posting on this list!), when the node tree is serialized to > text. The enclosed patches are fixing this, and are also implementing a TODO > that you had in the code: > > The html output method should not escape a & character occurring in an > attribute value immediately followed by a { character (see Section B.7.1 of > the HTML 4.0 Recommendation). > > This is illustrated by the &{My script} part in the example above. > > To get back to my issue however, I am not completely sure which behavior is > actually correct, as I could not find if '<' and '>' are allowed in attribute > values in html (I know '<' is forbidden in xml). > I run the regression tests, but they added to my confusion: > Some html tests are now failing in the test suite (runtest), but if I run: > ./testHTML test/HTML/lt.html > Then the output is a lot closer to the input file test/HTML/lt.html, which > was not the case before, so this may mean an improvement. > If this is indeed correct, I'm of course open to any suggestion or comment > you may have about the patches, they should apply cleanly to the git trunk.
Your approach is way too heavy, instead of changing < and & in all case detecting the full construct first and then special processing those case is really less disruptive. With that approach no other test case in libxml2 or libxslt fails. So I commited that restricted approach but which should handle the cases you raise. http://git.gnome.org/browse/libxml2/commit/?id=7d4c529a334845621e2f805c8ed0e154b3350cec thinkpad:~/XSLT -> xsltproc/xsltproc orange.xsl orange.xsl <a href="<!--apache-stuff-->/page.html" onMouseUp="&{My script};">link</a> thinkpad:~/XSLT -> Daniel -- Daniel Veillard | libxml Gnome XML XSLT toolkit http://xmlsoft.org/ dan...@veillard.com | Rpmfind RPM search engine http://rpmfind.net/ http://veillard.com/ | virtualization library http://libvirt.org/ _______________________________________________ xml mailing list, project page http://xmlsoft.org/ xml@gnome.org https://mail.gnome.org/mailman/listinfo/xml