Re: [xml] Relaxed entities encoding for html output

Daniel Veillard Tue, 04 Sep 2012 21:20:03 -0700

On Tue, Sep 04, 2012 at 06:36:12PM +0200, rbondue....@orange.com wrote:
> Hello,
> I am working on a project where we are using either libxslt or xalan for xslt 
> transformations.
> We have internally deprecated xalan because libxslt is considerably faster, 
> and all other xml processing is performed by libxml2.
> We now would like to drop xalan completely, but there is one important case 
> where both libraries are producing a different output, which prevents us from 
> doing so.
> 
> Consider whatever xml file and the following style sheet :
> 
> 
> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"; 
> version='1.0'>
> <xsl:output method="html"/>
> <xsl:variable name="apache">&lt;!--apache-stuff--></xsl:variable>
> <xsl:variable name="script">&amp;{My script};</xsl:variable>
> 
> <xsl:template match="/">
>     <a href="{$apache}/page.html" onMouseUp="{$script}">link</a>
> </xsl:template>
> 
> </xsl:stylesheet>
> 
> 
> 
> libxml2/libxslt currently produce the following file from the transformation:
> 
> <a href="&lt;!--apache-stuff--&gt;/page.html" onMouseUp="&amp;{My 
> script};">link</a>
> 
> 
> And Xerces/Xalan are producing:
> 
> <a href="<!--apache-stuff-->/page.html" onMouseUp="&{My script};">link</a>
> 
> 
> The <!--apache-stuff--> part is supposed to be replaced by the web server for 
> load balancing purpose, but this is not happening when using libxslt because 
> of the escaping (&lt; &gt;),
> And that is the issue we're running into.
> 
> I have tracked it down, and the problem lies within libxml2, not libxslt 
> (hence why I am posting on this list!), when the node tree is serialized to 
> text. The enclosed patches are fixing this, and are also implementing a TODO 
> that you had in the code:
> 
> The html output method should not escape a & character occurring in an 
> attribute value immediately followed by a { character (see Section B.7.1 of 
> the HTML 4.0 Recommendation).
> 
> This is illustrated by the &{My script} part in the example above.
> 
> To get back to my issue however, I am not completely sure which behavior is 
> actually correct, as I could not find if '<' and '>' are allowed in attribute 
> values in html (I know '<' is forbidden in xml).
> I run the regression tests, but they added to my confusion:
> Some html tests are now failing in the test suite (runtest), but if I run:
> ./testHTML test/HTML/lt.html
> Then the  output is a lot closer to the input file test/HTML/lt.html, which 
> was not the case before, so this may mean an improvement.
> If this is indeed correct, I'm of course open to any suggestion or comment 
> you may have about the patches, they should apply cleanly to the git trunk.


  Your approach is way too heavy, instead of changing < and & in all
case detecting the full construct first and then special processing
those case is really less disruptive. With that approach no other
test case in libxml2 or libxslt fails. So I commited that restricted
approach but which should handle the cases you raise.

  
http://git.gnome.org/browse/libxml2/commit/?id=7d4c529a334845621e2f805c8ed0e154b3350cec

thinkpad:~/XSLT -> xsltproc/xsltproc orange.xsl orange.xsl
<a href="<!--apache-stuff-->/page.html" onMouseUp="&{My script};">link</a>
thinkpad:~/XSLT -> 

Daniel

-- 
Daniel Veillard      | libxml Gnome XML XSLT toolkit  http://xmlsoft.org/
dan...@veillard.com  | Rpmfind RPM search engine http://rpmfind.net/
http://veillard.com/ | virtualization library  http://libvirt.org/
_______________________________________________
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml

Re: [xml] Relaxed entities encoding for html output

Reply via email to