[ 
https://issues.apache.org/jira/browse/TIKA-651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13090169#comment-13090169
 ] 

Michael McCandless commented on TIKA-651:
-----------------------------------------

bq. Yes. I've had numerous battles with XML processing gone haywire in systems 
that have accidentally pulled in a wrong versions of the XML processing 
libraries.

OK, that sounds bad -- let's not add the dependency.

bq. BTW, serializing SAX events to XML, XHTML or HTML4 streams shouldn't be 
that difficult to implement directly. We could even copy relevant parts of the 
code from Xalan.

That (poaching/cherry-picking what we need from Xalan) sounds like a reasonable 
approach here...

> Unescaped attribute value generated
> -----------------------------------
>
>                 Key: TIKA-651
>                 URL: https://issues.apache.org/jira/browse/TIKA-651
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 0.9
>            Reporter: Raimund Merkert
>            Assignee: Jukka Zitting
>         Attachments: XHTMLSerializer.java
>
>
> I've converted a word document that contains hyperlinks with a complex query 
> component. The & character is not escaped and mozilla complains about that 
> when I write out the XHTML via a content handler that I wrote.
> It's not clear to me whether or not my contenthandler should assume 
> attributes are properly escaped or not.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to