[ 
https://issues.apache.org/jira/browse/TIKA-1017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13492342#comment-13492342
 ] 

Jukka Zitting commented on TIKA-1017:
-------------------------------------

The idea behind DefaultHtmlMapper is to try to normalize and simplify the 
incoming HTML as much as possible while still preserving the semantic structure 
of the document. We can add extra elements if there's a good use case that's 
not already covered by the IdentifyHtmlMapper class.
                
> DefaultHtmlMapper misses some safe elements
> -------------------------------------------
>
>                 Key: TIKA-1017
>                 URL: https://issues.apache.org/jira/browse/TIKA-1017
>             Project: Tika
>          Issue Type: Bug
>            Reporter: Daniel Bonniot de Ruisselet
>
> The code of DefaultHtmlMapper says that the list of "safe" elements is based 
> on http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd
> Elements like <sub> and <i> are not included in the safe list. Is this 
> intentional (a comment with the rationale would be useful) or should they be 
> added?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to