[ https://issues.apache.org/jira/browse/TIKA-985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Markus Jelsma updated TIKA-985: ------------------------------- Attachment: TIKA-985-1.3-1.patch Here's a preliminary patch for 1.3. It adds some HTML5 elements to TagSoup's schema in our HtmlParser constructor. This allows for those elements to be parsed. Support for all HTML5 elements should be added in TagSoup's schema. > Support for HTML5 elements > -------------------------- > > Key: TIKA-985 > URL: https://issues.apache.org/jira/browse/TIKA-985 > Project: Tika > Issue Type: Improvement > Components: parser > Affects Versions: 1.2 > Reporter: Markus Jelsma > Fix For: 1.3 > > Attachments: TIKA-985-1.3-1.patch > > > TagSoup's schema.tssl does not include some HTML5 elements (e.g. article, > section). This prevents some custom ContentHandlers from reading expected > elements and/or attributes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira