[
https://issues.apache.org/jira/browse/TIKA-506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Geoff Jarrad updated TIKA-506:
--
Attachment: sample.doc
The attached sample.doc Word document breaks the OfficeParser:
java.util.NoSuchEl
[
https://issues.apache.org/jira/browse/TIKA-506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nick Burch updated TIKA-506:
Attachment: tika-word12.patch
Updated patch (v12) which tidies up a few bits when building against the POI
3
[
https://issues.apache.org/jira/browse/TIKA-506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nick Burch updated TIKA-506:
Attachment: tika-word11.patch
New patch (v11) adds support for .doc images, and non-nested .doc tables
(nest
[
https://issues.apache.org/jira/browse/TIKA-506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nick Burch updated TIKA-506:
Attachment: tika-word9.patch
This updates yesterday's patch, and additionally includes hyperlinks and
bold/i
[
https://issues.apache.org/jira/browse/TIKA-506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nick Burch updated TIKA-506:
Attachment: tika-word6.patch
The attached patch improves the parsing of .docx to include headings,
hyperlink