[jira] Updated: (TIKA-506) Improve doc and docx parsing to include more things

2010-09-27 Thread Geoff Jarrad (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Geoff Jarrad updated TIKA-506: -- Attachment: sample.doc The attached sample.doc Word document breaks the OfficeParser: java.util.NoSuchEl

[jira] Updated: (TIKA-506) Improve doc and docx parsing to include more things

2010-09-21 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Burch updated TIKA-506: Attachment: tika-word12.patch Updated patch (v12) which tidies up a few bits when building against the POI 3

[jira] Updated: (TIKA-506) Improve doc and docx parsing to include more things

2010-09-16 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Burch updated TIKA-506: Attachment: tika-word11.patch New patch (v11) adds support for .doc images, and non-nested .doc tables (nest

[jira] Updated: (TIKA-506) Improve doc and docx parsing to include more things

2010-09-15 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Burch updated TIKA-506: Attachment: tika-word9.patch This updates yesterday's patch, and additionally includes hyperlinks and bold/i

[jira] Updated: (TIKA-506) Improve doc and docx parsing to include more things

2010-09-14 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Burch updated TIKA-506: Attachment: tika-word6.patch The attached patch improves the parsing of .docx to include headings, hyperlink