[ https://issues.apache.org/jira/browse/TIKA-2030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
David Pilato updated TIKA-2030: ------------------------------- Attachment: test.docx test.odt ODT and DOCX files > A space is suppressed when parsing Odt file > ------------------------------------------- > > Key: TIKA-2030 > URL: https://issues.apache.org/jira/browse/TIKA-2030 > Project: Tika > Issue Type: Bug > Components: parser > Affects Versions: 1.13 > Environment: MacOS X > Reporter: David Pilato > Priority: Minor > Attachments: test.docx, test.odt > > > I have an ODT sample file which contains: > {code} > This is a sample text available in page 1 > {code} > When I extract its content with Tika, I'm getting: > {code} > This isa sample text available in page 1 > {code} > Note the missing space between {{is}} and {{a}}. > I'll link to an example ODT file which reproduces this issue. > Note that I generated this ODT file from MS Word. The original MS Word file > is correctly parsed by Tika. -- This message was sent by Atlassian JIRA (v6.3.4#6332)