[
https://issues.apache.org/jira/browse/TIKA-4315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17892440#comment-17892440
]
ASF GitHub Bot commented on TIKA-4315:
--------------------------------------
tballison commented on PR #1970:
URL: https://github.com/apache/tika/pull/1970#issuecomment-2434987172
Wow. That's fantastic. Thank you!
> XPS file parser does not emit whitespace as expected
> ----------------------------------------------------
>
> Key: TIKA-4315
> URL: https://issues.apache.org/jira/browse/TIKA-4315
> Project: Tika
> Issue Type: Bug
> Components: parser
> Affects Versions: 2.9.1, 2.9.2
> Reporter: Ruairidh Williamson
> Priority: Major
> Attachments: testXLSX.xps
>
>
> We are using tika to extract text from XPS files and have hit an issue where
> whitespace is not emitted where we would expect. See the attached example
> file where opening the file it visually has a large gap between "x" and
> "abcde1234f" but when extracted by tika it calls `characters` with "x" and
> then `characters` on "abcde1234f". We would expect a `ignorableWhitespace` in
> between those calls but we don't get one.
> I have a pull request that fixes the issue which I will submit.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)