[ https://issues.apache.org/jira/browse/TIKA-4315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17892459#comment-17892459 ]
Hudson commented on TIKA-4315: ------------------------------ FAILURE: Integrated in Jenkins build Tika ยป tika-branch_2x-jdk11 #518 (See [https://ci-builds.apache.org/job/Tika/job/tika-branch_2x-jdk11/518/]) [TIKA-4315] Fix XPS whitespace not being emitted (#1970) (tallison: [https://github.com/apache/tika/commit/f6fe845637f42b6c01bed388d610b0e9c9f83ea1]) * (add) tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/test/resources/test-documents/testXLSX.xps * (add) tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/test/resources/test-documents/test_text.xps * (edit) tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/main/java/org/apache/tika/parser/microsoft/ooxml/xps/XPSPageContentHandler.java * (edit) tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/test/java/org/apache/tika/parser/microsoft/ooxml/xps/XPSParserTest.java > XPS file parser does not emit whitespace as expected > ---------------------------------------------------- > > Key: TIKA-4315 > URL: https://issues.apache.org/jira/browse/TIKA-4315 > Project: Tika > Issue Type: Bug > Components: parser > Affects Versions: 2.9.1, 2.9.2 > Reporter: Ruairidh Williamson > Priority: Major > Fix For: 2.9.3, 3.0.1, 4.0.0 > > Attachments: testXLSX.xps > > > We are using tika to extract text from XPS files and have hit an issue where > whitespace is not emitted where we would expect. See the attached example > file where opening the file it visually has a large gap between "x" and > "abcde1234f" but when extracted by tika it calls `characters` with "x" and > then `characters` on "abcde1234f". We would expect a `ignorableWhitespace` in > between those calls but we don't get one. > I have a pull request that fixes the issue which I will submit. -- This message was sent by Atlassian Jira (v8.20.10#820010)