[ https://issues.apache.org/jira/browse/TIKA-3316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tim Allison resolved TIKA-3316. ------------------------------- Resolution: Fixed > Illegal IOException processing XPS files > ---------------------------------------- > > Key: TIKA-3316 > URL: https://issues.apache.org/jira/browse/TIKA-3316 > Project: Tika > Issue Type: Bug > Components: core > Affects Versions: 1.25 > Reporter: Nick Harmer > Assignee: Tim Allison > Priority: Major > Fix For: 1.26 > > Attachments: Screenshot from 2021-03-12 17-00-05.png, test1.xps, > test2.xps, test3.xps, test4.xps > > > I have a number of (relatively simple) XPS documents which Tika fails to > process. The following exception appears: > {code:java} > org.apache.tika.exception.TikaException: TIKA-198: Illegal IOException from > org.apache.tika.parser.microsoft.ooxml.OOXMLParser@4149c063 > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:286) > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) > at > org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143) > at > org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:159) > at com.mcms.Main.parseFile(Main.java:88) > at com.mcms.Main.main(Main.java:59) > Caused by: > org.apache.commons.compress.archivers.zip.UnsupportedZipFeatureException: > Unsupported feature data descriptor used in entry > Documents/1/Metadata/Page1_Thumbnail.JPG > at > org.apache.commons.compress.archivers.zip.ZipArchiveInputStream.read(ZipArchiveInputStream.java:477) > at java.base/java.io.FilterInputStream.read(Unknown Source) > at > org.apache.poi.openxml4j.util.ZipArchiveThresholdInputStream.read(ZipArchiveThresholdInputStream.java:80) > at org.apache.poi.util.IOUtils.toByteArray(IOUtils.java:182) > at org.apache.poi.util.IOUtils.toByteArray(IOUtils.java:149) > at org.apache.poi.util.IOUtils.toByteArray(IOUtils.java:136) > at > org.apache.poi.openxml4j.util.ZipArchiveFakeEntry.<init>(ZipArchiveFakeEntry.java:47) > at > org.apache.poi.openxml4j.util.ZipInputStreamZipEntrySource.<init>(ZipInputStreamZipEntrySource.java:53) > at org.apache.poi.openxml4j.opc.ZipPackage.<init>(ZipPackage.java:106) > at org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:307) > at > org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:111) > at > org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:113) > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) > ... 5 more > {code} > > Obviously the generator for these files (XPS printer driver from Notepad) > adds a per-page thumbnail image which Tika doesn't like. > > -- This message was sent by Atlassian Jira (v8.3.4#803005)