https://bz.apache.org/bugzilla/show_bug.cgi?id=62886
Bug ID: 62886 Summary: Regression extracting text from corrupted docx files Product: POI Version: 4.0.0-FINAL Hardware: PC Status: NEW Severity: regression Priority: P2 Component: OPC Assignee: dev@poi.apache.org Reporter: lfcnas...@gmail.com Target Milestone: --- Created attachment 36245 --> https://bz.apache.org/bugzilla/attachment.cgi?id=36245&action=edit Example file While testing Tika-1.19.1, POI throws the following exception with some corrupt docx files (MS Word complains but fixes them) previously handled without problems by POI-3.17. See TIKA-2765 for more info. Stacktrace bellow: org.apache.poi.openxml4j.exceptions.InvalidOperationException: Could not open the specified zip entry source stream at org.apache.poi.openxml4j.opc.ZipPackage.openZipEntrySourceStream(ZipPackage.java:214) at org.apache.poi.openxml4j.opc.ZipPackage.openZipEntrySourceStream(ZipPackage.java:196) at org.apache.poi.openxml4j.opc.ZipPackage.openZipEntrySourceStream(ZipPackage.java:170) at org.apache.poi.openxml4j.opc.ZipPackage.<init>(ZipPackage.java:151) at org.apache.poi.openxml4j.opc.ZipPackage.<init>(ZipPackage.java:123) at org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:234) at org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:81) at org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:110) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) ... 43 more Caused by: java.io.EOFException at org.apache.commons.compress.archivers.zip.ZipArchiveInputStream.readFully(ZipArchiveInputStream.java:803) at org.apache.commons.compress.archivers.zip.ZipArchiveInputStream.readFully(ZipArchiveInputStream.java:795) at org.apache.commons.compress.archivers.zip.ZipArchiveInputStream.skipRemainderOfArchive(ZipArchiveInputStream.java:1014) at org.apache.commons.compress.archivers.zip.ZipArchiveInputStream.getNextZipEntry(ZipArchiveInputStream.java:257) at org.apache.poi.openxml4j.util.ZipArchiveThresholdInputStream.getNextEntry(ZipArchiveThresholdInputStream.java:139) at org.apache.poi.openxml4j.util.ZipInputStreamZipEntrySource.<init>(ZipInputStreamZipEntrySource.java:47) at org.apache.poi.openxml4j.opc.ZipPackage.openZipEntrySourceStream(ZipPackage.java:212) ... 51 more{code} -- You are receiving this mail because: You are the assignee for the bug. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@poi.apache.org For additional commands, e-mail: dev-h...@poi.apache.org