https://bz.apache.org/bugzilla/show_bug.cgi?id=62886

            Bug ID: 62886
           Summary: Regression extracting text from corrupted docx files
           Product: POI
           Version: 4.0.0-FINAL
          Hardware: PC
            Status: NEW
          Severity: regression
          Priority: P2
         Component: OPC
          Assignee: dev@poi.apache.org
          Reporter: lfcnas...@gmail.com
  Target Milestone: ---

Created attachment 36245
  --> https://bz.apache.org/bugzilla/attachment.cgi?id=36245&action=edit
Example file

While testing Tika-1.19.1, POI throws the following exception with some corrupt
docx files (MS Word complains but fixes them) previously handled without
problems by POI-3.17. See TIKA-2765 for more info. Stacktrace bellow:

org.apache.poi.openxml4j.exceptions.InvalidOperationException: Could not open
the specified zip entry source stream
at
org.apache.poi.openxml4j.opc.ZipPackage.openZipEntrySourceStream(ZipPackage.java:214)
at
org.apache.poi.openxml4j.opc.ZipPackage.openZipEntrySourceStream(ZipPackage.java:196)
at
org.apache.poi.openxml4j.opc.ZipPackage.openZipEntrySourceStream(ZipPackage.java:170)
at org.apache.poi.openxml4j.opc.ZipPackage.<init>(ZipPackage.java:151)
at org.apache.poi.openxml4j.opc.ZipPackage.<init>(ZipPackage.java:123)
at org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:234)
at
org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:81)
at
org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:110)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
... 43 more
Caused by: java.io.EOFException
at
org.apache.commons.compress.archivers.zip.ZipArchiveInputStream.readFully(ZipArchiveInputStream.java:803)
at
org.apache.commons.compress.archivers.zip.ZipArchiveInputStream.readFully(ZipArchiveInputStream.java:795)
at
org.apache.commons.compress.archivers.zip.ZipArchiveInputStream.skipRemainderOfArchive(ZipArchiveInputStream.java:1014)
at
org.apache.commons.compress.archivers.zip.ZipArchiveInputStream.getNextZipEntry(ZipArchiveInputStream.java:257)
at
org.apache.poi.openxml4j.util.ZipArchiveThresholdInputStream.getNextEntry(ZipArchiveThresholdInputStream.java:139)
at
org.apache.poi.openxml4j.util.ZipInputStreamZipEntrySource.<init>(ZipInputStreamZipEntrySource.java:47)
at
org.apache.poi.openxml4j.opc.ZipPackage.openZipEntrySourceStream(ZipPackage.java:212)
... 51 more{code}

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@poi.apache.org
For additional commands, e-mail: dev-h...@poi.apache.org

Reply via email to