[ https://issues.apache.org/jira/browse/TIKA-4459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18010390#comment-18010390 ]
Manish S N edited comment on TIKA-4459 at 7/28/25 1:04 PM: ----------------------------------------------------------- -So would ditching java's inbuilt ZipInputStream and moving to ZipArchiveInputStream of apache commons compress solve the issue?- guess not. as the following code {code:java} InputStream is = URI.create("https://issues.apache.org/jira/secure/attachment/13077746/protected.odt").toURL().openStream(); // new AutoDetectParser().parse(is, new DefaultHandler(), new Metadata(), new ParseContext()); ZipArchiveInputStream zis = new ZipArchiveInputStream(is); do { ZipEntry entry = zis.getNextEntry(); if (entry == null) { System.out.println("No more entries in the zip file."); break; } System.out.println(entry + " " + entry.getName() + " " + entry.getSize() + " " + entry.isDirectory()); } while (true); {code} gives {code:java} mimetype mimetype 39 false Configurations2/menubar/ Configurations2/menubar/ 0 true Configurations2/progressbar/ Configurations2/progressbar/ 0 true Configurations2/popupmenu/ Configurations2/popupmenu/ 0 true Configurations2/floater/ Configurations2/floater/ 0 true Configurations2/statusbar/ Configurations2/statusbar/ 0 true Configurations2/toolbar/ Configurations2/toolbar/ 0 true Configurations2/toolpanel/ Configurations2/toolpanel/ 0 true Configurations2/images/Bitmaps/ Configurations2/images/Bitmaps/ 0 true Configurations2/accelerator/ Configurations2/accelerator/ 0 truestyles.xml styles.xml -1 false Exception in thread "main" org.apache.commons.compress.archivers.zip.UnsupportedZipFeatureException: Unsupported feature data descriptor used in entry styles.xml at org.apache.commons.compress.archivers.zip.ZipArchiveInputStream.read(ZipArchiveInputStream.java:919) at org.apache.commons.compress.archivers.zip.ZipArchiveInputStream.skip(ZipArchiveInputStream.java:1285) at org.apache.commons.compress.archivers.zip.ZipArchiveInputStream.closeEntry(ZipArchiveInputStream.java:480) at org.apache.commons.compress.archivers.zip.ZipArchiveInputStream.getNextZipEntry(ZipArchiveInputStream.java:651) at org.apache.commons.compress.archivers.zip.ZipArchiveInputStream.getNextEntry(ZipArchiveInputStream.java:632) at org.manish.AttachmentParser.tilmanTest(AttachmentParser.java:79) at org.manish.AttachmentParser.main(AttachmentParser.java:69){code} was (Author: JIRAUSER306563): So would ditching java's inbuilt ZipInputStream and moving to ZipArchiveInputStream of apache commons compress solve the issue? > protected ODF encryption detection fail > --------------------------------------- > > Key: TIKA-4459 > URL: https://issues.apache.org/jira/browse/TIKA-4459 > Project: Tika > Issue Type: Bug > Components: parser > Affects Versions: 3.2.1 > Environment: Ubuntu 24.04.2 LTS x86_64 > Reporter: Manish S N > Priority: Minor > Labels: encryption, odf, open-document-format, protected, > regression, zip > Fix For: 4.0.0, 3.2.2 > > Attachments: protected.odt > > > When passing inputstream of protected odf format file to tika we get a > ZipException instead of a EncryptedDocumentException. > This works well and correctly throws EncryptedDocumentException if you create > TikaInputStream with Path or call TikaInputStream.getPath() as it will write > to a temporary file in memory. > But when working with InputStreams we get the following zip exception: > > org.apache.tika.exception.TikaException: TIKA-198: Illegal IOException from > org.apache.tika.parser.odf.OpenDocumentParser@bae47a0 > at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:304) > at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:298) > at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:204) > at org.apache.tika.Tika.parseToString(Tika.java:525) > at org.apache.tika.Tika.parseToString(Tika.java:495) > at org.manish.AttachmentParser.parse(AttachmentParser.java:21) > at org.manish.AttachmentParser.lambda$testParse$1(AttachmentParser.java:72) > at > java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183) > at > java.base/java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:177) > at > java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:195) > at java.base/java.util.Iterator.forEachRemaining(Iterator.java:133) > at > java.base/java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801) > at > java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484) > at > java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474) > at > java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150) > at > java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173) > at > java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) > at > java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:497) > at org.manish.AttachmentParser.testParse(AttachmentParser.java:64) > at org.manish.AttachmentParser.main(AttachmentParser.java:57) > Caused by: java.util.zip.ZipException: only DEFLATED entries can have EXT > descriptor > at java.base/java.util.zip.ZipInputStream.readLOC(ZipInputStream.java:313) > at > java.base/java.util.zip.ZipInputStream.getNextEntry(ZipInputStream.java:125) > at > org.apache.tika.parser.odf.OpenDocumentParser.handleZipStream(OpenDocumentParser.java:218) > at > org.apache.tika.parser.odf.OpenDocumentParser.parse(OpenDocumentParser.java:169) > at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:298) > ... 19 more > > (We use tika to detect encrypted docs) -- This message was sent by Atlassian Jira (v8.20.10#820010)