[ https://issues.apache.org/jira/browse/TIKA-4424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17956813#comment-17956813 ]
Tilman Hausherr edited comment on TIKA-4424 at 6/9/25 4:17 AM: --------------------------------------------------------------- Found in DefaultZipDetector.java: {code:java} int markLimit = -1;//16 * 1024 * 1024; {code} I wonder if this was intended. It fails with this code at the end of {{ZipDetectionTest.testStreaming()}}: {code:java} detector = new DefaultZipContainerDetector(); try (InputStream is = ZipDetectionTest.class.getResourceAsStream("/test-documents/tika-4424-example.zip")) { assertExpected(detector, is, "application/vnd.google-earth.kmz", expectedDigest); } {code} was (Author: tilman): Found in DefaultZipDetector.java: {code:java} int markLimit = -1;//16 * 1024 * 1024; {code} I wonder if this was intended. > Regression in zip-based detection with an InputStream in 3.2.0 > -------------------------------------------------------------- > > Key: TIKA-4424 > URL: https://issues.apache.org/jira/browse/TIKA-4424 > Project: Tika > Issue Type: Task > Components: detector > Affects Versions: 3.2.0 > Reporter: Tim Allison > Priority: Major > Labels: regression > Attachments: tika-4424.zip > > > On the user list, Craig Muchinsky and Pontus Amberg noted new problems with > detection of zip based files. > Craig noted that this affects InputStream detection, and Pontus noted that > even if he switched to a TikaInputStream, his kmz file was getting detected > as a zip. > This is Pontus' code: > {noformat} > Tike.detect(InputStream stream, String name) > {noformat} > {noformat} > pp//org.apache.tika.io.BoundedInputStream.reset(BoundedInputStream.java:115) > app//org.apache.tika.detect.zip.DefaultZipContainerDetector.detectStreaming(DefaultZipContainerDetector.java:279) > app//org.apache.tika.detect.zip.DefaultZipContainerDetector.detect(DefaultZipContainerDetector.java:192) > app//org.apache.tika.detect.CompositeDetector.detect(CompositeDetector.java:84) > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)