On Tue, 7 Mar 2023 07:46:25 GMT, Eirik Bjorsnos <[email protected]> wrote:
> ZipOutputStream currently writes directory entries using the DEFLATED
> compression method. This does not strictly comply with the APPNOTE.TXT
> specification and is also about 10x slower than using the STORED compression
> method.
>
> Because of these concerns, `ZipOutputStream.putNextEntry` should be updated
> with an `@apiNote` recommending
> the use of the STORED compression method for directory entries.
>
> Suggested CSR in the first comment.
Suggested CSR:
### Compatibility kind
none
### Compatibility risk
minimal
### Compatibility description
This is a documentation-only change
### Summary
Add an `@apiNote`to `ZipOutputStream.putNextEntry` recommending that directory
entries should be added using the `STORED` compression method.
### Problem
ZipOutputStream currently writes directory entries using the default DEFLATE
method. This causes file data for a two-byte 'final empty' DEFLATE block to be
written, followed by a 16-byte data descriptor.
This is in violation of the APPNOTE.txt specification, which mandates that
directory entries `MUST NOT` have file data:
4.3.8 File data
Immediately following the local header for a file
SHOULD be placed the compressed or stored data for the file.
If the file is encrypted, the encryption header for the file
SHOULD be placed after the local header and before the file
data. The series of [local file header][encryption header]
[file data][data descriptor] repeats for each file in the
.ZIP archive.
Zero-byte files, directories, and other file types that
contain no content MUST NOT include file data.
```
Additionally, benchmarks show that the writing of these empty DEFLATED
directory entries are ~10X slower compared to an empty STORED entry.
While the `jar` command uses the STORED method for directory entries, the
DEFLATE method still seems to be prevalent: An analysis of the 109 dependency
jars of the Spring Petclinic project shows that 65 files had DEFLATE directories
while 34 files has STORED directories.
### Solution
Add an `@apiNote` to `ZipOutputStream.putNextEntry` recommending the use of theĀ
STORED compression method for directory entries. The note should include a
snippet which shows the recommended configuration of a directory `ZipEntry`.
(As an alternative solution, `putNextEntry` could be updated to change the
default compression method to STORED for directory entires. This was deemed as
having a too high risk, since users may be depending of the ability to attach
arbitrary data to directory entries.)
### Specification
- Add the following `@apiNote` to `ZipOutputStream.putNextEntry`
Index: src/java.base/share/classes/java/util/zip/ZipOutputStream.java
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/src/java.base/share/classes/java/util/zip/ZipOutputStream.java
b/src/java.base/share/classes/java/util/zip/ZipOutputStream.java
--- a/src/java.base/share/classes/java/util/zip/ZipOutputStream.java
(revision f2b03f9a2c0fca853211e41a1ddf46195dd56698)
+++ b/src/java.base/share/classes/java/util/zip/ZipOutputStream.java
(revision f57735cf134469b49cd19472680aee778c245771)
@@ -191,6 +191,22 @@
* <p>
* The current time will be used if the entry has no set modification time.
*
+ * @apiNote When writing a directory entry, the STORED compression method
+ * should be used and the size and CRC-32 values should be set to 0:
+ *
+ * {@snippet lang="java" :
+ * ZipEntry e = ...;
+ * if (e.isDirectory()) {
+ * e.setMethod(ZipEntry.STORED);
+ * e.setSize(0);
+ * e.setCrc(0);
+ * }
+ * stream.putNextEntry(e);
+ * }
+ *
+ * This ensures strict compliance with the ZIP specification and
+ * allows optimal performance when processing directory entries.
+ *
* @param e the ZIP entry to be written
* @throws ZipException if a ZIP format error has occurred
* @throws IOException if an I/O error has occurred
###
Here's what the generated API note looks like:
<img width="838" alt="image"
src="https://user-images.githubusercontent.com/300291/223421803-8541eafb-298f-4064-af67-2d93ee061114.png">
Looking for reviewers for this CSR which adds an `@apiNote` recommending the
use of the STORED compression method when writing directory entries in
ZipOutputStream:
https://bugs.openjdk.org/browse/JDK-8303925
The CSR was initially written by me, then edited by Lance.
-------------
PR: https://git.openjdk.org/jdk/pull/12899