On Wed, 27 Jul 2022 10:47:44 GMT, Volker Simonis <simo...@openjdk.org> wrote:
>> Add an API note to `InflaterInputStream::read(byte[] b, int off, int len)` >> to highlight that it might write more bytes than the returned number of >> inflated bytes into the buffer `b`. >> >> The superclass `java.io.InputStream` specifies that `read(byte[] b, int off, >> int len)` will leave the content beyond the last read byte in the read >> buffer `b` unaffected. However, the overridden `read` method in >> `InflaterInputStream` passes the read buffer `b` to >> `Inflater::inflate(byte[] b, int off, int len)` which doesn't provide this >> guarantee. Depending on implementation details, `Inflater::inflate` might >> write more than the returned number of inflated bytes into the buffer `b`. >> >> ### TL;DR >> >> `java.util.zip.Inflater` is the Java wrapper class for zlib's inflater >> functionality. `Inflater::inflate(byte[] output, int off, int len)` >> currently calls zlib's native `inflate(..)` function and passes the address >> of `output[off]` and `len` to it via JNI. >> >> The specification of zlib's `inflate(..)` function (i.e. the [API >> documentation in the original zlib >> implementation](https://github.com/madler/zlib/blob/cacf7f1d4e3d44d871b605da3b647f07d718623f/zlib.h#L400)) >> doesn't give any guarantees with regard to usage of the output buffer. It >> only states that upon completion the function will return the number of >> bytes that have been written (i.e. "inflated") into the output buffer. >> >> The original zlib implementation only wrote as many bytes into the output >> buffer as it inflated. However, this is not a hard requirement and newer, >> more performant implementations of the zlib library like >> [zlib-chromium](https://chromium.googlesource.com/chromium/src/third_party/zlib/) >> or [zlib-cloudflare](https://github.com/cloudflare/zlib) can use more bytes >> of the output buffer than they actually inflate as a scratch buffer. See >> https://github.com/simonis/zlib-chromium for a more detailed description of >> their approach and its performance benefit. >> >> These new zlib versions can still be used transparently from Java (e.g. by >> putting them into the `LD_LIBRARY_PATH` or by using `LD_PRELOAD`), because >> they still fully comply to specification of `Inflater::inflate(..)`. >> However, we might run into problems when using the `Inflater` functionality >> from the `InflaterInputStream` class. `InflaterInputStream` is derived from >> from `InputStream` and as such, its `read(byte[] b, int off, int len)` >> method is quite constrained. It specifically specifies that if *k* bytes >> have been read, then "these bytes will be stored in elements `b[off]` >> through `b[off+`*k*`-1]`, leaving elements `b[off+`*k*`]` through >> `b[off+len-1]` **unaffected**". But `InflaterInputStream::read(byte[] b, int >> off, int len)` (which is constrained by `InputStream::read(..)`'s >> specification) calls `Inflater::inflate(byte[] b, int off, int len)` and >> directly passes its output buffer down to the native zlib `inflate(..)` >> method which is free to change the bytes beyond `b[off+` *k*`]` (where *k* is the number of inflated bytes). >> >> From a practical point of view, I don't see this as a big problem, because >> callers of `InflaterInputStream::read(byte[] b, int off, int len)` can never >> know how many bytes will be written into the output buffer `b` (and in fact >> its content can always be completely overwritten). It therefore makes no >> sense to depend on any data there being untouched after the call. Also, >> having used zlib-cloudflare productively for about two years, we haven't >> seen real-world issues because of this behavior yet. However, from a >> specification point of view it is easy to artificially construct a program >> which violates `InflaterInputStream::read(..)`'s postcondition if using one >> of the alterantive zlib implementations. A recently integrated JTreg test >> (test/jdk/jdk/nio/zipfs/ZipFSOutputStreamTest.java) "unintentionally" fails >> with zlib-chromium but can fixed easily to run with alternative >> implementations as well (see >> [JDK-8283756](https://bugs.openjdk.java.net/browse/JDK-8283756)). > > Volker Simonis has updated the pull request with a new target base due to a > merge or a rebase. The pull request now contains eight commits: > > - Add apiNote to methods which return InflaterInputStreams as InputStreams > - Updated wording based on @JoeDarcy's CSR review > - Updated wording based on @LanceAndersen's review > - Adapted wording based on @AlanBateman's review, removed implementation > note on ZipFile::getInputStream and aligned wording for all ::read methods > - Adapted wording and copyrights based on @jaikiran review > - Added API-refinement to GZIPInputStream::read()/ZipInputStream::read() and > an Implementation note to ZipFile::getInputStream() > - Extended API-doc based on reviewer feedback > - 8282648: Weaken the InflaterInputStream specification in order to allow > faster Zip implementations Please update the CSR once you’ve finalized the specification changes. src/java.base/share/classes/java/net/URLConnection.java line 848: > 846: * last inflated byte undefined after a read operation (see {@link > 847: * java.util.zip.InflaterInputStream#read(byte[], int, int) > 848: * InflaterInputStream.read(byte[], int, int)}). Consider this wording, which makes the danger clearer, here and elsewhere: * @apiNote The {@code InputStream} returned by this method can wrap an * {@link java.util.zip.InflaterInputStream InflaterInputStream}, whose * {@link java.util.zip.InflaterInputStream#read(byte[], int, int) * read(byte[], int, int)} method can modify any element of the output * buffer. src/java.base/share/classes/java/util/zip/InflaterInputStream.java line 136: > 134: * are undefined (an implementation is free to change them during > the inflate > 135: * operation). If the return value is -1 or an exception is thrown, > then the content of > 136: * {@code b[off]} to {@code b[off+}<i>len</i>{@code -1]} is > undefined. Consider this more precise wording, both here and in `GZIPInputStream`, `ZipInputStream`, and `JarInputStream`: * If this method returns a nonzero integer <i>n</i> then {@code buf[off]} * through {@code buf[off+}<i>n</i>{@code -1]} contain the uncompressed * data. The content of elements {@code buf[off+}<i>n</i>{@code ]} through * {@code buf[off+}<i>len</i>{@code -1]} is undefined, contrary to the * specification of the {@link java.io.InputStream InputStream} superclass, * so an implementation is free to modify these elements during the inflate * operation. If this method returns {@code -1} or throws an exception then * the content of {@code buf[off]} through {@code buf[off+}<i>len</i>{@code * -1]} is undefined. src/java.base/share/classes/java/util/zip/InflaterInputStream.java line 141: > 139: * @param off the start offset in the destination array {@code b} > 140: * @param len the maximum number of bytes read > 141: * @return the actual number of inflated bytes, or -1 if the end of > the s/inflated bytes/bytes inflated/ ------------- Changes requested by mr (Lead). PR: https://git.openjdk.org/jdk/pull/7986