On Fri, 15 Dec 2023 21:13:07 GMT, Archie Cobbs <aco...@openjdk.org> wrote:

>> `GZIPInputStream`, when looking for a concatenated stream, relies on what 
>> the underlying `InputStream` says is how many bytes are `available()`. But 
>> this is inappropriate because `InputStream.available()` is just an estimate 
>> and is allowed (for example) to always return zero.
>> 
>> The fix is to ignore what's `available()` and just proceed and see what 
>> happens. If fewer bytes are available than required, the attempt to extend 
>> to another stream is canceled just as it was before, e.g., when the next 
>> stream header couldn't be read.
>
> Archie Cobbs has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Address third round of review comments.

The current behavior of allowing/ignoring trailing malformed data seems to have 
a complicated history:

- GZipInputStream was updated to throw ZipExeption instead of IOException on 
malformed GZIP data in [4263582](https://bugs.openjdk.org/browse/JDK-4263582)
- The ability to read concatenated GZ files was added in 
[JDK-4691425](https://bugs.openjdk.org/browse/JDK-4691425) This change 
interestingly also introduced the current behavior of ignoring any trailing 
malformed data in the stream. 
- [7021870](https://bugs.openjdk.org/browse/JDK-7021870) fixed a bug where 
GZipInputStream closed the underlying input stream. The change also introduced 
the test GZIPInZip which verified that reads from a wrapped ZipInputStream does 
not close the stream
- Some months later GZIPInZip was updated in fix a test failure, but the change 
also modified the test to verifiy that malformed trailing data was ignored. The 
JBS issue is not available to me: 
[JDK-8023431](https://bugs.openjdk.org/browse/JDK-8023431)
- Soon after this, GZIPInZip was again updated to fix test failure, this time 
removing the use of piped streams and threads. The JBS issue is not available 
to me: [JDK-8026756](https://bugs.openjdk.org/browse/JDK-8026756)

The current behavior of ignoring trailing malformed data does not seem to be 
specified in the API. On the contrary, the read methods are specified to throw 
ZipException for corrupt input data:


     * @throws    ZipException if the compressed input data is corrupt.
     * @throws    IOException if an I/O error has occurred.
     *
     */
    public int read(byte[] buf, int off, int len) throws IOException {


Not sure whether it is worthwhile to change this long-standing behavior of 
GZIpInputStream.  But it could perhaps be noted somehow in the API 
documentation? (To be clear, that would be for a different PR/issue/CSR)

-------------

PR Comment: https://git.openjdk.org/jdk/pull/17113#issuecomment-1859177655

Reply via email to