Hi Louis, On first glance this looks easy to fix. I've filed a draft PR here (pending tests) https://github.com/openjdk/jdk/pull/17113
-Archie On Thu, Dec 14, 2023 at 1:10 PM Louis Bergelson <lou...@broadinstitute.org> wrote: > Hello. This is my first time posting here so I apologize if this is the > wrong forum. I wanted to bring up an issue in the GZipInputStream which > was first identified in 2011, confirmed as a bug, and then never resolved. > > When reading certain GZIP files from certain types of InputStreams the > GZIPInputStream can misidentify the end of the stream and close early > resulting in silently truncated data. > > You can see the bug report which has a detailed description here: > https://bugs.openjdk.org/browse/JDK-7036144 > > In short it comes down to incorrect use of the (quite confusing) > InputStream.available() method to detect the end of stream. This typically > works fine with local files, but with network streams that might not have > bytes available at any given moment it fails nondeterministically. > > How could I go about getting this fixed? I can contribute a patch or > additional examples if necessary. > > Genomics data is typically encoded as block gzipped files, so this comes > up regularly and causes a lot of confusion. The workaround is to just not > use the GZIPInput stream. It seems like a core java class though so it > would be nice if it worked. > > Thank you, > Louis > -- Archie L. Cobbs