On Fri, Feb 24, 2023 at 9:22 AM Alan Bateman <alan.bate...@oracle.com> wrote:
> As a general point, the ZIP format can have redundant metadata and there > can be cases where the CRC-32 isn't available when writing a LOC header. > ZipInputStream throws exceptions in both of these cases. If the general purpose bit flag 3 is set, then CRC is set to zero in the LOC, and the actual CRC is put in the data descriptor immediately following the compressed data. With this format, an exception is thrown in ZipInputStream.readEnd: https://github.com/openjdk/jdk/blob/8f7c4969c28c58ae4b9adeed822707b28be16dd0/src/java.base/share/classes/java/util/zip/ZipInputStream.java#L624-L626 If the CRC-32 values is in the LOC, the exception is thrown when the read reaches the end of the data, in ZipInputStream.read: https://github.com/openjdk/jdk/blob/8f7c4969c28c58ae4b9adeed822707b28be16dd0/src/java.base/share/classes/java/util/zip/ZipInputStream.java#L624-L626 (The test I linked to covers both of these two cases) At the same time, the APIs work differently in that ZipFile opens a ZIP > file so it has access to the CEN whereas ZipInputStream is working on a > stream of ZIP entries and does not read the CEN. So some inconsistencies > in the handling is not too surprising. > Indeed, but I found it a bit amusing that ZipFile (and ZipFileSystem), which both see the "full picture", are actually the ones to not enforce the CRC. It does not make complete sense to me from a purely technical point of view. Perhaps the CRC in the CEN is less trustworthy across implementations than the one found in the LOC/Data Descriptor.. Cheers, Eirik.