On Wed, 8 Nov 2023 13:20:33 GMT, Eirik Bjorsnos <d...@openjdk.org> wrote:

>> ZipInputStream.readEnd currently assumes a Zip64 data descriptor if the 
>> number of compressed or uncompressed bytes read from the inflater is larger 
>> than the Zip64 magic value.
>> 
>> While the ZIP format  mandates that the data descriptor `SHOULD be stored in 
>> ZIP64 format (as 8 byte values) when a file's size exceeds 0xFFFFFFFF`, it 
>> also states that `ZIP64 format MAY be used regardless of the size of a 
>> file`. For such small entries, the above assumption does not hold.
>> 
>> This PR augments ZipInputStream.readEnd to also assume 8-byte sizes if the 
>> ZipEntry includes a Zip64 extra information field. This brings 
>> ZipInputStream into alignment with the APPNOTE format spec:
>> 
>> 
>> When extracting, if the zip64 extended information extra 
>> field is present for the file the compressed and 
>> uncompressed sizes will be 8 byte values.
>> 
>> 
>> While small Zip64 files with 8-byte data descriptors are not commonly found 
>> in the wild, it is possible to create one using the Info-ZIP command line 
>> `-fd` flag:
>> 
>> `echo hello | zip -fd > hello.zip`
>> 
>> The PR also adds a test verifying that such a small Zip64 file can be parsed 
>> by ZipInputStream.
>
> Eirik Bjorsnos has updated the pull request with a new target base due to a 
> merge or a rebase. The pull request now contains 25 commits:
> 
>  - Convert test from testNG to JUnit
>  - Fix the check that the size of an extra field block size must not grow 
> past the total extra field length
>  - Move isZip64ExtBlockSizeValid back into ZipInputStream, since it is 
> different from the ZipFile implementation which reads the LOC
>  - Merge branch 'master' into data-descriptor
>    
>    # Conflicts:
>    #  src/java.base/share/classes/java/util/zip/ZipFile.java
>  - Remove excessive comment
>  - Move isZip64ExtBlockSizeValid to ZipUtils, use it from ZipInputStream and 
> ZipFile.Source
>  - Merge branch 'master' into data-descriptor
>  - Use block comments instead of javadoc comments to avoid doclint warnings
>  - Merge branch 'master' into data-descriptor
>  - Zip64 extra field of a LOC header has 1-3 long components
>  - ... and 15 more: https://git.openjdk.org/jdk/compare/1e687b45...657f961e

Observation:

Unfortunately, `readLOC` skips reading the LOC's `size`, `csize` and `crc` 
values when in streaming mode. The fields are also overwritten by 
`ZipEntry.setExtra0`.

This means we cannot use the original values and compare them to ZIP64_MAGICVAL 
when determining whether a data descriptor uses 4 or 8 byte values.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/12524#issuecomment-1801900801

Reply via email to