Re: RFR: 8303866: Allow ZipInputStream.readEnd to parse small Zip64 ZIP files [v9]

Eirik Bjørsnøs Thu, 11 Jan 2024 09:01:41 -0800

On Wed, 10 Jan 2024 17:02:05 GMT, Eirik Bjørsnøs <eir...@openjdk.org> wrote:


>> Sounds like a CSR is needed for the behavioral change proposed here.
>
>> Sounds like a CSR is needed for the behavioral change proposed here.
> 
> Thanks for flagging this @jddarcy 
> 
> I'm personally not 100% convinced a CSR is warranted in this case, but I'm of 
> course happy to extend the following into a CSR if you or other reviewer 
> think that's the best way forward:  
> 
> Let's review the intended and unintended behavioral changes in this PR:  
> 
> The _intended_ behavioral change is to extend the set of ZIP entries parsable 
> by `ZipInputStream` to include "small Zip64 entries". Such entries meet the 
> following criteria:
>  
> 1. They are clearly marked as using the Zip64 format, meaning that the LOC's 
> 'compressed size' and 'uncompressed size' fields are set to the "Zip64 magic 
> value" 0xFFFFFFFF and that the LOC's extra field includes a valid _Zip64 
> Extended Information Field_.
> 2. They use _streaming mode_, meaning that the 'general purpose bit flag' 3 
> is set, that the Zip64 'Original Size' and 'Compressed Size' are both set to 
> zero and that file data is followed by a 'Data Descriptor' containing the 
> actual size values.
> 3. The Data Descriptor represents the 'compressed size' and 'uncompressed' 
> size fields using 8 byte fields, instead of the regular 4 byte fields.
> 
> These entires are currently not parsable by `ZipInputStream`. Trying to parse 
> them fails with an exception similar to the following:
> 
> 
> java.util.zip.ZipException: invalid entry size (expected 0 but got 6 bytes)
>       at 
> java.base/java.util.zip.ZipInputStream.readEnd(ZipInputStream.java:616)
> 
> 
> The _unintended_ behavioral change in this PR is that ZIP entries marked as 
> Zip64, but still having a Data Descriptor using 4 bytes to represent the size 
> fields will now become unparsable. Such entries meet critera 1 and 2 above, 
> but not 3.
> 
> While I have not performed a corpus search, I believe such entries are rare, 
> if they exist at all:
> 
> - They would be in clear violation of the `APPNOTE.txt` spec, which in 
> 4.3.9.2 says  _ZIP64 format MAY be used regardless of the size of a file.  
> When extracting, if the zip64 extended information extra field is present for 
> the file the compressed and uncompressed sizes will be 8 byte values_
> - The `zipdetails` tool fails to parse such a file, instead reading the two 
> 4-byte numbers into one 8-byte 'compressed size'
> - The file cannot be parsed by external APIs similar to `ZipInputStream`, 
> such as the Python `stream-unzip` module.
> 
> To avoid unintended behavioral changes in the implementation, care has been 
> taken to make the Zip64 extra field validation code robust ...

> @eirbjo, either of the intended or unintended effects of the PR on their own 
> would merit a CSR and this is a case where the inclusive-OR applies so a CSR 
> is definitely needed; thanks.

Thanks Joe!

 A CSR has been proposed and is ready for the first round of feedback: 
https://bugs.openjdk.org/browse/JDK-8323583

-------------

PR Comment: https://git.openjdk.org/jdk/pull/12524#issuecomment-1887559379

Re: RFR: 8303866: Allow ZipInputStream.readEnd to parse small Zip64 ZIP files [v9]

Reply via email to