On Fri, 19 Jul 2024 15:18:26 GMT, Lance Andersen <lan...@openjdk.org> wrote:

>> src/java.base/share/native/libjli/parse_manifest.c line 507:
>> 
>>> 505:                   || censiz == ZIP64_MAGICVAL
>>> 506:                   || cenoff == ZIP64_MAGICVAL)
>>> 507:                 && cenext > 0) {
>> 
>> I went through these changes and here's my first round of review.
>> 
>> Before the changes proposed in this PR, this part of the code which is 
>> responsible for finding and returning a constructed zip entry instance would 
>> blindly use the local header offset value from the central directory of the 
>> entry being looked up. It would then "seek" to that position and read the 
>> metadata of that entry (details like uncompressed length, compressed length, 
>> the compression method) and return back that entry instance. Clearly this 
>> isn't right when zip64 entries are involved since various details of such an 
>> entry can reside in the zip64 extra field block of that entry instead of 
>> being in the central directory.
>> 
>> This change proposes that this part of the code first identify that a zip64 
>> extra block exists for a particular entry and then read that zip64 block, 
>> validate some parts of the zip64 block and if that validation succeeds, then 
>> use the entry metadata from the zip64 block for constructing and returning 
>> the entry instance.
>> 
>> The idea to identify and support zip64 entries in this part of the code I 
>> think is agreed as the right one.
>> Coming to the implementation, I think we need to come to an agreement on 
>> what we want to do here. Specifically:
>> 
>> - What logic do we use to decide when to look for zip64 extra block for an 
>> entry? The changes in this PR (at this line here) proposes that we look for 
>> zip64 extra block for an entry if any of the compressed size, uncompressed 
>> size or the local header offset value is `0xFFFFFFFFL` and the extra field 
>> size noted in this central directory entry is greater than 0. This however 
>> doesn't match with what we do in the implementation of 
>> `java.util.zip.ZipFile` which too does similar checks for zip64 entry 
>> presence when parsing the central directory. Very specifically, in the 
>> ZipFile where this logic is implemented is here 
>> https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/util/zip/ZipFile.java#L1254.
>>  That code in the ZipFile has gone through the necessary vetting for dealing 
>> with various possibilities with the zip files. I think we should implement 
>> that same logic here while checking for zip64 entries.
>> - The next one to decide is what kind of validations do we want to do in 
>> this code fo...
>
> I have not had an opportunity for a deep dive of the changes in the PR, only 
> a quick skim., but I agree we should add some additional validation similar 
> to what was added to ZipFile

Thanks for the reviews! I have some initial notes on the questions raised:

I reviewed how `ZipFile` validates the extra fields in 
[`checkExtraFields`](https://github.com/openjdk/jdk/blob/c25c4896ad9ef031e3cddec493aef66ff87c48a7/src/java.base/share/classes/java/util/zip/ZipFile.java#L1254-L1255)
 if `jdk.util.zip.disableZip64ExtraFieldValidation` is enabled, and how the 
values are read and used by `ZipEntry` in 
[`setExtra0`](https://github.com/openjdk/jdk/blob/c25c4896ad9ef031e3cddec493aef66ff87c48a7/src/java.base/share/classes/java/util/zip/ZipEntry.java#L558-L585).

`zip_util.c` also handles zip64 extra fields 
[here](https://github.com/openjdk/jdk/blob/c25c4896ad9ef031e3cddec493aef66ff87c48a7/src/java.base/share/native/libzip/zip_util.c#L1048-L1061).
 It looks for a zip64 extra field if extra fields are present, and if any of 
the size/offset/length are set to the magic values. It does not appear to do 
any of the additional validation performed by `ZipFile`.

I think there are some questions to consider about the approach:

* Should all of the zip implementations being doing that validation, including 
`zip_util.c`? One of the last rounds of substantial changes to 
`parse_manifest`'s zip implementation in 
[JDK-8073158](https://bugs.openjdk.org/browse/JDK-8073158) the [review 
thread](https://mail.openjdk.org/pipermail/core-libs-dev/2015-March/032546.html)
 mentioned that it would be nice to have a single c zip implementation. At that 
point there were three including pack200, and now we're down to two, but 
consolidating `zip_util.c` and `parse_manifest.c` still seems like a big job.

* Should the validation be configurable, via 
`jdk.util.zip.disableZip64ExtraFieldValidation` or a separate option? There are 
uses of `jdk.util.zip.disableZip64ExtraFieldValidation` in the wild ([github 
search](https://github.com/search?type=code&q=jdk.util.zip.disableZip64ExtraFieldValidation)),
 adding that validation could cause the launcher to start rejecting jars that 
are currently accepted.

* Regarding what to do if the validation fails, `parse_manifest` currently 
takes the approach that if it sees something that looks like a zip64 end header 
but that fails checks, it could e.g. be seeing non-zip64 data from the last 
entry (see 
[here](https://github.com/openjdk/jdk/blob/c25c4896ad9ef031e3cddec493aef66ff87c48a7/src/java.base/share/native/libjli/parse_manifest.c#L172-L173)).
 Failing instead of ignoring ill-formed zip64 data is a departure from that 
approach. I wonder if the use-cases for `parse_manifest` and `ZipFile` are 
slightly different here, and it's more important for `ZipFile` to be hardened? 
If an untrusted jar is being launched by the launcher, hardening the launcher's 
zip implementation may not provide meaningful guarantees.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/18479#discussion_r1684972451

Reply via email to