Re: RFR: 8336025: Improve ZipOutputSream validation of MAX CEN Header field limits [v2]

Lance Andersen Mon, 23 Sep 2024 04:04:43 -0700

On Mon, 16 Sep 2024 11:42:46 GMT, Lance Andersen <lan...@openjdk.org> wrote:


>> I'm curious why the combined header length validation is being placed so 
>> late. In general I would assume failing fast is better?
>> 
>> Also, since the combined header length clause applies to "any directory 
>> record", it also applies to LOC?
>> 
>> So why is this happening late in `writeCEN`, as opposed to early in 
>> `putNextEntry`?
>> 
>> Edit: I'm aware that moving it to `putNextEntry` means you probably need to 
>> repeat it in writeCEN, just in case the `ZipEntry` was updated meanwhile. 
>> 
>> Some comments inline.
>
>> I'm curious why the combined header length validation is being placed so 
>> late. In general I would assume failing fast is better?
>> 
>> Also, since the combined header length clause applies to "any directory 
>> record", it also applies to LOC?
>> 
>> So why is this happening late in `writeCEN`, as opposed to early in 
>> `putNextEntry`?
>> 
>> Edit: I'm aware that moving it to `putNextEntry` means you probably need to 
>> repeat it in writeCEN, just in case the `ZipEntry` was updated meanwhile.
>> 
>> Some comments inline.
> 
> As this is really a corner case at best, I decided to keep the changes to a 
> minimum and the validation in writeCEN given that is where the encoded 
> comment bytes are obtained and written out.

> @LanceAndersen
> 
> I had a look at your update of the `CenSizeTooLarge` test.
> 
> This test uses artificially large extra fields to produce a CEN size 
> exceeding the implementation limit. Since this otherwise would create a lot 
> of IO and a huge file, the test writes a sparse file until the last CEN 
> record is written, after which real bytes are written out for the "ZIP64 End 
> of Central Directory" and "End of Central Directory" records are written.
> 
> The current extra field size of 0xFFFF is too large to pass the combined 
> length validation introduced by this PR. Because of this, the field size is 
> reduced to account for the CENHDR length, the length of the entry name and 
> the length of the comment (which is added to the last entry only)
> 
> Since the comment is only added to the last entry (as a hint to 
> SparseOutputStream), the CEN_HEADER_SIZE no longer reflects the actual size 
> of the CEN headers written. While the test still passes (by increasing 
> NUM_ENTRIES as needed), this makes the test somewhat confusing for future 
> maintainers.
> 
> Could we perhaps skip the comment altogether, and instead use equal-sized CEN 
> records for all entries? Instead of using a comment as a signal to 
> `SparseOutputStream` we could use a separate extra byte array on the last 
> entry, as suggested in this diff on top of your PR:


Thanks for the suggestions to the test.  They look fine.  I have done a run 
across our internal Mach5 systems and no issues(as expected).

PR has been updated with these changes

-------------

PR Comment: https://git.openjdk.org/jdk/pull/21003#issuecomment-2367883851

Re: RFR: 8336025: Improve ZipOutputSream validation of MAX CEN Header field limits [v2]

Reply via email to