On Sun, 22 Mar 2026 19:21:59 GMT, Alan Bateman <[email protected]> wrote:

> Do I read it correct that the entry name and comment will be encoded twice? 

Prior to this change, the entry name would be encoded twice, once in writeLOC, 
once in writeCEN. The comment is only found in the CEN record, hence it would 
only be encoded in writeCEN.

> I wonder if it should be pushed down to writeLOC.

This was discussed with Lance in an earlier comment in this PR. My response 
there was that this validation needs to happen in `putNextEntry`, must happen 
before the XEntry is added to xentries list (Vector acually!). Otherwise, the 
finish / close of the ZipOutputStream will fail during writeCEN when an 
unmappable comment is encoded. Bad usability to fail during close.

If we want to reduce the number of encodings I see the following options:

1: Capture the encoded byte array after validation in `putNextEntry`, then pass 
it as a parameter to writeLOC. This way, writeLOC does not have to reencode. 
Note that this trick does not work for comments, since they are not output in 
the CEN.

2: We could encode names and comments once during validation, then store byte 
arrays in the XEntry, then output that in writeLOC and writeCEN. The advangate 
is we only encode once, the disadvantage is we increase retained memory, 
probably noticable for large number of entries with longish names. Entry 
comments are probably rare in practise.

I think we could do 1 without any worries. 2 I'm more sceptical about. What do 
you think?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/30319#issuecomment-4106852709

Reply via email to