TestTooManyEntries.java and clarify its purpose [v6]

Eirik Bjorsnos Mon, 13 Mar 2023 02:32:13 -0700

On Sun, 12 Mar 2023 21:25:46 GMT, Eirik Bjorsnos <d...@openjdk.org> wrote:


>> Please review this PR which speeds up TestTooManyEntries and clarifies its 
>> purpose:
>> 
>> - The name 'TestTooManyEntries' does not clearly convey the purpose of the 
>> test. What is tested is the validation that the total CEN size fits in a 
>> Java byte array. Suggested rename: CenSizeTooLarge
>> - The test creates DEFLATED entries which incurs zlib costs and File Data / 
>> Data Descriptors for no additional benefit. We can use STORED instead.
>> - By creating a single LocalDateTime and setting it with 
>> `ZipEntry.setTimeLocal`, we can avoid repeated time zone calculations. 
>> - The name of entries is generated by calling UUID.randomUUID, we could use 
>> simple counter instead.
>> - The produced file is unnecessarily large. We know how large a CEN entry 
>> is, let's take advantage of that to create a file with the minimal size.
>> - By adding a maximally large extra field to the CEN entries, we get away 
>> with fewer CEN records and save memory
>> - The summary and comments of the test can be improved to help explain the 
>> purpose of the test and how we reach the limit being tested.
>> 
>> These speedups reduced the runtime from 4 min 17 sec to 4 seconds on my 
>> Macbook Pro. The produced ZIP size was reduced from 5.7 GB to 2 GB. Memory 
>> consumption is down from 8GB to something like 12MB.
>
> Eirik Bjorsnos has updated the pull request incrementally with two additional 
> commits since the last revision:
> 
>  - MAX_EXTRA_FIELD_SIZE can be better expressed as 0xFFFF
>  - Bring back '@requires sun.arch.data.model == 64' for now

The test now runs fast with much less memory, but still consumes  2GB of disk 
space.

I brought back `@requires (sun.arch.data.model == "64")`, is this required for 
files > 2GB?

We could bring down the consumed disk space to 131MB by using a sparse file. 
Whether this is worth pursuing depends on  whether the 2GB file is considered 
problematic.

Here's the SparseFileOutputStream used to bring the size down to 131MB:


/**
 * By writing mostly extra fields as sparse 'holes', we can save disk space
 * used by this test from ~2GB to ~131MB
 */
private static class SparseOutputStream extends FilterOutputStream {
    private final byte[] extra;
    private final FileChannel channel;

    public SparseOutputStream(byte[] extra, FileChannel channel) {
        super(new BufferedOutputStream(Channels.newOutputStream(channel)));
        this.extra = extra;
        this.channel = channel;
    }

    @Override
    public void write(byte[] b, int off, int len) throws IOException {
        if (b == extra && off == 0 && len == extra.length) {
            // Write extra field header
            out.write(b, off, EXTRA_HEADER_LENGTH);
            out.flush();
            // The data is all zeros, we can advance the position instead
            channel.position(channel.position() + len - EXTRA_HEADER_LENGTH);
        } else {
            out.write(b, off, len);
        }
    }
}

-------------

PR: https://git.openjdk.org/jdk/pull/12991

Re: RFR: 8304020: Speed up test/jdk/java/util/zip/ZipFile/TestTooManyEntries.java and clarify its purpose [v6]

Reply via email to