TestTooManyEntries.java and clarify its purpose [v3]

Eirik Bjorsnos Mon, 30 Oct 2023 05:02:51 -0700

On Sun, 12 Mar 2023 14:12:08 GMT, Lance Andersen <lan...@openjdk.org> wrote:


>> Eirik Bjorsnos has updated the pull request incrementally with one 
>> additional commit since the last revision:
>> 
>>   Use Integer.toString instead of Long.toString
>
>> Here's a wild idea:
>> 
>> We could inject large extra fields on the CEN entries to inflate the size of 
>> each CEN entry. This means we get away with much fewer CEN entries which 
>> again means much less memory. Also, writing mostly empty extra data seems to 
>> be really fast, at least on my Macbook Pro.
>> 
>> For me, the test now run without the `@requires' and completes in less than 
>> 5 seconds.
>> 
>> The cost is slightly more complicated code. But perhaps still worth the 
>> added complexity?
>> 
>> ```java
>> @BeforeTest
>> public void setup() throws IOException {
>>     hugeZipFile = new File("cen-too-large.zip");
>>     hugeZipFile.deleteOnExit();
>> 
>>     // Length of generated entry names
>>     int nameLength = 10;
>> 
>>     // We use a large extra field to get away with fewer CEN headers
>>     byte[] extra = makeLargeExtra();
>> 
>>     // The size of a single CEN header, including name and extra field
>>     int cenHeaderSize = ZipFile.CENHDR + nameLength + extra.length;
>> 
>>     // Determine the number of entries needed to exceed the MAX_CEN_SIZE
>>     int numEntries = (MAX_CEN_SIZE / cenHeaderSize) + 1;
>> 
>>     ZipEntry[] entries = new ZipEntry[numEntries];
>>     try (ZipOutputStream zip = new ZipOutputStream(new 
>> BufferedOutputStream(new FileOutputStream(hugeZipFile)))) {
>>         // Creating the LocalDataTime once allows faster processing
>>         LocalDateTime time = LocalDateTime.now();
>>         // Add entries until MAX_CEN_SIZE is reached
>>         for (int i = 0; i < numEntries; i++) {
>>             String name = Integer.toString(i);
>>             name = "0".repeat(nameLength -name.length()) + name;
>>             ZipEntry entry = entries[i] = new ZipEntry(name);
>>             // Use STORED for faster processing
>>             entry.setMethod(ZipEntry.STORED);
>>             entry.setSize(0);
>>             entry.setCrc(0);
>>             entry.setTimeLocal(time);
>>             zip.putNextEntry(entry);
>>         }
>>         zip.closeEntry();
>>         for (ZipEntry entry : entries) {
>>             entry.setExtra(extra);
>>         }
>>     }
>> }
>> 
>> /**
>>  * Make an extra field with an 'unknown' tag and the maximum possible data 
>> size
>>  * @return
>>  */
>> private static byte[] makeLargeExtra() {
>>     byte[] extra = new byte[Short.MAX_VALUE];
>>     ByteBuffer buffer = 
>> ByteBuffer.wrap(extra).order(ByteOrder.LITTLE_ENDIAN);
>>     buffer.putShort((short) 0x9902); // 'unknown' tag
>>     buffer.putShort((short) (extra.length - 2 * Short.BYTES)); // Data size
>>     return e...

This PR was reviewed by @LanceAndersen and @Martin-Buchholz back in May, but 
unfortunately did not get integrated before it was closed for inactivity.

I'm reopening it now, with a few changes after the last reviews:

- To save disk space required by the test, a SparseOutputStream writes 'holes' 
until it detects that the last CEN is written, after which it starts writing 
actual bytes for the END header. This reduces the required disk space from ~2GB 
to ~4K.
- Since the test no longer requires excessive memory or disk space, the 
'manual' tag is removed and the test is removed from the 
`jdk_core_manual_no_input` group in `TEST.groups`

With a blessing of these last changes on top of what's already reviewed, I'm 
hoping to integrate this PR soonish. 

And sorry for the long-lived PR, I can only blame it on the unusually great 
summer in the north of Norway this year :-)

-------------

PR Comment: https://git.openjdk.org/jdk/pull/12991#issuecomment-1785034602

Re: RFR: 8304020: Speed up test/jdk/java/util/zip/ZipFile/TestTooManyEntries.java and clarify its purpose [v3]

Reply via email to