TestTooManyEntries.java and clarify its purpose [v3]

Lance Andersen Sun, 12 Mar 2023 07:15:50 -0700

On Sun, 12 Mar 2023 10:00:50 GMT, Eirik Bjorsnos <d...@openjdk.org> wrote:


>> Please review this PR which speeds up TestTooManyEntries and clarifies its 
>> purpose:
>> 
>> - The name 'TestTooManyEntries' does not clearly convey the purpose of the 
>> test. What is tested is the validation that the total CEN size fits in a 
>> Java byte array. Suggested rename: CenSizeTooLarge
>> - The test creates DEFLATED entries which incurs zlib costs and File Data / 
>> Data Descriptors for no additional benefit. We can use STORED instead.
>> - By creating a single LocalDateTime and setting it with 
>> `ZipEntry.setTimeLocal`, we can avoid repeated time zone calculations. 
>> - The name of entries is generated by calling UUID.randomUUID, we could use 
>> simple counter instead.
>> - The produced file is unnecessarily large. We know how large a CEN entry 
>> is, let's take advantage of that to create a file with the minimal size.
>> - The summary and comments of the test can be improved to help explain the 
>> purpose of the test and how we reach the limit being tested.
>> 
>> These speedups reduced the runtime from 4 min 17 sec to 1 min 54 sec on my 
>> Macbook Pro. The produced ZIP size was reduced from 5.7 GB to 3.5 GB.
>
> Eirik Bjorsnos has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Use Integer.toString instead of Long.toString

> Here's a wild idea:
> 
> We could inject large extra fields on the CEN entries to inflate the size of 
> each CEN entry. This means we get away with much fewer CEN entries which 
> again means much less memory. Also, writing mostly empty extra data seems to 
> be really fast, at least on my Macbook Pro.
> 
> For me, the test now run without the `@requires' and completes in less than 5 
> seconds.
> 
> The cost is slightly more complicated code. But perhaps still worth the added 
> complexity?
> 
> ```java
> @BeforeTest
> public void setup() throws IOException {
>     hugeZipFile = new File("cen-too-large.zip");
>     hugeZipFile.deleteOnExit();
> 
>     // Length of generated entry names
>     int nameLength = 10;
> 
>     // We use a large extra field to get away with fewer CEN headers
>     byte[] extra = makeLargeExtra();
> 
>     // The size of a single CEN header, including name and extra field
>     int cenHeaderSize = ZipFile.CENHDR + nameLength + extra.length;
> 
>     // Determine the number of entries needed to exceed the MAX_CEN_SIZE
>     int numEntries = (MAX_CEN_SIZE / cenHeaderSize) + 1;
> 
>     ZipEntry[] entries = new ZipEntry[numEntries];
>     try (ZipOutputStream zip = new ZipOutputStream(new 
> BufferedOutputStream(new FileOutputStream(hugeZipFile)))) {
>         // Creating the LocalDataTime once allows faster processing
>         LocalDateTime time = LocalDateTime.now();
>         // Add entries until MAX_CEN_SIZE is reached
>         for (int i = 0; i < numEntries; i++) {
>             String name = Integer.toString(i);
>             name = "0".repeat(nameLength -name.length()) + name;
>             ZipEntry entry = entries[i] = new ZipEntry(name);
>             // Use STORED for faster processing
>             entry.setMethod(ZipEntry.STORED);
>             entry.setSize(0);
>             entry.setCrc(0);
>             entry.setTimeLocal(time);
>             zip.putNextEntry(entry);
>         }
>         zip.closeEntry();
>         for (ZipEntry entry : entries) {
>             entry.setExtra(extra);
>         }
>     }
> }
> 
> /**
>  * Make an extra field with an 'unknown' tag and the maximum possible data 
> size
>  * @return
>  */
> private static byte[] makeLargeExtra() {
>     byte[] extra = new byte[Short.MAX_VALUE];
>     ByteBuffer buffer = ByteBuffer.wrap(extra).order(ByteOrder.LITTLE_ENDIAN);
>     buffer.putShort((short) 0x9902); // 'unknown' tag
>     buffer.putShort((short) (extra.length - 2 * Short.BYTES)); // Data size
>     return extra;
> }
> ```

I think this is probably OK.  I would probably create a constant with the Data 
Size to make it clearer and add more of a detailed comment of this method 
(laying out the structure of the Extra field) along with pointing to  4.6.1 of 
the APP.note

-------------

PR: https://git.openjdk.org/jdk/pull/12991

Re: RFR: 8304020: Speed up test/jdk/java/util/zip/ZipFile/TestTooManyEntries.java and clarify its purpose [v3]

Reply via email to