On Sun, 12 Mar 2023 10:00:50 GMT, Eirik Bjorsnos <d...@openjdk.org> wrote:
>> Please review this PR which speeds up TestTooManyEntries and clarifies its >> purpose: >> >> - The name 'TestTooManyEntries' does not clearly convey the purpose of the >> test. What is tested is the validation that the total CEN size fits in a >> Java byte array. Suggested rename: CenSizeTooLarge >> - The test creates DEFLATED entries which incurs zlib costs and File Data / >> Data Descriptors for no additional benefit. We can use STORED instead. >> - By creating a single LocalDateTime and setting it with >> `ZipEntry.setTimeLocal`, we can avoid repeated time zone calculations. >> - The name of entries is generated by calling UUID.randomUUID, we could use >> simple counter instead. >> - The produced file is unnecessarily large. We know how large a CEN entry >> is, let's take advantage of that to create a file with the minimal size. >> - The summary and comments of the test can be improved to help explain the >> purpose of the test and how we reach the limit being tested. >> >> These speedups reduced the runtime from 4 min 17 sec to 1 min 54 sec on my >> Macbook Pro. The produced ZIP size was reduced from 5.7 GB to 3.5 GB. > > Eirik Bjorsnos has updated the pull request incrementally with one additional > commit since the last revision: > > Use Integer.toString instead of Long.toString > Here's a wild idea: > > We could inject large extra fields on the CEN entries to inflate the size of > each CEN entry. This means we get away with much fewer CEN entries which > again means much less memory. Also, writing mostly empty extra data seems to > be really fast, at least on my Macbook Pro. > > For me, the test now run without the `@requires' and completes in less than 5 > seconds. > > The cost is slightly more complicated code. But perhaps still worth the added > complexity? > > ```java > @BeforeTest > public void setup() throws IOException { > hugeZipFile = new File("cen-too-large.zip"); > hugeZipFile.deleteOnExit(); > > // Length of generated entry names > int nameLength = 10; > > // We use a large extra field to get away with fewer CEN headers > byte[] extra = makeLargeExtra(); > > // The size of a single CEN header, including name and extra field > int cenHeaderSize = ZipFile.CENHDR + nameLength + extra.length; > > // Determine the number of entries needed to exceed the MAX_CEN_SIZE > int numEntries = (MAX_CEN_SIZE / cenHeaderSize) + 1; > > ZipEntry[] entries = new ZipEntry[numEntries]; > try (ZipOutputStream zip = new ZipOutputStream(new > BufferedOutputStream(new FileOutputStream(hugeZipFile)))) { > // Creating the LocalDataTime once allows faster processing > LocalDateTime time = LocalDateTime.now(); > // Add entries until MAX_CEN_SIZE is reached > for (int i = 0; i < numEntries; i++) { > String name = Integer.toString(i); > name = "0".repeat(nameLength -name.length()) + name; > ZipEntry entry = entries[i] = new ZipEntry(name); > // Use STORED for faster processing > entry.setMethod(ZipEntry.STORED); > entry.setSize(0); > entry.setCrc(0); > entry.setTimeLocal(time); > zip.putNextEntry(entry); > } > zip.closeEntry(); > for (ZipEntry entry : entries) { > entry.setExtra(extra); > } > } > } > > /** > * Make an extra field with an 'unknown' tag and the maximum possible data > size > * @return > */ > private static byte[] makeLargeExtra() { > byte[] extra = new byte[Short.MAX_VALUE]; > ByteBuffer buffer = ByteBuffer.wrap(extra).order(ByteOrder.LITTLE_ENDIAN); > buffer.putShort((short) 0x9902); // 'unknown' tag > buffer.putShort((short) (extra.length - 2 * Short.BYTES)); // Data size > return extra; > } > ``` I think this is probably OK. I would probably create a constant with the Data Size to make it clearer and add more of a detailed comment of this method (laying out the structure of the Extra field) along with pointing to 4.6.1 of the APP.note ------------- PR: https://git.openjdk.org/jdk/pull/12991