On Thu, 27 Mar 2025 17:38:49 GMT, David M. Lloyd <d...@openjdk.org> wrote:
>> Provide method overloads to the ClassFile interface of the >> java.lang.classfile API which allow parsing of classes found in memory >> segments, as well as allowing built class files to be output to them. > > David M. Lloyd has updated the pull request incrementally with one additional > commit since the last revision: > > Add a benchmark for class file emission Here's the raw benchmark results against `AbstractMap` and `TreeMap`: Benchmark Mode Cnt Score Error Units MemorySegmentBenchmark.emitWithCopy0 thrpt 5 198061.082 ± 2300.146 ops/s MemorySegmentBenchmark.emitWithCopy1 thrpt 5 35352.167 ± 320.823 ops/s MemorySegmentBenchmark.emitWithoutCopy0 thrpt 5 265208.111 ± 1416.120 ops/s MemorySegmentBenchmark.emitWithoutCopy1 thrpt 5 53215.327 ± 354.228 ops/s `0` is the smaller `AbstractMap` class bytes and `1` is the larger `TreeMap` class bytes. For case 0 we see an improvement of around 34% overall, and case 1 shows an improvement of closer to 50% (which is expected, since larger classes would mean copying more bytes as well as putting more pressure on the GC). Here is the same benchmark with `-prof gc` enabled: Benchmark Mode Cnt Score Error Units MemorySegmentBenchmark.emitWithCopy0 thrpt 5 197728.066 ± 3107.524 ops/s MemorySegmentBenchmark.emitWithCopy0:gc.alloc.rate thrpt 5 3900.963 ± 61.292 MB/sec MemorySegmentBenchmark.emitWithCopy0:gc.alloc.rate.norm thrpt 5 20688.004 ± 0.001 B/op MemorySegmentBenchmark.emitWithCopy0:gc.count thrpt 5 680.000 counts MemorySegmentBenchmark.emitWithCopy0:gc.time thrpt 5 415.000 ms MemorySegmentBenchmark.emitWithCopy1 thrpt 5 35504.531 ± 260.423 ops/s MemorySegmentBenchmark.emitWithCopy1:gc.alloc.rate thrpt 5 3512.621 ± 25.778 MB/sec MemorySegmentBenchmark.emitWithCopy1:gc.alloc.rate.norm thrpt 5 103744.020 ± 0.001 B/op MemorySegmentBenchmark.emitWithCopy1:gc.count thrpt 5 673.000 counts MemorySegmentBenchmark.emitWithCopy1:gc.time thrpt 5 413.000 ms MemorySegmentBenchmark.emitWithoutCopy0 thrpt 5 265533.600 ± 1707.914 ops/s MemorySegmentBenchmark.emitWithoutCopy0:gc.alloc.rate thrpt 5 3547.167 ± 22.811 MB/sec MemorySegmentBenchmark.emitWithoutCopy0:gc.alloc.rate.norm thrpt 5 14008.003 ± 0.001 B/op MemorySegmentBenchmark.emitWithoutCopy0:gc.count thrpt 5 651.000 counts MemorySegmentBenchmark.emitWithoutCopy0:gc.time thrpt 5 392.000 ms MemorySegmentBenchmark.emitWithoutCopy1 thrpt 5 52727.917 ± 624.059 ops/s MemorySegmentBenchmark.emitWithoutCopy1:gc.alloc.rate thrpt 5 3531.104 ± 42.004 MB/sec MemorySegmentBenchmark.emitWithoutCopy1:gc.alloc.rate.norm thrpt 5 70224.013 ± 0.001 B/op MemorySegmentBenchmark.emitWithoutCopy1:gc.count thrpt 5 683.000 counts MemorySegmentBenchmark.emitWithoutCopy1:gc.time thrpt 5 412.000 ms You can see that in addition to the overhead of copying, we also put a bit more pressure on the GC despite having similar numbers of allocations by filling up our allocation regions more quickly with the extra large array per operation, which requires a little more time to be spent in GC on average. We are allocating roughly the same *number* of objects in either case. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24139#issuecomment-2758926428