On Thu, 27 Mar 2025 17:38:49 GMT, David M. Lloyd <d...@openjdk.org> wrote:

>> Provide method overloads to the ClassFile interface of the 
>> java.lang.classfile API which allow parsing of classes found in memory 
>> segments, as well as allowing built class files to be output to them.
>
> David M. Lloyd has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Add a benchmark for class file emission

Here's the raw benchmark results against `AbstractMap` and `TreeMap`:


Benchmark                                 Mode  Cnt       Score      Error  
Units
MemorySegmentBenchmark.emitWithCopy0     thrpt    5  198061.082 ± 2300.146  
ops/s
MemorySegmentBenchmark.emitWithCopy1     thrpt    5   35352.167 ±  320.823  
ops/s
MemorySegmentBenchmark.emitWithoutCopy0  thrpt    5  265208.111 ± 1416.120  
ops/s
MemorySegmentBenchmark.emitWithoutCopy1  thrpt    5   53215.327 ±  354.228  
ops/s


`0` is the smaller `AbstractMap` class bytes and `1` is the larger `TreeMap` 
class bytes. For case 0 we see an improvement of around 34% overall, and case 1 
shows an improvement of closer to 50% (which is expected, since larger classes 
would mean copying more bytes as well as putting more pressure on the GC).

Here is the same benchmark with `-prof gc` enabled:


Benchmark                                                    Mode  Cnt       
Score      Error   Units
MemorySegmentBenchmark.emitWithCopy0                        thrpt    5  
197728.066 ± 3107.524   ops/s
MemorySegmentBenchmark.emitWithCopy0:gc.alloc.rate          thrpt    5    
3900.963 ±   61.292  MB/sec
MemorySegmentBenchmark.emitWithCopy0:gc.alloc.rate.norm     thrpt    5   
20688.004 ±    0.001    B/op
MemorySegmentBenchmark.emitWithCopy0:gc.count               thrpt    5     
680.000             counts
MemorySegmentBenchmark.emitWithCopy0:gc.time                thrpt    5     
415.000                 ms
MemorySegmentBenchmark.emitWithCopy1                        thrpt    5   
35504.531 ±  260.423   ops/s
MemorySegmentBenchmark.emitWithCopy1:gc.alloc.rate          thrpt    5    
3512.621 ±   25.778  MB/sec
MemorySegmentBenchmark.emitWithCopy1:gc.alloc.rate.norm     thrpt    5  
103744.020 ±    0.001    B/op
MemorySegmentBenchmark.emitWithCopy1:gc.count               thrpt    5     
673.000             counts
MemorySegmentBenchmark.emitWithCopy1:gc.time                thrpt    5     
413.000                 ms
MemorySegmentBenchmark.emitWithoutCopy0                     thrpt    5  
265533.600 ± 1707.914   ops/s
MemorySegmentBenchmark.emitWithoutCopy0:gc.alloc.rate       thrpt    5    
3547.167 ±   22.811  MB/sec
MemorySegmentBenchmark.emitWithoutCopy0:gc.alloc.rate.norm  thrpt    5   
14008.003 ±    0.001    B/op
MemorySegmentBenchmark.emitWithoutCopy0:gc.count            thrpt    5     
651.000             counts
MemorySegmentBenchmark.emitWithoutCopy0:gc.time             thrpt    5     
392.000                 ms
MemorySegmentBenchmark.emitWithoutCopy1                     thrpt    5   
52727.917 ±  624.059   ops/s
MemorySegmentBenchmark.emitWithoutCopy1:gc.alloc.rate       thrpt    5    
3531.104 ±   42.004  MB/sec
MemorySegmentBenchmark.emitWithoutCopy1:gc.alloc.rate.norm  thrpt    5   
70224.013 ±    0.001    B/op
MemorySegmentBenchmark.emitWithoutCopy1:gc.count            thrpt    5     
683.000             counts
MemorySegmentBenchmark.emitWithoutCopy1:gc.time             thrpt    5     
412.000                 ms


You can see that in addition to the overhead of copying, we also put a bit more 
pressure on the GC despite having similar numbers of allocations by filling up 
our allocation regions more quickly with the extra large array per operation, 
which requires a little more time to be spent in GC on average. We are 
allocating roughly the same *number* of objects in either case.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/24139#issuecomment-2758926428

Reply via email to