On Wed, 9 Apr 2025 12:48:10 GMT, Thomas Schatzl <tscha...@openjdk.org> wrote:
>> src/hotspot/cpu/x86/gc/g1/g1BarrierSetAssembler_x86.cpp line 101: >> >>> 99: } >>> 100: >>> 101: void >>> G1BarrierSetAssembler::gen_write_ref_array_post_barrier(MacroAssembler* >>> masm, DecoratorSet decorators, >> >> Have you measured the performance impact of inlining this assembly code >> instead of resorting to a runtime call as done before? Is it worth the >> maintenance cost (for every platform), risk of introducing bugs, etc.? > > I remember significant impact in some microbenchmark. It's also inlined in > Parallel GC. I do not consider it a big issue wrt to maintenance - these > things never really change, and the method is small and contained. > I will try to redo numbers. >From our microbenchmarks (higher numbers are better): Current code: Benchmark (size) Mode Cnt Score Error Units ArrayCopyObject.conjoint_micro 31 thrpt 15 166136.959 ± 5517.157 ops/ms ArrayCopyObject.conjoint_micro 63 thrpt 15 108880.108 ± 4331.112 ops/ms ArrayCopyObject.conjoint_micro 127 thrpt 15 93159.977 ± 5025.458 ops/ms ArrayCopyObject.conjoint_micro 2047 thrpt 15 17234.842 ± 831.344 ops/ms ArrayCopyObject.conjoint_micro 4095 thrpt 15 9202.216 ± 292.612 ops/ms ArrayCopyObject.conjoint_micro 8191 thrpt 15 3565.705 ± 121.116 ops/ms ArrayCopyObject.disjoint_micro 31 thrpt 15 159106.245 ± 5965.576 ops/ms ArrayCopyObject.disjoint_micro 63 thrpt 15 95475.658 ± 5415.267 ops/ms ArrayCopyObject.disjoint_micro 127 thrpt 15 84249.979 ± 6313.007 ops/ms ArrayCopyObject.disjoint_micro 2047 thrpt 15 10682.650 ± 381.832 ops/ms ArrayCopyObject.disjoint_micro 4095 thrpt 15 4471.940 ± 216.439 ops/ms ArrayCopyObject.disjoint_micro 8191 thrpt 15 1378.296 ± 33.421 ops/ms ArrayCopy.arrayCopyObject N/A avgt 15 13.880 ± 0.517 ns/op ArrayCopy.arrayCopyObjectNonConst N/A avgt 15 14.844 ± 0.751 ns/op ArrayCopy.arrayCopyObjectSameArraysBackward N/A avgt 15 11.080 ± 0.703 ns/op ArrayCopy.arrayCopyObjectSameArraysForward N/A avgt 15 11.003 ± 0.135 ns/op Runtime call: Benchmark (size) Mode Cnt Score Error Units ArrayCopyObject.conjoint_micro 31 thrpt 15 73100.230 ± 11079.381 ops/ms ArrayCopyObject.conjoint_micro 63 thrpt 15 65039.431 ± 1996.832 ops/ms ArrayCopyObject.conjoint_micro 127 thrpt 15 58336.711 ± 2260.660 ops/ms ArrayCopyObject.conjoint_micro 2047 thrpt 15 17035.419 ± 524.445 ops/ms ArrayCopyObject.conjoint_micro 4095 thrpt 15 9207.661 ± 286.526 ops/ms ArrayCopyObject.conjoint_micro 8191 thrpt 15 3264.491 ± 73.848 ops/ms ArrayCopyObject.disjoint_micro 31 thrpt 15 84587.219 ± 3007.310 ops/ms ArrayCopyObject.disjoint_micro 63 thrpt 15 62815.254 ± 1214.310 ops/ms ArrayCopyObject.disjoint_micro 127 thrpt 15 58423.470 ± 285.670 ops/ms ArrayCopyObject.disjoint_micro 2047 thrpt 15 10720.462 ± 617.173 ops/ms ArrayCopyObject.disjoint_micro 4095 thrpt 15 4178.195 ± 178.942 ops/ms ArrayCopyObject.disjoint_micro 8191 thrpt 15 1374.268 ± 44.290 ops/ms ArrayCopy.arrayCopyObject N/A avgt 15 19.667 ± 0.740 ns/op ArrayCopy.arrayCopyObjectNonConst N/A avgt 15 21.243 ± 1.891 ns/op ArrayCopy.arrayCopyObjectSameArraysBackward N/A avgt 15 16.645 ± 0.504 ns/op ArrayCopy.arrayCopyObjectSameArraysForward N/A avgt 15 17.409 ± 0.705 ns/op Obviously with larger arrays, the impact diminishes, but it's always there. I think the inlined code is worth the effort in this case. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r2037086410