On Tue, 12 Nov 2024 16:00:39 GMT, Aleksey Shipilev <sh...@openjdk.org> wrote:
> See the bug for more discussion and reproducer. This PR replaces the linked > list with an `ArrayList` wrapper that manages synchronization, search and > replacements effectively. There are possible improvements here, most glaring > is parallelism that is currently knee-capped by global synchronization. The > synchronization scheme follows what we already have, and I think it is safer > to continue with it right now. > > I'll put performance data in a separate comment. > > Additional testing: > - [x] Original reproducer improves drastically > - [x] New microbenchmark shows no regression on "churning" tests, which > covers insertion/removal perf > - [x] New microbenchmark shows improvement on Full GC times (crude, but > repeatable), serves as a proxy for reproducer > - [x] `java/lang/ref` tests in release > - [ ] `all` tests in fastdebug Original reproducer on my M1: # Before ... [8.989s][info ][gc ] GC(50) Pause Young (Normal) (G1 Evacuation Pause) 608M->21M(1011M) 46.562ms [9.187s][info ][gc ] GC(51) Pause Young (Normal) (G1 Evacuation Pause) 608M->22M(1011M) 45.286ms [9.387s][info ][gc ] GC(52) Pause Young (Normal) (G1 Evacuation Pause) 609M->21M(1011M) 45.636ms [9.592s][info ][gc ] GC(53) Pause Young (Normal) (G1 Evacuation Pause) 608M->22M(1015M) 47.514ms [9.794s][info ][gc ] GC(54) Pause Young (Normal) (G1 Evacuation Pause) 612M->22M(1015M) 46.807ms [9.993s][info ][gc ] GC(55) Pause Young (Normal) (G1 Evacuation Pause) 612M->21M(1015M) 45.964ms # After ... [6.964s][info ][gc ] GC(50) Pause Young (Normal) (G1 Evacuation Pause) 521M->36M(830M) 11.096ms [7.108s][info ][gc ] GC(51) Pause Young (Normal) (G1 Evacuation Pause) 521M->36M(830M) 11.380ms [7.252s][info ][gc ] GC(52) Pause Young (Normal) (G1 Evacuation Pause) 521M->36M(830M) 11.293ms [7.397s][info ][gc ] GC(53) Pause Young (Normal) (G1 Evacuation Pause) 520M->35M(830M) 12.407ms [7.540s][info ][gc ] GC(54) Pause Young (Normal) (G1 Evacuation Pause) 520M->37M(830M) 11.096ms A closest reproducer in form of JMH test also improves: Benchmark (count) (recipFreq) Mode Cnt Score Error Units # Before CleanerGC.test 16384 N/A avgt 15 2.170 ± 0.082 ms/op CleanerGC.test 65536 N/A avgt 15 2.281 ± 0.104 ms/op CleanerGC.test 262144 N/A avgt 15 6.176 ± 0.466 ms/op CleanerGC.test 1048576 N/A avgt 15 22.913 ± 5.171 ms/op CleanerGC.test 4194304 N/A avgt 15 77.781 ± 14.937 ms/op # After CleanerGC.test 16384 N/A avgt 15 2.169 ± 0.061 ms/op CleanerGC.test 65536 N/A avgt 15 2.247 ± 0.083 ms/op CleanerGC.test 262144 N/A avgt 15 3.822 ± 0.191 ms/op CleanerGC.test 1048576 N/A avgt 15 9.750 ± 0.638 ms/op CleanerGC.test 4194304 N/A avgt 15 33.842 ± 5.382 ms/op Churn benchmark, which covers insertion/removal perf, matches the original implementation closely: Benchmark (count) (recipFreq) Mode Cnt Score Error Units # Before CleanerChurn.test N/A 128 avgt 9 7.063 ± 0.262 ns/op CleanerChurn.test N/A 256 avgt 9 5.669 ± 0.118 ns/op CleanerChurn.test N/A 512 avgt 9 5.025 ± 0.066 ns/op CleanerChurn.test N/A 1024 avgt 9 4.714 ± 0.086 ns/op CleanerChurn.test N/A 2048 avgt 9 4.595 ± 0.091 ns/op # After CleanerChurn.test N/A 128 avgt 9 7.050 ± 0.847 ns/op CleanerChurn.test N/A 256 avgt 9 5.378 ± 0.186 ns/op CleanerChurn.test N/A 512 avgt 9 4.896 ± 0.112 ns/op CleanerChurn.test N/A 1024 avgt 9 4.712 ± 0.063 ns/op CleanerChurn.test N/A 2048 avgt 9 4.671 ± 0.071 ns/op ------------- PR Comment: https://git.openjdk.org/jdk/pull/22043#issuecomment-2470928360