On Mon, 7 Jul 2025 19:56:30 GMT, Xiaolong Peng <[email protected]> wrote:
> - [x] I confirm that I make this contribution in accordance with the [OpenJDK > Interim AI Policy](https://openjdk.org/legal/ai). > > Shenandoah always allocates memory with heap lock, we have observed heavy > heap lock contention on memory allocation path in performance analysis of > some service in which we tried to adopt Shenandoah. This change is to propose > an optimization for the code path of memory allocation to improve heap lock > contention, along with the optimization, a better OOD is also done to > Shenandoah memory allocation to reuse the majority of the code: > > * ShenandoahAllocator: base class of the allocators, most of the allocation > code is in this class. > * ShenandoahMutatorAllocator: allocator for mutator, inherit from > ShenandoahAllocator, only override methods `alloc_start_index`, `verify`, > `_alloc_region_count` and `_yield_to_safepoint` to customize the allocator > for mutator. > * ShenandoahCollectorAllocator: allocator for collector allocation in > Collector partition, similar to ShenandoahMutatorAllocator, only few lines of > code to customize the allocator for Collector. > * ShenandoahOldCollectorAllocator: allocator for mutator collector > allocation in OldCollector partition, it doesn't inherit the logic from > ShenandoahAllocator for now, the `allocate` method has been overridden to > delegate to `FreeSet::allocate_for_collector` due to the special allocation > considerations for `plab` in old gen. We will rewrite this part later and > move the code out of `FreeSet::allocate_for_collector` > > I'm not expecting significant performance impact for most of the cases since > in most case the contention on heap lock it not high enough to cause > performance issue, but in some cases it may improve the latency/performance: > > 1. Dacapo lusearch test on EC2 host with 96 CPU cores, p90 is improved from > 500+us to less than 150us, p99 from 1000+us to ~200us. > > java -XX:-TieredCompilation -XX:+AlwaysPreTouch -Xms31G -Xmx31G > -XX:+UseShenandoahGC -XX:+UnlockExperimentalVMOptions > -XX:+UnlockDiagnosticVMOptions -XX:-ShenandoahUncommit > -XX:ShenandoahGCMode=generational -XX:+UseTLAB -jar > ~/tools/dacapo/dacapo-23.11-MR2-chopin.jar -n 10 lusearch | grep "metered > full smoothing" > > > Openjdk TIP: > > ===== DaCapo tail latency, metered full smoothing: 50% 241098 usec, 90% > 402356 usec, 99% 411065 usec, 99.9% 411763 usec, 99.99% 415531 usec, max > 428584 usec, measured over 524288 events ===== > ===== DaCapo tail latency, metered full smoothing: 50% 902 usec, 90% 3713 > usec, 99% 5898 usec, 99.9% 6488 ... This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/26171
