Withdrawn: 8361099: Shenandoah: Improve heap lock contention by using CAS for memory allocation

Xiaolong Peng Thu, 14 May 2026 12:18:40 -0700

On Mon, 7 Jul 2025 19:56:30 GMT, Xiaolong Peng <[email protected]> wrote:


> - [x] I confirm that I make this contribution in accordance with the [OpenJDK 
> Interim AI Policy](https://openjdk.org/legal/ai).
> 
> Shenandoah always allocates memory with heap lock, we have observed heavy 
> heap lock contention on memory allocation path in performance analysis of 
> some service in which we tried to adopt Shenandoah. This change is to propose 
> an optimization for the code path of memory allocation to improve heap lock 
> contention, along with the optimization, a better OOD is also done to 
> Shenandoah memory allocation to reuse the majority of the code:
> 
> * ShenandoahAllocator: base class of the allocators, most of the allocation 
> code is in this class.
> * ShenandoahMutatorAllocator: allocator for mutator, inherit from 
> ShenandoahAllocator, only override methods `alloc_start_index`, `verify`, 
> `_alloc_region_count` and  `_yield_to_safepoint` to customize the allocator 
> for mutator.
> * ShenandoahCollectorAllocator: allocator for collector allocation in 
> Collector partition, similar to ShenandoahMutatorAllocator, only few lines of 
> code to customize the allocator for Collector. 
> * ShenandoahOldCollectorAllocator:  allocator for mutator collector 
> allocation in OldCollector partition, it doesn't inherit the logic from 
> ShenandoahAllocator for now, the `allocate` method has been overridden to 
> delegate to `FreeSet::allocate_for_collector` due to the special allocation 
> considerations for `plab` in old gen. We will rewrite this part later and 
> move the code out of `FreeSet::allocate_for_collector`
> 
> I'm not expecting significant performance impact for most of the cases since 
> in most case the contention on heap lock it not high enough to cause 
> performance issue, but in some cases it may improve the latency/performance:
> 
> 1. Dacapo lusearch test on EC2 host with 96 CPU cores, p90 is improved from 
> 500+us to less than 150us, p99 from 1000+us to ~200us. 
> 
> java -XX:-TieredCompilation -XX:+AlwaysPreTouch -Xms31G -Xmx31G 
> -XX:+UseShenandoahGC -XX:+UnlockExperimentalVMOptions 
> -XX:+UnlockDiagnosticVMOptions  -XX:-ShenandoahUncommit 
> -XX:ShenandoahGCMode=generational  -XX:+UseTLAB -jar 
> ~/tools/dacapo/dacapo-23.11-MR2-chopin.jar  -n 10 lusearch  | grep "metered 
> full smoothing"
> 
> 
> Openjdk TIP:
> 
> ===== DaCapo tail latency, metered full smoothing: 50% 241098 usec, 90% 
> 402356 usec, 99% 411065 usec, 99.9% 411763 usec, 99.99% 415531 usec, max 
> 428584 usec, measured over 524288 events =====
> ===== DaCapo tail latency, metered full smoothing: 50% 902 usec, 90% 3713 
> usec, 99% 5898 usec, 99.9% 6488 ...

This pull request has been closed without being integrated.

-------------

PR: https://git.openjdk.org/jdk/pull/26171

Withdrawn: 8361099: Shenandoah: Improve heap lock contention by using CAS for memory allocation

Reply via email to