On Fri, 6 Dec 2024 16:30:47 GMT, Quan Anh Mai <qa...@openjdk.org> wrote:
> Hi, > > This patch improves the performance of a typical `Arena::allocate` in several > ways: > > - Delay the creation of the NativeMemorySegmentImpl. This avoids the merge of > the instance with the one obtained from the call in the uncommon path, > increasing the chance the object being scalar replaced. > - Split the allocation of over-aligned memory to a slow-path method. > - Align the memory to 8 bytes, allowing faster zeroing. > - Use a dedicated method to zero the just-allocated native memory, reduce > code size and make it more straightforward. > - Make `VM.pageAlignDirectMemory` a `Boolean` instead of a `boolean` so that > `false` value can be constant folded. > > Please take a look and leave your reviews, thanks a lot. The results with the modified `AllocTest`: Before After Benchmark (size) Mode Cnt Score Error Score Error Units AllocTest.alloc_confined 5 avgt 30 24.188 ± 0.305 17.221 ± 1.299 ns/op AllocTest.alloc_confined 20 avgt 30 24.690 ± 0.168 19.571 ± 3.108 ns/op AllocTest.alloc_confined 100 avgt 30 26.714 ± 0.061 17.819 ± 0.095 ns/op AllocTest.alloc_confined 500 avgt 30 38.907 ± 0.113 19.716 ± 0.060 ns/op AllocTest.alloc_confined 2000 avgt 30 60.056 ± 3.087 43.373 ± 0.564 ns/op AllocTest.alloc_confined 8000 avgt 30 141.535 ± 1.546 75.110 ± 3.482 ns/op The overall `AllocTest` results: Benchmark (size) Mode Cnt Score Error Units AllocTest.alloc_calloc_arena 5 avgt 30 19.604 ± 0.075 ns/op AllocTest.alloc_calloc_arena 20 avgt 30 19.750 ± 0.105 ns/op AllocTest.alloc_calloc_arena 100 avgt 30 20.335 ± 0.103 ns/op AllocTest.alloc_calloc_arena 500 avgt 30 36.676 ± 0.403 ns/op AllocTest.alloc_calloc_arena 2000 avgt 30 47.928 ± 2.754 ns/op AllocTest.alloc_calloc_arena 8000 avgt 30 83.762 ± 1.829 ns/op AllocTest.alloc_confined 5 avgt 30 17.221 ± 1.299 ns/op AllocTest.alloc_confined 20 avgt 30 19.571 ± 3.108 ns/op AllocTest.alloc_confined 100 avgt 30 17.819 ± 0.095 ns/op AllocTest.alloc_confined 500 avgt 30 19.716 ± 0.060 ns/op AllocTest.alloc_confined 2000 avgt 30 43.373 ± 0.564 ns/op AllocTest.alloc_confined 8000 avgt 30 75.110 ± 3.482 ns/op AllocTest.alloc_unsafe_arena 5 avgt 30 18.810 ± 0.074 ns/op AllocTest.alloc_unsafe_arena 20 avgt 30 18.858 ± 0.068 ns/op AllocTest.alloc_unsafe_arena 100 avgt 30 21.820 ± 0.077 ns/op AllocTest.alloc_unsafe_arena 500 avgt 30 32.685 ± 0.062 ns/op AllocTest.alloc_unsafe_arena 2000 avgt 30 61.172 ± 1.464 ns/op AllocTest.alloc_unsafe_arena 8000 avgt 30 133.842 ± 0.337 ns/op ------------- PR Comment: https://git.openjdk.org/jdk/pull/22610#issuecomment-2523693086