On Fri, 6 Dec 2024 16:30:47 GMT, Quan Anh Mai <qa...@openjdk.org> wrote:

> Hi,
> 
> This patch improves the performance of a typical `Arena::allocate` in several 
> ways:
> 
> - Delay the creation of the NativeMemorySegmentImpl. This avoids the merge of 
> the instance with the one obtained from the call in the uncommon path, 
> increasing the chance the object being scalar replaced.
> - Split the allocation of over-aligned memory to a slow-path method.
> - Align the memory to 8 bytes, allowing faster zeroing.
> - Use a dedicated method to zero the just-allocated native memory, reduce 
> code size and make it more straightforward.
> - Make `VM.pageAlignDirectMemory` a `Boolean` instead of a `boolean` so that 
> `false` value can be constant folded.
> 
> Please take a look and leave your reviews, thanks a lot.

The results with the modified  `AllocTest`:

                                                      Before            After
    Benchmark                 (size)  Mode  Cnt    Score   Error    Score   
Error  Units
    AllocTest.alloc_confined       5  avgt   30   24.188 ± 0.305   17.221 ± 
1.299  ns/op
    AllocTest.alloc_confined      20  avgt   30   24.690 ± 0.168   19.571 ± 
3.108  ns/op
    AllocTest.alloc_confined     100  avgt   30   26.714 ± 0.061   17.819 ± 
0.095  ns/op
    AllocTest.alloc_confined     500  avgt   30   38.907 ± 0.113   19.716 ± 
0.060  ns/op
    AllocTest.alloc_confined    2000  avgt   30   60.056 ± 3.087   43.373 ± 
0.564  ns/op
    AllocTest.alloc_confined    8000  avgt   30  141.535 ± 1.546   75.110 ± 
3.482  ns/op

The overall `AllocTest` results:

    Benchmark                     (size)  Mode  Cnt    Score   Error  Units
    AllocTest.alloc_calloc_arena       5  avgt   30   19.604 ± 0.075  ns/op
    AllocTest.alloc_calloc_arena      20  avgt   30   19.750 ± 0.105  ns/op
    AllocTest.alloc_calloc_arena     100  avgt   30   20.335 ± 0.103  ns/op
    AllocTest.alloc_calloc_arena     500  avgt   30   36.676 ± 0.403  ns/op
    AllocTest.alloc_calloc_arena    2000  avgt   30   47.928 ± 2.754  ns/op
    AllocTest.alloc_calloc_arena    8000  avgt   30   83.762 ± 1.829  ns/op
    AllocTest.alloc_confined           5  avgt   30   17.221 ± 1.299  ns/op
    AllocTest.alloc_confined          20  avgt   30   19.571 ± 3.108  ns/op
    AllocTest.alloc_confined         100  avgt   30   17.819 ± 0.095  ns/op
    AllocTest.alloc_confined         500  avgt   30   19.716 ± 0.060  ns/op
    AllocTest.alloc_confined        2000  avgt   30   43.373 ± 0.564  ns/op
    AllocTest.alloc_confined        8000  avgt   30   75.110 ± 3.482  ns/op
    AllocTest.alloc_unsafe_arena       5  avgt   30   18.810 ± 0.074  ns/op
    AllocTest.alloc_unsafe_arena      20  avgt   30   18.858 ± 0.068  ns/op
    AllocTest.alloc_unsafe_arena     100  avgt   30   21.820 ± 0.077  ns/op
    AllocTest.alloc_unsafe_arena     500  avgt   30   32.685 ± 0.062  ns/op
    AllocTest.alloc_unsafe_arena    2000  avgt   30   61.172 ± 1.464  ns/op
    AllocTest.alloc_unsafe_arena    8000  avgt   30  133.842 ± 0.337  ns/op

-------------

PR Comment: https://git.openjdk.org/jdk/pull/22610#issuecomment-2523693086

Reply via email to