On Fri, 9 May 2025 14:11:27 GMT, Andrew Haley <a...@openjdk.org> wrote:

> This intrinsic is generally faster than the current implementation for Panama 
> segment operations for all writes larger than about 8 bytes in size, 
> increasing to more than 2* the performance on larger memory blocks on 
> Graviton 2, between "panama" (C2 generated, what we use now) and "unsafe" 
> (this intrinsic).
> 
> 
> Benchmark                       (aligned)  (size)  Mode  Cnt     Score    
> Error  Units
> MemorySegmentFillUnsafe.panama       true  262143  avgt   10  7295.638 ±  
> 0.422  ns/op
> MemorySegmentFillUnsafe.panama      false  262143  avgt   10  8345.300 ± 
> 80.161  ns/op
> MemorySegmentFillUnsafe.unsafe       true  262143  avgt   10  2930.594 ±  
> 0.180  ns/op
> MemorySegmentFillUnsafe.unsafe      false  262143  avgt   10  3136.828 ±  
> 0.232  ns/op

Apple M1, small memory blocks:


Benchmark                       (aligned)  (size)  Mode  Cnt  Score   Error  
Units
MemorySegmentFillUnsafe.panama       true       1  avgt   10  1.731 ± 0.001  
ns/op
MemorySegmentFillUnsafe.panama       true       2  avgt   10  1.570 ± 0.001  
ns/op
MemorySegmentFillUnsafe.panama       true       3  avgt   10  1.583 ± 0.014  
ns/op
MemorySegmentFillUnsafe.panama       true       4  avgt   10  1.734 ± 0.014  
ns/op
MemorySegmentFillUnsafe.panama       true       5  avgt   10  1.736 ± 0.001  
ns/op
MemorySegmentFillUnsafe.panama       true       6  avgt   10  1.731 ± 0.001  
ns/op
MemorySegmentFillUnsafe.panama       true       7  avgt   10  1.744 ± 0.002  
ns/op
MemorySegmentFillUnsafe.panama       true       8  avgt   10  2.365 ± 0.005  
ns/op
MemorySegmentFillUnsafe.panama       true      15  avgt   10  2.681 ± 0.001  
ns/op
MemorySegmentFillUnsafe.panama       true      16  avgt   10  2.503 ± 0.003  
ns/op
MemorySegmentFillUnsafe.panama       true      63  avgt   10  3.615 ± 0.003  
ns/op
MemorySegmentFillUnsafe.panama       true      64  avgt   10  4.701 ± 0.056  
ns/op
MemorySegmentFillUnsafe.panama       true     255  avgt   10  4.848 ± 0.004  
ns/op
MemorySegmentFillUnsafe.panama       true     256  avgt   10  5.003 ± 0.003  
ns/op
MemorySegmentFillUnsafe.panama      false       1  avgt   10  1.729 ± 0.001  
ns/op
MemorySegmentFillUnsafe.panama      false       2  avgt   10  1.571 ± 0.003  
ns/op
MemorySegmentFillUnsafe.panama      false       3  avgt   10  1.579 ± 0.010  
ns/op
MemorySegmentFillUnsafe.panama      false       4  avgt   10  1.728 ± 0.002  
ns/op
MemorySegmentFillUnsafe.panama      false       5  avgt   10  1.739 ± 0.019  
ns/op
MemorySegmentFillUnsafe.panama      false       6  avgt   10  1.731 ± 0.002  
ns/op
MemorySegmentFillUnsafe.panama      false       7  avgt   10  1.744 ± 0.012  
ns/op
MemorySegmentFillUnsafe.panama      false       8  avgt   10  2.367 ± 0.002  
ns/op
MemorySegmentFillUnsafe.panama      false      15  avgt   10  2.694 ± 0.030  
ns/op
MemorySegmentFillUnsafe.panama      false      16  avgt   10  2.517 ± 0.057  
ns/op
MemorySegmentFillUnsafe.panama      false      63  avgt   10  3.619 ± 0.009  
ns/op
MemorySegmentFillUnsafe.panama      false      64  avgt   10  4.708 ± 0.057  
ns/op
MemorySegmentFillUnsafe.panama      false     255  avgt   10  5.018 ± 0.057  
ns/op
MemorySegmentFillUnsafe.panama      false     256  avgt   10  5.038 ± 0.068  
ns/op
MemorySegmentFillUnsafe.unsafe       true       1  avgt   10  2.815 ± 0.002  
ns/op
MemorySegmentFillUnsafe.unsafe       true       2  avgt   10  2.821 ± 0.022  
ns/op
MemorySegmentFillUnsafe.unsafe       true       3  avgt   10  2.502 ± 0.002  
ns/op
MemorySegmentFillUnsafe.unsafe       true       4  avgt   10  2.815 ± 0.004  
ns/op
MemorySegmentFillUnsafe.unsafe       true       5  avgt   10  2.502 ± 0.003  
ns/op
MemorySegmentFillUnsafe.unsafe       true       6  avgt   10  2.505 ± 0.022  
ns/op
MemorySegmentFillUnsafe.unsafe       true       7  avgt   10  2.193 ± 0.019  
ns/op
MemorySegmentFillUnsafe.unsafe       true       8  avgt   10  2.190 ± 0.002  
ns/op
MemorySegmentFillUnsafe.unsafe       true      15  avgt   10  2.043 ± 0.027  
ns/op
MemorySegmentFillUnsafe.unsafe       true      16  avgt   10  2.191 ± 0.003  
ns/op
MemorySegmentFillUnsafe.unsafe       true      63  avgt   10  2.061 ± 0.040  
ns/op
MemorySegmentFillUnsafe.unsafe       true      64  avgt   10  2.196 ± 0.027  
ns/op
MemorySegmentFillUnsafe.unsafe       true     255  avgt   10  3.756 ± 0.001  
ns/op
MemorySegmentFillUnsafe.unsafe       true     256  avgt   10  3.752 ± 0.002  
ns/op
MemorySegmentFillUnsafe.unsafe      false       1  avgt   10  2.813 ± 0.001  
ns/op
MemorySegmentFillUnsafe.unsafe      false       2  avgt   10  2.817 ± 0.003  
ns/op
MemorySegmentFillUnsafe.unsafe      false       3  avgt   10  2.502 ± 0.003  
ns/op
MemorySegmentFillUnsafe.unsafe      false       4  avgt   10  2.816 ± 0.002  
ns/op
MemorySegmentFillUnsafe.unsafe      false       5  avgt   10  2.507 ± 0.027  
ns/op
MemorySegmentFillUnsafe.unsafe      false       6  avgt   10  2.507 ± 0.025  
ns/op
MemorySegmentFillUnsafe.unsafe      false       7  avgt   10  2.195 ± 0.025  
ns/op
MemorySegmentFillUnsafe.unsafe      false       8  avgt   10  2.192 ± 0.005  
ns/op
MemorySegmentFillUnsafe.unsafe      false      15  avgt   10  2.050 ± 0.025  
ns/op
MemorySegmentFillUnsafe.unsafe      false      16  avgt   10  2.188 ± 0.001  
ns/op
MemorySegmentFillUnsafe.unsafe      false      63  avgt   10  2.051 ± 0.027  
ns/op
MemorySegmentFillUnsafe.unsafe      false      64  avgt   10  2.196 ± 0.015  
ns/op
MemorySegmentFillUnsafe.unsafe      false     255  avgt   10  4.619 ± 0.029  
ns/op
MemorySegmentFillUnsafe.unsafe      false     256  avgt   10  4.618 ± 0.047  
ns/op

Graviton 4, small memory blocks:


Benchmark                       (aligned)  (size)  Mode  Cnt  Score    Error  
Units
MemorySegmentFillUnsafe.panama       true       1  avgt   10  1.970 ±  0.002  
ns/op
MemorySegmentFillUnsafe.panama       true       2  avgt   10  1.966 ±  0.020  
ns/op
MemorySegmentFillUnsafe.panama       true       3  avgt   10  1.963 ±  0.014  
ns/op
MemorySegmentFillUnsafe.panama       true       4  avgt   10  1.989 ±  0.004  
ns/op
MemorySegmentFillUnsafe.panama       true       5  avgt   10  2.030 ±  0.010  
ns/op
MemorySegmentFillUnsafe.panama       true       6  avgt   10  2.027 ±  0.010  
ns/op
MemorySegmentFillUnsafe.panama       true       7  avgt   10  2.077 ±  0.006  
ns/op
MemorySegmentFillUnsafe.panama       true       8  avgt   10  2.557 ±  0.004  
ns/op
MemorySegmentFillUnsafe.panama       true      15  avgt   10  3.176 ±  0.002  
ns/op
MemorySegmentFillUnsafe.panama       true      16  avgt   10  2.779 ±  0.001  
ns/op
MemorySegmentFillUnsafe.panama       true      63  avgt   10  4.302 ±  0.002  
ns/op
MemorySegmentFillUnsafe.panama       true      64  avgt   10  4.292 ±  0.007  
ns/op
MemorySegmentFillUnsafe.panama       true     255  avgt   10  6.311 ±  0.013  
ns/op
MemorySegmentFillUnsafe.panama       true     256  avgt   10  5.394 ±  0.003  
ns/op
MemorySegmentFillUnsafe.panama      false       1  avgt   10  1.970 ±  0.001  
ns/op
MemorySegmentFillUnsafe.panama      false       2  avgt   10  1.937 ±  0.017  
ns/op
MemorySegmentFillUnsafe.panama      false       3  avgt   10  1.954 ±  0.014  
ns/op
MemorySegmentFillUnsafe.panama      false       4  avgt   10  1.985 ±  0.005  
ns/op
MemorySegmentFillUnsafe.panama      false       5  avgt   10  2.006 ±  0.008  
ns/op
MemorySegmentFillUnsafe.panama      false       6  avgt   10  2.015 ±  0.008  
ns/op
MemorySegmentFillUnsafe.panama      false       7  avgt   10  2.138 ±  0.035  
ns/op
MemorySegmentFillUnsafe.panama      false       8  avgt   10  2.553 ±  0.005  
ns/op
MemorySegmentFillUnsafe.panama      false      15  avgt   10  3.178 ±  0.002  
ns/op
MemorySegmentFillUnsafe.panama      false      16  avgt   10  2.775 ±  0.005  
ns/op
MemorySegmentFillUnsafe.panama      false      63  avgt   10  4.296 ±  0.007  
ns/op
MemorySegmentFillUnsafe.panama      false      64  avgt   10  4.290 ±  0.001  
ns/op
MemorySegmentFillUnsafe.panama      false     255  avgt   10  6.334 ±  0.013  
ns/op
MemorySegmentFillUnsafe.panama      false     256  avgt   10  5.472 ±  0.009  
ns/op
MemorySegmentFillUnsafe.unsafe       true       1  avgt   10  3.218 ±  0.001  
ns/op
MemorySegmentFillUnsafe.unsafe       true       2  avgt   10  2.860 ±  0.001  
ns/op
MemorySegmentFillUnsafe.unsafe       true       3  avgt   10  2.860 ±  0.001  
ns/op
MemorySegmentFillUnsafe.unsafe       true       4  avgt   10  2.860 ±  0.001  
ns/op
MemorySegmentFillUnsafe.unsafe       true       5  avgt   10  2.860 ±  0.001  
ns/op
MemorySegmentFillUnsafe.unsafe       true       6  avgt   10  2.503 ±  0.001  
ns/op
MemorySegmentFillUnsafe.unsafe       true       7  avgt   10  2.860 ±  0.001  
ns/op
MemorySegmentFillUnsafe.unsafe       true       8  avgt   10  2.145 ±  0.001  
ns/op
MemorySegmentFillUnsafe.unsafe       true      15  avgt   10  2.886 ±  0.100  
ns/op
MemorySegmentFillUnsafe.unsafe       true      16  avgt   10  2.145 ±  0.001  
ns/op
MemorySegmentFillUnsafe.unsafe       true      63  avgt   10  3.781 ±  0.013  
ns/op
MemorySegmentFillUnsafe.unsafe       true      64  avgt   10  2.735 ±  0.016  
ns/op
MemorySegmentFillUnsafe.unsafe       true     255  avgt   10  5.079 ±  0.014  
ns/op
MemorySegmentFillUnsafe.unsafe       true     256  avgt   10  4.007 ±  0.112  
ns/op
MemorySegmentFillUnsafe.unsafe      false       1  avgt   10  3.218 ±  0.001  
ns/op
MemorySegmentFillUnsafe.unsafe      false       2  avgt   10  2.860 ±  0.001  
ns/op
MemorySegmentFillUnsafe.unsafe      false       3  avgt   10  2.861 ±  0.001  
ns/op
MemorySegmentFillUnsafe.unsafe      false       4  avgt   10  2.864 ±  0.016  
ns/op
MemorySegmentFillUnsafe.unsafe      false       5  avgt   10  2.860 ±  0.001  
ns/op
MemorySegmentFillUnsafe.unsafe      false       6  avgt   10  2.503 ±  0.001  
ns/op
MemorySegmentFillUnsafe.unsafe      false       7  avgt   10  2.860 ±  0.001  
ns/op
MemorySegmentFillUnsafe.unsafe      false       8  avgt   10  2.145 ±  0.001  
ns/op
MemorySegmentFillUnsafe.unsafe      false      15  avgt   10  2.571 ±  0.040  
ns/op
MemorySegmentFillUnsafe.unsafe      false      16  avgt   10  2.146 ±  0.001  
ns/op
MemorySegmentFillUnsafe.unsafe      false      63  avgt   10  4.531 ±  0.021  
ns/op
MemorySegmentFillUnsafe.unsafe      false      64  avgt   10  5.134 ±  0.099  
ns/op
MemorySegmentFillUnsafe.unsafe      false     255  avgt   10  6.603 ±  0.031  
ns/op
MemorySegmentFillUnsafe.unsafe      false     256  avgt   10  7.148 ±  0.025  
ns/op

-------------

PR Comment: https://git.openjdk.org/jdk/pull/25147#issuecomment-2866747058
PR Comment: https://git.openjdk.org/jdk/pull/25147#issuecomment-2866751204

Reply via email to