On Fri, 9 May 2025 14:11:27 GMT, Andrew Haley <a...@openjdk.org> wrote:
> This intrinsic is generally faster than the current implementation for Panama > segment operations for all writes larger than about 8 bytes in size, > increasing to more than 2* the performance on larger memory blocks on > Graviton 2, between "panama" (C2 generated, what we use now) and "unsafe" > (this intrinsic). > > > Benchmark (aligned) (size) Mode Cnt Score > Error Units > MemorySegmentFillUnsafe.panama true 262143 avgt 10 7295.638 ± > 0.422 ns/op > MemorySegmentFillUnsafe.panama false 262143 avgt 10 8345.300 ± > 80.161 ns/op > MemorySegmentFillUnsafe.unsafe true 262143 avgt 10 2930.594 ± > 0.180 ns/op > MemorySegmentFillUnsafe.unsafe false 262143 avgt 10 3136.828 ± > 0.232 ns/op Apple M1, small memory blocks: Benchmark (aligned) (size) Mode Cnt Score Error Units MemorySegmentFillUnsafe.panama true 1 avgt 10 1.731 ± 0.001 ns/op MemorySegmentFillUnsafe.panama true 2 avgt 10 1.570 ± 0.001 ns/op MemorySegmentFillUnsafe.panama true 3 avgt 10 1.583 ± 0.014 ns/op MemorySegmentFillUnsafe.panama true 4 avgt 10 1.734 ± 0.014 ns/op MemorySegmentFillUnsafe.panama true 5 avgt 10 1.736 ± 0.001 ns/op MemorySegmentFillUnsafe.panama true 6 avgt 10 1.731 ± 0.001 ns/op MemorySegmentFillUnsafe.panama true 7 avgt 10 1.744 ± 0.002 ns/op MemorySegmentFillUnsafe.panama true 8 avgt 10 2.365 ± 0.005 ns/op MemorySegmentFillUnsafe.panama true 15 avgt 10 2.681 ± 0.001 ns/op MemorySegmentFillUnsafe.panama true 16 avgt 10 2.503 ± 0.003 ns/op MemorySegmentFillUnsafe.panama true 63 avgt 10 3.615 ± 0.003 ns/op MemorySegmentFillUnsafe.panama true 64 avgt 10 4.701 ± 0.056 ns/op MemorySegmentFillUnsafe.panama true 255 avgt 10 4.848 ± 0.004 ns/op MemorySegmentFillUnsafe.panama true 256 avgt 10 5.003 ± 0.003 ns/op MemorySegmentFillUnsafe.panama false 1 avgt 10 1.729 ± 0.001 ns/op MemorySegmentFillUnsafe.panama false 2 avgt 10 1.571 ± 0.003 ns/op MemorySegmentFillUnsafe.panama false 3 avgt 10 1.579 ± 0.010 ns/op MemorySegmentFillUnsafe.panama false 4 avgt 10 1.728 ± 0.002 ns/op MemorySegmentFillUnsafe.panama false 5 avgt 10 1.739 ± 0.019 ns/op MemorySegmentFillUnsafe.panama false 6 avgt 10 1.731 ± 0.002 ns/op MemorySegmentFillUnsafe.panama false 7 avgt 10 1.744 ± 0.012 ns/op MemorySegmentFillUnsafe.panama false 8 avgt 10 2.367 ± 0.002 ns/op MemorySegmentFillUnsafe.panama false 15 avgt 10 2.694 ± 0.030 ns/op MemorySegmentFillUnsafe.panama false 16 avgt 10 2.517 ± 0.057 ns/op MemorySegmentFillUnsafe.panama false 63 avgt 10 3.619 ± 0.009 ns/op MemorySegmentFillUnsafe.panama false 64 avgt 10 4.708 ± 0.057 ns/op MemorySegmentFillUnsafe.panama false 255 avgt 10 5.018 ± 0.057 ns/op MemorySegmentFillUnsafe.panama false 256 avgt 10 5.038 ± 0.068 ns/op MemorySegmentFillUnsafe.unsafe true 1 avgt 10 2.815 ± 0.002 ns/op MemorySegmentFillUnsafe.unsafe true 2 avgt 10 2.821 ± 0.022 ns/op MemorySegmentFillUnsafe.unsafe true 3 avgt 10 2.502 ± 0.002 ns/op MemorySegmentFillUnsafe.unsafe true 4 avgt 10 2.815 ± 0.004 ns/op MemorySegmentFillUnsafe.unsafe true 5 avgt 10 2.502 ± 0.003 ns/op MemorySegmentFillUnsafe.unsafe true 6 avgt 10 2.505 ± 0.022 ns/op MemorySegmentFillUnsafe.unsafe true 7 avgt 10 2.193 ± 0.019 ns/op MemorySegmentFillUnsafe.unsafe true 8 avgt 10 2.190 ± 0.002 ns/op MemorySegmentFillUnsafe.unsafe true 15 avgt 10 2.043 ± 0.027 ns/op MemorySegmentFillUnsafe.unsafe true 16 avgt 10 2.191 ± 0.003 ns/op MemorySegmentFillUnsafe.unsafe true 63 avgt 10 2.061 ± 0.040 ns/op MemorySegmentFillUnsafe.unsafe true 64 avgt 10 2.196 ± 0.027 ns/op MemorySegmentFillUnsafe.unsafe true 255 avgt 10 3.756 ± 0.001 ns/op MemorySegmentFillUnsafe.unsafe true 256 avgt 10 3.752 ± 0.002 ns/op MemorySegmentFillUnsafe.unsafe false 1 avgt 10 2.813 ± 0.001 ns/op MemorySegmentFillUnsafe.unsafe false 2 avgt 10 2.817 ± 0.003 ns/op MemorySegmentFillUnsafe.unsafe false 3 avgt 10 2.502 ± 0.003 ns/op MemorySegmentFillUnsafe.unsafe false 4 avgt 10 2.816 ± 0.002 ns/op MemorySegmentFillUnsafe.unsafe false 5 avgt 10 2.507 ± 0.027 ns/op MemorySegmentFillUnsafe.unsafe false 6 avgt 10 2.507 ± 0.025 ns/op MemorySegmentFillUnsafe.unsafe false 7 avgt 10 2.195 ± 0.025 ns/op MemorySegmentFillUnsafe.unsafe false 8 avgt 10 2.192 ± 0.005 ns/op MemorySegmentFillUnsafe.unsafe false 15 avgt 10 2.050 ± 0.025 ns/op MemorySegmentFillUnsafe.unsafe false 16 avgt 10 2.188 ± 0.001 ns/op MemorySegmentFillUnsafe.unsafe false 63 avgt 10 2.051 ± 0.027 ns/op MemorySegmentFillUnsafe.unsafe false 64 avgt 10 2.196 ± 0.015 ns/op MemorySegmentFillUnsafe.unsafe false 255 avgt 10 4.619 ± 0.029 ns/op MemorySegmentFillUnsafe.unsafe false 256 avgt 10 4.618 ± 0.047 ns/op Graviton 4, small memory blocks: Benchmark (aligned) (size) Mode Cnt Score Error Units MemorySegmentFillUnsafe.panama true 1 avgt 10 1.970 ± 0.002 ns/op MemorySegmentFillUnsafe.panama true 2 avgt 10 1.966 ± 0.020 ns/op MemorySegmentFillUnsafe.panama true 3 avgt 10 1.963 ± 0.014 ns/op MemorySegmentFillUnsafe.panama true 4 avgt 10 1.989 ± 0.004 ns/op MemorySegmentFillUnsafe.panama true 5 avgt 10 2.030 ± 0.010 ns/op MemorySegmentFillUnsafe.panama true 6 avgt 10 2.027 ± 0.010 ns/op MemorySegmentFillUnsafe.panama true 7 avgt 10 2.077 ± 0.006 ns/op MemorySegmentFillUnsafe.panama true 8 avgt 10 2.557 ± 0.004 ns/op MemorySegmentFillUnsafe.panama true 15 avgt 10 3.176 ± 0.002 ns/op MemorySegmentFillUnsafe.panama true 16 avgt 10 2.779 ± 0.001 ns/op MemorySegmentFillUnsafe.panama true 63 avgt 10 4.302 ± 0.002 ns/op MemorySegmentFillUnsafe.panama true 64 avgt 10 4.292 ± 0.007 ns/op MemorySegmentFillUnsafe.panama true 255 avgt 10 6.311 ± 0.013 ns/op MemorySegmentFillUnsafe.panama true 256 avgt 10 5.394 ± 0.003 ns/op MemorySegmentFillUnsafe.panama false 1 avgt 10 1.970 ± 0.001 ns/op MemorySegmentFillUnsafe.panama false 2 avgt 10 1.937 ± 0.017 ns/op MemorySegmentFillUnsafe.panama false 3 avgt 10 1.954 ± 0.014 ns/op MemorySegmentFillUnsafe.panama false 4 avgt 10 1.985 ± 0.005 ns/op MemorySegmentFillUnsafe.panama false 5 avgt 10 2.006 ± 0.008 ns/op MemorySegmentFillUnsafe.panama false 6 avgt 10 2.015 ± 0.008 ns/op MemorySegmentFillUnsafe.panama false 7 avgt 10 2.138 ± 0.035 ns/op MemorySegmentFillUnsafe.panama false 8 avgt 10 2.553 ± 0.005 ns/op MemorySegmentFillUnsafe.panama false 15 avgt 10 3.178 ± 0.002 ns/op MemorySegmentFillUnsafe.panama false 16 avgt 10 2.775 ± 0.005 ns/op MemorySegmentFillUnsafe.panama false 63 avgt 10 4.296 ± 0.007 ns/op MemorySegmentFillUnsafe.panama false 64 avgt 10 4.290 ± 0.001 ns/op MemorySegmentFillUnsafe.panama false 255 avgt 10 6.334 ± 0.013 ns/op MemorySegmentFillUnsafe.panama false 256 avgt 10 5.472 ± 0.009 ns/op MemorySegmentFillUnsafe.unsafe true 1 avgt 10 3.218 ± 0.001 ns/op MemorySegmentFillUnsafe.unsafe true 2 avgt 10 2.860 ± 0.001 ns/op MemorySegmentFillUnsafe.unsafe true 3 avgt 10 2.860 ± 0.001 ns/op MemorySegmentFillUnsafe.unsafe true 4 avgt 10 2.860 ± 0.001 ns/op MemorySegmentFillUnsafe.unsafe true 5 avgt 10 2.860 ± 0.001 ns/op MemorySegmentFillUnsafe.unsafe true 6 avgt 10 2.503 ± 0.001 ns/op MemorySegmentFillUnsafe.unsafe true 7 avgt 10 2.860 ± 0.001 ns/op MemorySegmentFillUnsafe.unsafe true 8 avgt 10 2.145 ± 0.001 ns/op MemorySegmentFillUnsafe.unsafe true 15 avgt 10 2.886 ± 0.100 ns/op MemorySegmentFillUnsafe.unsafe true 16 avgt 10 2.145 ± 0.001 ns/op MemorySegmentFillUnsafe.unsafe true 63 avgt 10 3.781 ± 0.013 ns/op MemorySegmentFillUnsafe.unsafe true 64 avgt 10 2.735 ± 0.016 ns/op MemorySegmentFillUnsafe.unsafe true 255 avgt 10 5.079 ± 0.014 ns/op MemorySegmentFillUnsafe.unsafe true 256 avgt 10 4.007 ± 0.112 ns/op MemorySegmentFillUnsafe.unsafe false 1 avgt 10 3.218 ± 0.001 ns/op MemorySegmentFillUnsafe.unsafe false 2 avgt 10 2.860 ± 0.001 ns/op MemorySegmentFillUnsafe.unsafe false 3 avgt 10 2.861 ± 0.001 ns/op MemorySegmentFillUnsafe.unsafe false 4 avgt 10 2.864 ± 0.016 ns/op MemorySegmentFillUnsafe.unsafe false 5 avgt 10 2.860 ± 0.001 ns/op MemorySegmentFillUnsafe.unsafe false 6 avgt 10 2.503 ± 0.001 ns/op MemorySegmentFillUnsafe.unsafe false 7 avgt 10 2.860 ± 0.001 ns/op MemorySegmentFillUnsafe.unsafe false 8 avgt 10 2.145 ± 0.001 ns/op MemorySegmentFillUnsafe.unsafe false 15 avgt 10 2.571 ± 0.040 ns/op MemorySegmentFillUnsafe.unsafe false 16 avgt 10 2.146 ± 0.001 ns/op MemorySegmentFillUnsafe.unsafe false 63 avgt 10 4.531 ± 0.021 ns/op MemorySegmentFillUnsafe.unsafe false 64 avgt 10 5.134 ± 0.099 ns/op MemorySegmentFillUnsafe.unsafe false 255 avgt 10 6.603 ± 0.031 ns/op MemorySegmentFillUnsafe.unsafe false 256 avgt 10 7.148 ± 0.025 ns/op ------------- PR Comment: https://git.openjdk.org/jdk/pull/25147#issuecomment-2866747058 PR Comment: https://git.openjdk.org/jdk/pull/25147#issuecomment-2866751204