On Mon, 22 Jun 2026 19:50:13 GMT, Andrew Haley <[email protected]> wrote:
> One more thought: it might just be unrolling and inlining. Apparently, it is C2 generating optimal code: dup selector ldr a ldr b bsl str result in the neon registers, unrolled once plus code for lengths not divisible by 4. I haven't paid much attention to this because it is not critical for the performance of the elliptic curve computation, but it is definitely better if it is optimal. ------------- PR Comment: https://git.openjdk.org/jdk/pull/30941#issuecomment-4773037600
