On Mon, 31 Oct 2022 12:25:43 GMT, Claes Redestad <redes...@openjdk.org> wrote:
>> src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 3484: >> >>> 3482: decrementl(index); >>> 3483: jmpb(LONG_SCALAR_LOOP_BEGIN); >>> 3484: bind(LONG_SCALAR_LOOP_END); >> >> You can share this loop with the scalar ones above. > > This might be messier than it first looks, since the two different loops use > different temp registers based (long scalar can scratch cnt1, short scalar > scratches the coef register). I'll have to think about this for a bit. As it happens in the latest version the vector loop drops into the scalar loop after all 32-element chunks has been processed. ------------- PR: https://git.openjdk.org/jdk/pull/10847