On Mon, 31 Oct 2022 12:25:43 GMT, Claes Redestad <redes...@openjdk.org> wrote:

>> src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 3484:
>> 
>>> 3482:   decrementl(index);
>>> 3483:   jmpb(LONG_SCALAR_LOOP_BEGIN);
>>> 3484:   bind(LONG_SCALAR_LOOP_END);
>> 
>> You can share this loop with the scalar ones above.
>
> This might be messier than it first looks, since the two different loops use 
> different temp registers based (long scalar can scratch cnt1, short scalar 
> scratches the coef register). I'll have to think about this for a bit.

As it happens in the latest version the vector loop drops into the scalar loop 
after all 32-element chunks has been processed.

-------------

PR: https://git.openjdk.org/jdk/pull/10847

Reply via email to