On Mon, 31 Oct 2022 02:35:18 GMT, Quan Anh Mai <qa...@openjdk.org> wrote:
>> Claes Redestad has updated the pull request incrementally with one >> additional commit since the last revision: >> >> Require UseSSE >= 3 due transitive use of sse3 instructions from ReduceI > > src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 3493: > >> 3491: // vnext = IntVector.broadcast(I256, power_of_31_backwards[0]); >> 3492: movdl(vnext, InternalAddress(power_of_31_backwards + (0 * >> sizeof(jint)))); >> 3493: vpbroadcastd(vnext, vnext, Assembler::AVX_256bit); > > `vpbroadcastd` can take an `Address` argument instead. An `InternalAddress` isn't an `Address` but an `AddressLiteral`. You can however do `as_Address(InternalAddress(power_of_31_backwards + (0 * sizeof(jint))))` > src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 3528: > >> 3526: vpmulld(vcoef[idx], vcoef[idx], vnext, Assembler::AVX_256bit); >> 3527: } >> 3528: jmp(LONG_VECTOR_LOOP_BEGIN); > > Calculating backward forces you to do calculating the coefficients on each > iteration, I think doing this normally would be better. But doing it forward requires a `reduceLane` on each iteration. It's faster to do it backward. ------------- PR: https://git.openjdk.org/jdk/pull/10847