On Mon, 31 Oct 2022 13:18:35 GMT, Ludovic Henry <luhe...@openjdk.org> wrote:
>> src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 3528: >> >>> 3526: vpmulld(vcoef[idx], vcoef[idx], vnext, Assembler::AVX_256bit); >>> 3527: } >>> 3528: jmp(LONG_VECTOR_LOOP_BEGIN); >> >> Calculating backward forces you to do calculating the coefficients on each >> iteration, I think doing this normally would be better. > > But doing it forward requires a `reduceLane` on each iteration. It's faster > to do it backward. No you don't need to, the vector loop can be calculated as: IntVector accumulation = IntVector.zero(INT_SPECIES); for (int i = 0; i < bound; i += INT_SPECIES.length()) { IntVector current = IntVector.load(INT_SPECIES, array, i); accumulation = accumulation.mul(31**(INT_SPECIES.length())).add(current); } return accumulation.mul(IntVector.of(31**INT_SPECIES.length() - 1, ..., 31**2, 31, 1).reduce(ADD); Each iteration only requires a multiplication and an addition. The weight of lanes can be calculated just before the reduction operation. ------------- PR: https://git.openjdk.org/jdk/pull/10847