Re: RFR: 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops [v3]

Quan Anh Mai Mon, 31 Oct 2022 06:39:22 -0700

On Mon, 31 Oct 2022 13:18:35 GMT, Ludovic Henry <[email protected]> wrote:


>> src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 3528:
>> 
>>> 3526:     vpmulld(vcoef[idx], vcoef[idx], vnext, Assembler::AVX_256bit);
>>> 3527:   }
>>> 3528:   jmp(LONG_VECTOR_LOOP_BEGIN);
>> 
>> Calculating backward forces you to do calculating the coefficients on each 
>> iteration, I think doing this normally would be better.
>
> But doing it forward requires a `reduceLane` on each iteration. It's faster 
> to do it backward.

No you don't need to, the vector loop can be calculated as:

    IntVector accumulation = IntVector.zero(INT_SPECIES);
    for (int i = 0; i < bound; i += INT_SPECIES.length()) {
        IntVector current = IntVector.load(INT_SPECIES, array, i);
        accumulation = 
accumulation.mul(31**(INT_SPECIES.length())).add(current);
    }
    return accumulation.mul(IntVector.of(31**INT_SPECIES.length() - 1, ..., 
31**2, 31, 1).reduce(ADD);

Each iteration only requires a multiplication and an addition. The weight of 
lanes can be calculated just before the reduction operation.

-------------

PR: https://git.openjdk.org/jdk/pull/10847

Re: RFR: 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops [v3]

Reply via email to