On Mon, 31 Oct 2022 02:15:35 GMT, Quan Anh Mai <qa...@openjdk.org> wrote:

>> src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 3387:
>> 
>>> 3385:   for (int idx = 0; idx < 4; idx++) {
>>> 3386:     // h = (31 * h) or (h << 5 - h);
>>> 3387:     movl(tmp, result);
>> 
>> If you are unrolling this, maybe break the dependency chain, `h = h * 31**4 
>> + x[i] * 31**3 + x[i + 1] * 31**2 + x[i + 2] * 31 + x[i + 3]`
>
> A 256-bit vector is only 8 ints so this loop seems redundant, maybe running 
> with the stride of 2 instead, in which case the single scalar calculation 
> does also not need a loop.

Working on this..

-------------

PR: https://git.openjdk.org/jdk/pull/10847

Reply via email to