On Mon, 31 Oct 2022 02:21:44 GMT, Quan Anh Mai <qa...@openjdk.org> wrote:
>> Claes Redestad has updated the pull request incrementally with one >> additional commit since the last revision: >> >> Reorder loops and some other suggestions from @merykitty > > src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 3358: > >> 3356: movl(result, is_string_hashcode ? 0 : 1); >> 3357: >> 3358: // if (cnt1 == 0) { > > You may want to reorder the execution of the loops, a short array suffers > more from processing than a big array, so you should have minimum extra hops > for those. For example, I think this could be: > > if (cnt1 >= 4) { > if (cnt1 >= 16) { > UNROLLED VECTOR LOOP > SINGLE VECTOR LOOP > } > UNROLLED SCALAR LOOP > } > SINGLE SCALAR LOOP > > The thresholds are arbitrary and need to be measured carefully. Fixed > src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 3374: > >> 3372: >> 3373: // int i = 0; >> 3374: movl(index, 0); > > `xorl(index, index)` Fixed > src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 3418: > >> 3416: // } else { // cnt1 >= 32 >> 3417: address power_of_31_backwards = pc(); >> 3418: emit_int32( 2111290369); > > Can this giant table be shared among compilations instead? Probably, though I'm not entirely sure on how. Maybe the "long" cases should be factored out into a set of stub routines so that it's not inlined in numerous places anyway. ------------- PR: https://git.openjdk.org/jdk/pull/10847