On Wed, 1 Mar 2023 at 20:53, Vineet Gupta <vine...@rivosinc.com> wrote: > > This showed up as dynamic icount regression in SPEC 531.deepsjeng with > upstream > gcc (vs. gcc 12.2). gcc was resorting to synthetic multiply using shift+add(s) > even when multiply had clear cost benefit. > > |00000000000133b8 <see(state_t*, int, int, int, int) [clone > .constprop.0]+0x382>: > | 133b8: srl a3,a1,s6 > | 133bc: and a3,a3,s5 > | 133c0: slli a4,a3,0x9 > | 133c4: add a4,a4,a3 > | 133c6: slli a4,a4,0x9 > | 133c8: add a4,a4,a3 > | 133ca: slli a3,a4,0x1b > | 133ce: add a4,a4,a3 > > vs. gcc 12 doing something lke below. > > |00000000000131c4 <see(state_t*, int, int, int, int) [clone > .constprop.0]+0x35c>: > | 131c4: ld s1,8(sp) > | 131c6: srl a3,a1,s4 > | 131ca: and a3,a3,s11 > | 131ce: mul a3,a3,s1 > > Bisected this to f90cb39235c4 ("RISC-V: costs: support shift-and-add in > strength-reduction"). The intent was to optimize cost for > shift-add-pow2-{1,2,3} corresponding to bitmanip insns SH*ADD, but ended > up doing that for all shift values which seems to favor synthezing > multiply among others. > > The bug itself is trivial, IN_RANGE() calling pow2p_hwi() which returns bool > vs. exact_log2() returning power of 2. > > This fix also requires update to the test introduced by the same commit > which now generates MUL vs. synthesizing it. > > gcc/Changelog: > > * config/riscv/riscv.cc (riscv_rtx_costs): Fixed IN_RANGE() to > use exact_log2(). > > gcc/testsuite/ChangeLog: > > * gcc.target/riscv/zba-shNadd-07.c: f2(i*783) now generates MUL vs. > 5 insn sh1add+slli+add+slli+sub. > * gcc.target/riscv/pr108987.c: New test. > > Signed-off-by: Vineet Gupta <vine...@rivosinc.com>
Reviewed-by: Philipp Tomsich <philipp.toms...@vrull.eu>