Hi Roger, You have a block of 8 spaces that needs to be replaced by tabs: gcc/config/arc/arc.cc:5538:0: if (n < 4)
Please fix the above, and proceed with your commit. Thank you, Claudiu On Sun, Oct 29, 2023 at 11:16 AM Roger Sayle <ro...@nextmovesoftware.com> wrote: > > > This patch overhauls the ARC backend's insn_cost target hook, and makes > some related improvements to rtx_costs, BRANCH_COST, etc. The primary > goal is to allow the backend to indicate that shifts and rotates are > slow (discouraged) when the CPU doesn't have a barrel shifter. I should > also acknowledge Richard Sandiford for inspiring the use of set_cost > in this rewrite of arc_insn_cost; this implementation borrows heavily > for the target hooks for AArch64 and ARM. > > The motivating example is derived from PR rtl-optimization/110717. > > struct S { int a : 5; }; > unsigned int foo (struct S *p) { > return p->a; > } > > With a barrel shifter, GCC -O2 generates the reasonable: > > foo: ldb_s r0,[r0] > asl_s r0,r0,27 > j_s.d [blink] > asr_s r0,r0,27 > > What's interesting is that during combine, the middle-end actually > has two shifts by three bits, and a sign-extension from QI to SI. > > Trying 8, 9 -> 11: > 8: r158:SI=r157:QI#0<<0x3 > REG_DEAD r157:QI > 9: r159:SI=sign_extend(r158:SI#0) > REG_DEAD r158:SI > 11: r155:SI=r159:SI>>0x3 > REG_DEAD r159:SI > > Whilst it's reasonable to simplify this to two shifts by 27 bits when > the CPU has a barrel shifter, it's actually a significant pessimization > when these shifts are implemented by loops. This combination can be > prevented if the backend provides accurate-ish estimates for insn_cost. > > > Previously, without a barrel shifter, GCC -O2 -mcpu=em generates: > > foo: ldb_s r0,[r0] > mov lp_count,27 > lp 2f > add r0,r0,r0 > nop > 2: # end single insn loop > mov lp_count,27 > lp 2f > asr r0,r0 > nop > 2: # end single insn loop > j_s [blink] > > which contains two loops and requires about ~113 cycles to execute. > With this patch to rtx_cost/insn_cost, GCC -O2 -mcpu=em generates: > > foo: ldb_s r0,[r0] > mov_s r2,0 ;3 > add3 r0,r2,r0 > sexb_s r0,r0 > asr_s r0,r0 > asr_s r0,r0 > j_s.d [blink] > asr_s r0,r0 > > which requires only ~6 cycles, for the shorter shifts by 3 and sign > extension. > > > Tested with a cross-compiler to arc-linux hosted on x86_64, > with no new (compile-only) regressions from make -k check. > Ok for mainline if this passes Claudiu's nightly testing? > > > 2023-10-29 Roger Sayle <ro...@nextmovesoftware.com> > > gcc/ChangeLog > * config/arc/arc.cc (arc_rtx_costs): Improve cost estimates. > Provide reasonable values for SHIFTS and ROTATES by constant > bit counts depending upon TARGET_BARREL_SHIFTER. > (arc_insn_cost): Use insn attributes if the instruction is > recognized. Avoid calling get_attr_length for type "multi", > i.e. define_insn_and_split patterns without explicit type. > Fall-back to set_rtx_cost for single_set and pattern_cost > otherwise. > * config/arc/arc.h (COSTS_N_BYTES): Define helper macro. > (BRANCH_COST): Improve/correct definition. > (LOGICAL_OP_NON_SHORT_CIRCUIT): Preserve previous behavior. > > > Thanks again, > Roger > -- >