Hi Roger,

You have a block of 8 spaces that needs to be replaced by tabs:
gcc/config/arc/arc.cc:5538:0:       if (n < 4)

Please fix the above, and proceed with your commit.

Thank you,
Claudiu

On Sun, Oct 29, 2023 at 11:16 AM Roger Sayle <ro...@nextmovesoftware.com> wrote:
>
>
> This patch overhauls the ARC backend's insn_cost target hook, and makes
> some related improvements to rtx_costs, BRANCH_COST, etc.  The primary
> goal is to allow the backend to indicate that shifts and rotates are
> slow (discouraged) when the CPU doesn't have a barrel shifter. I should
> also acknowledge Richard Sandiford for inspiring the use of set_cost
> in this rewrite of arc_insn_cost; this implementation borrows heavily
> for the target hooks for AArch64 and ARM.
>
> The motivating example is derived from PR rtl-optimization/110717.
>
> struct S { int a : 5; };
> unsigned int foo (struct S *p) {
>   return p->a;
> }
>
> With a barrel shifter, GCC -O2 generates the reasonable:
>
> foo:    ldb_s   r0,[r0]
>         asl_s   r0,r0,27
>         j_s.d   [blink]
>         asr_s   r0,r0,27
>
> What's interesting is that during combine, the middle-end actually
> has two shifts by three bits, and a sign-extension from QI to SI.
>
> Trying 8, 9 -> 11:
>     8: r158:SI=r157:QI#0<<0x3
>       REG_DEAD r157:QI
>     9: r159:SI=sign_extend(r158:SI#0)
>       REG_DEAD r158:SI
>    11: r155:SI=r159:SI>>0x3
>       REG_DEAD r159:SI
>
> Whilst it's reasonable to simplify this to two shifts by 27 bits when
> the CPU has a barrel shifter, it's actually a significant pessimization
> when these shifts are implemented by loops.  This combination can be
> prevented if the backend provides accurate-ish estimates for insn_cost.
>
>
> Previously, without a barrel shifter, GCC -O2 -mcpu=em generates:
>
> foo:    ldb_s   r0,[r0]
>         mov     lp_count,27
>         lp      2f
>         add     r0,r0,r0
>         nop
> 2:      # end single insn loop
>         mov     lp_count,27
>         lp      2f
>         asr     r0,r0
>         nop
> 2:      # end single insn loop
>         j_s     [blink]
>
> which contains two loops and requires about ~113 cycles to execute.
> With this patch to rtx_cost/insn_cost, GCC -O2 -mcpu=em generates:
>
> foo:    ldb_s   r0,[r0]
>         mov_s   r2,0    ;3
>         add3    r0,r2,r0
>         sexb_s  r0,r0
>         asr_s   r0,r0
>         asr_s   r0,r0
>         j_s.d   [blink]
>         asr_s   r0,r0
>
> which requires only ~6 cycles, for the shorter shifts by 3 and sign
> extension.
>
>
> Tested with a cross-compiler to arc-linux hosted on x86_64,
> with no new (compile-only) regressions from make -k check.
> Ok for mainline if this passes Claudiu's nightly testing?
>
>
> 2023-10-29  Roger Sayle  <ro...@nextmovesoftware.com>
>
> gcc/ChangeLog
>         * config/arc/arc.cc (arc_rtx_costs): Improve cost estimates.
>         Provide reasonable values for SHIFTS and ROTATES by constant
>         bit counts depending upon TARGET_BARREL_SHIFTER.
>         (arc_insn_cost): Use insn attributes if the instruction is
>         recognized.  Avoid calling get_attr_length for type "multi",
>         i.e. define_insn_and_split patterns without explicit type.
>         Fall-back to set_rtx_cost for single_set and pattern_cost
>         otherwise.
>         * config/arc/arc.h (COSTS_N_BYTES): Define helper macro.
>         (BRANCH_COST): Improve/correct definition.
>         (LOGICAL_OP_NON_SHORT_CIRCUIT): Preserve previous behavior.
>
>
> Thanks again,
> Roger
> --
>

Reply via email to