Hi Roger,

Do you want to say bmsk_s instead of msk_s here:
+/* { dg-final { scan-assembler "msk_s\\s+r0,r0,0" } } */

Anyhow, the patch looks good. Proceed with your commit.

Thank you,
Claudiu

On Mon, Oct 30, 2023 at 5:05 AM Jeff Law <jeffreya...@gmail.com> wrote:
>
>
>
> On 10/28/23 10:47, Roger Sayle wrote:
> >
> > This patch optimizes PR middle-end/101955 for the ARC backend.  On ARC
> > CPUs with a barrel shifter, using two shifts is (probably) optimal as:
> >
> >          asl_s   r0,r0,31
> >          asr_s   r0,r0,31
> >
> > but without a barrel shifter, GCC -O2 -mcpu=em currently generates:
> >
> >          and     r2,r0,1
> >          ror     r2,r2
> >          add.f   0,r2,r2
> >          sbc     r0,r0,r0
> >
> > with this patch, we now generate the smaller, faster and non-flags
> > clobbering:
> >
> >          bmsk_s  r0,r0,0
> >          neg_s   r0,r0
> >
> > Tested with a cross-compiler to arc-linux hosted on x86_64,
> > with no new (compile-only) regressions from make -k check.
> > Ok for mainline if this passes Claudiu's nightly testing?
> >
> >
> > 2023-10-28  Roger Sayle  <ro...@nextmovesoftware.com>
> >
> > gcc/ChangeLog
> >          PR middle-end/101955
> >          * config/arc/arc.md (*extvsi_1_0): New define_insn_and_split
> >          to convert sign extract of the least significant bit into an
> >          AND $1 then a NEG when !TARGET_BARREL_SHIFTER.
> >
> > gcc/testsuite/ChangeLog
> >          PR middle-end/101955
> >          * gcc.target/arc/pr101955.c: New test case.
> Good catch.  Looking to do something very similar on the H8 based on
> your work here.
>
> One the H8 we can use bld to load a bit from an 8 bit register into the
> C flag.  Then we use subtract with carry to get an 8 bit 0/-1 which we
> can then sign extend to 16 or 32 bits.  That covers bit positions 0..15
> of an SImode input.
>
> For bits 16..31 we can move the high half into the low half, the use the
> bld sequence.
>
> For bit zero the and+neg is the same number of clocks and size as bld
> based sequence.  But it'll simulate faster, so it's special cased.
>
>
> Jeff
>

Reply via email to