On Mon, Oct 19, 2020 at 11:31 PM Richard Sandiford <richard.sandif...@arm.com> wrote: > > Hongtao Liu <crazy...@gmail.com> writes: > > On Thu, Oct 15, 2020 at 8:38 PM Richard Sandiford > > <richard.sandif...@arm.com> wrote: > >> > >> Hongtao Liu via Gcc-patches <gcc-patches@gcc.gnu.org> writes: > >> > + /* Simplify vec_select of a subreg of X to just a vec_select of X > >> > + when X has same component mode as vec_select. */ > >> > + int l2; > >> > + if (GET_CODE (trueop0) == SUBREG > >> > + && GET_MODE_INNER (mode) > >> > + == GET_MODE_INNER (GET_MODE (XEXP (trueop0, 0))) > >> > >> Better to use SUBREG_REG here and below. > >> > > > > Yes and changed. > > > >> > + && (GET_MODE_NUNITS (GET_MODE (trueop0))).is_constant (&l0) > >> > + && (GET_MODE_NUNITS (mode)).is_constant (&l1) > >> > + && (GET_MODE_NUNITS (GET_MODE (XEXP (trueop0, 0)))) > >> > + .is_constant (&l2) > >> > + && known_le (l1, l2)) > >> > + { > >> > + unsigned HOST_WIDE_INT subreg_offset = 0; > >> > + gcc_assert (known_eq (XVECLEN (trueop1, 0), l1)); > >> > + gcc_assert (can_div_trunc_p (exact_div (subreg_lsb > >> > (trueop0), BITS_PER_UNIT), > >> > + GET_MODE_SIZE (GET_MODE_INNER > >> > (mode)), > >> > + &subreg_offset)); > >> > >> can_div_trunc_p discards the remainder, whereas it looks like here > >> you want an exact multiple. > >> > >> I don't think it's absolutely guaranteed that the “if” condition makes > >> the division by GET_MODE_SIZE exact. E.g. in principle you could have > >> a subreg of a vector of TIs in which the subreg offset is misaligned by > >> a DI offset. > >> > >> I'm not sure the subreg_lsb conversion is correct though. On big-endian > >> targets, lane numbering follows memory layout, just like subreg byte > >> offsets do. So ISTM that using SUBREG_BYTE (as per the earlier patch) > >> was correct. > >> > >> In summary, I think the "if” condition should include something like: > >> > >> constant_mulitple_p (SUBREG_BYTE (trueop0), > >> GET_MODE_UNIT_BITSIZE (mode), > >> &subreg_offset) > >> > > > > Changed. > > > >> Thanks, > >> Richard > > > > > > Update patch. > > > > -- > > BR, > > Hongtao > > > > From 8d154067963e453c337e6dc2c4f3f19bf0d6e11b Mon Sep 17 00:00:00 2001 > > From: liuhongt <hongtao....@intel.com> > > Date: Tue, 13 Oct 2020 15:35:29 +0800 > > Subject: [PATCH] Simplify vec_select of a subreg of X to just a vec_select > > of > > X. > > > > gcc/ChangeLog > > PR rtl-optimization/97249 > > * simplify-rtx.c (simplify_binary_operation_1): Simplify > > vec_select of a subreg of X to a vec_select of X. > > > > gcc/testsuite/ChangeLog > > > > * gcc.target/i386/pr97249-1.c: New test. > > --- > > gcc/simplify-rtx.c | 44 +++++++++++++++++++++++ > > gcc/testsuite/gcc.target/i386/pr97249-1.c | 30 ++++++++++++++++ > > 2 files changed, 74 insertions(+) > > create mode 100644 gcc/testsuite/gcc.target/i386/pr97249-1.c > > > > diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c > > index 869f0d11b2e..b1009837b2b 100644 > > --- a/gcc/simplify-rtx.c > > +++ b/gcc/simplify-rtx.c > > @@ -4170,6 +4170,50 @@ simplify_binary_operation_1 (enum rtx_code code, > > machine_mode mode, > > return subop1; > > } > > } > > + > > + /* Simplify vec_select of a subreg of X to just a vec_select of X > > + when X has same component mode as vec_select. */ > > + int l2; > > + unsigned HOST_WIDE_INT subreg_offset = 0; > > + if (GET_CODE (trueop0) == SUBREG > > + && GET_MODE_INNER (mode) > > + == GET_MODE_INNER (GET_MODE (SUBREG_REG (trueop0))) > > + && (GET_MODE_NUNITS (GET_MODE (trueop0))).is_constant (&l0) > > Nothing really relies on this last line, and nothing uses l0, so better > to drop it. >
Changed, so there won't be any vector mode with variable number elts. > > + && (GET_MODE_NUNITS (mode)).is_constant (&l1) > > + && (GET_MODE_NUNITS (GET_MODE (SUBREG_REG (trueop0)))) > > + .is_constant (&l2) > > + && known_le (l1, l2) > > I'm not sure the last two &&s are really the important condition. > I think we should drop them for the suggestion below. > Changed, assume gcc also support something like (vec_select:v4di (reg:v2di) (parallel [ (const_int 0) (const_int 1) (const_int 1) (const_int 0)])) as long as the range of selection guaranteed by || maybe_ge (UINTVAL (idx) + subreg_offset, nunits)) > > + && constant_multiple_p (SUBREG_BYTE (trueop0), > > + GET_MODE_UNIT_BITSIZE (mode), > > + &subreg_offset)) > > + { > > + > > Excess blank line. > Changed. > > + gcc_assert (known_eq (XVECLEN (trueop1, 0), l1)); > > This can just use ==. > Changed. > > + bool success = true; > > + for (int i = 0; i != l1; i++) > > + { > > + rtx idx = XVECEXP (trueop1, 0, i); > > Excess space. Changed. > > > + if (!CONST_INT_P (idx)) > > Here I think we should check: > > || maybe_ge (UINTVAL (idx) + subreg_offset, nunits)) > > where: > > poly_uint64 nunits > = GET_MODE_NUNITS (GET_MODE (SUBREG_REG (trueop0)))). > Changed. > This makes sure that all indices are in range. In particular, it's > valid for the SUBREG_REG to be narrower than mode, for appropriate > vec_select indices > Yes, that's what paradoxical subreg means. > Thanks, > Richard -- BR, Hongtao
From 5dc3de7f3fabb932be2c097db00b75061228caaf Mon Sep 17 00:00:00 2001 From: liuhongt <hongtao....@intel.com> Date: Tue, 13 Oct 2020 15:35:29 +0800 Subject: [PATCH] Simplify vec_select of a subreg of X to just a vec_select of X. gcc/ChangeLog PR rtl-optimization/97249 * simplify-rtx.c (simplify_binary_operation_1): Simplify vec_select of a subreg of X to a vec_select of X. gcc/testsuite/ChangeLog * gcc.target/i386/pr97249-1.c: New test. --- gcc/simplify-rtx.c | 41 +++++++++++++++++++++++ gcc/testsuite/gcc.target/i386/pr97249-1.c | 30 +++++++++++++++++ 2 files changed, 71 insertions(+) create mode 100644 gcc/testsuite/gcc.target/i386/pr97249-1.c diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c index 869f0d11b2e..df751318237 100644 --- a/gcc/simplify-rtx.c +++ b/gcc/simplify-rtx.c @@ -4170,6 +4170,47 @@ simplify_binary_operation_1 (enum rtx_code code, machine_mode mode, return subop1; } } + + /* Simplify vec_select of a subreg of X to just a vec_select of X + when X has same component mode as vec_select. */ + unsigned HOST_WIDE_INT subreg_offset = 0; + if (GET_CODE (trueop0) == SUBREG + && GET_MODE_INNER (mode) + == GET_MODE_INNER (GET_MODE (SUBREG_REG (trueop0))) + && (GET_MODE_NUNITS (mode)).is_constant (&l1) + && constant_multiple_p (SUBREG_BYTE (trueop0), + GET_MODE_UNIT_BITSIZE (mode), + &subreg_offset)) + { + gcc_assert (XVECLEN (trueop1, 0) == l1); + bool success = true; + poly_uint64 nunits + = GET_MODE_NUNITS (GET_MODE (SUBREG_REG (trueop0))); + for (int i = 0; i != l1; i++) + { + rtx idx = XVECEXP (trueop1, 0, i); + if (!CONST_INT_P (idx) + || maybe_ge (UINTVAL (idx) + subreg_offset, nunits)) + { + success = false; + break; + } + } + if (success) + { + rtx par = trueop1; + if (subreg_offset) + { + rtvec vec = rtvec_alloc (l1); + for (int i = 0; i < l1; i++) + RTVEC_ELT (vec, i) + = GEN_INT (INTVAL (XVECEXP (trueop1, 0, i) + + subreg_offset)); + par = gen_rtx_PARALLEL (VOIDmode, vec); + } + return gen_rtx_VEC_SELECT (mode, SUBREG_REG (trueop0), par); + } + } } if (XVECLEN (trueop1, 0) == 1 diff --git a/gcc/testsuite/gcc.target/i386/pr97249-1.c b/gcc/testsuite/gcc.target/i386/pr97249-1.c new file mode 100644 index 00000000000..4478a34a9f8 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr97249-1.c @@ -0,0 +1,30 @@ +/* PR target/97249 */ +/* { dg-do compile } */ +/* { dg-options "-mavx2 -O3 -masm=att" } */ +/* { dg-final { scan-assembler-times {(?n)vpmovzxbw[ \t]+\(.*%xmm[0-9]} 2 } } */ +/* { dg-final { scan-assembler-times {(?n)vpmovzxwd[ \t]+\(.*%xmm[0-9]} 2 } } */ +/* { dg-final { scan-assembler-times {(?n)vpmovzxdq[ \t]+\(.*%xmm[0-9]} 2 } } */ + +void +foo (unsigned char* p1, unsigned char* p2, short* __restrict p3) +{ + for (int i = 0 ; i != 8; i++) + p3[i] = p1[i] + p2[i]; + return; +} + +void +foo1 (unsigned short* p1, unsigned short* p2, int* __restrict p3) +{ + for (int i = 0 ; i != 4; i++) + p3[i] = p1[i] + p2[i]; + return; +} + +void +foo2 (unsigned int* p1, unsigned int* p2, long long* __restrict p3) +{ + for (int i = 0 ; i != 2; i++) + p3[i] = (long long)p1[i] + (long long)p2[i]; + return; +} -- 2.18.1