On Wed, Oct 14, 2020 at 4:01 AM Segher Boessenkool <seg...@kernel.crashing.org> wrote: > > Hi! > > On Tue, Oct 13, 2020 at 04:40:53PM +0800, Hongtao Liu wrote: > > For rtx like > > (vec_select:V2SI (subreg:V4SI (inner:V2SI) 0) > > (parallel [(const_int 0) (const_int 1)])) > > it could be simplified as inner. > > You could even simplify any vec_select of a subreg of X to just a > vec_select of X, by changing the selection vector a bit (well, only do
Yes, when SUBREG_BYTE of trueop0 is not 0, we need to add offset to selection. > this if that is a constant vector, I suppose). Not just for paradoxical > subregs either, just for *all* subregs. > Yes, and only when X has the same inner mode and more elements. > > gcc/ChangeLog > > PR rtl-optimization/97249 > > * simplify-rtx.c (simplify_binary_operation_1): Simplify > > vec_select of paradoxical subreg. > > > > gcc/testsuite/ChangeLog > > > > * gcc.target/i386/pr97249-1.c: New test. > > > + /* For cases like > > + (vec_select:V2SI (subreg:V4SI (inner:V2SI) 0) > > + (parallel [(const_int 0) (const_int 1)])). > > + return inner directly. */ > > + if (GET_CODE (trueop0) == SUBREG > > + && paradoxical_subreg_p (trueop0) > > + && mode == GET_MODE (XEXP (trueop0, 0)) > > + && (GET_MODE_NUNITS (GET_MODE (trueop0))).is_constant (&l0) > > + && (GET_MODE_NUNITS (mode)).is_constant (&l1) > > + && l0 % l1 == 0) > > Why this? Why does the number of elements of the input have to divide > that of the output? > Removed, also add condition for my upper comments. > > + { > > + gcc_assert (known_eq (XVECLEN (trueop1, 0), l1)); > > + unsigned HOST_WIDE_INT expect = (HOST_WIDE_INT_1U << l1) - 1; > > + unsigned HOST_WIDE_INT sel = 0; > > + int i = 0; > > + for (;i != l1; i++) > > for (int i = 0; i != l1; i++) > > > + { > > + rtx j = XVECEXP (trueop1, 0, i); > > + if (!CONST_INT_P (j)) > > + break; > > + sel |= HOST_WIDE_INT_1U << UINTVAL (j); > > + } > > + /* ??? Need to simplify XEXP (trueop0, 0) here. */ > > + if (sel == expect) > > + return XEXP (trueop0, 0); > > + } > > } > > If you just handle the much more generic case, all the other vec_select > simplifications can be done as well, not just this one. > Yes, changed, also selection should be inside the elements of X. > > +/* PR target/97249 */ > > +/* { dg-do compile } */ > > +/* { dg-options "-mavx2 -O3 -masm=att" } */ > > +/* { dg-final { scan-assembler-times "vpmovzxbw\[ > > \t\]+\\\(\[^\n\]*%xmm\[0-9\](?:\n|\[ \t\]+#)" 2 } } */ > > +/* { dg-final { scan-assembler-times "vpmovzxwd\[ > > \t\]+\\\(\[^\n\]*%xmm\[0-9\](?:\n|\[ \t\]+#)" 2 } } */ > > +/* { dg-final { scan-assembler-times "vpmovzxdq\[ > > \t\]+\\\(\[^\n\]*%xmm\[0-9\](?:\n|\[ \t\]+#)" 2 } } */ > > I don't know enough about the x86 backend to know if this is exactly > what you need in the testsuite. I do know a case of backslashitis when > I see one though -- you might want to use {} instead of "", and perhaps > \m and \M and \s etc. And to make sure things are on one line, don't do > all that nastiness with [^\n], just start the RE with (?n) :-) > Yes, changed and it's very clean with usage of (?n) and {}. > > Segher Update patch. -- BR, Hongtao
From df71eb46e394e5b778c69e9e8f25b301997e365d Mon Sep 17 00:00:00 2001 From: liuhongt <hongtao....@intel.com> Date: Tue, 13 Oct 2020 15:35:29 +0800 Subject: [PATCH] Simplify vec_select of a subreg of X to just a vec_select of X. gcc/ChangeLog PR rtl-optimization/97249 * simplify-rtx.c (simplify_binary_operation_1): Simplify vec_select of a subreg of X to a vec_select of X when available. gcc/testsuite/ChangeLog * gcc.target/i386/pr97249-1.c: New test. --- gcc/simplify-rtx.c | 44 +++++++++++++++++++++++ gcc/testsuite/gcc.target/i386/pr97249-1.c | 30 ++++++++++++++++ 2 files changed, 74 insertions(+) create mode 100644 gcc/testsuite/gcc.target/i386/pr97249-1.c diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c index 869f0d11b2e..8a10b6cf4d5 100644 --- a/gcc/simplify-rtx.c +++ b/gcc/simplify-rtx.c @@ -4170,6 +4170,50 @@ simplify_binary_operation_1 (enum rtx_code code, machine_mode mode, return subop1; } } + + /* Simplify vec_select of a subreg of X to just a vec_select of X + when available. */ + int l2; + if (GET_CODE (trueop0) == SUBREG + && (GET_MODE_INNER (mode) + == GET_MODE_INNER (GET_MODE (XEXP (trueop0, 0)))) + && (GET_MODE_NUNITS (GET_MODE (trueop0))).is_constant (&l0) + && (GET_MODE_NUNITS (mode)).is_constant (&l1) + && (GET_MODE_NUNITS (GET_MODE (XEXP (trueop0, 0)))) + .is_constant (&l2) + && known_le (l1, l2)) + { + unsigned HOST_WIDE_INT subreg_offset = 0; + gcc_assert (known_eq (XVECLEN (trueop1, 0), l1)); + gcc_assert (can_div_trunc_p (SUBREG_BYTE (trueop0), + GET_MODE_SIZE (GET_MODE_INNER (mode)), + &subreg_offset)); + bool success = true; + for (int i = 0;i != l1; i++) + { + rtx j = XVECEXP (trueop1, 0, i); + if (!CONST_INT_P (j) + || known_ge (UINTVAL (j), l2 - subreg_offset)) + { + success = false; + break; + } + } + if (success) + { + rtx par = trueop1; + if (subreg_offset) + { + rtvec vec = rtvec_alloc (l1); + for (int i = 0; i < l1; i++) + RTVEC_ELT (vec, i) + = GEN_INT (INTVAL (XVECEXP (trueop1, 0, i) + + subreg_offset)); + par = gen_rtx_PARALLEL (VOIDmode, vec); + } + return gen_rtx_VEC_SELECT (mode, XEXP (trueop0, 0), par); + } + } } if (XVECLEN (trueop1, 0) == 1 diff --git a/gcc/testsuite/gcc.target/i386/pr97249-1.c b/gcc/testsuite/gcc.target/i386/pr97249-1.c new file mode 100644 index 00000000000..4478a34a9f8 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr97249-1.c @@ -0,0 +1,30 @@ +/* PR target/97249 */ +/* { dg-do compile } */ +/* { dg-options "-mavx2 -O3 -masm=att" } */ +/* { dg-final { scan-assembler-times {(?n)vpmovzxbw[ \t]+\(.*%xmm[0-9]} 2 } } */ +/* { dg-final { scan-assembler-times {(?n)vpmovzxwd[ \t]+\(.*%xmm[0-9]} 2 } } */ +/* { dg-final { scan-assembler-times {(?n)vpmovzxdq[ \t]+\(.*%xmm[0-9]} 2 } } */ + +void +foo (unsigned char* p1, unsigned char* p2, short* __restrict p3) +{ + for (int i = 0 ; i != 8; i++) + p3[i] = p1[i] + p2[i]; + return; +} + +void +foo1 (unsigned short* p1, unsigned short* p2, int* __restrict p3) +{ + for (int i = 0 ; i != 4; i++) + p3[i] = p1[i] + p2[i]; + return; +} + +void +foo2 (unsigned int* p1, unsigned int* p2, long long* __restrict p3) +{ + for (int i = 0 ; i != 2; i++) + p3[i] = (long long)p1[i] + (long long)p2[i]; + return; +} -- 2.18.1