Hi! On Tue, Oct 13, 2020 at 04:40:53PM +0800, Hongtao Liu wrote: > For rtx like > (vec_select:V2SI (subreg:V4SI (inner:V2SI) 0) > (parallel [(const_int 0) (const_int 1)])) > it could be simplified as inner.
You could even simplify any vec_select of a subreg of X to just a vec_select of X, by changing the selection vector a bit (well, only do this if that is a constant vector, I suppose). Not just for paradoxical subregs either, just for *all* subregs. > gcc/ChangeLog > PR rtl-optimization/97249 > * simplify-rtx.c (simplify_binary_operation_1): Simplify > vec_select of paradoxical subreg. > > gcc/testsuite/ChangeLog > > * gcc.target/i386/pr97249-1.c: New test. > + /* For cases like > + (vec_select:V2SI (subreg:V4SI (inner:V2SI) 0) > + (parallel [(const_int 0) (const_int 1)])). > + return inner directly. */ > + if (GET_CODE (trueop0) == SUBREG > + && paradoxical_subreg_p (trueop0) > + && mode == GET_MODE (XEXP (trueop0, 0)) > + && (GET_MODE_NUNITS (GET_MODE (trueop0))).is_constant (&l0) > + && (GET_MODE_NUNITS (mode)).is_constant (&l1) > + && l0 % l1 == 0) Why this? Why does the number of elements of the input have to divide that of the output? > + { > + gcc_assert (known_eq (XVECLEN (trueop1, 0), l1)); > + unsigned HOST_WIDE_INT expect = (HOST_WIDE_INT_1U << l1) - 1; > + unsigned HOST_WIDE_INT sel = 0; > + int i = 0; > + for (;i != l1; i++) for (int i = 0; i != l1; i++) > + { > + rtx j = XVECEXP (trueop1, 0, i); > + if (!CONST_INT_P (j)) > + break; > + sel |= HOST_WIDE_INT_1U << UINTVAL (j); > + } > + /* ??? Need to simplify XEXP (trueop0, 0) here. */ > + if (sel == expect) > + return XEXP (trueop0, 0); > + } > } If you just handle the much more generic case, all the other vec_select simplifications can be done as well, not just this one. > +/* PR target/97249 */ > +/* { dg-do compile } */ > +/* { dg-options "-mavx2 -O3 -masm=att" } */ > +/* { dg-final { scan-assembler-times "vpmovzxbw\[ > \t\]+\\\(\[^\n\]*%xmm\[0-9\](?:\n|\[ \t\]+#)" 2 } } */ > +/* { dg-final { scan-assembler-times "vpmovzxwd\[ > \t\]+\\\(\[^\n\]*%xmm\[0-9\](?:\n|\[ \t\]+#)" 2 } } */ > +/* { dg-final { scan-assembler-times "vpmovzxdq\[ > \t\]+\\\(\[^\n\]*%xmm\[0-9\](?:\n|\[ \t\]+#)" 2 } } */ I don't know enough about the x86 backend to know if this is exactly what you need in the testsuite. I do know a case of backslashitis when I see one though -- you might want to use {} instead of "", and perhaps \m and \M and \s etc. And to make sure things are on one line, don't do all that nastiness with [^\n], just start the RE with (?n) :-) Segher