Hi!

On Tue, Oct 13, 2020 at 04:40:53PM +0800, Hongtao Liu wrote:
>   For rtx like
>   (vec_select:V2SI (subreg:V4SI (inner:V2SI) 0)
>                    (parallel [(const_int 0) (const_int 1)]))
>  it could be simplified as inner.

You could even simplify any vec_select of a subreg of X to just a
vec_select of X, by changing the selection vector a bit (well, only do
this if that is a constant vector, I suppose).  Not just for paradoxical
subregs either, just for *all* subregs.

> gcc/ChangeLog
>         PR rtl-optimization/97249
>         * simplify-rtx.c (simplify_binary_operation_1): Simplify
>         vec_select of paradoxical subreg.
> 
> gcc/testsuite/ChangeLog
> 
>         * gcc.target/i386/pr97249-1.c: New test.

> +       /* For cases like
> +          (vec_select:V2SI (subreg:V4SI (inner:V2SI) 0)
> +                           (parallel [(const_int 0) (const_int 1)])).
> +          return inner directly.  */
> +       if (GET_CODE (trueop0) == SUBREG
> +           && paradoxical_subreg_p (trueop0)
> +           && mode == GET_MODE (XEXP (trueop0, 0))
> +           && (GET_MODE_NUNITS (GET_MODE (trueop0))).is_constant (&l0)
> +           && (GET_MODE_NUNITS (mode)).is_constant (&l1)
> +           && l0 % l1 == 0)

Why this?  Why does the number of elements of the input have to divide
that of the output?

> +         {
> +           gcc_assert (known_eq (XVECLEN (trueop1, 0), l1));
> +           unsigned HOST_WIDE_INT expect = (HOST_WIDE_INT_1U << l1) - 1;
> +           unsigned HOST_WIDE_INT sel = 0;
> +           int i = 0;
> +           for (;i != l1; i++)

  for (int i = 0; i != l1; i++)

> +             {
> +               rtx j = XVECEXP (trueop1, 0, i);
> +               if (!CONST_INT_P (j))
> +                 break;
> +               sel |= HOST_WIDE_INT_1U << UINTVAL (j);
> +             }
> +           /* ??? Need to simplify XEXP (trueop0, 0) here.  */
> +           if (sel == expect)
> +             return XEXP (trueop0, 0);
> +         }
>       }

If you just handle the much more generic case, all the other vec_select
simplifications can be done as well, not just this one.

> +/* PR target/97249  */
> +/* { dg-do compile } */
> +/* { dg-options "-mavx2 -O3 -masm=att" } */
> +/* { dg-final { scan-assembler-times "vpmovzxbw\[ 
> \t\]+\\\(\[^\n\]*%xmm\[0-9\](?:\n|\[ \t\]+#)" 2 } } */
> +/* { dg-final { scan-assembler-times "vpmovzxwd\[ 
> \t\]+\\\(\[^\n\]*%xmm\[0-9\](?:\n|\[ \t\]+#)" 2 } } */
> +/* { dg-final { scan-assembler-times "vpmovzxdq\[ 
> \t\]+\\\(\[^\n\]*%xmm\[0-9\](?:\n|\[ \t\]+#)" 2 } } */

I don't know enough about the x86 backend to know if this is exactly
what you need in the testsuite.  I do know a case of backslashitis when
I see one though -- you might want to use {} instead of "", and perhaps
\m and \M and \s etc.  And to make sure things are on one line, don't do
all that nastiness with [^\n], just start the RE with (?n) :-)


Segher

Reply via email to