On 5/30/2022 6:23 AM, Roger Sayle wrote:
Whilst investigating PR 55278, I noticed that the tree-ssa optimizers
aren't eliminating the promotions of shifts to "int" as inserted by the
c-family front-ends, instead leaving this simplification to be left to
the RTL optimizers.  This patch allows match.pd to do this itself earlier,
narrowing (T)(X << C) to (T)X << C when the constant C is known to be
valid for the (narrower) type T.

Hence for this simple test case:
short foo(short x) { return x << 5; }

the .optimized dump currently looks like:

short int foo (short int x)
{
   int _1;
   int _2;
   short int _4;

   <bb 2> [local count: 1073741824]:
   _1 = (int) x_3(D);
   _2 = _1 << 5;
   _4 = (short int) _2;
   return _4;
}

but with this patch, now becomes:

short int foo (short int x)
{
   short int _2;

   <bb 2> [local count: 1073741824]:
   _2 = x_1(D) << 5;
   return _2;
}

This is always reasonable as RTL expansion knows how to use
widening optabs if it makes sense at the RTL level to perform
this shift in a wider mode.

Of course, there's often a catch.  The above simplification not only
reduces the number of statements in gimple, but also allows further
optimizations, for example including the perception of rotate idioms
and bswap16.  Alas, optimizing things earlier than anticipated
requires several testsuite changes [though all these tests have
been confirmed to generate identical assembly code on x86_64].
The only significant change is that the vectorization pass previously
wouldn't vectorize rotations if the backend doesn't explicitly provide
an optab for them.  This is curious as if the rotate is expressed as
ior(lshift,rshift) it will vectorize, and likewise RTL expansion will
generate the iorv(lshiftv,rshiftv) sequence if required for a vector
mode rotation.  Hence this patch includes a tweak to the optabs
test in tree-vect-stmts.cc's vectorizable_shifts to better reflect
the functionality supported by RTL expansion.

This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-m32},
with no new failures.  Ok for mainline?


2022-05-30  Roger Sayle  <ro...@nextmovesoftware.com>

gcc/ChangeLog
         * match.pd (convert (lshift @1 INTEGER_CST@2)): Narrow integer
         left shifts by a constant when the result is truncated, and the
         shift constant is well-defined for the narrower mode.
         * tree-vect-stmts.cc (vectorizable_shift): Rotations by
         constants are vectorizable, if the backend supports logical
         shifts and IOR logical operations in the required vector mode.

gcc/testsuite/ChangeLog
         * gcc.dg/fold-convlshift-4.c: New test case.
         * gcc.dg/optimize-bswaphi-1.c: Update found bswap count.
         * gcc.dg/tree-ssa/pr61839_3.c: Shift is now optimized before VRP.
         * gcc.dg/vect/vect-over-widen-1-big-array.c: Remove obsolete tests.
         * gcc.dg/vect/vect-over-widen-1.c: Likewise.
         * gcc.dg/vect/vect-over-widen-3-big-array.c: Likewise.
         * gcc.dg/vect/vect-over-widen-3.c: Likewise.
         * gcc.dg/vect/vect-over-widen-4-big-array.c: Likewise.
         * gcc.dg/vect/vect-over-widen-4.c: Likewise.
So the worry here would be stuff like narrowing the source operand leading to partial stalls.  But as you indicated, if the target really wants to do the shift in a wider mode, it can.  Furthermore, the place to make that decision is at the gimple->rtl border, IMHO.

OK.

jeff

ps.  There may still be another old BZ for the lack of narrowing inhibiting vectorization IIRC.  I don't recall the specifics enough to hazard a guess if this patch will help or not.

Reply via email to