On Wed, Sep 16, 2015 at 10:28 AM, Bill Schmidt <wschm...@linux.vnet.ibm.com> wrote: > Hi, > > A recent patch proposal from Alan Hayward > (https://gcc.gnu.org/ml/gcc-patches/2015-09/msg00690.html) uncovered > that the PowerPC back end doesn't have expansions for > reduc_{smax,smin,umax,umin}_<mode> and > reduc_{smax,smin,umax,umin}_scal_<mode> for the integer modes. This > prevents vectorization of reductions involving comparisons that can be > transformed into REDUC_{MAX,MIN}_EXPR expressions. This patch adds > these expansions. > > PowerPC does not have hardware reduction instructions for maximum and > minimum. However, we can emulate this with varying degrees of > efficiency for different modes. The size of the expansion is > logarithmic in the number of vector elements E. The expansions for > reduc_{smax,smin,umax,umin}_<mode> consist of log E stages, each > comprising a rotate operation and a maximum or minimum operation. After > stage N, the maximum value in the vector will appear in at least 2^N > consecutive positions in the intermediate result. > > The ...scal_<mode> expansions just invoke the related non-scalar > expansions, and then extract an arbitrary element from the result > vector. > > The expansions for V16QI, V8HI, and V4SI require TARGET_ALTIVEC. The > expansions for V2DI make use of vector instructions added for ISA 2.07, > so they require TARGET_P8_VECTOR. > > I was able to use iterators for the sub-doubleword ...scal_<mode> > expansions, but that's all. I experimented with trying to use > code_iterators to generate the {smax,smin,umax,umin} expansions, but > couldn't find a way to make that work, as the substitution wasn't being > done into the UNSPEC constants. If there is a way to do this, please > let me know and I'll try to reduce the code size. > > There are already a number of common reduction execution tests that > exercise this logic. I've also added PowerPC-specific code generation > tests to verify the patterns produce what's expected. These are based > on the existing execution tests. > > Some future work will be required: > > (1) The vectorization cost model does not currently allow us to > distinguish between reductions of additions and reductions of max/min. > On PowerPC, these costs are very different, as the former is supported > by hardware and the latter is not. After this patch is applied, we will > possibly vectorize some code when it's not profitable to do so. I think > it's probably best to go ahead with this patch now, and deal with the > cost model as a separate issue after Alan's patch is complete and > upstream. > > (2) The use of rs6000_expand_vector_extract to obtain a scalar from a > vector is not optimal for sub-doubleword modes using the latest > hardware. Currently this generates a vector store followed by a scalar > load, which is Very Bad. We should instead use a mfvsrd and sign- or > zero-extend the rightmost element in the result GPR. To accomplish > this, we should update rs6000_expand_vector_extract to do the more > general thing: mfvsrd, shift the selected element into the rightmost > position, and extend it. At that time we should change the _scal_<mode> > expansions to select the element number that avoids the shift (that > number will differ for BE and LE). > > Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no > regressions. Is this ok for trunk? > > Thanks, > Bill > > > [gcc] > > 2015-09-16 Bill Schmidt <wschm...@linux.vnet.ibm.com> > > * config/rs6000/altivec.md (UNSPEC_REDUC_SMAX, UNSPEC_REDUC_SMIN, > UNSPEC_REDUC_UMAX, UNSPEC_REDUC_UMIN, UNSPEC_REDUC_SMAX_SCAL, > UNSPEC_REDUC_SMIN_SCAL, UNSPEC_REDUC_UMAX_SCAL, > UNSPEC_REDUC_UMIN_SCAL): New enumerated constants. > (reduc_smax_v2di): New define_expand. > (reduc_smax_scal_v2di): Likewise. > (reduc_smin_v2di): Likewise. > (reduc_smin_scal_v2di): Likewise. > (reduc_umax_v2di): Likewise. > (reduc_umax_scal_v2di): Likewise. > (reduc_umin_v2di): Likewise. > (reduc_umin_scal_v2di): Likewise. > (reduc_smax_v4si): Likewise. > (reduc_smin_v4si): Likewise. > (reduc_umax_v4si): Likewise. > (reduc_umin_v4si): Likewise. > (reduc_smax_v8hi): Likewise. > (reduc_smin_v8hi): Likewise. > (reduc_umax_v8hi): Likewise. > (reduc_umin_v8hi): Likewise. > (reduc_smax_v16qi): Likewise. > (reduc_smin_v16qi): Likewise. > (reduc_umax_v16qi): Likewise. > (reduc_umin_v16qi): Likewise. > (reduc_smax_scal_<mode>): Likewise. > (reduc_smin_scal_<mode>): Likewise. > (reduc_umax_scal_<mode>): Likewise. > (reduc_umin_scal_<mode>): Likewise. > > [gcc/testsuite] > > 2015-09-16 Bill Schmidt <wschm...@linux.vnet.ibm.com> > > * gcc.target/powerpc/vect-reduc-minmax-char.c: New. > * gcc.target/powerpc/vect-reduc-minmax-short.c: New. > * gcc.target/powerpc/vect-reduc-minmax-int.c: New. > * gcc.target/powerpc/vect-reduc-minmax-long.c: New.
This is okay. I don't think that I have seen iterators for UNSPECs, but maybe someone else is aware of the right idiom. Thanks, David