https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115640

--- Comment #10 from Andrew Stubbs <ams at gcc dot gnu.org> ---
On 26/06/2024 12:05, rguenth at gcc dot gnu.org wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115640
> 
> --- Comment #8 from Richard Biener <rguenth at gcc dot gnu.org> ---
> (In reply to Richard Biener from comment #7)
>> I will have a look (and for run validation try to reproduce with gfx1036).
> 
> OK, so with gfx1036 we end up using 16 byte vectors and the testcase
> passes.  The difference with gfx908 is
> 
> /space/rguenther/src/gcc-autopar_devel/gcc/testsuite/gfortran.dg/vect/pr115528.f:16:12:
> note:   ==> examining statement: _14 = aa[_13];
> /space/rguenther/src/gcc-autopar_devel/gcc/testsuite/gfortran.dg/vect/pr115528.f:16:12:
> note:   vect_model_load_cost: aligned.
> /space/rguenther/src/gcc-autopar_devel/gcc/testsuite/gfortran.dg/vect/pr115528.f:16:12:
> note:   vect_model_load_cost: inside_cost = 2, prologue_cost = 0 .
> 
> vs.
> 
> /space/rguenther/src/gcc-autopar_devel/gcc/testsuite/gfortran.dg/vect/pr115528.f:16:12:
> note:   ==> examining statement: _14 = aa[_13];
> /space/rguenther/src/gcc-autopar_devel/gcc/testsuite/gfortran.dg/vect/pr115528.f:16:12:
> missed:   unsupported vect permute { 0 0 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 
> 10
> 10 11 11 12 12 13 13 14 14 15 15 }
> /space/rguenther/src/gcc-autopar_devel/gcc/testsuite/gfortran.dg/vect/pr115528.f:16:12:
> missed:   unsupported load permutation
> /space/rguenther/src/gcc-autopar_devel/gcc/testsuite/gfortran.dg/vect/pr115528.f:19:72:
> missed:   not vectorized: relevant stmt not supported: _14 = aa[_13];
> /space/rguenther/src/gcc-autopar_devel/gcc/testsuite/gfortran.dg/vect/pr115528.f:16:12:
> note:   removing SLP instance operations starting from: REALPART_EXPR
> <(*hadcur_24(D))[_2]> = _86;
> /space/rguenther/src/gcc-autopar_devel/gcc/testsuite/gfortran.dg/vect/pr115528.f:16:12:
> missed:  unsupported SLP instances
> /space/rguenther/src/gcc-autopar_devel/gcc/testsuite/gfortran.dg/vect/pr115528.f:16:12:
> note:  re-trying with SLP disabled
> 
> so gfx1036 cannot do such permutes but gfx908 can?

GFX10 has more limited permutation capabilities than GFX9 because it 
only has 32-lane vectors natively, even though we're using the 64-lane 
"compatibility" mode.

However, in theory, the permutation capabilities on V32 and below should 
be the same, and some permutations on V64 are allowed, so I don't know 
why it doesn't use it. It's possible I broke the logic in 
gcn_vectorize_vec_perm_const:

   /* RDNA devices can only do permutations within each group of 32-lanes.
      Reject permutations that cross the boundary.  */
   if (TARGET_RDNA2_PLUS)
     for (unsigned int i = 0; i < nelt; i++)
       if (i < 31 ? perm[i] > 31 : perm[i] < 32)
         return false;

It looks right to me though?

The vec_extract patterns that also use permutations are likewise 
supposedly still enabled for V32 and below.

Andrew

Reply via email to