https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117556

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Assignee|unassigned at gcc dot gnu.org      |rguenth at gcc dot 
gnu.org
     Ever confirmed|0                           |1
   Last reconfirmed|                            |2024-11-13
             Status|UNCONFIRMED                 |ASSIGNED
                 CC|                            |rsandifo at gcc dot gnu.org

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
So for aarch64 we use vector([2,2]) long int, with a group size of four this
means the division is pos = [0, 4] / nunits = [2, 2].  This is similar to
PR117558 where this kind of division is too naiive for VLA.

The desired result is of course vec_entry == 0, vec_index == 0 in this case
(slp_index is zero, num_scalar == group_size == 4).  The VF is [1, 1] in
case this helps.

      int num_scalar = SLP_TREE_LANES (slp_node);
      int num_vec = SLP_TREE_NUMBER_OF_VEC_STMTS (slp_node);
      poly_uint64 pos = (num_vec * nunits) - num_scalar + slp_index;

and num_vec is VF * SLP_TREE_LANES / nunits, so

  pos = VF * num_scalar - num_scalar + slp_index;

where we know slp_index < num_scalar.  For VF [1, 1] pos simplifies to
slp_index?

With single-lane SLP we end up doing load-lanes where I think the code
will be confused anyway - we have to look at the permute node.  Fixing
that vectorizes the testcase as expected.

So mine.

Reply via email to