https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117556
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Assignee|unassigned at gcc dot gnu.org |rguenth at gcc dot gnu.org Ever confirmed|0 |1 Last reconfirmed| |2024-11-13 Status|UNCONFIRMED |ASSIGNED CC| |rsandifo at gcc dot gnu.org --- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> --- So for aarch64 we use vector([2,2]) long int, with a group size of four this means the division is pos = [0, 4] / nunits = [2, 2]. This is similar to PR117558 where this kind of division is too naiive for VLA. The desired result is of course vec_entry == 0, vec_index == 0 in this case (slp_index is zero, num_scalar == group_size == 4). The VF is [1, 1] in case this helps. int num_scalar = SLP_TREE_LANES (slp_node); int num_vec = SLP_TREE_NUMBER_OF_VEC_STMTS (slp_node); poly_uint64 pos = (num_vec * nunits) - num_scalar + slp_index; and num_vec is VF * SLP_TREE_LANES / nunits, so pos = VF * num_scalar - num_scalar + slp_index; where we know slp_index < num_scalar. For VF [1, 1] pos simplifies to slp_index? With single-lane SLP we end up doing load-lanes where I think the code will be confused anyway - we have to look at the permute node. Fixing that vectorizes the testcase as expected. So mine.