https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49955
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |ASSIGNED Assignee|unassigned at gcc dot gnu.org |rguenth at gcc dot gnu.org --- Comment #5 from Richard Biener <rguenth at gcc dot gnu.org> --- The loop in comment#1 isn't vectorized because we do not have interleaving support for a group size of 5: t.f:18:17: missed: the size of the group of accesses is not a power of 2 or not equal to 3 t.f:18:17: missed: not falling back to elementwise accesses t.f:19:72: missed: not vectorized: relevant stmt not supported: t1_83 = (*q_82(D))[_21]; t.f:18:17: missed: bad operation or unsupported loop bound. we don't try to SLP this because there's just a single lane reduction. There's not really a loop vectorization opportunity and as comment#3 says there's at most a BB reduction opportunity. We try to analyze that now: _58 = powmult_9 + powmult_107; t7_108 = _58 + powmult_88; t7_109 = __builtin_sqrt (t7_108); M.7_110 = MAX_EXPR <t7_109, t8_126>; and t.f:28:72: note: Starting SLP discovery for t.f:28:72: note: powmult_88 = _106 * _106; t.f:28:72: note: powmult_9 = _101 * _101; t.f:28:72: note: powmult_107 = _96 * _96; t.f:28:72: note: starting SLP discovery for node 0x50ef8a0 t.f:28:72: note: Build SLP for powmult_88 = _106 * _106; t.f:28:72: note: get vectype for scalar type (group size 3): real(kind=8) t.f:28:72: note: vectype: vector(2) real(kind=8) t.f:28:72: note: nunits = 2 t.f:28:72: missed: Build SLP failed: unrolling required in basic block SLP we do not yet have code to limit a BB reduction vectorization to a subset of lanes (in this case it's uniform so choosing any power-of-two elements would work but ideally we'd let SLP discovery figure out the "best" lane combination to vectorize - there's more missing support for BB reduction vectorization).