[Bug tree-optimization/49955] Fails to do partial basic-block SLP

rguenth at gcc dot gnu.org via Gcc-bugs Mon, 07 Aug 2023 02:11:02 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49955


Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |ASSIGNED
           Assignee|unassigned at gcc dot gnu.org      |rguenth at gcc dot 
gnu.org

--- Comment #5 from Richard Biener <rguenth at gcc dot gnu.org> ---
The loop in comment#1 isn't vectorized because we do not have interleaving
support for a group size of 5:

t.f:18:17: missed:   the size of the group of accesses is not a power of 2 or
not equal to 3
t.f:18:17: missed:   not falling back to elementwise accesses
t.f:19:72: missed:   not vectorized: relevant stmt not supported: t1_83 =
(*q_82(D))[_21];
t.f:18:17: missed:  bad operation or unsupported loop bound.

we don't try to SLP this because there's just a single lane reduction.  There's
not really a loop vectorization opportunity and as comment#3 says there's at
most a BB reduction opportunity.  We try to analyze that now:

  _58 = powmult_9 + powmult_107;
  t7_108 = _58 + powmult_88;
  t7_109 = __builtin_sqrt (t7_108);
  M.7_110 = MAX_EXPR <t7_109, t8_126>;

and

t.f:28:72: note:   Starting SLP discovery for
t.f:28:72: note:     powmult_88 = _106 * _106;
t.f:28:72: note:     powmult_9 = _101 * _101;
t.f:28:72: note:     powmult_107 = _96 * _96;
t.f:28:72: note:   starting SLP discovery for node 0x50ef8a0
t.f:28:72: note:   Build SLP for powmult_88 = _106 * _106;
t.f:28:72: note:   get vectype for scalar type (group size 3): real(kind=8)
t.f:28:72: note:   vectype: vector(2) real(kind=8)
t.f:28:72: note:   nunits = 2
t.f:28:72: missed:   Build SLP failed: unrolling required in basic block SLP

we do not yet have code to limit a BB reduction vectorization to a subset
of lanes (in this case it's uniform so choosing any power-of-two elements
would work but ideally we'd let SLP discovery figure out the "best"
lane combination to vectorize - there's more missing support for BB
reduction vectorization).

[Bug tree-optimization/49955] Fails to do partial basic-block SLP

Reply via email to