https://gcc.gnu.org/bugzilla/show_bug.cgi?id=52252

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |rguenth at gcc dot gnu.org

--- Comment #9 from Richard Biener <rguenth at gcc dot gnu.org> ---
We are not optimally vectorizing this yet, we are using SLP to cover
out[0], out[1], out[2] and single element interleaving for out[3].  The
stores end up strided (aka scalar), that's not what the reporter intended.
We also unroll the loop four times.

The SLP discovery code splits the store group (in the end we should avoid
throwing away such information).  This makes it have a gap and stores with
a gap are only supported "strided" (we could at least store two and one
element, but ...).  We don't support "merging" back the group from SLP
and non-SLP.  With SLP only we might recover here, possibly we shouldn't
allow half SLP / non-SLP for a store group but it might fail even after
discovery so it might be difficult to force this.  Maybe a good case to
"prime" single-lane SLP.

Reply via email to