https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80283

Marc Glisse <glisse at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |ra

--- Comment #4 from Marc Glisse <glisse at gcc dot gnu.org> ---
IMO this should be rtl-optimization or middle-end. The .optimized dump looks
fine to me. Expansion pulls many of the vec_duplicate to the beginning of the
loop (they were interleaved with the uses before), which increases live ranges
a lot, and nothing moves them back closer to their use. I don't know if doing
the reads early, as gcc chooses to do, can ever compensate for having to spill
on this testcase, since the memory access pattern seems quite cache-friendly.

Reply via email to