https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122152

--- Comment #7 from Robin Dapp <rdapp at gcc dot gnu.org> ---
The vcompress code looks like a "costing" issue.  The loop is now cheaper than
it was in 15 which makes us choose it in 16 while we rejected it before.

I'll see if we can do something in the target here as a bandaid, like making
the permute more expensive if we know it takes 2+ instructions.

Ah, of course, another improvement for (3) would be to let the vectorizer use
strided loads even for appropriate contiguous access patterns.

Reply via email to