https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118019

--- Comment #7 from Robin Dapp <rdapp at gcc dot gnu.org> ---
> The problem is GCC-15 has performance regression compare to GCC-14 on both
> strict align and we should fix it, we can't specify use no strict align in
> GCC-15 to pretend that we don't have such performance regression.

The problem is that in GCC 15 we're now vectorizing the first loop and don't
cost it properly, most likely the vec_init/vec_construct is too inexpensive.
A solution could be two-fold:
 - Increase those costs (but we can never get them really correct in case a
vec_init is just a broadcast or so)
 - Enhance our vec_init expander to also allow construction from sub vectors so
the initialization becomes cheaper and we don't need to clumsily load every
uint8 element separately.

I can start with the second one which should also help other workloads.  There
are several follow-up items, though, including proper subreg handling for those
cases.

Reply via email to