https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121766
--- Comment #4 from Tamar Christina <tnfchris at gcc dot gnu.org> --- The original change happened because with the cost model disabled we started costing inductions again and stopped costing truncations. The not costing of truncation is just a missing feature, but I think the reducer is too far reduced. with -msve-vector-bits=128 the Adv. SIMD code handles 16 bytes per iteration and uses one less store slot and the SVE code in the example 8. Benchmarking that loop confirms it. The Adv. SIMD loops is much faster than the SVE on every SVE core. So while there is a costing gap wrt to the truncating stores, the codegen is correct for the given example loop. Is it perhaps reduced too much?