https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116575
--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> --- OK, so we fail single-lane SLP discovery where we succeeded with multi-lane. This is because the loads appear permuted during discovery and we have a masked load feeding a masked store. But we do not handle permuting masked operations so for single-lane discovery it appears as such. I'll for now avoid the situation and leave the actual fix for later.