https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68707
--- Comment #13 from alalaw01 at gcc dot gnu.org --- Hmmm, I realize a "definite" codegen improvement was maybe a bad choice of wording. A "substantial" (albeit uncertain!) improvement, may have been more accurate... However, yes it looks like we want that patch (indeed, it still helps even when we up the cost of permute operations and drop the -fno-vect-cost-model) - so thanks, Richard. We'll clean up the testisms in due course. In the longer term, is the issue here, that we aren't comparing costs of SLP vs load-lanes, right? We merely compare the cost of whichever of those vectorization strategies we favour, permutes et al, vs leaving it in scalar code?