https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116140
--- Comment #20 from Tamar Christina <tnfchris at gcc dot gnu.org> --- We're currently working on it. The improvements come from architectures where the code vectorized. The performance losses come from those where it didn't vectorize, or the vectorizer generated inefficient code. The GCC RTL unroller isn't very efficient, and that's where the LLVM gain comes from.