https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117008
Andrew Pinski <pinskia at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Ever confirmed|1 |0 Blocks| |53947 Status|NEW |UNCONFIRMED --- Comment #3 from Andrew Pinski <pinskia at gcc dot gnu.org> --- Note your `total+=values[index];` loop could be reduced down to just `total += values.count();` and that will over 10x faster. I am not sure sure if this is useful benchmark either. because count uses popcount directly. Maybe GCC could detect the popcount here but I am not sure. LLVM does a slightly better job at vectorizing the loop but still messes it up. Plus once you add other code around values[index], the vectorizer will no longer kick in so the slow down is only for this bad micro-benchmark. Referenced Bugs: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947 [Bug 53947] [meta-bug] vectorizer missed-optimizations