https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117008
--- Comment #4 from Matt Bentley <mattreecebentley at gmail dot com> --- Yeah, I know, I mentioned that in the report. It's not a bad benchmark, it's benchmarking access of individual consecutive bits, not summing. The counting is merely for preventing the compiler from optimizing out the loop. I could equally make it benchmark random indices and I imagine the problem would remain, though I haven't checked. Still, your point is valid in that most non-benchmark code would likely have more code around the access. Could potentially lead to misleading benchmark results in other scenarios though. I haven't tested whether vector/array indexing triggers the same bad vectorisation.