https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117008
--- Comment #9 from Matt Bentley <mattreecebentley at gmail dot com> --- (In reply to Jonathan Wakely from comment #7) > I'm certain it won't be, because (apart from vector<bool>) they access > memory locations that are at least one byte, not repeating very similar > bitwise ops for every bit in a word. Have tested 4 additional scenarios: random access, sequential iteration with an if statement, the vector equivalent, and sequential iteration with an if statement and a more complicated action. Can confirm that the issue only occurs in the second of those, ie.: { std::bitset<12800000> values; for (unsigned int counter = 0; counter != 500; ++counter) { for (unsigned int index = 0; index != 12800000; ++index) { values.set(index, plf::rand() & 1); } for (unsigned int index = 0; index != 12800000; ++index) { if (values[index]) { total += values[index] * 2; } } } } While this statement is closer to what you might see in real world code ie. if 1, do this, I'm assuming it gets optimized into a conditional move or something, as the same 25% slowdown occurs. Once you add in a more complicated action ie.: for (unsigned int index = 0; index != 12800000; ++index) { if (values[index]) { total += values[index] * 2; total ^= values[index]; } } The problem goes away. So essentially Andrew's correct, unless the user's chosen action can be trivially optimized. Will generate the -v output for the second of the scenarios now.