https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117008
Andrew Pinski <pinskia at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Keywords| |missed-optimization Status|UNCONFIRMED |NEW Ever confirmed|0 |1 Last reconfirmed| |2024-10-08 Target| |x86_64-*-* --- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> --- Looks like the reduction loop is vectorized and that is causing the slow down. Semi reduced (unincluded) testcase: ``` #include <bitset> void g(std::bitset<12800000> &); int f() { unsigned int total = 0; std::bitset<12800000> values; g(values); for (unsigned int index = 0; index != 12800000; ++index) total += values[index]; return total ; } ``` For Linux, you need `-m32 -O2 -mavx2` (-m32 since it uses long and for mingw that is 32bits while for linux it is 64bits and that does not get vectorized).