https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114449
--- Comment #2 from Pali Rohár <pali at kernel dot org> --- Interesting... I was expecting that some -O3 or better -Ofast option tells gcc to optimize the code as much as possible. I added that pragma before for-loop in the first example and then gcc really optimized the code to just bswap instruction.