https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110249
--- Comment #2 from David Brown <david at westcontrol dot com> --- My apologies - it does not optimise the code to a single aligned load at -O1 (at least, not with the combinations I have now tried). The code was originally C++, with a reference, which /does/ exhibit this behaviour of having better code at -O1 than -O2. I had translated the reference to a pointer to get more generic code that is valid as C and C++, but I don't see the same effect with the pointer. For a reference, the issue is as I first described: #include <stdint.h> #include <string.h> uint64_t read64r(const uint64_t &x) { if ((uint64_t) &x % 8 ) { __builtin_unreachable(); } uint64_t value; memcpy( &value, &x, sizeof(uint64_t) ); return value; } I tested with 64-bit RISC-V and 32-bit ARM using trunk (gcc 13.1, I think) on godbolt.org. The only relevant flag was the optimisation level. <https://godbolt.org/z/TaqdqeGch>