http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54682
--- Comment #2 from Oleg Endo <olegendo at gcc dot gnu.org> --- A related case, but the other way around: #include <bitset> std::bitset<32> make_bits (void) { std::bitset<32> r; for (auto&& i : { 4, 5, 6, 10 }) if (i < r.size ()) r.set (i); return r; } results in the following code (-O2): mov.l .L8,r1 mov #0,r0 mov #31,r7 mov #1,r6 // load constant '1' for '1 << x' mov #4,r2 .L2: mov.l @r1,r3 cmp/hi r7,r3 bf/s .L7 mov r6,r5 // copy constant '1' to r5 .L3: dt r2 bf/s .L2 add #4,r1 rts nop .align 1 .L7: shld r3,r5 // r5 <<= r3 bra .L3 or r5,r0 In this case one register is used to hold an imm8 constant that can be loaded with a single insn. Even though the insn 'mov Rm,Rn' is a zero-latency on SH4 and SH2A, freeing one register might result in better overall code.