https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110717
--- Comment #13 from Segher Boessenkool <segher at gcc dot gnu.org> --- So. Before expand we have _6 = (__int128) x_3(D); x.0_1 = _6 << 59; _2 = x.0_1 >> 59; _4 = (__int128 unsigned) _2; return _4; That should have been optimised better :-( The RTL code it expands to sets the same pseudo multiple times. Bad bad bad. This hampers many optimisations. Like: (insn 6 3 7 2 (set (reg:DI 124) (lshiftrt:DI (reg:DI 129 [ x+8 ]) (const_int 5 [0x5]))) "110717.c":6:11 299 {lshrdi3} (nil)) (insn 7 6 8 2 (set (reg:DI 132) (ashift:DI (reg:DI 128 [ x ]) (const_int 59 [0x3b]))) "110717.c":6:11 289 {ashldi3} (nil)) (insn 8 7 9 2 (set (reg:DI 132) (ior:DI (reg:DI 124) (reg:DI 132))) "110717.c":6:11 233 {*booldi3} (nil)) (They are subregs right after expand, totally unreadable; this is after subreg1, slightly more readable, but essentially the same code still). The web pass eventually gets rid of the double set in this case. Because the shift-left-then-right survives all the way to combine, it (being the greedy bastard that it is) will use the combiner patterns rs6000 has for multi-precision shifts, before it would notice the two (multiprecision!) shifts together are largely a no-op, so you get stuck at a local optimum. Pat for the course for combine :-/