https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110717

--- Comment #13 from Segher Boessenkool <segher at gcc dot gnu.org> ---
So.  Before expand we have

  _6 = (__int128) x_3(D);
  x.0_1 = _6 << 59;
  _2 = x.0_1 >> 59;
  _4 = (__int128 unsigned) _2;
  return _4;

That should have been optimised better :-(

The RTL code it expands to sets the same pseudo multiple times.  Bad bad bad.
This hampers many optimisations.  Like:
(insn 6 3 7 2 (set (reg:DI 124)
        (lshiftrt:DI (reg:DI 129 [ x+8 ])
            (const_int 5 [0x5]))) "110717.c":6:11 299 {lshrdi3}
     (nil))
(insn 7 6 8 2 (set (reg:DI 132)
        (ashift:DI (reg:DI 128 [ x ])
            (const_int 59 [0x3b]))) "110717.c":6:11 289 {ashldi3}
     (nil))
(insn 8 7 9 2 (set (reg:DI 132)
        (ior:DI (reg:DI 124)
            (reg:DI 132))) "110717.c":6:11 233 {*booldi3}
     (nil))
(They are subregs right after expand, totally unreadable; this is after
subreg1, slightly more readable, but essentially the same code still).

The web pass eventually gets rid of the double set in this case.

Because the shift-left-then-right survives all the way to combine, it (being
the greedy bastard that it is) will use the combiner patterns rs6000 has for
multi-precision shifts, before it would notice the two (multiprecision!)
shifts together are largely a no-op, so you get stuck at a local optimum.
Pat for the course for combine :-/

Reply via email to