Roman Zippel <[EMAIL PROTECTED]> writes: > The new subreg lowering pass seems to generate a bit worse code on m68k > than before, let's take simple example: > > unsigned long long f(unsigned long long a, unsigned long long b) > { > return a + b; > } > > where currently gcc generates code like this: > > move.l 16(%sp),%d1 > move.l 20(%sp),%d2 > move.l 8(%sp),%d0 > add.l 12(%sp),%d2 > addx.l %d0,%d1 > move.l %d1,%d0 > move.l %d2,%d1 > > whereas with -fno-split-wide-types it generates this: > > move.l 16(%sp),%d0 > move.l 20(%sp),%d1 > move.l 8(%sp),%d2 > add.l 12(%sp),%d1 > addx.l %d2,%d0 > > How can I get rid of these extra move instructions?
The standard answer would be to add a define_split for the adddi3 insn which triggers before reload. But that is problematic on a CC0 system where you want to preserve the overflow flag. I'm not sure what to suggest at the moment. Note that there is still an extra move.l insn in the -fno-split-wide-types version. > Another more general question would be how should be wide registers > handled in general. In the past I tried to avoid splitting instructions > before reload, exactly because the extra subregs caused worse code. Has > this changed? AFAICT this would mean in the back end to split DI values as > early as possible, which could have its advantages, but also its > challenges, as m68k is still a cc0 target and with instructions like > addx.l above, so far I avoided splitting these at all. Yes, it is in general better now to split double-word length operations before reload. It's not necessarily better to split as early as possible, as that will essentially disable the RTL level loop optimizations. But it's still problematic to split before reload on a CC0 system. Ian