On Mon, Apr 6, 2015 at 2:19 PM, Bill Schmidt <wschm...@linux.vnet.ibm.com> wrote: > Hi, > > The prologue and epilogue code to save/restore Altivec registers uses > the generic emit_move_insn logic. This means that when VSX is available > on a little-endian target, we will generate xxswapd/stxvd2x for prologue > saves, and lxvd2x/xxswapd for epilogue restores. This happens too late > to be cleaned up by swap optimization. Since the stack save slots are > aligned, we should always use lvx and stvx for this purpose instead. > This improves performance on LE targets, is performance-neutral for BE, > and is always safe. > > This change causes the previously failing test > gcc.target/powerpc/swaps-p8-2.c to succeed again. This test started > failing when an unrelated change raised register pressure on the vector > registers, causing us to generate the above save/restore sequences. The > test was testing whether all xxswapd instructions had been removed. > With this change, we no longer generate them for the save/restores, and > the test succeeds. > > Because we no longer generate the two-instruction sequences, we no > longer need the code in rs6000_frame_related to identify the true source > register for the two-instruction store. I've removed that along with > the extra argument it required, and adjusted the callers. > > Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no > regressions, and one fixed failure. Is this ok for trunk after 5.1 > branches? I'd also like to backport this as far as 4.8 for the > performance improvement. > > Thanks, > Bill > > > 2015-04-06 Bill Schmidt <wschm...@linux.vnet.ibm.com> > > * config/rs6000/altivec.md (*altivec_lvx_<mode>_internal): Remove > asterisk from name so this can be generated directly. > (*altivec_stvx_<mode>_internal): Likewise. > * config/rs6000/rs6000.c (rs6000_frame_related): Remove split_reg > argument and logic that references it. > (emit_frame_save): Remove last parameter from call to > rs6000_frame_related. > (rs6000_emit_prologue): Remove last parameter from eight calls to > rs6000_frame_related. Force generation of stvx instruction for > Altivec register saves. Remove split_reg handling, which is no > longer needed. > (rs6000_emit_epilogue): Force generation of lvx instruction for > Altivec register restores.
This is okay after GCC 5 branches. But please insert a space between (void) and emit_insn when casting the result. > @@ -24817,7 +24805,10 @@ rs6000_emit_epilogue (int sibcall) > mem = gen_frame_mem (V4SImode, addr); > > reg = gen_rtx_REG (V4SImode, i); > - emit_move_insn (reg, mem); > + /* Rather than emitting a generic move, force use of the > + lvx instruction, which we always want. In particular > + we don't want lxvd2x/xxpermdi for little endian. */ > + (void)emit_insn (gen_altivec_lvx_v4si_internal (reg, mem)); > } > } > > @@ -25020,7 +25011,10 @@ rs6000_emit_epilogue (int sibcall) > mem = gen_frame_mem (V4SImode, addr); > > reg = gen_rtx_REG (V4SImode, i); > - emit_move_insn (reg, mem); > + /* Rather than emitting a generic move, force use of the > + lvx instruction, which we always want. In particular > + we don't want lxvd2x/xxpermdi for little endian. */ > + (void)emit_insn (gen_altivec_lvx_v4si_internal (reg, mem)); > } > } Thanks, David