> Hi! > > Honza recently changed the i?86 backend, so that it often doesn't > do -maccumulate-outgoing-args by default on x86_64. > Unfortunately, on some of the here included testcases this regressed > quite a bit the generated code. As AVX vectors are used, the dynamic
Yep, the accidental change to disable register accumulation was apparently on too long. I still plan to switch accmulate-outoging-args on for generic - I benchmarked it on AMD targets and it seems to be good, I sitll need to analyze the code size implication though. At some testers we seems to have regressed in 32bit that may be related to fact that frame pointer enabled codegen is more compact and Vladimir patch, while restoring performance, led to code size issues. > realignment code needs to assume e.g. that some of them will need to be > spilled, and for -mno-accumulate-outgoing-args the code needs to set > need_drap early as well. But in when emitting the prologue/epilogue, > if need_drap is set, we don't perform the optimization for leaf functions > which have zero size stack frame, thus we end up with uselessly doing > dynamic stack realignment, setting up DRAP that nothing uses and later on > restore everything back. > > This patch improves it, if the DRAP register isn't live at the start of > entry bb successor and we aren't going to realign the stack, we don't > need DRAP at all, and even if we need DRAP register, that can't be the sole > reason for doing stack realignment, the prologue code is able to set up DRAP > even without dynamic stack realignment. > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? > > 2013-12-20 Jakub Jelinek <ja...@redhat.com> > > PR target/59501 > * config/i386/i386.c (ix86_save_reg): Don't return true for drap_reg > if !crtl->stack_realign_needed. > (ix86_finalize_stack_realign_flags): If drap_reg isn't live on entry > and stack_realign_needed will be false, clear drap_reg and need_drap. > Optimize leaf functions that don't need stack frame even if > crtl->need_drap. > > * gcc.target/i386/pr59501-1.c: New test. > * gcc.target/i386/pr59501-1a.c: New test. > * gcc.target/i386/pr59501-2.c: New test. > * gcc.target/i386/pr59501-2a.c: New test. > * gcc.target/i386/pr59501-3.c: New test. > * gcc.target/i386/pr59501-3a.c: New test. > * gcc.target/i386/pr59501-4.c: New test. > * gcc.target/i386/pr59501-4a.c: New test. > * gcc.target/i386/pr59501-5.c: New test. > * gcc.target/i386/pr59501-6.c: New test. This seems OK, thanks! Honza