On Thu, Nov 10, 2016 at 6:25 AM, Kyrill Tkachov <kyrylo.tkac...@foss.arm.com> wrote: > Hi all, > > This patch implements the new separate shrink-wrapping hooks for aarch64. > In separate shrink wrapping (as I understand it) we consider each register > save/restore as > a 'component' that can be performed independently of the other save/restores > in the prologue/epilogue > and can be moved outside the prologue/epilogue and instead performed only in > the basic blocks where it's > actually needed. This allows us to avoid saving and restoring registers on > execution paths where a register > might not be needed. > > In the most general form a 'component' can be any operation that the > prologue/epilogue performs, for example > stack adjustment. But in this patch we only consider callee-saved register > save/restores as components. > The code is in many ways similar to the powerpc implementation of the hooks. > > The hooks implemented are: > * TARGET_SHRINK_WRAP_GET_SEPARATE_COMPONENTS: Returns a bitmap containing a > bit for each register that should > be considered a 'component' i.e. its save/restore should be separated from > the prologue and epilogue and placed > at the basic block where it's needed. > > * TARGET_SHRINK_WRAP_COMPONENTS_FOR_BB: Determine for a given basic block > which 'component' registers it needs. > This is determined through dataflow. If a component register is in the > IN,GEN or KILL sets for the basic block > it's considered as needed and marked as such in the bitmap. > > * TARGET_SHRINK_WRAP_EMIT_PROLOGUE_COMPONENTS and > TARGET_SHRINK_WRAP_EMIT_EPILOGUE_COMPONENTS: Given a bitmap > of component registers emits the save or restore code for them. > > * TARGET_SHRINK_WRAP_SET_HANDLED_COMPONENTS: Given a bitmap of component > registers record in the backend that > the register is shrink-wrapped using this approach and that the normal > prologue and epilogue expansion code > should not emit code for them. This is done similarly to powerpc by defining > a bool array in machine_function > where we record whether each register is separately shrink-wrapped. The > prologue and epilogue expansion code > (through aarch64_save_callee_saves and aarch64_restore_callee_saves) is > updated to not emit save/restores for > these registers if they appear in that array. > > Our prologue and epilogue code has a lot of intricate logic to perform stack > adjustments using the writeback > forms of the load/store instructions. Separately shrink-wrapping those > registers marked for writeback > (cfun->machine->frame.wb_candidate1 and cfun->machine->frame.wb_candidate2) > broke that codegen and I had to > emit an explicit stack adjustment instruction that created ugly > prologue/epilogue sequences. So this patch > is conservative and doesn't allow shrink-wrapping of the registers marked > for writeback. Maybe in the future > we can relax it (for example allow wrapping of one of the two writeback > registers if the writeback amount > can be encoded in a single-register writeback store/load) but given the > development stage of GCC I thought > I'd play it safe. > > I ran SPEC2006 on a Cortex-A72. Overall scores were neutral but there were > some interesting swings. > 458.sjeng +1.45% > 471.omnetpp +2.19% > 445.gobmk -2.01% > > On SPECFP: > 453.povray +7.00%
Wow, this looks really good. Thank you for implementing this. If I get some time I am going to try it out on other processors than A72 but I doubt I have time any time soon. Thanks, Andrew > > I'll be re-running the benchmarks with Segher's recent patch [1] to see if > they fix the regression > and if it does I think this can go in. > > [1] https://gcc.gnu.org/ml/gcc-patches/2016-11/msg00889.html > > Bootstrapped and tested on aarch64-none-linux-gnu. > > Thanks, > Kyrill > > 2016-11-10 Kyrylo Tkachov <kyrylo.tkac...@arm.com> > > * config/aarch64/aarch64.h (machine_function): Add > reg_is_wrapped_separately field. > * config/aarch64/aarch64.c (emit_set_insn): Change return type to > rtx_insn *. > (aarch64_save_callee_saves): Don't save registers that are wrapped > separately. > (aarch64_restore_callee_saves): Don't restore registers that are > wrapped separately. > (offset_9bit_signed_unscaled_p, offset_12bit_unsigned_scaled_p, > aarch64_offset_7bit_signed_scaled_p): Move earlier in the file. > (aarch64_get_separate_components): New function. > (aarch64_components_for_bb): Likewise. > (aarch64_disqualify_components): Likewise. > (aarch64_emit_prologue_components): Likewise. > (aarch64_emit_epilogue_components): Likewise. > (aarch64_set_handled_components): Likewise. > (TARGET_SHRINK_WRAP_GET_SEPARATE_COMPONENTS, > TARGET_SHRINK_WRAP_COMPONENTS_FOR_BB, > TARGET_SHRINK_WRAP_DISQUALIFY_COMPONENTS, > TARGET_SHRINK_WRAP_EMIT_PROLOGUE_COMPONENTS, > TARGET_SHRINK_WRAP_EMIT_EPILOGUE_COMPONENTS, > TARGET_SHRINK_WRAP_SET_HANDLED_COMPONENTS): Define.