On Thu, Nov 10, 2016 at 6:25 AM, Kyrill Tkachov
<kyrylo.tkac...@foss.arm.com> wrote:
> Hi all,
>
> This patch implements the new separate shrink-wrapping hooks for aarch64.
> In separate shrink wrapping (as I understand it) we consider each register
> save/restore as
> a 'component' that can be performed independently of the other save/restores
> in the prologue/epilogue
> and can be moved outside the prologue/epilogue and instead performed only in
> the basic blocks where it's
> actually needed. This allows us to avoid saving and restoring registers on
> execution paths where a register
> might not be needed.
>
> In the most general form a 'component' can be any operation that the
> prologue/epilogue performs, for example
> stack adjustment. But in this patch we only consider callee-saved register
> save/restores as components.
> The code is in many ways similar to the powerpc implementation of the hooks.
>
> The hooks implemented are:
> * TARGET_SHRINK_WRAP_GET_SEPARATE_COMPONENTS: Returns a bitmap containing a
> bit for each register that should
> be considered a 'component' i.e. its save/restore should be separated from
> the prologue and epilogue and placed
> at the basic block where it's needed.
>
> * TARGET_SHRINK_WRAP_COMPONENTS_FOR_BB: Determine for a given basic block
> which 'component' registers it needs.
> This is determined through dataflow. If a component register is in the
> IN,GEN or KILL sets for the basic block
> it's considered as needed and marked as such in the bitmap.
>
> * TARGET_SHRINK_WRAP_EMIT_PROLOGUE_COMPONENTS and
> TARGET_SHRINK_WRAP_EMIT_EPILOGUE_COMPONENTS: Given a bitmap
> of component registers emits the save or restore code for them.
>
> * TARGET_SHRINK_WRAP_SET_HANDLED_COMPONENTS: Given a bitmap of component
> registers record in the backend that
> the register is shrink-wrapped using this approach and that the normal
> prologue and epilogue expansion code
> should not emit code for them. This is done similarly to powerpc by defining
> a bool array in machine_function
> where we record whether each register is separately shrink-wrapped.  The
> prologue and epilogue expansion code
> (through aarch64_save_callee_saves and aarch64_restore_callee_saves) is
> updated to not emit save/restores for
> these registers if they appear in that array.
>
> Our prologue and epilogue code has a lot of intricate logic to perform stack
> adjustments using the writeback
> forms of the load/store instructions. Separately shrink-wrapping those
> registers marked for writeback
> (cfun->machine->frame.wb_candidate1 and cfun->machine->frame.wb_candidate2)
> broke that codegen and I had to
> emit an explicit stack adjustment instruction that created ugly
> prologue/epilogue sequences. So this patch
> is conservative and doesn't allow shrink-wrapping of the registers marked
> for writeback. Maybe in the future
> we can relax it (for example allow wrapping of one of the two writeback
> registers if the writeback amount
> can be encoded in a single-register writeback store/load) but given the
> development stage of GCC I thought
> I'd play it safe.
>
> I ran SPEC2006 on a Cortex-A72. Overall scores were neutral but there were
> some interesting swings.
> 458.sjeng     +1.45%
> 471.omnetpp   +2.19%
> 445.gobmk     -2.01%
>
> On SPECFP:
> 453.povray    +7.00%


Wow, this looks really good.  Thank you for implementing this.  If I
get some time I am going to try it out on other processors than A72
but I doubt I have time any time soon.

Thanks,
Andrew

>
> I'll be re-running the benchmarks with Segher's recent patch [1] to see if
> they fix the regression
> and if it does I think this can go in.
>
> [1] https://gcc.gnu.org/ml/gcc-patches/2016-11/msg00889.html
>
> Bootstrapped and tested on aarch64-none-linux-gnu.
>
> Thanks,
> Kyrill
>
> 2016-11-10  Kyrylo Tkachov  <kyrylo.tkac...@arm.com>
>
>     * config/aarch64/aarch64.h (machine_function): Add
>     reg_is_wrapped_separately field.
>     * config/aarch64/aarch64.c (emit_set_insn): Change return type to
>     rtx_insn *.
>     (aarch64_save_callee_saves): Don't save registers that are wrapped
>     separately.
>     (aarch64_restore_callee_saves): Don't restore registers that are
>     wrapped separately.
>     (offset_9bit_signed_unscaled_p, offset_12bit_unsigned_scaled_p,
>     aarch64_offset_7bit_signed_scaled_p): Move earlier in the file.
>     (aarch64_get_separate_components): New function.
>     (aarch64_components_for_bb): Likewise.
>     (aarch64_disqualify_components): Likewise.
>     (aarch64_emit_prologue_components): Likewise.
>     (aarch64_emit_epilogue_components): Likewise.
>     (aarch64_set_handled_components): Likewise.
>     (TARGET_SHRINK_WRAP_GET_SEPARATE_COMPONENTS,
>     TARGET_SHRINK_WRAP_COMPONENTS_FOR_BB,
>     TARGET_SHRINK_WRAP_DISQUALIFY_COMPONENTS,
>     TARGET_SHRINK_WRAP_EMIT_PROLOGUE_COMPONENTS,
>     TARGET_SHRINK_WRAP_EMIT_EPILOGUE_COMPONENTS,
>     TARGET_SHRINK_WRAP_SET_HANDLED_COMPONENTS): Define.

Reply via email to