Re: RFC: Patch to implement Aarch64 SIMD ABI

Richard Sandiford Thu, 19 Jul 2018 00:32:26 -0700

Hi,

Thanks for doing this.


Steve Ellcey <sell...@cavium.com> writes:
> This is a patch to support the Aarch64 SIMD ABI [1] in GCC.  I intend
> to eventually follow this up with two more patches; one to define the
> TARGET_SIMD_CLONE* macros and one to improve the GCC register
> allocation/usage when calling SIMD functions.
>
> The significant difference between the standard ARM ABI and the SIMD ABI
> is that in the normal ABI a callee saves only the lower 64 bits of registers
> V8-V15, in the SIMD ABI the callee must save all 128 bits of registers
> V8-V23.
>
> This patch checks for SIMD functions and saves the extra registers when
> needed.  It does not change the caller behavour, so with just this patch
> there may be values saved by both the caller and callee.  This is not
> efficient, but it is correct code.
>
> This patch bootstraps and passes the GCC testsuite but that only verifies
> I haven't broken anything, it doesn't validate the handling of SIMD functions.
> I tried to write some tests, but I could never get GCC to generate code
> that would save the FP callee-save registers in the prologue.  Complex code
> might generate spills and fills but it never triggered the prologue/epilogue
> code to save V8-V23.  If anyone has ideas on how to write a test that would
> cause GCC to generate this code I would appreciate some ideas.  Just doing
> lots of calculations with lots of intermediate values doesn't seem to be 
> enough.

Probably easiest to use asm clobbers, e.g.:

void __attribute__ ((aarch64_vector_pcs))
f (void)
{
  asm volatile ("" ::: "s8", "s13");
}

This also lets you control exactly which registers are saved.

> @@ -4105,7 +4128,8 @@ aarch64_layout_frame (void)
>        {
>       /* If there is an alignment gap between integer and fp callee-saves,
>          allocate the last fp register to it if possible.  */
> -     if (regno == last_fp_reg && has_align_gap && (offset & 8) == 0)
> +     if (regno == last_fp_reg && has_align_gap
> +         && !simd_function && (offset & 8) == 0)
>         {
>           cfun->machine->frame.reg_offset[regno] = max_int_offset;
>           break;
> @@ -4117,7 +4141,7 @@ aarch64_layout_frame (void)
>       else if (cfun->machine->frame.wb_candidate2 == INVALID_REGNUM
>                && cfun->machine->frame.wb_candidate1 >= V0_REGNUM)
>         cfun->machine->frame.wb_candidate2 = regno;
> -     offset += UNITS_PER_WORD;
> +     offset += simd_function ? UNITS_PER_VREG : UNITS_PER_WORD;
>        }
>  
>    offset = ROUND_UP (offset, STACK_BOUNDARY / BITS_PER_UNIT);
> @@ -4706,8 +4730,11 @@ aarch64_process_components (sbitmap components, bool 
> prologue_p)
>    while (regno != last_regno)
>      {
>        /* AAPCS64 section 5.1.2 requires only the bottom 64 bits to be saved
> -      so DFmode for the vector registers is enough.  */
> -      machine_mode mode = GP_REGNUM_P (regno) ? E_DImode : E_DFmode;
> +      so DFmode for the vector registers is enough.  For simd functions
> +         we want to save the entire register.  */
> +      machine_mode mode = GP_REGNUM_P (regno) ? E_DImode
> +     : (aarch64_simd_function_p (cfun->decl) ? E_TFmode : E_DFmode);

This condition also occurs in aarch64_push_regs and aarch64_pop_regs.
It'd probably be worth splitting it out into a subfunction.

I think you also need to handle the writeback cases, which should work
for Q registers too.  This will mean extra loadwb_pair and storewb_pair
patterns.

LGTM otherwise FWIW.

> diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
> index f284e74..d11474e 100644
> --- a/gcc/config/aarch64/aarch64.h
> +++ b/gcc/config/aarch64/aarch64.h
> @@ -500,6 +500,8 @@ extern unsigned aarch64_architecture_version;
>  #define PR_LO_REGNUM_P(REGNO)\
>    (((unsigned) (REGNO - P0_REGNUM)) <= (P7_REGNUM - P0_REGNUM))
>  
> +#define FP_SIMD_SAVED_REGNUM_P(REGNO)                        \
> +  (((unsigned) (REGNO - V8_REGNUM)) <= (V23_REGNUM - V8_REGNUM))

(We should probably rewrite these to use IN_RANGE at some point,
but I agree it's better to be consistent until then.)

Thanks,
Richard

Re: RFC: Patch to implement Aarch64 SIMD ABI

Reply via email to