Hi, Thanks for doing this.
Steve Ellcey <sell...@cavium.com> writes: > This is a patch to support the Aarch64 SIMD ABI [1] in GCC. I intend > to eventually follow this up with two more patches; one to define the > TARGET_SIMD_CLONE* macros and one to improve the GCC register > allocation/usage when calling SIMD functions. > > The significant difference between the standard ARM ABI and the SIMD ABI > is that in the normal ABI a callee saves only the lower 64 bits of registers > V8-V15, in the SIMD ABI the callee must save all 128 bits of registers > V8-V23. > > This patch checks for SIMD functions and saves the extra registers when > needed. It does not change the caller behavour, so with just this patch > there may be values saved by both the caller and callee. This is not > efficient, but it is correct code. > > This patch bootstraps and passes the GCC testsuite but that only verifies > I haven't broken anything, it doesn't validate the handling of SIMD functions. > I tried to write some tests, but I could never get GCC to generate code > that would save the FP callee-save registers in the prologue. Complex code > might generate spills and fills but it never triggered the prologue/epilogue > code to save V8-V23. If anyone has ideas on how to write a test that would > cause GCC to generate this code I would appreciate some ideas. Just doing > lots of calculations with lots of intermediate values doesn't seem to be > enough. Probably easiest to use asm clobbers, e.g.: void __attribute__ ((aarch64_vector_pcs)) f (void) { asm volatile ("" ::: "s8", "s13"); } This also lets you control exactly which registers are saved. > @@ -4105,7 +4128,8 @@ aarch64_layout_frame (void) > { > /* If there is an alignment gap between integer and fp callee-saves, > allocate the last fp register to it if possible. */ > - if (regno == last_fp_reg && has_align_gap && (offset & 8) == 0) > + if (regno == last_fp_reg && has_align_gap > + && !simd_function && (offset & 8) == 0) > { > cfun->machine->frame.reg_offset[regno] = max_int_offset; > break; > @@ -4117,7 +4141,7 @@ aarch64_layout_frame (void) > else if (cfun->machine->frame.wb_candidate2 == INVALID_REGNUM > && cfun->machine->frame.wb_candidate1 >= V0_REGNUM) > cfun->machine->frame.wb_candidate2 = regno; > - offset += UNITS_PER_WORD; > + offset += simd_function ? UNITS_PER_VREG : UNITS_PER_WORD; > } > > offset = ROUND_UP (offset, STACK_BOUNDARY / BITS_PER_UNIT); > @@ -4706,8 +4730,11 @@ aarch64_process_components (sbitmap components, bool > prologue_p) > while (regno != last_regno) > { > /* AAPCS64 section 5.1.2 requires only the bottom 64 bits to be saved > - so DFmode for the vector registers is enough. */ > - machine_mode mode = GP_REGNUM_P (regno) ? E_DImode : E_DFmode; > + so DFmode for the vector registers is enough. For simd functions > + we want to save the entire register. */ > + machine_mode mode = GP_REGNUM_P (regno) ? E_DImode > + : (aarch64_simd_function_p (cfun->decl) ? E_TFmode : E_DFmode); This condition also occurs in aarch64_push_regs and aarch64_pop_regs. It'd probably be worth splitting it out into a subfunction. I think you also need to handle the writeback cases, which should work for Q registers too. This will mean extra loadwb_pair and storewb_pair patterns. LGTM otherwise FWIW. > diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h > index f284e74..d11474e 100644 > --- a/gcc/config/aarch64/aarch64.h > +++ b/gcc/config/aarch64/aarch64.h > @@ -500,6 +500,8 @@ extern unsigned aarch64_architecture_version; > #define PR_LO_REGNUM_P(REGNO)\ > (((unsigned) (REGNO - P0_REGNUM)) <= (P7_REGNUM - P0_REGNUM)) > > +#define FP_SIMD_SAVED_REGNUM_P(REGNO) \ > + (((unsigned) (REGNO - V8_REGNUM)) <= (V23_REGNUM - V8_REGNUM)) (We should probably rewrite these to use IN_RANGE at some point, but I agree it's better to be consistent until then.) Thanks, Richard