On Fri, Jan 7, 2022 at 7:20 AM Alex Coplan via Gcc-patches <gcc-patches@gcc.gnu.org> wrote: > > Hi Richard, > > Thanks for the review. > > On 20/12/2021 13:19, Richard Sandiford wrote: > > Alex Coplan via Gcc-patches <gcc-patches@gcc.gnu.org> writes: > > > Hi, > > > > > > This fixes PR103500 i.e. ensuring that stack slots for > > > passed-by-reference overaligned types are appropriately aligned. For the > > > testcase: > > > > > > typedef struct __attribute__((aligned(32))) { > > > long x,y; > > > } S; > > > S x; > > > void f(S); > > > void g(void) { f(x); } > > > > > > on AArch64, we currently generate (at -O2): > > > > > > g: > > > adrp x1, .LANCHOR0 > > > add x1, x1, :lo12:.LANCHOR0 > > > stp x29, x30, [sp, -48]! > > > mov x29, sp > > > ldp q0, q1, [x1] > > > add x0, sp, 16 > > > stp q0, q1, [sp, 16] > > > bl f > > > ldp x29, x30, [sp], 48 > > > ret > > > > > > so the stack slot for the passed-by-reference copy of the structure is > > > at sp + 16, and the sp is only guaranteed to be 16-byte aligned, so the > > > structure is only 16-byte aligned. The PCS requires the structure to be > > > 32-byte aligned. After this patch, we generate: > > > > > > g: > > > adrp x1, .LANCHOR0 > > > add x1, x1, :lo12:.LANCHOR0 > > > stp x29, x30, [sp, -64]! > > > mov x29, sp > > > add x0, sp, 47 > > > ldp q0, q1, [x1] > > > and x0, x0, -32 > > > stp q0, q1, [x0] > > > bl f > > > ldp x29, x30, [sp], 64 > > > ret > > > > > > i.e. we ensure 32-byte alignment for the struct. > > > > > > The approach taken here is similar to that in > > > function.c:assign_parm_setup_block where it handles the case for > > > DECL_ALIGN (parm) > MAX_SUPPORTED_STACK_ALIGNMENT. This in turn is > > > similar to the approach taken in cfgexpand.c:expand_stack_vars (where > > > the function calls get_dynamic_stack_size) which is the code that > > > handles the alignment for overaligned structures as addressable local > > > variables (see the related case discussed in the PR). > > > > A difference with the latter is that cfgexpand (AFAICT) always > > honours the DECL/TYPE_ALIGN, with LOCAL_DECL_ALIGNMENT supposedly > > only increasing the alignment for efficiency reasons (rather than > > decreasing it). > > > > So… > > > > > This patch also updates the aapcs64 test mentioned in the PR to avoid > > > the frontend folding away the alignment check. I've confirmed that the > > > execution test actually fails on aarch64-linux-gnu prior to the patch > > > being applied and passes afterwards. > > > > > > Bootstrapped and regtested on aarch64-linux-gnu, x86_64-linux-gnu, and > > > arm-linux-gnueabihf: no regressions. > > > > > > I'd appreciate any feedback. Is it OK for trunk? > > > > > > Thanks, > > > Alex > > > > > > gcc/ChangeLog: > > > > > > PR middle-end/103500 > > > * function.c (get_stack_local_alignment): Align BLKmode overaligned > > > types to the alignment required by the type. > > > (assign_stack_temp_for_type): Handle BLKmode overaligned stack > > > slots by allocating a larger-than-necessary buffer and aligning > > > the address within appropriately. > > > > > > gcc/testsuite/ChangeLog: > > > > > > PR middle-end/103500 > > > * gcc.target/aarch64/aapcs64/rec_align-8.c (test_pass_by_ref): > > > Prevent the frontend from folding our alignment check away by > > > using snprintf to store the pointer into a string and recovering > > > it with sscanf. > > > > > > diff --git a/gcc/function.c b/gcc/function.c > > > index 61b3bd036b8..5ed722ab959 100644 > > > --- a/gcc/function.c > > > +++ b/gcc/function.c > > > @@ -278,7 +278,9 @@ get_stack_local_alignment (tree type, machine_mode > > > mode) > > > unsigned int alignment; > > > > > > if (mode == BLKmode) > > > - alignment = BIGGEST_ALIGNMENT; > > > + alignment = (type && TYPE_ALIGN (type) > > > > MAX_SUPPORTED_STACK_ALIGNMENT) > > > + ? TYPE_ALIGN (type) > > > + : BIGGEST_ALIGNMENT; > > > > …I'm not sure about this calculation. Why do we only honour TYPE_ALIGN > > if it's greater than MAX_SUPPORTED_STACK_ALIGNMENT, and fall all the > > way back to BIGGEST_ALIGNMENT otherwise? It looks like on nvptx > > this would have the effect of honouring (say) 2048-byte alignment, > > but not 32-byte alignment (which falls between BIGGEST_ALIGNMENT > > and MAX_SUPPORTED_STACK_ALIGNMENT). > > So, to be honest, this was a bit of a bodge to try and work around > issues on x86. My original attempt at solving the PR used the more > obvious calculation you suggest below, i.e. the max of BIGGEST_ALIGNMENT > and TYPE_ALIGN (type). The problem with that is, on x86, > MAX_SUPPORTED_STACK_ALIGNMENT has a huge value (2^31).
On x86, stack alignment limit is the limit of stack. You can align stack to any values on x86 as long as your stack allows it. > explow.c:get_dynamic_stack_size has: > > if (size_align % MAX_SUPPORTED_STACK_ALIGNMENT != 0) > { > size = round_push (size); > [...] > } > *psize = size; > > so inevitably we end up calling round_push on x86 which in turn ends up > going down the else branch: > > else > { > /* If crtl->preferred_stack_boundary might still grow, use > virtual_preferred_stack_boundary_rtx instead. This will be > substituted by the right value in vregs pass and optimized > during combine. */ > align_rtx = virtual_preferred_stack_boundary_rtx; > alignm1_rtx = force_operand (plus_constant (Pmode, align_rtx, -1), > NULL_RTX); > } > > and we end up returning a size expression involving > virtual_preferred_stack_boundary_rtx, which clearly cannot be a > CONST_INT (required for the approach in the patch to work: we need a > constant to pass to assign_stack_local_1). > > So the idea with that calculation was to only use the new, larger > alignment, if it was larger than MAX_SUPPORTED_STACK_ALIGNMENT (which > should never be the case on x86 for any reasonable value of TYPE_ALIGN). > Otherwise, it would fall back to the old behaviour of just using > BIGGEST_ALIGNMENT. > > I realise that this is a rather inelegant hack, but I wasn't sure how > else to approach the problem. > > > > > Also, what about the non-BLKmode case? A type's mode cannot be more > > aligned than the type, but a type is allowed to be more aligned than > > its mode. E.g.: > > Yes, we should probably try and handle the non-BLKmode case as well. I > was just trying to avoid changing too much at once at the risk of > introducing regressions. > > > > > typedef unsigned int V __attribute__((vector_size(32))); > > typedef struct __attribute__((aligned(32))) { V x; } S; > > S x; > > void f(S); > > void g(void) { f(x); } > > > > doesn't seem to be handled by the patch. It's possible to create > > variations on this on aarch64 up to 2048 bits using SVE and > > -msve-vector-bits=N. > > > > Perhaps we should treat the old alignment calculation as a minimum > > and take the max of that and TYPE_ALIGN, where TYPE_ALIGN is given. > > As discussed above, this doesn't work on x86, unfortunately. If you have > any alternative suggestions as to how to approach the problem, it would > be great to hear them. > > Thanks, > Alex > > > > > > else > > > alignment = GET_MODE_ALIGNMENT (mode); > > > > > > @@ -872,21 +874,35 @@ assign_stack_temp_for_type (machine_mode mode, > > > poly_int64 size, tree type) > > > > > > p = ggc_alloc<temp_slot> (); > > > > > > - /* We are passing an explicit alignment request to > > > assign_stack_local. > > > - One side effect of that is assign_stack_local will not round SIZE > > > - to ensure the frame offset remains suitably aligned. > > > - > > > - So for requests which depended on the rounding of SIZE, we go ahead > > > - and round it now. We also make sure ALIGNMENT is at least > > > - BIGGEST_ALIGNMENT. */ > > > - gcc_assert (mode != BLKmode || align == BIGGEST_ALIGNMENT); > > > - p->slot = assign_stack_local_1 (mode, > > > - (mode == BLKmode > > > - ? aligned_upper_bound (size, > > > - (int) align > > > - / BITS_PER_UNIT) > > > - : size), > > > - align, 0); > > > + if (mode == BLKmode && align > MAX_SUPPORTED_STACK_ALIGNMENT) > > > + { > > > + rtx allocsize = gen_int_mode (size, Pmode); > > > + get_dynamic_stack_size (&allocsize, 0, align, NULL); > > > + gcc_assert (CONST_INT_P (allocsize)); > > > + size = UINTVAL (allocsize); > > > > Since the size coming in is a poly_int64, I think we should treat > > the result in the same way, using rtx_to_poly_int64 rather than > > requiring a CONST_INT. We might not handle that case properly yet, > > but rtx_to_poly_int64 would trap mistakes, and using it now would > > avoid fewer surprises later. > > > > Thanks, > > Richard > > > > > + p->slot = assign_stack_local_1 (mode, > > > + size, > > > + BIGGEST_ALIGNMENT, 0); > > > + rtx addr = align_dynamic_address (XEXP (p->slot, 0), align); > > > + mark_reg_pointer (addr, align); > > > + p->slot = gen_rtx_MEM (GET_MODE (p->slot), addr); > > > + MEM_NOTRAP_P (p->slot) = 1; > > > + } > > > + else > > > + /* We are passing an explicit alignment request to assign_stack_local. > > > + One side effect of that is assign_stack_local will not round SIZE > > > + to ensure the frame offset remains suitably aligned. > > > + > > > + So for requests which depended on the rounding of SIZE, we go ahead > > > + and round it now. We also make sure ALIGNMENT is at least > > > + BIGGEST_ALIGNMENT. */ > > > + p->slot = assign_stack_local_1 (mode, > > > + (mode == BLKmode > > > + ? aligned_upper_bound (size, > > > + (int) align > > > + / > > > BITS_PER_UNIT) > > > + : size), > > > + align, 0); > > > > > > p->align = align; > > > > > > diff --git a/gcc/testsuite/gcc.target/aarch64/aapcs64/rec_align-8.c > > > b/gcc/testsuite/gcc.target/aarch64/aapcs64/rec_align-8.c > > > index c9351802191..0b0ac0b9b97 100644 > > > --- a/gcc/testsuite/gcc.target/aarch64/aapcs64/rec_align-8.c > > > +++ b/gcc/testsuite/gcc.target/aarch64/aapcs64/rec_align-8.c > > > @@ -21,10 +21,18 @@ test_pass_by_ref (int x0, overaligned x1, int x2) > > > abort (); > > > if (memcmp ((void *) &x1, (void *)&a, sizeof (overaligned))) > > > abort (); > > > - long addr = ((long) &x1) & 31; > > > + > > > + char buf[64]; > > > + void *ptr; > > > + > > > + __builtin_snprintf (buf, sizeof(buf), "%p", &x1); > > > + if (__builtin_sscanf(buf, "%p", &ptr) != 1) > > > + abort (); > > > + > > > + long addr = ((long) ptr) & 31; > > > if (addr != 0) > > > { > > > - __builtin_printf ("Alignment was %d\n", addr); > > > + __builtin_printf ("Alignment was %ld\n", addr); > > > abort (); > > > } > > > } -- H.J.