Re: [PATCH] Fix alignment of stack slots for overaligned types [PR103500]

Richard Sandiford via Gcc-patches Mon, 20 Dec 2021 05:20:08 -0800

Alex Coplan via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> Hi,
>
> This fixes PR103500 i.e. ensuring that stack slots for
> passed-by-reference overaligned types are appropriately aligned. For the
> testcase:
>
> typedef struct __attribute__((aligned(32))) {
>   long x,y;
> } S;
> S x;
> void f(S);
> void g(void) { f(x); }
>
> on AArch64, we currently generate (at -O2):
>
> g:
>         adrp    x1, .LANCHOR0
>         add     x1, x1, :lo12:.LANCHOR0
>         stp     x29, x30, [sp, -48]!
>         mov     x29, sp
>         ldp     q0, q1, [x1]
>         add     x0, sp, 16
>         stp     q0, q1, [sp, 16]
>         bl      f
>         ldp     x29, x30, [sp], 48
>         ret
>
> so the stack slot for the passed-by-reference copy of the structure is
> at sp + 16, and the sp is only guaranteed to be 16-byte aligned, so the
> structure is only 16-byte aligned. The PCS requires the structure to be
> 32-byte aligned. After this patch, we generate:
>
> g:
>         adrp    x1, .LANCHOR0
>         add     x1, x1, :lo12:.LANCHOR0
>         stp     x29, x30, [sp, -64]!
>         mov     x29, sp
>         add     x0, sp, 47
>         ldp     q0, q1, [x1]
>         and     x0, x0, -32
>         stp     q0, q1, [x0]
>         bl      f
>         ldp     x29, x30, [sp], 64
>         ret
>
> i.e. we ensure 32-byte alignment for the struct.
>
> The approach taken here is similar to that in
> function.c:assign_parm_setup_block where it handles the case for
> DECL_ALIGN (parm) > MAX_SUPPORTED_STACK_ALIGNMENT. This in turn is
> similar to the approach taken in cfgexpand.c:expand_stack_vars (where
> the function calls get_dynamic_stack_size) which is the code that
> handles the alignment for overaligned structures as addressable local
> variables (see the related case discussed in the PR).


A difference with the latter is that cfgexpand (AFAICT) always
honours the DECL/TYPE_ALIGN, with LOCAL_DECL_ALIGNMENT supposedly
only increasing the alignment for efficiency reasons (rather than
decreasing it).

So…

> This patch also updates the aapcs64 test mentioned in the PR to avoid
> the frontend folding away the alignment check. I've confirmed that the
> execution test actually fails on aarch64-linux-gnu prior to the patch
> being applied and passes afterwards.
>
> Bootstrapped and regtested on aarch64-linux-gnu, x86_64-linux-gnu, and
> arm-linux-gnueabihf: no regressions.
>
> I'd appreciate any feedback. Is it OK for trunk?
>
> Thanks,
> Alex
>
> gcc/ChangeLog:
>
>       PR middle-end/103500
>       * function.c (get_stack_local_alignment): Align BLKmode overaligned
>       types to the alignment required by the type.
>       (assign_stack_temp_for_type): Handle BLKmode overaligned stack
>       slots by allocating a larger-than-necessary buffer and aligning
>       the address within appropriately.
>
> gcc/testsuite/ChangeLog:
>
>       PR middle-end/103500
>       * gcc.target/aarch64/aapcs64/rec_align-8.c (test_pass_by_ref):
>       Prevent the frontend from folding our alignment check away by
>       using snprintf to store the pointer into a string and recovering
>       it with sscanf.
>
> diff --git a/gcc/function.c b/gcc/function.c
> index 61b3bd036b8..5ed722ab959 100644
> --- a/gcc/function.c
> +++ b/gcc/function.c
> @@ -278,7 +278,9 @@ get_stack_local_alignment (tree type, machine_mode mode)
>    unsigned int alignment;
>  
>    if (mode == BLKmode)
> -    alignment = BIGGEST_ALIGNMENT;
> +    alignment = (type && TYPE_ALIGN (type) > MAX_SUPPORTED_STACK_ALIGNMENT)
> +      ? TYPE_ALIGN (type)
> +      : BIGGEST_ALIGNMENT;

…I'm not sure about this calculation.  Why do we only honour TYPE_ALIGN
if it's greater than MAX_SUPPORTED_STACK_ALIGNMENT, and fall all the
way back to BIGGEST_ALIGNMENT otherwise?  It looks like on nvptx
this would have the effect of honouring (say) 2048-byte alignment,
but not 32-byte alignment (which falls between BIGGEST_ALIGNMENT
and MAX_SUPPORTED_STACK_ALIGNMENT).

Also, what about the non-BLKmode case?  A type's mode cannot be more
aligned than the type, but a type is allowed to be more aligned than
its mode.  E.g.:

typedef unsigned int V __attribute__((vector_size(32)));
typedef struct __attribute__((aligned(32))) { V x; } S;
S x;
void f(S);
void g(void) { f(x); }

doesn't seem to be handled by the patch.  It's possible to create
variations on this on aarch64 up to 2048 bits using SVE and
-msve-vector-bits=N.

Perhaps we should treat the old alignment calculation as a minimum
and take the max of that and TYPE_ALIGN, where TYPE_ALIGN is given.

>    else
>      alignment = GET_MODE_ALIGNMENT (mode);
>  
> @@ -872,21 +874,35 @@ assign_stack_temp_for_type (machine_mode mode, 
> poly_int64 size, tree type)
>  
>        p = ggc_alloc<temp_slot> ();
>  
> -      /* We are passing an explicit alignment request to assign_stack_local.
> -      One side effect of that is assign_stack_local will not round SIZE
> -      to ensure the frame offset remains suitably aligned.
> -
> -      So for requests which depended on the rounding of SIZE, we go ahead
> -      and round it now.  We also make sure ALIGNMENT is at least
> -      BIGGEST_ALIGNMENT.  */
> -      gcc_assert (mode != BLKmode || align == BIGGEST_ALIGNMENT);
> -      p->slot = assign_stack_local_1 (mode,
> -                                   (mode == BLKmode
> -                                    ? aligned_upper_bound (size,
> -                                                           (int) align
> -                                                           / BITS_PER_UNIT)
> -                                    : size),
> -                                   align, 0);
> +      if (mode == BLKmode && align > MAX_SUPPORTED_STACK_ALIGNMENT)
> +     {
> +       rtx allocsize = gen_int_mode (size, Pmode);
> +       get_dynamic_stack_size (&allocsize, 0, align, NULL);
> +       gcc_assert (CONST_INT_P (allocsize));
> +       size = UINTVAL (allocsize);

Since the size coming in is a poly_int64, I think we should treat
the result in the same way, using rtx_to_poly_int64 rather than
requiring a CONST_INT.  We might not handle that case properly yet,
but rtx_to_poly_int64 would trap mistakes, and using it now would
avoid fewer surprises later.

Thanks,
Richard

> +       p->slot = assign_stack_local_1 (mode,
> +                                       size,
> +                                       BIGGEST_ALIGNMENT, 0);
> +       rtx addr = align_dynamic_address (XEXP (p->slot, 0), align);
> +       mark_reg_pointer (addr, align);
> +       p->slot = gen_rtx_MEM (GET_MODE (p->slot), addr);
> +       MEM_NOTRAP_P (p->slot) = 1;
> +     }
> +      else
> +     /* We are passing an explicit alignment request to assign_stack_local.
> +        One side effect of that is assign_stack_local will not round SIZE
> +        to ensure the frame offset remains suitably aligned.
> +
> +        So for requests which depended on the rounding of SIZE, we go ahead
> +        and round it now.  We also make sure ALIGNMENT is at least
> +        BIGGEST_ALIGNMENT.  */
> +     p->slot = assign_stack_local_1 (mode,
> +                                     (mode == BLKmode
> +                                      ? aligned_upper_bound (size,
> +                                                             (int) align
> +                                                             / BITS_PER_UNIT)
> +                                      : size),
> +                                     align, 0);
>  
>        p->align = align;
>  
> diff --git a/gcc/testsuite/gcc.target/aarch64/aapcs64/rec_align-8.c 
> b/gcc/testsuite/gcc.target/aarch64/aapcs64/rec_align-8.c
> index c9351802191..0b0ac0b9b97 100644
> --- a/gcc/testsuite/gcc.target/aarch64/aapcs64/rec_align-8.c
> +++ b/gcc/testsuite/gcc.target/aarch64/aapcs64/rec_align-8.c
> @@ -21,10 +21,18 @@ test_pass_by_ref (int x0, overaligned x1, int x2)
>      abort ();
>    if (memcmp ((void *) &x1, (void *)&a, sizeof (overaligned)))
>      abort ();
> -  long addr = ((long) &x1) & 31;
> +
> +  char buf[64];
> +  void *ptr;
> +
> +  __builtin_snprintf (buf, sizeof(buf), "%p", &x1);
> +  if (__builtin_sscanf(buf, "%p", &ptr) != 1)
> +    abort ();
> +
> +  long addr = ((long) ptr) & 31;
>    if (addr != 0)
>      {
> -      __builtin_printf ("Alignment was %d\n", addr);
> +      __builtin_printf ("Alignment was %ld\n", addr);
>        abort ();
>      }
>  }

Re: [PATCH] Fix alignment of stack slots for overaligned types [PR103500]

Reply via email to