Hi! Apparently estimated_stack_frame_size -> expand_one_var -> add_stack_var -> align_local_variable changes DECL_ALIGN of stack variables from LOCAL_DECL_ALIGNMENT.
That is unfortunately very undesirable for offloading, because this happens early during IPA where we stream out LTO bytecode for the target offloading after it, so on the PR90811 testcase x86_64 backend says a local variable would be best 128-bit aligned, but in the PTX backend such alignment means runtime (virtual) stack adjustments because only 64-bit alignment is guaranteed. The following patch makes sure we don't update DECL_ALIGN during stack frame size estimation, just during actual expansion. Bootstrapped/regtested on x86_64-linux and i686-linux and furthermore tested with x86_64-linux to nvptx-none offloading with the nvptx.c PR90811 fix reverted to verify the overaligned variables are gone. Ok for trunk? Or, is there something in GIMPLE passes that would benefit from the updated DECL_ALIGN during post-IPA optimizations, rather than just in RTL passes? If yes, we should call those somewhere post-IPA on top of this patch. Nothing in the testsuite showed it though. 2019-06-12 Jakub Jelinek <ja...@redhat.com> PR target/90811 * cfgexpand.c (align_local_variable): Add really_expand argument, don't SET_DECL_ALIGN if it is false. (add_stack_var): Add really_expand argument, pass it through to align_local_variable. (expand_one_stack_var_1): Pass true as really_expand to align_local_variable. (expand_one_ssa_partition): Pass true as really_expand to add_stack_var. (expand_one_var): Pass really_expand through to add_stack_var. --- gcc/cfgexpand.c.jj 2019-05-20 11:39:34.059816432 +0200 +++ gcc/cfgexpand.c 2019-06-11 11:28:08.720932421 +0200 @@ -361,7 +361,7 @@ static bool has_short_buffer; we can't do with expected alignment of the stack boundary. */ static unsigned int -align_local_variable (tree decl) +align_local_variable (tree decl, bool really_expand) { unsigned int align; @@ -370,7 +370,8 @@ align_local_variable (tree decl) else { align = LOCAL_DECL_ALIGNMENT (decl); - SET_DECL_ALIGN (decl, align); + if (really_expand) + SET_DECL_ALIGN (decl, align); } return align / BITS_PER_UNIT; } @@ -418,7 +419,7 @@ alloc_stack_frame_space (poly_int64 size /* Accumulate DECL into STACK_VARS. */ static void -add_stack_var (tree decl) +add_stack_var (tree decl, bool really_expand) { struct stack_var *v; @@ -446,7 +447,7 @@ add_stack_var (tree decl) variables that are simultaneously live. */ if (known_eq (v->size, 0U)) v->size = 1; - v->alignb = align_local_variable (decl); + v->alignb = align_local_variable (decl, really_expand); /* An alignment of zero can mightily confuse us later. */ gcc_assert (v->alignb != 0); @@ -1323,7 +1324,7 @@ expand_one_stack_var_1 (tree var) else { size = tree_to_poly_uint64 (DECL_SIZE_UNIT (var)); - byte_align = align_local_variable (var); + byte_align = align_local_variable (var, true); } /* We handle highly aligned variables in expand_stack_vars. */ @@ -1413,7 +1414,7 @@ expand_one_ssa_partition (tree var) if (!use_register_for_decl (var)) { if (defer_stack_allocation (var, true)) - add_stack_var (var); + add_stack_var (var, true); else expand_one_stack_var_1 (var); return; @@ -1695,7 +1696,7 @@ expand_one_var (tree var, bool toplevel, } } else if (defer_stack_allocation (var, toplevel)) - add_stack_var (origvar); + add_stack_var (origvar, really_expand); else { if (really_expand) Jakub