http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47735
--- Comment #4 from Jakub Jelinek <jakub at gcc dot gnu.org> --- I wonder what is the point of even looking at the alignment of VAR_DECLs that are SSA_NAME_VAR of SSA_NAMEs if we're not putting those into stack. So perhaps something like: --- gcc/cfgexpand.c.jj 2013-12-16 09:08:17.000000000 +0100 +++ gcc/cfgexpand.c 2014-01-02 10:04:39.525480578 +0100 @@ -1215,8 +1215,11 @@ expand_one_var (tree var, bool toplevel, we conservatively assume it will be on stack even if VAR is eventually put into register after RA pass. For non-automatic variables, which won't be on stack, we collect alignment of - type and ignore user specified alignment. */ - if (TREE_STATIC (var) || DECL_EXTERNAL (var)) + type and ignore user specified alignment. Similarly for + SSA_NAMEs for which use_register_for_decl returns true. */ + if (TREE_STATIC (var) + || DECL_EXTERNAL (var) + || (TREE_CODE (origvar) == SSA_NAME && use_register_for_decl (var))) align = MINIMUM_ALIGNMENT (TREE_TYPE (var), TYPE_MODE (TREE_TYPE (var)), TYPE_ALIGN (TREE_TYPE (var))); That said, I really wonder if we shouldn't besides estimated stack alignment track also what we really need, i.e. record stack alignment requirements without any pessimistic assumptions, only bump it when we actually allocate something on the stack that needs bigger alignment (when we create MEM DECL_RTL, when say assign_stack_temp* creates stack slot that needs bigger alignment, when RA spills something that needs bigger alignment etc.). RA etc. would work as is, but ix86_finalize_stack_realign_flags would look at the actual value instead. Consider say: typedef double m256 __attribute__((vector_size (32))); m256 bar (m256 x, m256 y); m256 foo (m256 x, m256 y, m256 z) { return bar (x + z, y - z) + (m256) { 1.0, 2.0, 3.0, 4.0 }; } vaddpd %ymm2, %ymm0, %ymm0 pushq %rbp vsubpd %ymm2, %ymm1, %ymm1 movq %rsp, %rbp andq $-32, %rsp call bar vaddpd .LC0(%rip), %ymm0, %ymm0 leave ret pushq %rbp; movq %rsp, %rbp; andq $-32, %rsp; leave all seem to be completely unnecessary to me (well, some push/pop or rsp -=4/+=4 would be needed to maintain 128-bit stack alignment), bar doesn't take any argument on the stack, there is no V4DFmode spilling, etc. For leaf functions ix86_finalize_stack_realign_flags already manages to avoid that if the stack pointer is never touched and frame pointer isn't needed. I guess by adding another integer to x_rtl and tracking this carefully we could get rid of the dynamic stack realignment here, still likely frame_pointer_needed would be set. Wonder if we couldn't optimize that away (unless user requested frame pointer) too in some cases if frame pointer register is unused or only used to look at arguments before stack is first decremented.