Hello! As shown in PR 66697 [1] and WineHQ bug [2], an application can misalign incoming stack to less than ABI mandated 16 bytes. While it is possible to use -mincoming-stack-boundary=2 (= 4 bytes) for 32 bit targets to emit stack realignment code, this option is artificially limited to 4 (= 16 bytes) for 64bit targets.
Attached patches lowers this limitation to 3 (= 8 bytes, which is actually the minimum amount that stack can be misaligned) for 64bit targets. The "outside" code is out of users control, and the last resort is -mincoming-stack-boundary=3 that emits realignment code for all functions. So, for the following testcase: -- cut here-- typedef float v4sf __attribute__((vector_size(16))); v4sf test (v4sf a, v4sf b) { volatile v4sf z = a + b; return z; } --cut here-- gcc -O2 -mincoming-stack-boundary=3 generates: 0000000000000000 <test>: 0: 4c 8d 54 24 08 lea 0x8(%rsp),%r10 5: 0f 58 c8 addps %xmm0,%xmm1 8: 48 83 e4 f0 and $0xfffffffffffffff0,%rsp c: 41 ff 72 f8 pushq -0x8(%r10) 10: 55 push %rbp 11: 48 89 e5 mov %rsp,%rbp 14: 41 52 push %r10 16: 0f 29 4d e0 movaps %xmm1,-0x20(%rbp) 1a: 0f 28 45 e0 movaps -0x20(%rbp),%xmm0 1e: 41 5a pop %r10 20: 5d pop %rbp 21: 49 8d 62 f8 lea -0x8(%r10),%rsp 25: c3 retq instead of: 0000000000000000 <test>: 0: 0f 58 c8 addps %xmm0,%xmm1 3: 0f 29 4c 24 e8 movaps %xmm1,-0x18(%rsp) 8: 0f 28 44 24 e8 movaps -0x18(%rsp),%xmm0 d: c3 retq IMO, additional stack realignment code is also a good punishment for rogue application :) 2015-10-04 Uros Bizjak <ubiz...@gmail.com> * config/i386/i386.c (ix86_option_override_internal): Lower minimum allowed incoming stack boundary to 3 also for 64bit SSE targets. [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66697 [2] https://bugs.winehq.org/show_bug.cgi?id=27680 Uros. Index: config/i386/i386.c =================================================================== --- config/i386/i386.c (revision 228460) +++ config/i386/i386.c (working copy) @@ -5102,8 +5102,7 @@ ix86_option_override_internal (bool main_args_p, ix86_incoming_stack_boundary = ix86_default_incoming_stack_boundary; if (opts_set->x_ix86_incoming_stack_boundary_arg) { - int min = (TARGET_64BIT_P (opts->x_ix86_isa_flags) - ? (TARGET_SSE_P (opts->x_ix86_isa_flags) ? 4 : 3) : 2); + int min = TARGET_64BIT_P (opts->x_ix86_isa_flags) ? 3 : 2; if (opts->x_ix86_incoming_stack_boundary_arg < min || opts->x_ix86_incoming_stack_boundary_arg > 12)