------- Comment #14 from dave dot korn dot cygwin at gmail dot com 2009-01-25 06:05 ------- Adding "-mpreferred-stack-boundary=2" to the command line generates correct code.
Here are the diffs between code generated by that setting and the default (-mpreferred-stack-boundary=4) for the start of the function: --- eh-sb2.s 2009-01-25 05:24:46.718750000 +0000 +++ eh-sb4.s 2009-01-25 05:26:19.187500000 +0000 @@ -10,19 +10,20 @@ _main: pushl %ebp movl %esp, %ebp + andl $-16, %esp pushl %edi pushl %esi pushl %ebx - subl $68, %esp - movl $___gxx_personality_sj0, -40(%ebp) - movl $LLSDA0, -36(%ebp) - leal -32(%ebp), %eax - leal -12(%ebp), %edx + subl $84, %esp + movl $___gxx_personality_sj0, 52(%esp) + movl $LLSDA0, 56(%esp) + leal 60(%esp), %eax + leal 80(%esp), %edx movl %edx, (%eax) movl $L5, %edx movl %edx, 4(%eax) movl %esp, 8(%eax) - leal -64(%ebp), %eax + leal 28(%esp), %eax movl %eax, (%esp) call __Unwind_SjLj_Register So.... I think I'm starting to grok what's happening here. Because of the larger stack alignment required, and because the incoming stack alignment is only 8, not 16, we have to use an AND to mask and align the incoming esp. Now that means then that we have a hole of unknown size in our stack frame, just below the frame pointer at the top end. So because this gap is unkown, we can't index down from the frame pointer %ebp to the rest of the stack frame any more, which is why we have to turn the elimination basis upside down and calculate all the eliminations upward from esp instead. (The gap is in fact composed of two components. The dynamic adjustment needed to align the incoming stack, which cannot be known at compile time, and then the extra space allocated to the stack frame to ensure its size is a multiple of the alignment so that the lower end of the frame is also aligned. Although this second part is known at compile-time, as long as the first part is unpredictable we have to do the eliminations from the stack base, not frame pointer). This is all fine for most stack frame contents, but it goes wrong in exactly the same-but-opposite way if we're trying to access items of the stack frame *above* the gap - and that's what's happening in my test case, because we're trying to get the address of HARD_FRAME_POINTER, aka the value in $ebp, aka 4 or 8 bytes below the ARG_POINTER (compile-time known constant offset). So the one or two items above the gap - the frame pointer and the return pc value (Does this bug affect __builtin_return_address(0) as well, by any chance? I haven't checked) - would have to still be eliminated against HARD_FRAME_POINTER and denied elimination against STACK_POINTER in the case where there is going to be stack realignment in the prologue. And that is presumably the intention of this if clause in ix86_can_eliminate: if (stack_realign_fp) return ((from == ARG_POINTER_REGNUM && to == HARD_FRAME_POINTER_REGNUM) || (from == FRAME_POINTER_REGNUM && to == STACK_POINTER_REGNUM)); else [ ... ] I'll look at why it's not doing what it's supposed to. One possibility is that stack_realign_fp isn't becoming true until after the elimination has already taken place. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38952