It indicated that sibling calling optimization in main should be disabled for targets that need to up the stack alignment, otherwise you get the stack alignment of a lower one than
While that may be true, I think the problem is broader.
I took out the main1() function and put it into a separate file, and compiled just that. So now there is no carnal knowledge of main or its stack alignment. The generated code for this stand-alone main1() makes no attempt to align the stack or the stack variables it is going to be passing to the movdqa instruction. Unless thats what you mean by:
that is required. You have to look to see what changed between 3.4.0 and 4.0.0 that caused this since it is a regression. I think the issue is that we are detecting them at the tree level but not rejecting them when expanding. So you have to look at the expand functions for that.
You're using internals verbiage thats beyond me :) I'm a simple porter, I have very little understanding of the actual internals of GCC.
The reason why nobody notices this before is because most x86 OS's now a days align their stack going into main as 16byte aligned which was what my comment about fixing your OS was about, it was more of a joke rather than anything else.
Ok I appologise Andrew. I took it as a SCO-bash. My bad. However, I dont think the stack being aligned on a 16-byte boundary into main will help, unless GCC is assuming (and I dont see how it possibly could) that every function would likewise be aligned. The fact that a stand-alone version of main1() was not correctly aligned leads me to believe that the real error is that gcc is not making an attempt to align the stack variables for use by the alignment-sensitive vector insns. Also, when you say "stack going into main is 16 byte aligned", what specifically do you mean? that its 16-byte aligned before the call to main() itself? That at the first insn in main, most likely a push %ebp, its 16-byte aligned (i.e does the call to main from crt1.o have to take the push of the return address into account)? Kean PS, here is the generated assembly for main() as a stand-alone function, nothing else defined in the .c file: .file "foo.c" .version "01.01" .section .rodata .align 32 .type C.0.1458, @object .size C.0.1458, 32 C.0.1458: .long 0 .long 3 .long 6 .long 9 .long 12 .long 15 .long 18 .long 21 .text .align 16 .globl main1 .type main1, @function main1: pushl %ebp movl $8, %ecx movl %esp, %ebp pushl %edi cld pushl %esi leal -40(%ebp), %edi subl $64, %esp movl $C.0.1458, %esi rep movsl xorl %edx, %edx leal -40(%ebp), %esi leal -72(%ebp), %ecx .align 16 .L2: leal 0(,%edx,4), %eax addl $4, %edx cmpl $8, %edx movdqa (%esi,%eax), %xmm0 movdqa %xmm0, (%ecx,%eax) jne .L2 movb $1, %dl .align 16 .L4: movl -4(%ecx,%edx,4), %eax cmpl -4(%esi,%edx,4), %eax jne .L14 incl %edx cmpl $9, %edx jne .L4 addl $64, %esp xorl %eax, %eax popl %esi popl %edi popl %ebp ret .L14: call abort .size main1, .-main1 .ident "GCC: (GNU) 4.0.3 20051013 (prerelease)" # cat foo.c #define N 8 int main1 () { int b[N] = {0,3,6,9,12,15,18,21}; int a[N]; int i; for (i = 0; i < N; i++) { a[i] = b[i]; } /* check results: */ for (i = 0; i < N; i++) { if (a[i] != b[i]) abort (); } return 0; }