It indicated that sibling calling optimization in main should
be disabled for targets that need to up the stack alignment,
otherwise you get the stack alignment of a lower one than
While that may be true, I think the problem is broader.

I took out the main1() function and put it into a separate
file, and compiled just that. So now there is no carnal
knowledge of main or its stack alignment. The generated
code for this stand-alone main1() makes no attempt to
align the stack or the stack variables it is going to be
passing to the movdqa instruction. Unless thats what you
mean by:
that is required.  You have to look to see what changed
between 3.4.0 and 4.0.0 that caused this since it is a
regression.  I think the issue is that we are detecting them
at the tree level but not rejecting them when expanding.  So you
have to look at the expand functions for that.

You're using internals verbiage thats beyond me :) I'm a
simple porter, I have very little understanding of the actual
internals of GCC.

The reason why nobody notices this before is because most x86 OS's
now a days align their stack going into main as 16byte aligned
which was what my comment about fixing your OS was about, it was
more of a joke rather than anything else.
Ok I appologise Andrew. I took it as a SCO-bash. My bad.

However, I dont think the stack being aligned on a 16-byte
boundary into main will help, unless GCC is assuming (and I
dont see how it possibly could) that every function would
likewise be aligned. The fact that a stand-alone version of
main1() was not correctly aligned leads me to believe that
the real error is that gcc is not making an attempt to
align the stack variables for use by the alignment-sensitive
vector insns.

Also, when you say "stack going into main is 16 byte aligned",
what specifically do you mean? that its 16-byte aligned before
the call to main() itself? That at the first insn in main, most
likely a push %ebp, its 16-byte aligned (i.e does the call
to main from crt1.o have to take the push of the return address
into account)?

Kean

PS, here is the generated assembly for main() as a stand-alone
function, nothing else defined in the .c file:

        .file   "foo.c"
        .version        "01.01"
        .section        .rodata
        .align 32
        .type   C.0.1458, @object
        .size   C.0.1458, 32
C.0.1458:
        .long   0
        .long   3
        .long   6
        .long   9
        .long   12
        .long   15
        .long   18
        .long   21
        .text
        .align 16
        .globl  main1
        .type   main1, @function
main1:
        pushl   %ebp
        movl    $8, %ecx
        movl    %esp, %ebp
        pushl   %edi
        cld
        pushl   %esi
        leal    -40(%ebp), %edi
        subl    $64, %esp
        movl    $C.0.1458, %esi
        rep
        movsl
        xorl    %edx, %edx
        leal    -40(%ebp), %esi
        leal    -72(%ebp), %ecx
        .align 16
.L2:
        leal    0(,%edx,4), %eax
        addl    $4, %edx
        cmpl    $8, %edx
        movdqa  (%esi,%eax), %xmm0
        movdqa  %xmm0, (%ecx,%eax)
        jne     .L2
        movb    $1, %dl
        .align 16
.L4:
        movl    -4(%ecx,%edx,4), %eax
        cmpl    -4(%esi,%edx,4), %eax
        jne     .L14
        incl    %edx
        cmpl    $9, %edx
        jne     .L4
        addl    $64, %esp
        xorl    %eax, %eax
        popl    %esi
        popl    %edi
        popl    %ebp
        ret
.L14:
        call    abort
        .size   main1, .-main1
        .ident  "GCC: (GNU) 4.0.3 20051013 (prerelease)"

# cat foo.c
#define N 8

int main1 ()
{
  int b[N] = {0,3,6,9,12,15,18,21};
  int a[N];
  int i;

  for (i = 0; i < N; i++)
    {
      a[i] = b[i];
    }

  /* check results:  */
  for (i = 0; i < N; i++)
    {
      if (a[i] != b[i])
        abort ();
    }

  return 0;
}

Reply via email to