------- Comment #10 from steven at gcc dot gnu dot org 2006-02-19 13:41 -------
I modified the test case a bit to make it easier to understand what is going
on:
void
do_sort (int *lst, int cnt)
{
int i, j, k;
for (i = 0; i < cnt - 1; i++)
{
for (j = i + 1; j < cnt; j++)
{
int lsti = lst[i];
int lstj = lst[j];
if (lsti > lstj)
{
lst[i] = lstj;
lst[j] = lsti;
}
}
}
}
This gives two very different inner loops:
GCC 4.0:
.L6:
movl -4(%esi), %ecx
movl (%edx), %eax
cmpl %eax, %ecx
jle .L7
movl %eax, -4(%esi)
movl %ecx, (%edx)
.L7:
addl $1, %ebx
addl $4, %edx
cmpl %edi, %ebx
jne .L6
GCC 4.1:
.L6:
movl 8(%ebp), %ebx
movl -4(%ebx,%eax,4), %ebx
movl %ebx, -20(%ebp)
movl 4(%ecx), %esi
movl %esi, -24(%ebp)
cmpl %esi, %ebx
jle .L7
movl 8(%ebp), %ebx
movl %esi, -4(%ebx,%eax,4)
movl -20(%ebp), %esi
movl %esi, 4(%ecx)
.L7:
addl $1, -28(%ebp)
addl $4, %ecx
cmpl -28(%ebp), %edi
jg .L6
So there are two problems:
- The addressing modes are different. This is due to the TARGET_MEM_REF
stuff that Zdenek added.
- We need at least one register more apparently, judging from the extra
stack moves.
Interestingly, if I change the test case to:
void
do_sort (int *lst, int cnt)
{
int i, j, k;
for (i = 0; i < cnt - 1; i++)
{
for (j = 0/*i + 1*/; j < cnt; j++)
{
int lsti = lst[i];
int lstj = lst[j];
if (lsti > lstj)
{
lst[i] = lstj;
lst[j] = lsti;
}
}
}
}
then the code produced by GCC 4.1 is 20% faster than what GCC 4.0 makes of it.
Zdenek, this really looks like one for you...
--
steven at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
AssignedTo|steven at gcc dot gnu dot |unassigned at gcc dot gnu
|org |dot org
Status|ASSIGNED |NEW
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26290