Take the following two functions, they should produce the same asm, the second
is better on powerpc
at least for the inner loop (6 instructions vs 8):
void daxpy(int n, float da, float dx[], int incx, float dy[], int incy)
{
int i,ix=0,iy=0,m,mp1;
mp1 = 0;
m = 0;
for (i = 0;i < n; i++){
dy[iy] = dy[iy] + dx[ix];
ix = ix + incx;
iy = iy + incy;
}
}
void daxpy1(int n, float da, float dx[], int incx, float dy[], int incy)
{
int i,ix=0,iy=0,m,mp1;
mp1 = 0;
m = 0;
for (i = 0;i < n; i++){
*(float*)(((char*)dy)+iy) = *(float*)(((char*)dy)+iy) +
*(float*)(((char*)dx)+ix);
ix = ix + incx*4;
iy = iy + incy*4;
}
}
inner loop for the first one:
L4:
slwi r2,r9,2
slwi r0,r11,2
lfsx f13,r5,r0
add r11,r11,r6
lfsx f0,r7,r2
add r9,r9,r8
fadds f0,f0,f13
stfsx f0,r7,r2
bdnz L4
the second one:
L11:
lfsx f0,r7,r0
lfsx f13,r5,r2
add r2,r2,r6
fadds f0,f0,f13
stfsx f0,r7,r0
add r0,r0,r8
bdnz L11
Yes this shows up in real code.
--
Summary: Missed IV optimization (redundant instruction in loop)
Product: gcc
Version: 4.0.0
Status: UNCONFIRMED
Keywords: missed-optimization
Severity: enhancement
Priority: P2
Component: tree-optimization
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: pinskia at gcc dot gnu dot org
CC: gcc-bugs at gcc dot gnu dot org
GCC target triplet: powerpc-darwin
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19126