Using gcc 4.4.4 -Os on
loop(long *to, long *from, long len)
{
for (; len; --len)
*++to = *++from;
}
I get
/* gcc 4.4.4 -Os
loop:
addi 5,5,1
li 9,0
mtctr 5
b .L2
.L3:
lwzx 0,4,9
stwx 0,3,9
.L2:
addi 9,9,4
bdnz .L3
blr
*/
gcc 3.4.6 has:
/* gcc 3.4.6 -Os
loop:
mr. 0,5
mtctr 0
beqlr- 0
.L8:
lwzu 0,4(4)
stwu 0,4(3)
bdnz .L8
blr
*/
It doesn't matter which cpu type I use. It seems impossible
to make gcc produce small/faster code with newer gcc.
Perhaps lwzx/stwx is faster on bigger Power cpus but this
can be true for all cpus, can it?
That should matter though because I asked gcc to produce smaller
code with -Os
Jocke