Using gcc 4.4.4 -Os on loop(long *to, long *from, long len) { for (; len; --len) *++to = *++from; } I get /* gcc 4.4.4 -Os loop: addi 5,5,1 li 9,0 mtctr 5 b .L2 .L3: lwzx 0,4,9 stwx 0,3,9 .L2: addi 9,9,4 bdnz .L3 blr */
gcc 3.4.6 has: /* gcc 3.4.6 -Os loop: mr. 0,5 mtctr 0 beqlr- 0 .L8: lwzu 0,4(4) stwu 0,4(3) bdnz .L8 blr */ It doesn't matter which cpu type I use. It seems impossible to make gcc produce small/faster code with newer gcc. Perhaps lwzx/stwx is faster on bigger Power cpus but this can be true for all cpus, can it? That should matter though because I asked gcc to produce smaller code with -Os Jocke