On Fri, May 27, 2016 at 07:45:18AM +0200, Christophe Leroy wrote: > >>Wouldn't it be better to add nops before the function entry in order to > >>get the hot loop aligned, instead of adding nops in the middle of the > >>function ? > >Why would that be better? The nops are executed once per function call > >in either case, there are the same number of nops in either case, and > >on most CPUs nops aren't actually executed anyway (they are decoded and > >the thrown away). > > > The idea was to not execute them: > > |.balign 16 nop nop _GLOBAL(strcpy) addi r5,r3,-1 addi r4,r4,-1 1: > lbzu r0,1(r4) cmpwi 0,r0,0 stbu r0,1(r5) bne 1b blr |
That performs _worse_ on most modern CPUs (the first decode will decode less, so instructions are available for execution later). That's why functions are aligned in the first place! Segher _______________________________________________ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev