On Wed, 2009-06-03 at 08:51 +1000, Benjamin Herrenschmidt wrote: > On Tue, 2009-06-02 at 20:45 +0200, Albrecht Dreß wrote: > > > > > which drops the r1 accesses, but still produces the sub-optimal loop. > > Is this a gcc regression, or did I miss something here? Probably the > > only bullet-proof way is to write some core loops in assembly... :-/ > > Well, gcc may be right here. What you call the "optimal" loop uses the > lwzu instruction. An interesting thing about this instruction is that > it updates two GPRs at completion (I'm ignoring the load multiple and > string instructions on purpose here).
> I wouldn't be surprised thus if the loop variant with the separate add > ends up more efficient on most implementations around. On an e300 core using the lwzu/stwu is about 20% faster so at least one core prefer that optimization. _______________________________________________ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev