On Mon, 2015-01-12 at 11:55 +1100, Anton Blanchard wrote: > Hi David, > > > The unrolled loop (deleted) looks excessive. > > On a modern cpu with multiple execution units you can usually > > manage to get the loop overhead to execute in parallel to the > > actual 'work'. > > So I suspect that a much simpler 'word at a time' loop will be almost as > > fast - especially in the case where the code isn't > > already in the cache and the compare is relatively short. > > I'm always keen to keep things as simple as possible, but your loop is over > 50% slower. Once the loop hits a steady state you are going to run into front > end issues with instruction fetch on POWER8. >
Out of curiosity, does preincrement make any difference(or can gcc do that for you nowadays)? a1 = *a; b1 = *b; while { a2 = *++a; b2 = *++b; if (a1 != a2) break; a1 = *++a; b1 = *++b; } while (a2 != a1); Jocke > Anton > > > Try something based on: > > a1 = *a++; > > b1 = *b++; > > while { > > a2 = *a++; > > b2 = *b++; > > if (a1 != a2) > > break; > > a1 = *a++; > > b1 = *b++; > > } while (a2 != a1); > > > > David > > > > _______________________________________________ > Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org > https://lists.ozlabs.org/listinfo/linuxppc-dev _______________________________________________ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev