Hi David, > The unrolled loop (deleted) looks excessive. > On a modern cpu with multiple execution units you can usually > manage to get the loop overhead to execute in parallel to the > actual 'work'. > So I suspect that a much simpler 'word at a time' loop will be > almost as fast - especially in the case where the code isn't > already in the cache and the compare is relatively short.
I'm always keen to keep things as simple as possible, but your loop is over 50% slower. Once the loop hits a steady state you are going to run into front end issues with instruction fetch on POWER8. Anton > Try something based on: > a1 = *a++; > b1 = *b++; > while { > a2 = *a++; > b2 = *b++; > if (a1 != a2) > break; > a1 = *a++; > b1 = *b++; > } while (a2 != a1); > > David > _______________________________________________ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev