Hi David,
> The unrolled loop (deleted) looks excessive.
> On a modern cpu with multiple execution units you can usually
> manage to get the loop overhead to execute in parallel to the
> actual 'work'.
> So I suspect that a much simpler 'word at a time' loop will be
> almost as fast - especially in the case where the code isn't
> already in the cache and the compare is relatively short.
I'm always keen to keep things as simple as possible, but your loop is
over 50% slower. Once the loop hits a steady state you are going to run
into front end issues with instruction fetch on POWER8.
Anton
> Try something based on:
> a1 = *a++;
> b1 = *b++;
> while {
> a2 = *a++;
> b2 = *b++;
> if (a1 != a2)
> break;
> a1 = *a++;
> b1 = *b++;
> } while (a2 != a1);
>
> David
>
_______________________________________________
Linuxppc-dev mailing list
[email protected]
https://lists.ozlabs.org/listinfo/linuxppc-dev