Hi David,

> The unrolled loop (deleted) looks excessive.
> On a modern cpu with multiple execution units you can usually
> manage to get the loop overhead to execute in parallel to the
> actual 'work'.
> So I suspect that a much simpler 'word at a time' loop will be
> almost as fast - especially in the case where the code isn't
> already in the cache and the compare is relatively short.

I'm always keen to keep things as simple as possible, but your loop is
over 50% slower. Once the loop hits a steady state you are going to run
into front end issues with instruction fetch on POWER8.

Anton

> Try something based on:
>       a1 = *a++;
>       b1 = *b++;
>       while {
>               a2 = *a++;
>               b2 = *b++;
>               if (a1 != a2)
>                       break;
>               a1 = *a++;
>               b1 = *b++;
>       } while (a2 != a1);
> 
>       David
> 

_______________________________________________
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Reply via email to