On both POWER6 and POWER7 this should be as fast as we can go since
we are limited by the latency of the adde instructions.

Not really.  Do you know how many 16/32-bit words you can add before a
64-bit register can overflow? :-)
If you ever have to call this with more than 16GB of data to sum, that's
easy to handle as well of course (just break it into pieces).


Segher

_______________________________________________
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Reply via email to