On both POWER6 and POWER7 this should be as fast as we can go since we are limited by the latency of the adde instructions.
Not really. Do you know how many 16/32-bit words you can add before a 64-bit register can overflow? :-) If you ever have to call this with more than 16GB of data to sum, that's easy to handle as well of course (just break it into pieces). Segher _______________________________________________ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev