On Fri, May 22, 2015 at 02:32:42PM -0500, Scott Wood wrote: > > I'd also have thought that the 64bit C version above would be generally > > 'good'. > > It doesn't generate the addc/addze sequence. At least with GCC 4.8.2, > it does something like: > > mr tmp0, csum > li tmp1, 0 > li tmp2, 0 > addc tmp3, addend, tmp0 > adde csum, tmp2, tmp1 > add csum, csum, tmp3
Right. Don't expect older compilers to do sane things here. All this begs a question... If it is worth spending so much time micro-optimising this, why not pick the low-hanging fruit first? Having a 32-bit accumulator for ones' complement sums, on a 64-bit system, is not such a great idea. Segher -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/