Hi all, While digging into some GCC-generated code, I noticed a missed opportunity in GCC that Clang and ICC seem to take advantage of. All versions of GCC (up to 4.9.0) seem to have the same trouble. The following source (for x86_64) shows up the problem:
----- #include <cstdint> #define add_carry32(sum, v) __asm__("addl %1, %0 ;" \ "adcl $0, %0 ;" \ : "=r" (sum) \ : "g" ((uint32_t) v), "0" (sum)) unsigned sorta_checksum(const void* src, int n, unsigned sum) { const uint32_t *s4 = (const uint32_t*) src; const uint32_t *es4 = s4 + (n >> 2); while( s4 != es4 ) { add_carry32(sum, *s4++); } add_carry32(sum, *(const uint16_t*) s4); return sum; } ----- (the example is a contrived version of the original code, which comes from Solarflare's OpenOnload project). GCC optimizes the loop but then re-calculates the "s4" variable outside of the loop before the last add_carry32. ICC and Clang both realise that the 's4' value in the loop is fine to re-use. GCC has an extra four instructions to calculate the same value known to be in a register upon loop exit. Compiler explorer links: GCC 4.9.0: http://goo.gl/fi3p2J ICC 13.0.1: http://goo.gl/PRTTc6 Clang 3.4.1: http://goo.gl/95JEQc I'll happily file a bug if necessary but I'm not clear in what phase the optimization opportunity has been missed. Thanks all, Matt