Hi all,

While digging into some GCC-generated code, I noticed a missed
opportunity in GCC that Clang and ICC seem to take advantage of. All
versions of GCC (up to 4.9.0) seem to have the same trouble. The
following source (for x86_64) shows up the problem:

-----
#include <cstdint>

#define add_carry32(sum, v)  __asm__("addl %1, %0 ;"  \
"adcl $0, %0 ;"  \
: "=r" (sum)  \
: "g" ((uint32_t) v), "0" (sum))

unsigned sorta_checksum(const void* src, int n, unsigned sum)
{
  const uint32_t *s4 = (const uint32_t*) src;
  const uint32_t *es4 = s4 + (n >> 2);

  while( s4 != es4 ) {
    add_carry32(sum, *s4++);
  }

  add_carry32(sum, *(const uint16_t*) s4);
  return sum;
}
-----

(the example is a contrived version of the original code, which comes
from Solarflare's OpenOnload project).

GCC optimizes the loop but then re-calculates the "s4" variable
outside of the loop before the last add_carry32.  ICC and Clang both
realise that the 's4' value in the loop is fine to re-use. GCC has an
extra four instructions to calculate the same value known to be in a
register upon loop exit.

Compiler explorer links:
GCC 4.9.0: http://goo.gl/fi3p2J
ICC 13.0.1: http://goo.gl/PRTTc6
Clang 3.4.1: http://goo.gl/95JEQc

I'll happily file a bug if necessary but I'm not clear in what phase
the optimization opportunity has been missed.

Thanks all, Matt

Reply via email to