https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97282

--- Comment #1 from Thomas Koenig <tkoenig at gcc dot gnu.org> ---
And here is a version which uses two 64-bit numbers for calculation of he
sum of decimal digits as a benchmark for the division and modulo:

unsigned long digsum3 (myint x)
{
  unsigned long ret;
  __uint64_t high, low;
  const unsigned long int rem_high[10] = {0,6,2,8,4,0,6,2,8,4};
  const unsigned long int foo_high[10] =
    {0x0000000000000000, 0x1999999999999999, 0x3333333333333333,
0x4CCCCCCCCCCCCCCC,
     0x6666666666666666, 0x8000000000000000, 0x9999999999999999,
0xB333333333333333,
     0xCCCCCCCCCCCCCCCC, 0xE666666666666666 };

  high = x >> 64;
  low = x;
  ret = 0;
  while (low > 0 || high > 0)
    {
      unsigned long r_high, r_low, r_sum, r_carry;
      r_high = high % 10;
      r_carry = rem_high[r_high];
      high = high / 10;
      r_low = low % 10;
      low = low / 10;
      low = low + foo_high[r_high];
      r_sum = r_low + r_carry;
      if (r_sum >= 10)
        {
          r_sum = r_sum - 10;
          low ++;
        }
      ret = ret + r_sum;
    }
  return ret;
}

It is _much_ faster, taking around 250 to 260 cycles per
calculation, a speedup of a factor of around 8 versus the
original code.

Reply via email to