https://gcc.gnu.org/bugzilla/show_bug.cgi?id=12849

--- Comment #7 from Cassio Neri <cassio.neri at gmail dot com> ---
Thanks for implementing the modular inverse algorithm in gcc. However, the
implementation has an issue. In some cases, for no obvious reason, the compiler
falls back to the old algorithm. For instance,

    bool f1(unsigned n) { return n % 10 == 5; }

as expected, uses the modular inverse algorithm and translates to

    f1(unsigned int):
      imull $-858993459, %edi, %edi
      subl $1, %edi
      rorl %edi
      cmpl $429496729, %edi
      setbe %al
      ret

whereas

    bool f2(unsigned n) { return n % 10 == 6; }

doesn't use the modular inverse algorithm and is the same as in older versions
of gcc:

    f2(unsigned int):
      movl %edi, %eax
      movl $3435973837, %edx
      imulq %rdx, %rax
      shrq $35, %rax
      leal (%rax,%rax,4), %eax
      addl %eax, %eax
      subl %eax, %edi
      cmpl $6, %edi
      sete %al
      ret

See on godbolt: https://godbolt.org/z/u-C54I

I would like make another observation. For some divisors (e.g. 7, 19, 21) the
modular inverse algorithm seems to be faster than the traditional one even when
the remainder r (in n % d == r) is not a compile time constant. In general this
happens in cases where the "magic number" M used by the traditional algorithm
to replace the division "n / d" with "n * M >> k" is such that M doesn't fit in
a register and extra operations are required to overcome this problem. In other
words, these are the divisors for which '"Add" indicator' in
https://www.hackersdelight.org/magic.htm shows 1.

I made some measurements and I hope to make my results available for your
consideration soon.

Reply via email to