On Thu, Jul 8, 2021 at 10:25 AM Roger Sayle <ro...@nextmovesoftware.com> wrote:
>
>
> This patch tweaks the way GCC handles 32-bit integer division on
> x86_64, when the numerator is constant.  Currently the function
>
> int foo (int x) {
>   return 100/x;
> }
>
> generates the code:
> foo:    movl    $100, %eax
>         cltd
>         idivl   %edi
>         ret
>
> where the sign-extension instruction "cltd" creates a long
> dependency chain, as it depends on the "mov" before it, and
> is depended upon by "idivl" after it.
>
> With this patch, GCC now matches both icc and LLVM and
> uses an xor instead, generating:
> foo:    xorl    %edx, %edx
>         movl    $100, %eax
>         idivl   %edi
>         ret

You made me lookup idiv and I figured we're not optimally
handling

int foo (long x, int y)
{
  return x / y;
}

by using a 32:32 / 32 bit divide.  combine manages to
see enough to eventually do this though.

> Microbenchmarking confirms that this is faster on Intel
> processors (Kaby lake), and no worse on AMD processors (Zen2),
> which agrees with intuition, but oddly disagrees with the
> llvm-mca cycle count prediction on godbolt.org.
>
> The tricky bit is that this sign-extension instruction is only
> produced by late (postreload) splitting, and unfortunately none
> of the subsequent passes (e.g. cprop_hardreg) is able to
> propagate and simplify its constant argument.  The solution
> here is to introduce a define_insn_and_split that allows the
> constant numerator operand to be captured (by combine) and
> then split into an optimal form after reload.
>
> The above microbenchmarking also shows that eliminating the
> sign extension of negative values (using movl $-1,%edx) is also
> a performance improvement, as performed by icc but not by LLVM.
> Both the xor and movl sign-extensions are larger than cltd,
> so this transformation is prevented for -Os.
>
>
> This patch has been tested on x86_64-pc-linux-gnu with a "make
> bootstrap" and "make -k check" with no new failures.
>
> Ok for mainline?
>
>
> 2021-07-08  Roger Sayle  <ro...@nextmovesoftware.com>
>
> gcc/ChangeLog
>         * config/i386/i386.md (*divmodsi4_const): Optimize SImode
>         divmod of a constant numerator with new define_insn_and_split.
>
> gcc/testsuite/ChangeLog
>         * gcc.target/i386/divmod-9.c: New test case.
>
>
> Roger
> --
> Roger Sayle
> NextMove Software
> Cambridge, UK
>

Reply via email to