On Thu, Jul 8, 2021 at 10:25 AM Roger Sayle <ro...@nextmovesoftware.com> wrote:
>
>
> This patch tweaks the way GCC handles 32-bit integer division on
> x86_64, when the numerator is constant.  Currently the function
>
> int foo (int x) {
>   return 100/x;
> }
>
> generates the code:
> foo:    movl    $100, %eax
>         cltd
>         idivl   %edi
>         ret
>
> where the sign-extension instruction "cltd" creates a long
> dependency chain, as it depends on the "mov" before it, and
> is depended upon by "idivl" after it.
>
> With this patch, GCC now matches both icc and LLVM and
> uses an xor instead, generating:
> foo:    xorl    %edx, %edx
>         movl    $100, %eax
>         idivl   %edi
>         ret
>
> Microbenchmarking confirms that this is faster on Intel
> processors (Kaby lake), and no worse on AMD processors (Zen2),
> which agrees with intuition, but oddly disagrees with the
> llvm-mca cycle count prediction on godbolt.org.
>
> The tricky bit is that this sign-extension instruction is only
> produced by late (postreload) splitting, and unfortunately none
> of the subsequent passes (e.g. cprop_hardreg) is able to
> propagate and simplify its constant argument.  The solution
> here is to introduce a define_insn_and_split that allows the
> constant numerator operand to be captured (by combine) and
> then split into an optimal form after reload.
>
> The above microbenchmarking also shows that eliminating the
> sign extension of negative values (using movl $-1,%edx) is also
> a performance improvement, as performed by icc but not by LLVM.
> Both the xor and movl sign-extensions are larger than cltd,
> so this transformation is prevented for -Os.
>
>
> This patch has been tested on x86_64-pc-linux-gnu with a "make
> bootstrap" and "make -k check" with no new failures.
>
> Ok for mainline?
>
>
> 2021-07-08  Roger Sayle  <ro...@nextmovesoftware.com>
>
> gcc/ChangeLog
>         * config/i386/i386.md (*divmodsi4_const): Optimize SImode
>         divmod of a constant numerator with new define_insn_and_split.
>
> gcc/testsuite/ChangeLog
>         * gcc.target/i386/divmod-9.c: New test case.

+  if (INTVAL (operands[2]) < 0)
+    emit_move_insn (operands[1], constm1_rtx);
+  else
+    ix86_expand_clear (operands[1]);

No need to call ix86_expand_clear,

    emit_move_insn (operands[1], const0_rtx);

will result in xor, too.

OK with the above change.

Thanks,
Uros.

>
>
> Roger
> --
> Roger Sayle
> NextMove Software
> Cambridge, UK
>

Reply via email to