On Thu, Jul 8, 2021 at 10:25 AM Roger Sayle <ro...@nextmovesoftware.com> wrote: > > > This patch tweaks the way GCC handles 32-bit integer division on > x86_64, when the numerator is constant. Currently the function > > int foo (int x) { > return 100/x; > } > > generates the code: > foo: movl $100, %eax > cltd > idivl %edi > ret > > where the sign-extension instruction "cltd" creates a long > dependency chain, as it depends on the "mov" before it, and > is depended upon by "idivl" after it. > > With this patch, GCC now matches both icc and LLVM and > uses an xor instead, generating: > foo: xorl %edx, %edx > movl $100, %eax > idivl %edi > ret
You made me lookup idiv and I figured we're not optimally handling int foo (long x, int y) { return x / y; } by using a 32:32 / 32 bit divide. combine manages to see enough to eventually do this though. > Microbenchmarking confirms that this is faster on Intel > processors (Kaby lake), and no worse on AMD processors (Zen2), > which agrees with intuition, but oddly disagrees with the > llvm-mca cycle count prediction on godbolt.org. > > The tricky bit is that this sign-extension instruction is only > produced by late (postreload) splitting, and unfortunately none > of the subsequent passes (e.g. cprop_hardreg) is able to > propagate and simplify its constant argument. The solution > here is to introduce a define_insn_and_split that allows the > constant numerator operand to be captured (by combine) and > then split into an optimal form after reload. > > The above microbenchmarking also shows that eliminating the > sign extension of negative values (using movl $-1,%edx) is also > a performance improvement, as performed by icc but not by LLVM. > Both the xor and movl sign-extensions are larger than cltd, > so this transformation is prevented for -Os. > > > This patch has been tested on x86_64-pc-linux-gnu with a "make > bootstrap" and "make -k check" with no new failures. > > Ok for mainline? > > > 2021-07-08 Roger Sayle <ro...@nextmovesoftware.com> > > gcc/ChangeLog > * config/i386/i386.md (*divmodsi4_const): Optimize SImode > divmod of a constant numerator with new define_insn_and_split. > > gcc/testsuite/ChangeLog > * gcc.target/i386/divmod-9.c: New test case. > > > Roger > -- > Roger Sayle > NextMove Software > Cambridge, UK >