On August 9, 2020 12:30:32 PM GMT+02:00, Roger Sayle <ro...@nextmovesoftware.com> wrote: > >This patch fixes a subtle bug in the depths of GCC's synth_mult, >where the middle-end queries whether (how well) the target supports >widening and highpart multiplications by calling targetm.rtx_costs. >The code in init_expmed and init_expmed_one_mode iterates over various >RTL patterns querying the cost of each. To avoid generating & garbage >collecting too much junk, it reuses the same RTL over and over, but >adjusting the modes between each call. > >Alas this reuse of state is a little fragile, and at some point a >change to init_expmed_one_conv has resulted in the state (mode of >a register) being changed, but not reset before being used again. > >The fallout is that GCC currently queries the backend for the cost >of non-sense RTL such as: > >(mult:HI (zero_extend:HI (reg:TI 82)) > (zero_extend:HI (reg:TI 82))) > >and > >(lshiftrt:HI (mult:HI (zero_extend:HI (reg:TI 82)) > (zero_extend:HI (reg:TI 82))) > (const_int 8 [0x8])) > >The fix is to set the mode of the register back to its assumed >state, as (reg:QI 82) in the above patterns makes much more sense. > >Using the old software engineering/defensive programming maxim of >"why fix a bug just once, if it can be fixed in multiple places", >this patch both restores the original value in init_expmed_one_conv, >and also sets it to the expected value in init_expmed_one_mode. >This should hopefully signal the need to be careful of invariants for >anyone modifying this code in future. > >Alas things are rarely simple... Fixing this obviously incorrect >logic causes a failure of gcc.target/i386/pr71321.c that tests for >a particular expansion from synth_mult. The issue here is that this >test is checking for a specific multiplication expansion, when it >should really be checking that we don't generate the inefficient >"leal 0(,%rax,4), %edx" forms that were produced in GCC v6, as >reported in the PR target/71321. Now that we use correct costs, >GCC uses a multiply instruction matching icc, LLVM and GCC prior >to v4.8. I've even microbenchmarked this function (over several >minutes) with (disappointingly) no difference in timings. Three >dependent leas has 3-cycle latency, exactly the same as a widening >byte multiply (on Haswell), so the shorter code splits the tie. >[I have a follow-up patch that may improve things further]. > >Before: > movzbl %dil, %eax > leal (%rax,%rax,4), %edx > leal (%rax,%rdx,8), %edx > leal (%rdx,%rdx,4), %edx > shrw $11, %dx > leal (%rdx,%rdx,4), %eax > ... > >After: > movl $-51, %edx > movl %edx, %eax > mulb %dil > movl %eax, %edx > shrw $11, %dx > leal (%rdx,%rdx,4), %eax > ... > > >This patch has been tested on x86_64-pc-linux-gnu with a "make >bootstrap" and "make -k check" with no new failures. >Ok for mainline?
OK. Richard. > >2020-08-09 Roger Sayle <ro...@nextmovesoftware.com> > >gcc/ChangeLog > * expmed.c (init_expmed_one_conv): Restore all->reg's mode. > (init_expmed_one_mode): Set all->reg to desired mode. > >gcc/testsuite/ChangeLog > PR target/71321 > * gcc.target/i386/pr71321.c: Check that the code doesn't use > the 4B zero displacement lea, not that it uses lea. > >Thanks in advance, >Roger >-- >Roger Sayle >NextMove Software >Cambridge, UK