On August 9, 2020 12:30:32 PM GMT+02:00, Roger Sayle 
<ro...@nextmovesoftware.com> wrote:
>
>This patch fixes a subtle bug in the depths of GCC's synth_mult,
>where the middle-end queries whether (how well) the target supports
>widening and highpart multiplications by calling targetm.rtx_costs.
>The code in init_expmed and init_expmed_one_mode iterates over various
>RTL patterns querying the cost of each.  To avoid generating & garbage
>collecting too much junk, it reuses the same RTL over and over, but
>adjusting the modes between each call.
>
>Alas this reuse of state is a little fragile, and at some point a
>change to init_expmed_one_conv has resulted in the state (mode of
>a register) being changed, but not reset before being used again.
>
>The fallout is that GCC currently queries the backend for the cost
>of non-sense RTL such as:
>
>(mult:HI (zero_extend:HI (reg:TI 82))
>    (zero_extend:HI (reg:TI 82)))
>
>and
>
>(lshiftrt:HI (mult:HI (zero_extend:HI (reg:TI 82))
>        (zero_extend:HI (reg:TI 82)))
>    (const_int 8 [0x8]))
>
>The fix is to set the mode of the register back to its assumed
>state, as (reg:QI 82) in the above patterns makes much more sense.
>
>Using the old software engineering/defensive programming maxim of
>"why fix a bug just once, if it can be fixed in multiple places",
>this patch both restores the original value in init_expmed_one_conv,
>and also sets it to the expected value in init_expmed_one_mode.
>This should hopefully signal the need to be careful of invariants for
>anyone modifying this code in future.
>
>Alas things are rarely simple...  Fixing this obviously incorrect
>logic causes a failure of gcc.target/i386/pr71321.c that tests for
>a particular expansion from synth_mult.  The issue here is that this
>test is checking for a specific multiplication expansion, when it
>should really be checking that we don't generate the inefficient
>"leal 0(,%rax,4), %edx" forms that were produced in GCC v6, as
>reported in the PR target/71321.  Now that we use correct costs,
>GCC uses a multiply instruction matching icc, LLVM and GCC prior
>to v4.8.  I've even microbenchmarked this function (over several
>minutes) with (disappointingly) no difference in timings.  Three
>dependent leas has 3-cycle latency, exactly the same as a widening
>byte multiply (on Haswell), so the shorter code splits the tie.
>[I have a follow-up patch that may improve things further].
>
>Before:
>        movzbl  %dil, %eax
>        leal    (%rax,%rax,4), %edx
>        leal    (%rax,%rdx,8), %edx
>        leal    (%rdx,%rdx,4), %edx
>        shrw    $11, %dx
>        leal    (%rdx,%rdx,4), %eax
>        ...
>
>After:
>        movl    $-51, %edx
>        movl    %edx, %eax
>        mulb    %dil
>        movl    %eax, %edx
>        shrw    $11, %dx
>        leal    (%rdx,%rdx,4), %eax
>        ...
>
>
>This patch has been tested on x86_64-pc-linux-gnu with a "make
>bootstrap" and "make -k check" with no new failures.
>Ok for mainline?

OK. 

Richard. 

>
>2020-08-09  Roger Sayle  <ro...@nextmovesoftware.com>
>
>gcc/ChangeLog
>       * expmed.c (init_expmed_one_conv): Restore all->reg's mode.
>       (init_expmed_one_mode): Set all->reg to desired mode.
>
>gcc/testsuite/ChangeLog
>       PR target/71321
>       * gcc.target/i386/pr71321.c: Check that the code doesn't use
>       the 4B zero displacement lea, not that it uses lea.
>
>Thanks in advance,
>Roger
>--
>Roger Sayle
>NextMove Software
>Cambridge, UK

Reply via email to