> -----Original Message----- > From: Uros Bizjak <ubiz...@gmail.com> > Sent: Thursday, November 2, 2023 3:23 AM > To: Roger Sayle <ro...@nextmovesoftware.com> > Cc: gcc-patches@gcc.gnu.org > Subject: Re: [x86_64 PATCH] PR target/110551: Tweak mulx register allocation > using peephole2. > > On Wed, Nov 1, 2023 at 1:58 PM Roger Sayle <ro...@nextmovesoftware.com> > wrote: > > > > > > Hi Uros, > > > > > From: Uros Bizjak <ubiz...@gmail.com> > > > Sent: 01 November 2023 10:05 > > > Subject: Re: [x86_64 PATCH] PR target/110551: Tweak mulx register > allocation > > > using peephole2. > > > > > > On Mon, Oct 30, 2023 at 6:27 PM Roger Sayle > <ro...@nextmovesoftware.com> > > > wrote: > > > > > > > > > > > > This patch is a follow-up to my previous PR target/110551 patch, this > > > > time to address the additional move after mulx, seen on TARGET_BMI2 > > > > architectures (such as -march=haswell). The complication here is that > > > > the flexible multiple-set mulx instruction is introduced into RTL > > > > after reload, by split2, and therefore can't benefit from register > > > > preferencing. This results in RTL like the following: > > > > > > > > (insn 32 31 17 2 (parallel [ > > > > (set (reg:DI 4 si [orig:101 r ] [101]) > > > > (mult:DI (reg:DI 1 dx [109]) > > > > (reg:DI 5 di [109]))) > > > > (set (reg:DI 5 di [ r+8 ]) > > > > (umul_highpart:DI (reg:DI 1 dx [109]) > > > > (reg:DI 5 di [109]))) > > > > ]) "pr110551-2.c":8:17 -1 > > > > (nil)) > > > > > > > > (insn 17 32 9 2 (set (reg:DI 0 ax [107]) > > > > (reg:DI 5 di [ r+8 ])) "pr110551-2.c":9:40 90 {*movdi_internal} > > > > (expr_list:REG_DEAD (reg:DI 5 di [ r+8 ]) > > > > (nil))) > > > > > > > > Here insn 32, the mulx instruction, places its results in si and di, > > > > and then immediately after decides to move di to ax, with di now dead. > > > > This can be trivially cleaned up by a peephole2. I've added an > > > > additional constraint that the two SET_DESTs can't be the same > > > > register to avoid confusing the middle-end, but this has well-defined > > > > behaviour on x86_64/BMI2, encoding a umul_highpart. > > > > > > > > For the new test case, compiled on x86_64 with -O2 -march=haswell: > > > > > > > > Before: > > > > mulx64: movabsq $-7046029254386353131, %rdx > > > > mulx %rdi, %rsi, %rdi > > > > movq %rdi, %rax > > > > xorq %rsi, %rax > > > > ret > > > > > > > > After: > > > > mulx64: movabsq $-7046029254386353131, %rdx > > > > mulx %rdi, %rsi, %rax > > > > xorq %rsi, %rax > > > > ret > > > > > > > > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap > > > > and make -k check, both with and without --target_board=unix{-m32} > > > > with no new failures. Ok for mainline? > > > > > > It looks that your previous PR110551 patch regressed -march=cascadelake > > > [1].
Actually it is not only on -march=cascadelake, w/o -march=cascadelake will also fail. Thx, Haochen > > > Let's fix these regressions first. > > > > > > [1] https://gcc.gnu.org/pipermail/gcc-patches/2023-October/634660.html > > > > > > Uros. > > > > This patch fixes that "regression". Originally, the test case in PR110551 > > contained > > one unnecessary mov on "default" x86_targets, but two extra movs on BMI2 > > targets, including -march=haswell and -march=cascadelake. The first patch > > eliminated one of these MOVs, this patch eliminates the second. I'm not > > sure > > that you can call it a regression, the added test failed when run with a > > non-standard > > -march setting. The good news is that test case doesn't have to be changed > > with > > this patch applied, i.e. the correct intended behaviour is no MOVs on all > architectures. > > I was not worried about the extra mov, but more about [2], also > referred from [1], but it looks that that regression was wrongly > attributed to your patch. > > [2] https://gcc.gnu.org/pipermail/gcc-regression/2023-October/078391.html > > > I'll admit the timing is unusual; I had already written and was regression > > testing a > > patch for the BMI2 issue, when the -march=cascadelake regression tester let > > me > > know it was required for folks that helpfully run the regression suite with > > non > > standard settings. i.e. a long standing bug that wasn't previously tested > > for by > > the testsuite. > > > > > > 2023-10-30 Roger Sayle <ro...@nextmovesoftware.com> > > > > > > > > gcc/ChangeLog > > > > PR target/110551 > > > > * config/i386/i386.md (*bmi2_umul<mode><dwi>3_1): Tidy condition > > > > as operands[2] with predicate register_operand must be !MEM_P. > > > > (peephole2): Optimize a mulx followed by a register-to-register > > > > move, to place result in the correct destination if possible. > > > > > > > > gcc/testsuite/ChangeLog > > > > PR target/110551 > > > > * gcc.target/i386/pr110551-2.c: New test case. > > The patch is OK. > > Thanks, > Uros.