> -----Original Message-----
> From: Uros Bizjak <ubiz...@gmail.com>
> Sent: Thursday, November 2, 2023 3:23 AM
> To: Roger Sayle <ro...@nextmovesoftware.com>
> Cc: gcc-patches@gcc.gnu.org
> Subject: Re: [x86_64 PATCH] PR target/110551: Tweak mulx register allocation
> using peephole2.
> 
> On Wed, Nov 1, 2023 at 1:58 PM Roger Sayle <ro...@nextmovesoftware.com>
> wrote:
> >
> >
> > Hi Uros,
> >
> > > From: Uros Bizjak <ubiz...@gmail.com>
> > > Sent: 01 November 2023 10:05
> > > Subject: Re: [x86_64 PATCH] PR target/110551: Tweak mulx register
> allocation
> > > using peephole2.
> > >
> > > On Mon, Oct 30, 2023 at 6:27 PM Roger Sayle
> <ro...@nextmovesoftware.com>
> > > wrote:
> > > >
> > > >
> > > > This patch is a follow-up to my previous PR target/110551 patch, this
> > > > time to address the additional move after mulx, seen on TARGET_BMI2
> > > > architectures (such as -march=haswell).  The complication here is that
> > > > the flexible multiple-set mulx instruction is introduced into RTL
> > > > after reload, by split2, and therefore can't benefit from register
> > > > preferencing.  This results in RTL like the following:
> > > >
> > > > (insn 32 31 17 2 (parallel [
> > > >             (set (reg:DI 4 si [orig:101 r ] [101])
> > > >                 (mult:DI (reg:DI 1 dx [109])
> > > >                     (reg:DI 5 di [109])))
> > > >             (set (reg:DI 5 di [ r+8 ])
> > > >                 (umul_highpart:DI (reg:DI 1 dx [109])
> > > >                     (reg:DI 5 di [109])))
> > > >         ]) "pr110551-2.c":8:17 -1
> > > >      (nil))
> > > >
> > > > (insn 17 32 9 2 (set (reg:DI 0 ax [107])
> > > >         (reg:DI 5 di [ r+8 ])) "pr110551-2.c":9:40 90 {*movdi_internal}
> > > >      (expr_list:REG_DEAD (reg:DI 5 di [ r+8 ])
> > > >         (nil)))
> > > >
> > > > Here insn 32, the mulx instruction, places its results in si and di,
> > > > and then immediately after decides to move di to ax, with di now dead.
> > > > This can be trivially cleaned up by a peephole2.  I've added an
> > > > additional constraint that the two SET_DESTs can't be the same
> > > > register to avoid confusing the middle-end, but this has well-defined
> > > > behaviour on x86_64/BMI2, encoding a umul_highpart.
> > > >
> > > > For the new test case, compiled on x86_64 with -O2 -march=haswell:
> > > >
> > > > Before:
> > > > mulx64: movabsq $-7046029254386353131, %rdx
> > > >         mulx    %rdi, %rsi, %rdi
> > > >         movq    %rdi, %rax
> > > >         xorq    %rsi, %rax
> > > >         ret
> > > >
> > > > After:
> > > > mulx64: movabsq $-7046029254386353131, %rdx
> > > >         mulx    %rdi, %rsi, %rax
> > > >         xorq    %rsi, %rax
> > > >         ret
> > > >
> > > > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> > > > and make -k check, both with and without --target_board=unix{-m32}
> > > > with no new failures.  Ok for mainline?
> > >
> > > It looks that your previous PR110551 patch regressed -march=cascadelake 
> > > [1].

Actually it is not only on -march=cascadelake, w/o -march=cascadelake will also
fail.

Thx,
Haochen

> > > Let's fix these regressions first.
> > >
> > > [1] https://gcc.gnu.org/pipermail/gcc-patches/2023-October/634660.html
> > >
> > > Uros.
> >
> > This patch fixes that "regression".  Originally, the test case in PR110551 
> > contained
> > one unnecessary mov on "default" x86_targets, but two extra movs on BMI2
> > targets, including -march=haswell and -march=cascadelake.  The first patch
> > eliminated one of these MOVs, this patch eliminates the second.  I'm not 
> > sure
> > that you can call it a regression, the added test failed when run with a 
> > non-standard
> > -march setting.  The good news is that test case doesn't have to be changed 
> > with
> > this patch applied, i.e. the correct intended behaviour is no MOVs on all
> architectures.
> 
> I was not worried about the extra mov, but more about [2], also
> referred from [1], but it looks that that regression was wrongly
> attributed to your patch.
> 
> [2] https://gcc.gnu.org/pipermail/gcc-regression/2023-October/078391.html
> 
> > I'll admit the timing is unusual; I had already written and was regression 
> > testing a
> > patch for the BMI2 issue, when the -march=cascadelake regression tester let 
> > me
> > know it was required for folks that helpfully run the regression suite with 
> > non
> > standard settings.  i.e. a long standing bug that wasn't previously tested 
> > for by
> > the testsuite.
> >
> > > > 2023-10-30  Roger Sayle  <ro...@nextmovesoftware.com>
> > > >
> > > > gcc/ChangeLog
> > > >         PR target/110551
> > > >         * config/i386/i386.md (*bmi2_umul<mode><dwi>3_1): Tidy condition
> > > >         as operands[2] with predicate register_operand must be !MEM_P.
> > > >         (peephole2): Optimize a mulx followed by a register-to-register
> > > >         move, to place result in the correct destination if possible.
> > > >
> > > > gcc/testsuite/ChangeLog
> > > >         PR target/110551
> > > >         * gcc.target/i386/pr110551-2.c: New test case.
> 
> The patch is OK.
> 
> Thanks,
> Uros.

Reply via email to