On Sun, Aug 11, 2024 at 12:16 PM Roger Sayle <ro...@nextmovesoftware.com> wrote:
>
>
> This patch resolves PR target/116275, a recent ICE-on-valid regression on
> -m32 caused by my recent change to enable STV of DImode arithmeric right
> shift on non-AVX512VL targets.  The oversight is that the i386 backend
> contains an *extenddi2_doubleword_highpart instruction (whose pattern
> is an arithmetic right shift of a left shift) that optimizes the case where
> sign-extension need only update the highpart word of a DImode value when
> generating 32-bit code (!TARGET_64BIT).  STV accepts this pattern as a
> candidate, as there are patterns to handle this form of extension on SSE
> using AVX512VL instructions (and previously ASHIFTRT was only allowed on
> AVX512VL).  Now that ASHIFTRT is a candidate on non-AVX512vL targets, we
> either need to check that the first operand is a register, or as done
> below provide the define_insn_and_split that provides a non-AVX512VL
> implementation of *extendv2di_highpart_stv.
>
> The new testcase only ICEed with -m32, so this test could be limited to
> target ia32, but there's no harm also running this test on -m64 to
> provide a little extra test coverage.
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32}
> with no new failures.  Ok for mainline?
>
>
> 2024-08-11  Roger Sayle  <ro...@nextmovesoftware.com>
>
> gcc/ChangeLog
>         PR target/116275
>         * config/i386/i386.md (*extendv2di2_highpart_stv_noavx512vl): New
>         define_insn_and_split to handle the STV conversion of the DImode
>         pattern *extenddi2_doubleword_highpart.
>
> gcc/testsuite/ChangeLog
>         PR target/116275
>         * g++.target/i386/pr116275.C: New test case.

+  [(set (match_dup 0)
+ (ashift:V2DI (match_dup 1) (match_dup 2)))
+   (set (match_dup 0)
+ (ashiftrt:V2DI (match_dup 0) (match_dup 2)))])

SInce this pattern is split before reload, you can perhaps introduce a
new V2DI temporary register and use it to output from the first RTX.
This will ease the job of RA a tiny bit.

OK with or without the above suggestion.

Thanks,
Uros.

Reply via email to