On Thu, Jul 6, 2023 at 2:04 PM Roger Sayle <ro...@nextmovesoftware.com> wrote:
>
>
> Passing 128-bit integer (TImode) parameters on x86_64 can sometimes
> result in surprising code.  Consider the example below (from PR 43644):
>
> __uint128 foo(__uint128 x, unsigned long long y) {
>   return x+y;
> }
>
> which currently results in 6 consecutive movq instructions:
>
> foo:    movq    %rsi, %rax
>         movq    %rdi, %rsi
>         movq    %rdx, %rcx
>         movq    %rax, %rdi
>         movq    %rsi, %rax
>         movq    %rdi, %rdx
>         addq    %rcx, %rax
>         adcq    $0, %rdx
>         ret
>
> The underlying issue is that during RTL expansion, we generate the
> following initial RTL for the x argument:
>
> (insn 4 3 5 2 (set (reg:TI 85)
>         (subreg:TI (reg:DI 86) 0)) "pr43644-2.c":5:1 -1
>      (nil))
> (insn 5 4 6 2 (set (subreg:DI (reg:TI 85) 8)
>         (reg:DI 87)) "pr43644-2.c":5:1 -1
>      (nil))
> (insn 6 5 7 2 (set (reg/v:TI 84 [ x ])
>         (reg:TI 85)) "pr43644-2.c":5:1 -1
>      (nil))
>
> which by combine/reload becomes
>
> (insn 25 3 22 2 (set (reg/v:TI 84 [ x ])
>         (const_int 0 [0])) "pr43644-2.c":5:1 -1
>      (nil))
> (insn 22 25 23 2 (set (subreg:DI (reg/v:TI 84 [ x ]) 0)
>         (reg:DI 93)) "pr43644-2.c":5:1 90 {*movdi_internal}
>      (expr_list:REG_DEAD (reg:DI 93)
>         (nil)))
> (insn 23 22 28 2 (set (subreg:DI (reg/v:TI 84 [ x ]) 8)
>         (reg:DI 94)) "pr43644-2.c":5:1 90 {*movdi_internal}
>      (expr_list:REG_DEAD (reg:DI 94)
>         (nil)))
>
> where the heavy use of SUBREG SET_DESTs creates challenges for both
> combine and register allocation.
>
> The improvement proposed here is to avoid these problematic SUBREGs
> by adding (two) special cases to ix86_expand_move.  For insn 4, which
> sets a TImode destination from a paradoxical SUBREG, to assign the
> lowpart, we can use an explicit zero extension (zero_extendditi2 was
> added in July 2022), and for insn 5, which sets the highpart of a
> TImode register we can use the *insvti_highpart_1 instruction (that
> was added in May 2023, after being approved for stage1 in January).
> This allows combine to work its magic, merging these insns into a
> *concatditi3 and from there into other optimized forms.

How about we introduce *insvti_lowpart_1, similar to
*insvti_highpart_1, in the hope that combine is smart enough to also
combine these two instructions? IMO, faking insert to lowpart of the
register with zero_extend is a bit overkill, and could hinder some
other optimization opportunities (as perhaps hinted by failing
testcases).

Uros.

> So for the test case above, we now generate only a single movq:
>
> foo:    movq    %rdx, %rax
>         xorl    %edx, %edx
>         addq    %rdi, %rax
>         adcq    %rsi, %rdx
>         ret
>
> But there is a little bad news.  This patch causes two (minor) missed
> optimization regressions on x86_64; gcc.target/i386/pr82580.c and
> gcc.target/i386/pr91681-1.c.  As shown in the test case above, we're
> no longer generating adcq $0, but instead using xorl.  For the other
> FAIL, register allocation now has more freedom and is (arbitrarily)
> choosing a register assignment that doesn't match what the test is
> expecting.  These issues are easier to explain and fix once this patch
> is in the tree.
>
> The good news is that this approach fixes a number of long standing
> issues, that need to checked in bugzilla, including PR target/110533
> which was just opened/reported earlier this week.
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32}
> with only the two new FAILs described above.  Ok for mainline?
>
> 2023-07-06  Roger Sayle  <ro...@nextmovesoftware.com>
>
> gcc/ChangeLog
>         PR target/43644
>         PR target/110533
>         * config/i386/i386-expand.cc (ix86_expand_move): Convert SETs of
>         TImode destinations from paradoxical SUBREGs (setting the lowpart)
>         into explicit zero extensions.  Use *insvti_highpart_1 instruction
>         to set the highpart of a TImode destination.
>
> gcc/testsuite/ChangeLog
>         PR target/43644
>         PR target/110533
>         * gcc.target/i386/pr110533.c: New test case.
>         * gcc.target/i386/pr43644-2.c: Likewise.
>
>
> Thanks in advance,
> Roger
> --
>

Reply via email to