On Thu, Jul 6, 2023 at 2:04 PM Roger Sayle <ro...@nextmovesoftware.com> wrote: > > > Passing 128-bit integer (TImode) parameters on x86_64 can sometimes > result in surprising code. Consider the example below (from PR 43644): > > __uint128 foo(__uint128 x, unsigned long long y) { > return x+y; > } > > which currently results in 6 consecutive movq instructions: > > foo: movq %rsi, %rax > movq %rdi, %rsi > movq %rdx, %rcx > movq %rax, %rdi > movq %rsi, %rax > movq %rdi, %rdx > addq %rcx, %rax > adcq $0, %rdx > ret > > The underlying issue is that during RTL expansion, we generate the > following initial RTL for the x argument: > > (insn 4 3 5 2 (set (reg:TI 85) > (subreg:TI (reg:DI 86) 0)) "pr43644-2.c":5:1 -1 > (nil)) > (insn 5 4 6 2 (set (subreg:DI (reg:TI 85) 8) > (reg:DI 87)) "pr43644-2.c":5:1 -1 > (nil)) > (insn 6 5 7 2 (set (reg/v:TI 84 [ x ]) > (reg:TI 85)) "pr43644-2.c":5:1 -1 > (nil)) > > which by combine/reload becomes > > (insn 25 3 22 2 (set (reg/v:TI 84 [ x ]) > (const_int 0 [0])) "pr43644-2.c":5:1 -1 > (nil)) > (insn 22 25 23 2 (set (subreg:DI (reg/v:TI 84 [ x ]) 0) > (reg:DI 93)) "pr43644-2.c":5:1 90 {*movdi_internal} > (expr_list:REG_DEAD (reg:DI 93) > (nil))) > (insn 23 22 28 2 (set (subreg:DI (reg/v:TI 84 [ x ]) 8) > (reg:DI 94)) "pr43644-2.c":5:1 90 {*movdi_internal} > (expr_list:REG_DEAD (reg:DI 94) > (nil))) > > where the heavy use of SUBREG SET_DESTs creates challenges for both > combine and register allocation. > > The improvement proposed here is to avoid these problematic SUBREGs > by adding (two) special cases to ix86_expand_move. For insn 4, which > sets a TImode destination from a paradoxical SUBREG, to assign the > lowpart, we can use an explicit zero extension (zero_extendditi2 was > added in July 2022), and for insn 5, which sets the highpart of a > TImode register we can use the *insvti_highpart_1 instruction (that > was added in May 2023, after being approved for stage1 in January). > This allows combine to work its magic, merging these insns into a > *concatditi3 and from there into other optimized forms.
How about we introduce *insvti_lowpart_1, similar to *insvti_highpart_1, in the hope that combine is smart enough to also combine these two instructions? IMO, faking insert to lowpart of the register with zero_extend is a bit overkill, and could hinder some other optimization opportunities (as perhaps hinted by failing testcases). Uros. > So for the test case above, we now generate only a single movq: > > foo: movq %rdx, %rax > xorl %edx, %edx > addq %rdi, %rax > adcq %rsi, %rdx > ret > > But there is a little bad news. This patch causes two (minor) missed > optimization regressions on x86_64; gcc.target/i386/pr82580.c and > gcc.target/i386/pr91681-1.c. As shown in the test case above, we're > no longer generating adcq $0, but instead using xorl. For the other > FAIL, register allocation now has more freedom and is (arbitrarily) > choosing a register assignment that doesn't match what the test is > expecting. These issues are easier to explain and fix once this patch > is in the tree. > > The good news is that this approach fixes a number of long standing > issues, that need to checked in bugzilla, including PR target/110533 > which was just opened/reported earlier this week. > > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap > and make -k check, both with and without --target_board=unix{-m32} > with only the two new FAILs described above. Ok for mainline? > > 2023-07-06 Roger Sayle <ro...@nextmovesoftware.com> > > gcc/ChangeLog > PR target/43644 > PR target/110533 > * config/i386/i386-expand.cc (ix86_expand_move): Convert SETs of > TImode destinations from paradoxical SUBREGs (setting the lowpart) > into explicit zero extensions. Use *insvti_highpart_1 instruction > to set the highpart of a TImode destination. > > gcc/testsuite/ChangeLog > PR target/43644 > PR target/110533 > * gcc.target/i386/pr110533.c: New test case. > * gcc.target/i386/pr43644-2.c: Likewise. > > > Thanks in advance, > Roger > -- >