On Fri, 23 Aug 2019, Uros Bizjak wrote: > On Fri, Aug 23, 2019 at 1:52 PM Richard Biener <rguent...@suse.de> wrote: > > > > On Fri, 23 Aug 2019, Uros Bizjak wrote: > > > > > This is currently a heads-up patch that removes the minimum limitation > > > of cost of moves to/from XMM reg. The immediate benefit is the removal > > > of mismatched spills, caused by subreg usage. > > > > > > *If* the patch proves to be beneficial (as in "doesn't regress > > > important benchmarks"), then we should be able to un-hide the > > > inter-regset moves from RA and allow it to collapse some moves. As an > > > example, patched compiler removes a movd in gcc.target/i386/minmax-6.c > > > and still avoids mismatched spill. > > > > > > 2019-08-23 Uroš Bizjak <ubiz...@gmail.com> > > > > > > * config/i386/i386.c (ix86_register_move_cost): Do not > > > limit the cost of moves to/from XMM register to minimum 8. > > > * config/i386/i386-features.c > > > (general_scalar_chain::make_vector_copies): Do not generate > > > zeroing move from GPR to XMM register, use gen_move_insn > > > instead of gen_gpr_to_xmm_move_src. > > > (general_scalar_chain::convert_op): Ditto. > > > (gen_gpr_to_xmm_move_src): Remove. > > > > > > The patch was bootstrapped and regression tested on x86_64-linux-gnu > > > {,-m32}, configured w/ and w/o -with-arch=ivybridge. > > > > > > The patch regresses PR80481 scan-asm-not (where the compiler generates > > > unrelated XMM spill on register starved x86_32). However, during the > > > analysis, I found that the original issue is not fixed, and is still > > > visible without -funrol-loops [1]. > > > > > > [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80481#c10 > > > > > > So, I'd wait for the HJ's benchmark results of the cost to/from XMM > > > change before proceeding with the patch. > > > > Otherwise looks good to me, it might clash (whitespace wise) > > with my STV "rewrite" though. > > > > We might want to adjust ix86_register_move_cost separately from > > the STV change to use regular moves though? Just to make bisection > > point to either of the two - STV "fallout"" is probably minor > > compared to fallout elsewhere... > > Yes, this is also my plan.
Btw, when testing w/ the costing disabled I run into (insn 31 7 8 2 (set (subreg:V4SI (reg:SI 107) 0) (vec_merge:V4SI (vec_duplicate:V4SI (mem/c:SI (const:DI (plus:DI (symbol_ref:DI ("peakbuf") [flags 0x2] <var_decl 0x7efe8dbfeab0 peakbuf>) (const_int 400020000 [0x17d7d220]))) [1 peakbuf.PeaksInBuf+0 S4 A64])) (const_vector:V4SI [ (const_int 0 [0]) repeated x4 ]) (const_int 1 [0x1]))) being not recognized (for the large immediate I guess). So when just doing (set (reg:SI 107) (mem:SI ...)) here we expect IRA to be able to allocate 107 into xmm, realizing it needs to reload the RHS first? For current code, is testing x86_64_general_operand like in the following the correct thing to do? Thanks, Richard. Index: config/i386/i386-features.c =================================================================== --- config/i386/i386-features.c (revision 274926) +++ config/i386/i386-features.c (working copy) @@ -812,9 +812,15 @@ general_scalar_chain::convert_op (rtx *o else if (MEM_P (*op)) { rtx tmp = gen_reg_rtx (GET_MODE (*op)); + rtx tmp2 = *op; + if (!x86_64_general_operand (*op, smode)) + { + tmp2 = gen_reg_rtx (smode); + emit_insn_before (gen_move_insn (tmp2, *op), insn); + } emit_insn_before (gen_rtx_SET (gen_rtx_SUBREG (vmode, tmp, 0), - gen_gpr_to_xmm_move_src (vmode, *op)), + gen_gpr_to_xmm_move_src (vmode, tmp2)), insn); *op = gen_rtx_SUBREG (vmode, tmp, 0);