https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98856
--- Comment #26 from Richard Biener <rguenth at gcc dot gnu.org> --- (In reply to rguent...@suse.de from comment #25) > On Fri, 5 Mar 2021, ubizjak at gmail dot com wrote: > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98856 > > > > --- Comment #24 from Uroš Bizjak <ubizjak at gmail dot com> --- > > (In reply to Richard Biener from comment #22) > > > I guess the idea of this insn setup was exactly to get IRA/LRA choose > > > the optimal instruction sequence - otherwise exposing the reload so > > > late is probably suboptimal. > > > > THere is one more tool in the toolbox. A peephole2 pattern can be > > conditionalized on availabe XMM register. So, if XMM reg is available, the > > GPR->XMM move can be emitted in front of the insn. So, if there is XMM > > register > > pressure, pinsrd will be used, but if an XMM register is availabe, it will > > be > > reused to emit punpcklqdq. > > > > The peephole2 pattern can also be conditionalized for targets where GPR->XMM > > moves are fast. > > Note the trick is esp. important when GPR->XMM moves are _slow_. But only > in the case we originally combine two GPR operands. Doing two > GPR->XMM moves and then one puncklqdq hides half of the latency of the > slow moves since they have no data dependence on each other. So for the > peephole we should try to match this - a reloaded operand and a GPR > operand. When the %xmm operand results from a SSE computation there's > no point in splitting out a GPR->XMM move. > > So in the end a peephole2 sounds like it could better match the condition > the transform is profitable on. I tried diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index db5be59f5b7..8d0d3077cf8 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -1419,6 +1419,23 @@ DONE; }) +(define_peephole2 + [(set (match_operand:DI 0 "sse_reg_operand") + (match_operand:DI 1 "general_gr_operand")) + (match_scratch:DI 2 "sse_reg_operand") + (set (match_operand:V2DI 2 "sse_reg_operand") + (vec_concat:V2DI (match_dup:DI 0) + (match_operand:DI 3 "general_gr_operand")))] + "reload_completed" + [(set (match_dup 0) + (match_dup 1)) + (set (match_dup 2) + (match_dup 3)) + (set (match_dup 2) + (vec_concat:V2DI (match_dup 0) + (match_dup 2)))] + "") + ;; Merge movsd/movhpd to movupd for TARGET_SSE_UNALIGNED_LOAD_OPTIMAL targets. (define_peephole2 [(set (match_operand:V2DF 0 "sse_reg_operand") but that doesn't seem to match for some unknown reason.