https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98856

--- Comment #26 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to rguent...@suse.de from comment #25)
> On Fri, 5 Mar 2021, ubizjak at gmail dot com wrote:
> 
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98856
> > 
> > --- Comment #24 from Uroš Bizjak <ubizjak at gmail dot com> ---
> > (In reply to Richard Biener from comment #22)
> > > I guess the idea of this insn setup was exactly to get IRA/LRA choose
> > > the optimal instruction sequence - otherwise exposing the reload so
> > > late is probably suboptimal.
> > 
> > THere is one more tool in the toolbox. A peephole2 pattern can be
> > conditionalized on availabe XMM register. So, if XMM reg is available, the
> > GPR->XMM move can be emitted in front of the insn. So, if there is XMM 
> > register
> > pressure, pinsrd will be used, but if an XMM register is availabe, it will 
> > be
> > reused to emit punpcklqdq.
> > 
> > The peephole2 pattern can also be conditionalized for targets where GPR->XMM
> > moves are fast.
> 
> Note the trick is esp. important when GPR->XMM moves are _slow_.  But only
> in the case we originally combine two GPR operands.  Doing two
> GPR->XMM moves and then one puncklqdq hides half of the latency of the
> slow moves since they have no data dependence on each other.  So for the
> peephole we should try to match this - a reloaded operand and a GPR
> operand.  When the %xmm operand results from a SSE computation there's
> no point in splitting out a GPR->XMM move.
> 
> So in the end a peephole2 sounds like it could better match the condition
> the transform is profitable on.

I tried

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index db5be59f5b7..8d0d3077cf8 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -1419,6 +1419,23 @@
   DONE;
 })

+(define_peephole2
+  [(set (match_operand:DI 0 "sse_reg_operand")
+        (match_operand:DI 1 "general_gr_operand"))
+   (match_scratch:DI 2 "sse_reg_operand")
+   (set (match_operand:V2DI 2 "sse_reg_operand")
+       (vec_concat:V2DI (match_dup:DI 0)
+                        (match_operand:DI 3 "general_gr_operand")))]
+  "reload_completed"
+  [(set (match_dup 0)
+        (match_dup 1))
+   (set (match_dup 2)
+        (match_dup 3))
+   (set (match_dup 2)
+       (vec_concat:V2DI (match_dup 0)
+                        (match_dup 2)))]
+  "")
+
 ;; Merge movsd/movhpd to movupd for TARGET_SSE_UNALIGNED_LOAD_OPTIMAL targets.
 (define_peephole2
   [(set (match_operand:V2DF 0 "sse_reg_operand")

but that doesn't seem to match for some unknown reason.

Reply via email to