https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98856

--- Comment #29 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Uroš Bizjak from comment #27)
> (In reply to Richard Biener from comment #26)
> > but that doesn't seem to match for some unknown reason.
> 
> Try this:
> 
> (define_peephole2
>   [(match_scratch:DI 5 "Yv")
>    (set (match_operand:DI 0 "sse_reg_operand")
>         (match_operand:DI 1 "general_reg_operand"))
>    (set (match_operand:V2DI 2 "sse_reg_operand")
>         (vec_concat:V2DI (match_operand:DI 3 "sse_reg_operand")
>                          (match_operand:DI 4 "nonimmediate_gr_operand")))]
>   ""
>   [(set (match_dup 0)
>         (match_dup 1))
>    (set (match_dup 5)
>         (match_dup 4))
>    (set (match_dup 2)
>        (vec_concat:V2DI (match_dup 3)
>                         (match_dup 5)))])

Ah, I messed up operands.  The following works (the above position of
match_scratch happily chooses an operand matching operand 0):

;; Further split pinsrq variants of vec_concatv2di with two GPR sources,
;; one already reloaded, to hide the latency of one GPR->XMM transitions.
(define_peephole2
  [(set (match_operand:DI 0 "sse_reg_operand")
        (match_operand:DI 1 "general_reg_operand"))
   (match_scratch:DI 2 "Yv")
   (set (match_operand:V2DI 3 "sse_reg_operand")
        (vec_concat:V2DI (match_dup 0)
                         (match_operand:DI 4 "nonimmediate_gr_operand")))]
  "reload_completed && optimize_insn_for_speed_p ()"
  [(set (match_dup 0)
        (match_dup 1))
   (set (match_dup 2)
        (match_dup 4))
   (set (match_dup 3)
        (vec_concat:V2DI (match_dup 0)
                         (match_dup 2)))])

but for some reason it again doesn't work for the important loop.  There
we have

  389: xmm0:DI=cx:DI
      REG_DEAD cx:DI
  390: dx:DI=[sp:DI+0x10]
   56: {dx:DI=dx:DI 0>>0x3f;clobber flags:CC;}
      REG_UNUSED flags:CC
   57: xmm0:V2DI=vec_concat(xmm0:DI,dx:DI)

I suppose the reason is that there's two unrelated insns between the
xmm0 = cx:DI and the vec_concat.  Which would hint that we somehow
need to not match this GPR->XMM move in the peephole pattern but
instead somehow in the condition (can we use DF there?)

The simplified variant below works but IMHO matches cases we do not
want to transform.  I can't find any example on how to achieve that
though.

;; Further split pinsrq variants of vec_concatv2di with two GPR sources,
;; one already reloaded, to hide the latency of one GPR->XMM transitions.
(define_peephole2
  [(match_scratch:DI 3 "Yv")
   (set (match_operand:V2DI 0 "sse_reg_operand")
        (vec_concat:V2DI (match_operand:DI 1 "sse_reg_operand")
                         (match_operand:DI 2 "nonimmediate_gr_operand")))]
  "reload_completed && optimize_insn_for_speed_p ()"
  [(set (match_dup 3)
        (match_dup 2))
   (set (match_dup 0)
        (vec_concat:V2DI (match_dup 1)
                         (match_dup 3)))])

Reply via email to