https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98856
--- Comment #32 from rguenther at suse dot de <rguenther at suse dot de> --- On Fri, 5 Mar 2021, ubizjak at gmail dot com wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98856 > > --- Comment #31 from Uroš Bizjak <ubizjak at gmail dot com> --- > (In reply to Richard Biener from comment #29) > > The simplified variant below works but IMHO matches cases we do not > > want to transform. I can't find any example on how to achieve that > > though. > > I think that pinsrd should be transformed to punpcklqdq irrespective of its > first input operand. The insn scheduler should move insns around to mask their > latencies. > > > ;; Further split pinsrq variants of vec_concatv2di with two GPR sources, > > ;; one already reloaded, to hide the latency of one GPR->XMM transitions. > > (define_peephole2 > > [(match_scratch:DI 3 "Yv") > > (set (match_operand:V2DI 0 "sse_reg_operand") > > (vec_concat:V2DI (match_operand:DI 1 "sse_reg_operand") > > (match_operand:DI 2 "nonimmediate_gr_operand")))] > > "reload_completed && optimize_insn_for_speed_p ()" > > Please use > > "TARGET_64BIT && TARGET_SSE4_1 > && !optimize_insn_for_size_p ()" > > here. what about reload_completed? We really only want to do this after RA. Will test the patch then and add the reduced testcase.