XTS is slower by ~17% since r11-6649-g285fa338b06b804e72997c4d876ecf08a9c083af

rguenther at suse dot de via Gcc-bugs Fri, 05 Mar 2021 04:55:34 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98856


--- Comment #32 from rguenther at suse dot de <rguenther at suse dot de> ---
On Fri, 5 Mar 2021, ubizjak at gmail dot com wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98856
> 
> --- Comment #31 from Uroš Bizjak <ubizjak at gmail dot com> ---
> (In reply to Richard Biener from comment #29)
> > The simplified variant below works but IMHO matches cases we do not
> > want to transform.  I can't find any example on how to achieve that
> > though.
> 
> I think that pinsrd should be transformed to punpcklqdq irrespective of its
> first input operand. The insn scheduler should move insns around to mask their
> latencies.
> 
> > ;; Further split pinsrq variants of vec_concatv2di with two GPR sources,
> > ;; one already reloaded, to hide the latency of one GPR->XMM transitions.
> > (define_peephole2
> >   [(match_scratch:DI 3 "Yv")
> >    (set (match_operand:V2DI 0 "sse_reg_operand")
> >         (vec_concat:V2DI (match_operand:DI 1 "sse_reg_operand")
> >                          (match_operand:DI 2 "nonimmediate_gr_operand")))]
> >   "reload_completed && optimize_insn_for_speed_p ()"
> 
> Please use
> 
>   "TARGET_64BIT && TARGET_SSE4_1
>    && !optimize_insn_for_size_p ()"
> 
> here.

what about reload_completed?  We really only want to do this after RA.

Will test the patch then and add the reduced testcase.

[Bug tree-optimization/98856] [11 Regression] botan AES-128/XTS is slower by ~17% since r11-6649-g285fa338b06b804e72997c4d876ecf08a9c083af

Reply via email to