On Mon, Jun 19, 2023 at 9:45 PM Jakub Jelinek via Gcc <gcc@gcc.gnu.org> wrote:
>
> On Mon, Jun 19, 2023 at 09:10:53PM +0200, André Günther via Gcc wrote:
> > I noticed that a simple function like
> > auto relu( float x ) {
> >     return x > 0.f ? x : 0.f;
> > }
> > compiles to different ASM using GCC11 (or lower) and GCC12 (or higher). On
> > -O3 -mavx2 the former compiles above function to
>
> Such reports should go into gcc.gnu.org/bugzilla/, not to the mailing list,
> if you are convinced that loading the constant from memory is faster.
> Another possibility is
>         vxorps xmm1, xmm1, xmm1
>         vmaxss xmm0, xmm0, xmm1
>         ret
> which doesn't need to wait for the memory.
> This changed with https://gcc.gnu.org/r12-7693

I guess we previously were able to see that one operand of
the comparison was not NaN.  Maybe some REG_EQUAL
note can come to the rescue here?

> >
> > relu(float):
> >     vmaxss xmm0, xmm0, DWORD PTR .LC0[rip]
> >     ret
> > .LC0:
> >     .long 0
> >
> > which is what I would naively expect and what also clang essentially does
> > (clang actually uses an xor before the maxss to get the zero). The latter,
> > however, compiles the function to
> >
> > relu(float):
> >     vxorps xmm1, xmm1, xmm1
> >     vcmpltss xmm2, xmm1, xmm0
> >     vblendvps xmm0, xmm1, xmm0, xmm2
> >     ret
> >
> > which looks like a missed optimisation. Does anyone know if there's a
> > reason for the changed behaviour?
>
>         Jakub
>

Reply via email to