On Mon, Jun 19, 2023 at 9:45 PM Jakub Jelinek via Gcc <gcc@gcc.gnu.org> wrote: > > On Mon, Jun 19, 2023 at 09:10:53PM +0200, André Günther via Gcc wrote: > > I noticed that a simple function like > > auto relu( float x ) { > > return x > 0.f ? x : 0.f; > > } > > compiles to different ASM using GCC11 (or lower) and GCC12 (or higher). On > > -O3 -mavx2 the former compiles above function to > > Such reports should go into gcc.gnu.org/bugzilla/, not to the mailing list, > if you are convinced that loading the constant from memory is faster. > Another possibility is > vxorps xmm1, xmm1, xmm1 > vmaxss xmm0, xmm0, xmm1 > ret > which doesn't need to wait for the memory. > This changed with https://gcc.gnu.org/r12-7693
I guess we previously were able to see that one operand of the comparison was not NaN. Maybe some REG_EQUAL note can come to the rescue here? > > > > relu(float): > > vmaxss xmm0, xmm0, DWORD PTR .LC0[rip] > > ret > > .LC0: > > .long 0 > > > > which is what I would naively expect and what also clang essentially does > > (clang actually uses an xor before the maxss to get the zero). The latter, > > however, compiles the function to > > > > relu(float): > > vxorps xmm1, xmm1, xmm1 > > vcmpltss xmm2, xmm1, xmm0 > > vblendvps xmm0, xmm1, xmm0, xmm2 > > ret > > > > which looks like a missed optimisation. Does anyone know if there's a > > reason for the changed behaviour? > > Jakub >