On Tue, Jun 20, 2023 at 10:15:37AM +0200, Richard Biener wrote: > On Mon, Jun 19, 2023 at 9:45 PM Jakub Jelinek via Gcc <gcc@gcc.gnu.org> wrote: > > > > On Mon, Jun 19, 2023 at 09:10:53PM +0200, André Günther via Gcc wrote: > > > I noticed that a simple function like > > > auto relu( float x ) { > > > return x > 0.f ? x : 0.f; > > > } > > > compiles to different ASM using GCC11 (or lower) and GCC12 (or higher). On > > > -O3 -mavx2 the former compiles above function to > > > > Such reports should go into gcc.gnu.org/bugzilla/, not to the mailing list, > > if you are convinced that loading the constant from memory is faster. > > Another possibility is > > vxorps xmm1, xmm1, xmm1 > > vmaxss xmm0, xmm0, xmm1 > > ret > > which doesn't need to wait for the memory. > > This changed with https://gcc.gnu.org/r12-7693 > > I guess we previously were able to see that one operand of > the comparison was not NaN. Maybe some REG_EQUAL > note can come to the rescue here?
ce1 pass results in emit_conditional_move with (gt (reg/v:SF 83 [ x ]) (reg:SF 84)), (reg/v:SF 83 [ x ]), (reg:SF 84) operands in the GCC 11 case and so is successfully matched by ix86_expand_fp_movcc as ix86_expand_sse_fp_minmax. But, in GCC 12+, emit_conditional_move is called with (gt (reg/v:SF 83 [ x ]) (reg:SF 84)), (reg/v:SF 83 [ x ]), (const_double:SF 0.0 [0x0.0p+0]) instead (reg:SF 84 in both cases contains the (const_double:SF 0.0 [0x0.0p+0]) value, in the GCC 11 case loaded from memory, in the GCC 12+ case set directly in a move. The reason for the difference is exactly that because cheap SSE constant can be moved directly into a reg, it is done so instead of reusing a pseudo that contains that value already. In the latter case ix86_expand_fp_movcc is called even not with the const_double because the expander doesn't allow immediates, but with it forced into some other register, so it can't really find out it is actually a minmax. Even if it allowed the cheap SSE constants, it wouldn't know that r84 is also zero (unless the expander checks that it is a pseudo with a single setter and verifies it or something similar). Jakub