On Fri, 22 Nov 2024, liuhongt wrote: > It could cause weired spill in RA when register pressure is high. > > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. > Ok for trunk? > > BTW, It's difficult to get a decent testcase for the issue since the spill > is not exposed in simple testcase.
I think it's a good patch independent on the spill issue given it avoids false dependences on the scratch reg contents. While it happens to avoid the problematic spill I don't think it does so by design, "sse_movhlps" still requires two registers and spilling the first input/output will cause a STLF fail for a later reload since the actual store is V2SFmode but the reload will be V4SFmode. That said, it's a good enough fix for the actual regression, just not a full solution for the then theoretical performance issue. Adding $ to the memory alternative didn't fix the regression btw (maybe LRA doesn't honor that). Richard. > gcc/ChangeLog: > > PR target/117562 > * config/i386/sse.md (vec_unpacks_hi_v4sf): Initialize > operands[2] with CONST0_RTX. > --- > gcc/config/i386/sse.md | 5 ++++- > 1 file changed, 4 insertions(+), 1 deletion(-) > > diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md > index 72acd5bde5e..498a42d6e1e 100644 > --- a/gcc/config/i386/sse.md > +++ b/gcc/config/i386/sse.md > @@ -10424,7 +10424,10 @@ (define_expand "vec_unpacks_hi_v4sf" > (match_dup 2) > (parallel [(const_int 0) (const_int 1)]))))] > "TARGET_SSE2" > - "operands[2] = gen_reg_rtx (V4SFmode);") > +{ > + operands[2] = gen_reg_rtx (V4SFmode); > + emit_move_insn (operands[2], CONST0_RTX (V4SFmode)); > +}) > > (define_expand "vec_unpacks_hi_v8sf" > [(set (match_dup 2) > -- Richard Biener <rguent...@suse.de> SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)