On Fri, 22 Nov 2024, liuhongt wrote:

> It could cause weired spill in RA when register pressure is high.
> 
> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> Ok for trunk?
> 
> BTW, It's difficult to get a decent testcase for the issue since the spill
> is not exposed in simple testcase.

I think it's a good patch independent on the spill issue given it
avoids false dependences on the scratch reg contents.  While it happens
to avoid the problematic spill I don't think it does so by design,
"sse_movhlps" still requires two registers and spilling the
first input/output will cause a STLF fail for a later reload since
the actual store is V2SFmode but the reload will be V4SFmode.

That said, it's a good enough fix for the actual regression, just
not a full solution for the then theoretical performance issue.
Adding $ to the memory alternative didn't fix the regression btw
(maybe LRA doesn't honor that).

Richard.

> gcc/ChangeLog:
> 
>       PR target/117562
>       * config/i386/sse.md (vec_unpacks_hi_v4sf): Initialize
>       operands[2] with CONST0_RTX.
> ---
>  gcc/config/i386/sse.md | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> index 72acd5bde5e..498a42d6e1e 100644
> --- a/gcc/config/i386/sse.md
> +++ b/gcc/config/i386/sse.md
> @@ -10424,7 +10424,10 @@ (define_expand "vec_unpacks_hi_v4sf"
>         (match_dup 2)
>         (parallel [(const_int 0) (const_int 1)]))))]
>    "TARGET_SSE2"
> -  "operands[2] = gen_reg_rtx (V4SFmode);")
> +{
> +  operands[2] = gen_reg_rtx (V4SFmode);
> +  emit_move_insn (operands[2], CONST0_RTX (V4SFmode));
> +})
>  
>  (define_expand "vec_unpacks_hi_v8sf"
>    [(set (match_dup 2)
> 

-- 
Richard Biener <rguent...@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

Reply via email to