On Wed, Feb 10, 2021 at 4:12 PM Jakub Jelinek <ja...@redhat.com> wrote: > > Hi! > > In these patterns, we call simplify_gen_subreg on the input operand > to create paradoxical subregs that have 2x, 4x or 8x elements as the input > operand. That works fine if the input operand is a REG, but when it is a > SUBREG, RTL doesn't allow SUBREG of SUBREG and so relies on simplify_subreg > actually simplifying it. And e.g. if the input operand is a SUBREG that > changes the element mode (floating vs. non-floating) and then combined with > a paradoxical subreg (i.e. different size) this can easily fail, then > simplify_gen_subreg returns NULL but we still use it in instructions. > > Fixed by forcing the operands into REG. > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? > > 2021-02-10 Jakub Jelinek <ja...@redhat.com> > > PR target/99025 > * config/i386/sse.md (fix<fixunssuffix>_truncv2sfv2di2, > <insn>v8qiv8hi2, <insn>v8qiv8si2, <insn>v4qiv4si2, <insn>v4hiv4si2, > <insn>v8qiv8di2, <insn>v4qiv4di2, <insn>v2qiv2di2, <insn>v4hiv4di2, > <insn>v2hiv2di2, <insn>v2siv2di2): Force operands[1] into REG before > calling simplify_gen_subreg on it. > > * gcc.target/i386/pr99025.c: New test.
OK. Thanks, Uros. > > --- gcc/config/i386/sse.md.jj 2021-02-10 07:52:32.673901634 +0100 > +++ gcc/config/i386/sse.md 2021-02-10 10:57:37.229665371 +0100 > @@ -6356,6 +6356,7 @@ (define_expand "fix<fixunssuffix>_truncv > (match_operand:V2SF 1 "register_operand")))] > "TARGET_AVX512DQ && TARGET_AVX512VL" > { > + operands[1] = force_reg (V2SFmode, operands[1]); > operands[1] = simplify_gen_subreg (V4SFmode, operands[1], V2SFmode, 0); > emit_insn (gen_avx512dq_fix<fixunssuffix>_truncv2sfv2di2 > (operands[0], operands[1])); > @@ -18013,6 +18014,7 @@ (define_expand "<insn>v8qiv8hi2" > { > if (!MEM_P (operands[1])) > { > + operands[1] = force_reg (V8QImode, operands[1]); > operands[1] = simplify_gen_subreg (V16QImode, operands[1], V8QImode, > 0); > emit_insn (gen_sse4_1_<code>v8qiv8hi2 (operands[0], operands[1])); > DONE; > @@ -18090,6 +18092,7 @@ (define_expand "<insn>v8qiv8si2" > { > if (!MEM_P (operands[1])) > { > + operands[1] = force_reg (V8QImode, operands[1]); > operands[1] = simplify_gen_subreg (V16QImode, operands[1], V8QImode, > 0); > emit_insn (gen_avx2_<code>v8qiv8si2 (operands[0], operands[1])); > DONE; > @@ -18153,6 +18156,7 @@ (define_expand "<insn>v4qiv4si2" > { > if (!MEM_P (operands[1])) > { > + operands[1] = force_reg (V4QImode, operands[1]); > operands[1] = simplify_gen_subreg (V16QImode, operands[1], V4QImode, > 0); > emit_insn (gen_sse4_1_<code>v4qiv4si2 (operands[0], operands[1])); > DONE; > @@ -18279,6 +18283,7 @@ (define_expand "<insn>v4hiv4si2" > { > if (!MEM_P (operands[1])) > { > + operands[1] = force_reg (V4HImode, operands[1]); > operands[1] = simplify_gen_subreg (V8HImode, operands[1], V4HImode, 0); > emit_insn (gen_sse4_1_<code>v4hiv4si2 (operands[0], operands[1])); > DONE; > @@ -18366,6 +18371,7 @@ (define_expand "<insn>v8qiv8di2" > { > if (!MEM_P (operands[1])) > { > + operands[1] = force_reg (V8QImode, operands[1]); > operands[1] = simplify_gen_subreg (V16QImode, operands[1], V8QImode, > 0); > emit_insn (gen_avx512f_<code>v8qiv8di2 (operands[0], operands[1])); > DONE; > @@ -18427,6 +18433,7 @@ (define_expand "<insn>v4qiv4di2" > { > if (!MEM_P (operands[1])) > { > + operands[1] = force_reg (V8QImode, operands[1]); > operands[1] = simplify_gen_subreg (V16QImode, operands[1], V8QImode, > 0); > emit_insn (gen_avx2_<code>v4qiv4di2 (operands[0], operands[1])); > DONE; > @@ -18453,6 +18460,7 @@ (define_expand "<insn>v2qiv2di2" > (match_operand:V2QI 1 "register_operand")))] > "TARGET_SSE4_1" > { > + operands[1] = force_reg (V2QImode, operands[1]); > operands[1] = simplify_gen_subreg (V16QImode, operands[1], V2QImode, 0); > emit_insn (gen_sse4_1_<code>v2qiv2di2 (operands[0], operands[1])); > DONE; > @@ -18525,6 +18533,7 @@ (define_expand "<insn>v4hiv4di2" > { > if (!MEM_P (operands[1])) > { > + operands[1] = force_reg (V4HImode, operands[1]); > operands[1] = simplify_gen_subreg (V8HImode, operands[1], V4HImode, 0); > emit_insn (gen_avx2_<code>v4hiv4di2 (operands[0], operands[1])); > DONE; > @@ -18586,6 +18595,7 @@ (define_expand "<insn>v2hiv2di2" > { > if (!MEM_P (operands[1])) > { > + operands[1] = force_reg (V2HImode, operands[1]); > operands[1] = simplify_gen_subreg (V8HImode, operands[1], V2HImode, 0); > emit_insn (gen_sse4_1_<code>v2hiv2di2 (operands[0], operands[1])); > DONE; > @@ -18737,6 +18747,7 @@ (define_expand "<insn>v2siv2di2" > { > if (!MEM_P (operands[1])) > { > + operands[1] = force_reg (V2SImode, operands[1]); > operands[1] = simplify_gen_subreg (V4SImode, operands[1], V2SImode, 0); > emit_insn (gen_sse4_1_<code>v2siv2di2 (operands[0], operands[1])); > DONE; > --- gcc/testsuite/gcc.target/i386/pr99025.c.jj 2021-02-09 19:17:29.705924814 > +0100 > +++ gcc/testsuite/gcc.target/i386/pr99025.c 2021-02-09 19:17:10.687137688 > +0100 > @@ -0,0 +1,17 @@ > +/* PR target/99025 */ > +/* { dg-do compile } */ > +/* { dg-options "-O3 -msse4" } */ > + > +long v[16]; > +int w; > +union U { float u; int r; } x; > + > +void > +foo (float y) > +{ > + union U z; > + x.u = w; > + v[5] = x.r; > + z.u = y; > + v[6] = z.r; > +} > > Jakub >