https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61810

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |hjl.tools at gmail dot com

--- Comment #7 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Andrew Pinski from comment #6)
> https://gcc.gnu.org/pipermail/gcc-patches/2021-August/577192.html

On current trunk x86_64 that gets

FAIL: gcc.target/i386/extract-insert-combining.c scan-assembler-times
(?:vmovd|movd)[ \\\\t]+[^{\\n]*%xmm[0-9] 3
FAIL: gcc.target/i386/extract-insert-combining.c scan-assembler-times
(?:vpinsrd|pinsrd)[ \\\\t]+[^{\\n]*%xmm[0-9] 1
FAIL: gcc.target/i386/pr104441-1b.c execution test
FAIL: gcc.target/i386/pr98335.c scan-assembler movzbl
FAIL: gcc.target/i386/pr98335.c scan-assembler-not movb

FAIL: gnat.dg/sso8.adb execution test

FAIL: libgomp.c/loop-19.c execution test


FAILs can be reproduced in an unpatched tree with specifying
-fdisable-rtl-init-regs

Assembly difference for gcc.target/i386/pr104441-1b.c is (besides RA):

-       vpxor   %xmm1, %xmm1, %xmm1
+       vpinsrd $1, (%rax,%r10), %xmm5, %xmm1
+       vpinsrd $1, (%rdx,%r9), %xmm4, %xmm3
        vmovd   (%rax), %xmm0
-       vpxor   %xmm2, %xmm2, %xmm2
        addl    $4, %ecx
-       vpinsrd $1, (%rax,%r10), %xmm1, %xmm1
-       vpinsrd $1, (%rdx,%r9), %xmm2, %xmm2

adding initialization in compute4x_m_sad_avx2_intrin of reg 109 at in block 4
for insn 33.
adding initialization in compute4x_m_sad_avx2_intrin of reg 99 at in block 4
for insn 48.

where we have for example

-(insn 97 31 98 4 (clobber (reg/v:V2DI 109 [ src23 ]))
"/home/rguenther/obj-gcc4-g/gcc/include/smmintrin.h":408:20 -1
-     (nil))
-(insn 98 97 33 4 (set (reg/v:V2DI 109 [ src23 ])
-        (const_vector:V2DI [
-                (const_int 0 [0]) repeated x2
-            ])) "/home/rguenther/obj-gcc4-g/gcc/include/smmintrin.h":408:20 -1
-     (nil))
-(insn 33 98 36 4 (set (reg:V4SI 138 [ src23 ])
+(insn 33 31 36 4 (set (reg:V4SI 138 [ src23 ])
         (vec_merge:V4SI (vec_duplicate:V4SI (reg:SI 137 [ MEM[(int32_t
*)src_62 + _41 * 1] ]))
             (subreg:V4SI (reg/v:V2DI 109 [ src23 ]) 0)
             (const_int 2 [0x2])))
"/home/rguenther/obj-gcc4-g/gcc/include/smmintrin.h":408:20 6925
{sse4_1_pinsrd}

where this produces { undef, MEM, undef, undef } without init-regs

But it looks like the testcase is broken:

__attribute__((always_inline, target("avx2")))
static __m256i
load8bit_4x4_avx2(const uint8_t *const src, const uint32_t stride)
{ 
  __m128i src01, src23;
  src01 = _mm_cvtsi32_si128(*(int32_t*)(src + 0 * stride));
  src23 = _mm_insert_epi32(src23, *(int32_t *)(src + 3 * stride), 1);
  return _mm256_setr_m128i(src01, src23);
}

it seems to expect that src23 is zero before inserting the data?

Reply via email to