This patch resolves the second part of PR target/112992, building upon Hongtao Liu's solution to the first part.
The issue addressed by this patch is that when initializing vectors by broadcasting integer constants, the compiler has the flexibility to select the most appropriate vector mode to perform the broadcast, as long as the resulting vector has an identical bit pattern. For example, the following constants are all equivalent: V4SImode {0x01010101, 0x01010101, 0x01010101, 0x01010101 } V8HImode {0x0101, 0x0101, 0x0101, 0x0101, 0x0101, 0x0101, 0x0101, 0x0101 } V16QImode {0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, ... 0x01 } So instruction sequences that construct any of these can be used to construct the others (with a suitable cast/SUBREG). On x86_64, it turns out that broadcasts of SImode constants are preferred, as DImode constants often require a longer movabs instruction, and HImode and QImode broadcasts require multiple uops on some architectures. Hence, SImode is always the equal shortest/fastest implementation. Examples of this improvement, can be seen in the testsuite. gcc.target/i386/pr102021.c Before: 0: 48 b8 0c 00 0c 00 0c movabs $0xc000c000c000c,%rax 7: 00 0c 00 a: 62 f2 fd 28 7c c0 vpbroadcastq %rax,%ymm0 10: c3 retq After: 0: b8 0c 00 0c 00 mov $0xc000c,%eax 5: 62 f2 7d 28 7c c0 vpbroadcastd %eax,%ymm0 b: c3 retq and gcc.target/i386/pr90773-17.c: Before: 0: 48 8b 15 00 00 00 00 mov 0x0(%rip),%rdx # 7 <foo+0x7> 7: b8 0c 00 00 00 mov $0xc,%eax c: 62 f2 7d 08 7a c0 vpbroadcastb %eax,%xmm0 12: 62 f1 7f 08 7f 02 vmovdqu8 %xmm0,(%rdx) 18: c7 42 0f 0c 0c 0c 0c movl $0xc0c0c0c,0xf(%rdx) 1f: c3 retq After: 0: 48 8b 15 00 00 00 00 mov 0x0(%rip),%rdx # 7 <foo+0x7> 7: b8 0c 0c 0c 0c mov $0xc0c0c0c,%eax c: 62 f2 7d 08 7c c0 vpbroadcastd %eax,%xmm0 12: 62 f1 7f 08 7f 02 vmovdqu8 %xmm0,(%rdx) 18: c7 42 0f 0c 0c 0c 0c movl $0xc0c0c0c,0xf(%rdx) 1f: c3 retq where according to Agner Fog's instruction tables broadcastd is slightly faster on some microarchitectures, for example Knight's Landing. This patch has been tested on x86_64-pc-linux-gnu with make bootstrap and make -k check, both with and without --target_board=unix{-m32} with no new failures. Ok for mainline? 2023-12-21 Roger Sayle <ro...@nextmovesoftware.com> gcc/ChangeLog PR target/112992 * config/i386/i386-expand.cc (ix86_convert_const_wide_int_to_broadcast): Allow call to ix86_expand_vector_init_duplicate to fail, and return NULL_RTX. (ix86_broadcast_from_constant): Revert recent change; Return a suitable MEMREF independently of mode/target combinations. (ix86_expand_vector_move): Allow ix86_expand_vector_init_duplicate to decide whether expansion is possible/preferrable. Only try forcing DImode constants to memory (and trying again) if calling ix86_expand_vector_init_duplicate fails with an DImode immediate constant. (ix86_expand_vector_init_duplicate) <case E_V2DImode>: Try using V4SImode for suitable immediate constants. <case E_V4DImode>: Try using V8SImode for suitable constants. <case E_V4SImode>: Use constant pool for AVX without AVX2. <case E_V4HImode>: Fail for CONST_INT_P, i.e. use constant pool. <case E_V2HImode>: Likewise. <case E_V8HImode>: For CONST_INT_P try using V4SImode via widen. <case E_V16QImode>: For CONT_INT_P try using V8HImode via widen. <label widen>: Handle CONT_INTs via simplify_binary_operation. Allow recursive calls to ix86_expand_vector_init_duplicate to fail. <case E_V16HImode>: For CONST_INT_P try V8SImode via widen. <case E_V32QImode>: For CONST_INT_P try V16HImode via widen. (ix86_expand_vector_init): Move try using a broadcast for all_same with ix86_expand_vector_init_duplicate before using constant pool. gcc/testsuite/ChangeLog * gcc.target/i386/avx512f-broadcast-pr87767-1.c: Update test case. * gcc.target/i386/avx512f-broadcast-pr87767-5.c: Likewise. * gcc.target/i386/avx512fp16-13.c: Likewise. * gcc.target/i386/avx512vl-broadcast-pr87767-1.c: Likewise. * gcc.target/i386/avx512vl-broadcast-pr87767-5.c: Likewise. * gcc.target/i386/pr100865-10a.c: Likewise. * gcc.target/i386/pr100865-10b.c: Likewise. * gcc.target/i386/pr100865-11c.c: Likewise. * gcc.target/i386/pr100865-12c.c: Likewise. * gcc.target/i386/pr100865-2.c: Likewise. * gcc.target/i386/pr100865-3.c: Likewise. * gcc.target/i386/pr100865-4a.c: Likewise. * gcc.target/i386/pr100865-4b.c: Likewise. * gcc.target/i386/pr100865-5a.c: Likewise. * gcc.target/i386/pr100865-5b.c: Likewise. * gcc.target/i386/pr100865-9a.c: Likewise. * gcc.target/i386/pr100865-9b.c: Likewise. * gcc.target/i386/pr102021.c: Likewise. * gcc.target/i386/pr90773-17.c: Likewise. Thanks in advance, Roger --
diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc index 57a108a..b0113b5 100644 --- a/gcc/config/i386/i386-expand.cc +++ b/gcc/config/i386/i386-expand.cc @@ -352,7 +352,8 @@ ix86_convert_const_wide_int_to_broadcast (machine_mode mode, rtx op) bool ok = ix86_expand_vector_init_duplicate (false, vector_mode, target, GEN_INT (val_broadcast)); - gcc_assert (ok); + if (!ok) + return nullptr; target = lowpart_subreg (mode, target, vector_mode); return target; } @@ -599,19 +600,11 @@ ix86_broadcast_from_constant (machine_mode mode, rtx op) && INTEGRAL_MODE_P (mode)) return nullptr; - unsigned int msize = GET_MODE_SIZE (mode); - unsigned int inner_size = GET_MODE_SIZE (GET_MODE_INNER ((mode))); - /* Convert CONST_VECTOR to a non-standard SSE constant integer broadcast only if vector broadcast is available. */ if (standard_sse_constant_p (op, mode)) return nullptr; - /* vpbroadcast[b,w] is available under TARGET_AVX2. - or TARGET_AVX512BW for zmm. */ - if (inner_size < 4 && !(msize == 64 ? TARGET_AVX512BW : TARGET_AVX2)) - return nullptr; - if (GET_MODE_INNER (mode) == TImode) return nullptr; @@ -705,22 +698,22 @@ ix86_expand_vector_move (machine_mode mode, rtx operands[]) { /* Broadcast to XMM/YMM/ZMM register from an integer constant or scalar mem. */ - op1 = gen_reg_rtx (mode); - if (FLOAT_MODE_P (mode) - || (!TARGET_64BIT && GET_MODE_INNER (mode) == DImode) - /* vbroadcastss/vbroadcastsd only supports memory operand - w/o AVX2, force them into memory to avoid spill to - memory. */ - || (GET_MODE_SIZE (mode) == 32 - && (GET_MODE_INNER (mode) == DImode - || GET_MODE_INNER (mode) == SImode) - && !TARGET_AVX2)) + rtx tmp = gen_reg_rtx (mode); + if (FLOAT_MODE_P (mode)) first = force_const_mem (GET_MODE_INNER (mode), first); bool ok = ix86_expand_vector_init_duplicate (false, mode, - op1, first); - gcc_assert (ok); - emit_move_insn (op0, op1); - return; + tmp, first); + if (!ok && !TARGET_64BIT && GET_MODE_INNER (mode) == DImode) + { + first = force_const_mem (GET_MODE_INNER (mode), first); + ok = ix86_expand_vector_init_duplicate (false, mode, + tmp, first); + } + if (ok) + { + emit_move_insn (op0, tmp); + return; + } } } @@ -15697,6 +15690,56 @@ ix86_expand_vector_init_duplicate (bool mmx_ok, machine_mode mode, switch (mode) { + case E_V2DImode: + if (CONST_INT_P (val)) + { + int tmp = (int)INTVAL (val); + if (tmp == (int)(INTVAL (val) >> 32)) + { + rtx reg = gen_reg_rtx (V4SImode); + ok = ix86_vector_duplicate_value (V4SImode, reg, + GEN_INT (tmp)); + if (ok) + { + emit_move_insn (target, gen_lowpart (V2DImode, reg)); + return true; + } + } + if (!TARGET_AVX) + return false; + if (!TARGET_AVX2) + val = force_const_mem (DImode, val); + } + return ix86_vector_duplicate_value (mode, target, val); + + case E_V4DImode: + if (CONST_INT_P (val)) + { + int tmp = (int)INTVAL (val); + if (tmp == (int)(INTVAL (val) >> 32)) + { + rtx reg = gen_reg_rtx (V8SImode); + ok = ix86_vector_duplicate_value (V8SImode, reg, + GEN_INT (tmp)); + if (ok) + { + emit_move_insn (target, gen_lowpart (V4DImode, reg)); + return true; + } + } + } + return ix86_vector_duplicate_value (mode, target, val); + + case E_V4SImode: + if (CONST_INT_P (val)) + { + if (!TARGET_AVX) + return false; + if (!TARGET_AVX2) + val = force_const_mem (SImode, val); + } + return ix86_vector_duplicate_value (mode, target, val); + case E_V2SImode: case E_V2SFmode: if (!mmx_ok) @@ -15704,13 +15747,10 @@ ix86_expand_vector_init_duplicate (bool mmx_ok, machine_mode mode, /* FALLTHRU */ case E_V4DFmode: - case E_V4DImode: case E_V8SFmode: case E_V8SImode: case E_V2DFmode: - case E_V2DImode: case E_V4SFmode: - case E_V4SImode: case E_V16SImode: case E_V8DImode: case E_V16SFmode: @@ -15725,6 +15765,8 @@ ix86_expand_vector_init_duplicate (bool mmx_ok, machine_mode mode, rtx x; val = gen_lowpart (SImode, val); + if (CONST_INT_P (val)) + return false; x = gen_rtx_TRUNCATE (HImode, val); x = gen_rtx_VEC_DUPLICATE (mode, x); emit_insn (gen_rtx_SET (target, x)); @@ -15749,6 +15791,8 @@ ix86_expand_vector_init_duplicate (bool mmx_ok, machine_mode mode, rtx x; val = gen_lowpart (SImode, val); + if (CONST_INT_P (val)) + return false; x = gen_rtx_TRUNCATE (HImode, val); x = gen_rtx_VEC_DUPLICATE (mode, x); emit_insn (gen_rtx_SET (target, x)); @@ -15774,6 +15818,10 @@ ix86_expand_vector_init_duplicate (bool mmx_ok, machine_mode mode, goto widen; case E_V8HImode: + if (CONST_INT_P (val)) + goto widen; + /* FALLTHRU */ + case E_V8HFmode: case E_V8BFmode: if (TARGET_AVX2) @@ -15821,6 +15869,8 @@ ix86_expand_vector_init_duplicate (bool mmx_ok, machine_mode mode, goto widen; case E_V16QImode: + if (CONST_INT_P (val)) + goto widen; if (TARGET_AVX2) return ix86_vector_duplicate_value (mode, target, val); @@ -15840,7 +15890,13 @@ ix86_expand_vector_init_duplicate (bool mmx_ok, machine_mode mode, val = convert_modes (wsmode, smode, val, true); - if (smode == QImode && !TARGET_PARTIAL_REG_STALL) + if (CONST_INT_P (val)) + { + x = simplify_binary_operation (ASHIFT, wsmode, val, + GEN_INT (GET_MODE_BITSIZE (smode))); + val = simplify_binary_operation (IOR, wsmode, val, x); + } + else if (smode == QImode && !TARGET_PARTIAL_REG_STALL) emit_insn (gen_insv_1 (wsmode, val, val)); else { @@ -15853,15 +15909,20 @@ ix86_expand_vector_init_duplicate (bool mmx_ok, machine_mode mode, x = gen_reg_rtx (wvmode); ok = ix86_expand_vector_init_duplicate (mmx_ok, wvmode, x, val); - gcc_assert (ok); + if (!ok) + return false; emit_move_insn (target, gen_lowpart (GET_MODE (target), x)); - return ok; + return true; } case E_V16HImode: + case E_V32QImode: + if (CONST_INT_P (val)) + goto widen; + /* FALLTHRU */ + case E_V16HFmode: case E_V16BFmode: - case E_V32QImode: if (TARGET_AVX2) return ix86_vector_duplicate_value (mode, target, val); else @@ -15887,7 +15948,8 @@ ix86_expand_vector_init_duplicate (bool mmx_ok, machine_mode mode, rtx x = gen_reg_rtx (hvmode); ok = ix86_expand_vector_init_duplicate (false, hvmode, x, val); - gcc_assert (ok); + if (!ok) + return false; x = gen_rtx_VEC_CONCAT (mode, x, x); emit_insn (gen_rtx_SET (target, x)); @@ -15924,7 +15986,8 @@ ix86_expand_vector_init_duplicate (bool mmx_ok, machine_mode mode, rtx x = gen_reg_rtx (hvmode); ok = ix86_expand_vector_init_duplicate (false, hvmode, x, val); - gcc_assert (ok); + if (!ok) + return false; x = gen_rtx_VEC_CONCAT (mode, x, x); emit_insn (gen_rtx_SET (target, x)); @@ -16896,6 +16959,12 @@ ix86_expand_vector_init (bool mmx_ok, rtx target, rtx vals) all_same = false; } + /* If all values are identical, broadcast the value. */ + if (all_same + && ix86_expand_vector_init_duplicate (mmx_ok, mode, target, + XVECEXP (vals, 0, 0))) + return; + /* Constants are best loaded from the constant pool. */ if (n_var == 0) { @@ -16903,12 +16972,6 @@ ix86_expand_vector_init (bool mmx_ok, rtx target, rtx vals) return; } - /* If all values are identical, broadcast the value. */ - if (all_same - && ix86_expand_vector_init_duplicate (mmx_ok, mode, target, - XVECEXP (vals, 0, 0))) - return; - /* Values where only one field is non-constant are best loaded from the pool and overwritten via move later. */ if (n_var == 1)
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-broadcast-pr87767-1.c b/gcc/testsuite/gcc.target/i386/avx512f-broadcast-pr87767-1.c index 0fa93e0..138dbb4 100644 --- a/gcc/testsuite/gcc.target/i386/avx512f-broadcast-pr87767-1.c +++ b/gcc/testsuite/gcc.target/i386/avx512f-broadcast-pr87767-1.c @@ -3,8 +3,7 @@ /* { dg-options "-O2 -mavx512f -mavx512dq" } */ /* { dg-additional-options "-fno-PIE" { target ia32 } } */ /* { dg-additional-options "-mdynamic-no-pic" { target { *-*-darwin* && ia32 } } } -/* { dg-final { scan-assembler-times "\[^\n\]*\\\{1to8\\\}" 2 { target { ! ia32 } } } } */ -/* { dg-final { scan-assembler-times "\[^\n\]*\\\{1to8\\\}" 5 { target ia32 } } } */ +/* { dg-final { scan-assembler-times "\[^\n\]*\\\{1to8\\\}" 2 } } */ /* { dg-final { scan-assembler-times "\[^\n\]*\\\{1to16\\\}" 2 } } */ /* { dg-final { scan-assembler-times "vpbroadcastd\[\\t \]+%(?:r|e)\[^\n\]*, %zmm\[0-9\]+" 3 } } */ /* { dg-final { scan-assembler-times "vpbroadcastq\[\\t \]+%r\[^\n\]*, %zmm\[0-9\]+" 3 { target { ! ia32 } } } } */ diff --git a/gcc/testsuite/gcc.target/i386/avx512f-broadcast-pr87767-5.c b/gcc/testsuite/gcc.target/i386/avx512f-broadcast-pr87767-5.c index f1b672a..d22251b 100644 --- a/gcc/testsuite/gcc.target/i386/avx512f-broadcast-pr87767-5.c +++ b/gcc/testsuite/gcc.target/i386/avx512f-broadcast-pr87767-5.c @@ -3,7 +3,7 @@ /* { dg-options "-O2 -mavx512f" } */ /* { dg-additional-options "-fno-PIE" { target ia32 } } */ /* { dg-additional-options "-mdynamic-no-pic" { target { *-*-darwin* && ia32 } } } -/* { dg-final { scan-assembler-times "\[^\n\]*\\\{1to8\\\}" 4 { target ia32 } } } */ +/* { dg-final { scan-assembler-not "\[^\n\]*\\\{1to8\\\}" { target ia32 } } } */ /* { dg-final { scan-assembler-times "vpbroadcastd\[\\t \]+%(?:r|e)\[^\n\]*, %zmm\[0-9\]+" 4 } } */ /* { dg-final { scan-assembler-times "vpbroadcastq\[\\t \]+%r\[^\n\]*, %zmm\[0-9\]+" 4 { target { ! ia32 } } } } */ diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-13.c b/gcc/testsuite/gcc.target/i386/avx512fp16-13.c index b73a8f4..f431b8a 100644 --- a/gcc/testsuite/gcc.target/i386/avx512fp16-13.c +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-13.c @@ -126,7 +126,7 @@ abs256_ph (__m256h a) return _mm256_abs_ph (a); } -/* { dg-final { scan-assembler-times "vpbroadcastq\[^\n\]*%ymm\[0-9\]+" 1 { target { ! ia32 } } } } */ +/* { dg-final { scan-assembler-times "vpbroadcastd\[^\n\]*%ymm\[0-9\]+" 1 { target { ! ia32 } } } } */ /* { dg-final { scan-assembler-times "vpand\[^\n\]*%ymm\[0-9\]+" 1 } } */ __m128h @@ -136,5 +136,5 @@ abs_ph (__m128h a) return _mm_abs_ph (a); } -/* { dg-final { scan-assembler-times "vpbroadcastq\[^\n\]*%xmm\[0-9\]+" 1 { target { ! ia32 } } } } */ +/* { dg-final { scan-assembler-times "vpbroadcastd\[^\n\]*%xmm\[0-9\]+" 1 { target { ! ia32 } } } } */ /* { dg-final { scan-assembler-times "vpand\[^\n\]*%xmm\[0-9\]+" 1 } } */ diff --git a/gcc/testsuite/gcc.target/i386/avx512vl-broadcast-pr87767-1.c b/gcc/testsuite/gcc.target/i386/avx512vl-broadcast-pr87767-1.c index 0304b9d..e6df4d2 100644 --- a/gcc/testsuite/gcc.target/i386/avx512vl-broadcast-pr87767-1.c +++ b/gcc/testsuite/gcc.target/i386/avx512vl-broadcast-pr87767-1.c @@ -3,10 +3,8 @@ /* { dg-options "-O2 -mavx512f -mavx512vl -mavx512dq" } */ /* { dg-additional-options "-fno-PIE" { target ia32 } } */ /* { dg-additional-options "-mdynamic-no-pic" { target { *-*-darwin* && ia32 } } } -/* { dg-final { scan-assembler-times "\[^\n\]*\\\{1to2\\\}" 2 { target { ! ia32 } } } } */ -/* { dg-final { scan-assembler-times "\[^\n\]*\\\{1to4\\\}" 4 { target { ! ia32 } } } } */ -/* { dg-final { scan-assembler-times "\[^\n\]*\\\{1to2\\\}" 5 { target ia32 } } } */ -/* { dg-final { scan-assembler-times "\[^\n\]*\\\{1to4\\\}" 7 { target ia32 } } } */ +/* { dg-final { scan-assembler-times "\[^\n\]*\\\{1to2\\\}" 2 } } */ +/* { dg-final { scan-assembler-times "\[^\n\]*\\\{1to4\\\}" 4 } } */ /* { dg-final { scan-assembler-times "\[^\n\]*\\\{1to8\\\}" 2 } } */ /* { dg-final { scan-assembler-times "vpbroadcastd\[\\t \]+%(?:r|e)\[^\n\]*, %xmm\[0-9\]+" 3 } } */ /* { dg-final { scan-assembler-times "vpbroadcastd\[\\t \]+%(?:r|e)\[^\n\]*, %ymm\[0-9\]+" 3 } } */ diff --git a/gcc/testsuite/gcc.target/i386/avx512vl-broadcast-pr87767-5.c b/gcc/testsuite/gcc.target/i386/avx512vl-broadcast-pr87767-5.c index 0ba0cd9..ebdc361 100644 --- a/gcc/testsuite/gcc.target/i386/avx512vl-broadcast-pr87767-5.c +++ b/gcc/testsuite/gcc.target/i386/avx512vl-broadcast-pr87767-5.c @@ -3,8 +3,8 @@ /* { dg-options "-O2 -mavx512f -mavx512vl" } */ /* { dg-additional-options "-fno-PIE" { target ia32 } } */ /* { dg-additional-options "-mdynamic-no-pic" { target { *-*-darwin* && ia32 } } } -/* { dg-final { scan-assembler-times "\[^\n\]*\\\{1to2\\\}" 4 { target ia32 } } } */ -/* { dg-final { scan-assembler-times "\[^\n\]*\\\{1to4\\\}" 4 { target ia32 } } } */ +/* { dg-final { scan-assembler-not "\[^\n\]*\\\{1to2\\\}" { target ia32 } } } */ +/* { dg-final { scan-assembler-not "\[^\n\]*\\\{1to4\\\}" { target ia32 } } } */ /* { dg-final { scan-assembler-times "vpbroadcastd\[\\t \]+%(?:r|e)\[^\n\]*, %xmm\[0-9\]+" 4 } } */ /* { dg-final { scan-assembler-times "vpbroadcastd\[\\t \]+%(?:r|e)\[^\n\]*, %ymm\[0-9\]+" 4 } } */ /* { dg-final { scan-assembler-times "vpbroadcastq\[\\t \]+%r\[^\n\]*, %xmm\[0-9\]+" 4 { target { ! ia32 } } } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr100865-10a.c b/gcc/testsuite/gcc.target/i386/pr100865-10a.c index 1d849a3..3bc0f1a 100644 --- a/gcc/testsuite/gcc.target/i386/pr100865-10a.c +++ b/gcc/testsuite/gcc.target/i386/pr100865-10a.c @@ -29,5 +29,5 @@ foo (void) array[i] = MK_CONST128_BROADCAST (0x1f); } -/* { dg-final { scan-assembler-times "vpbroadcastb\[\\t \]+\[^\n\]*, %ymm\[0-9\]+" 1 } } */ +/* { dg-final { scan-assembler-times "vpbroadcastd\[\\t \]+\[^\n\]*, %ymm\[0-9\]+" 1 } } */ /* { dg-final { scan-assembler-times "vmovdqu\[\\t \]%ymm\[0-9\]+, " 8 } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr100865-10b.c b/gcc/testsuite/gcc.target/i386/pr100865-10b.c index e5616d8..f60d1bf 100644 --- a/gcc/testsuite/gcc.target/i386/pr100865-10b.c +++ b/gcc/testsuite/gcc.target/i386/pr100865-10b.c @@ -3,5 +3,5 @@ #include "pr100865-10a.c" -/* { dg-final { scan-assembler-times "vpbroadcastb\[\\t \]+%(?:r|e)\[^\n\]*, %ymm\[0-9\]+" 1 } } */ +/* { dg-final { scan-assembler-times "vpbroadcastd\[\\t \]+%(?:r|e)\[^\n\]*, %ymm\[0-9\]+" 1 } } */ /* { dg-final { scan-assembler-times "vmovdqu8\[\\t \]%ymm\[0-9\]+, " 8 } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr100865-11c.c b/gcc/testsuite/gcc.target/i386/pr100865-11c.c index de56c84..94a2c43 100644 --- a/gcc/testsuite/gcc.target/i386/pr100865-11c.c +++ b/gcc/testsuite/gcc.target/i386/pr100865-11c.c @@ -3,6 +3,5 @@ #include "pr100865-11a.c" -/* { dg-final { scan-assembler-times "movabsq" 1 } } */ -/* { dg-final { scan-assembler-times "vpunpcklqdq\[\\t \]+\[^\n\]*, %xmm\[0-9\]+" 1 } } */ +/* { dg-final { scan-assembler-times "vmovddup\[\\t \]+\[^\n\]*, %xmm\[0-9\]+" 1 } } */ /* { dg-final { scan-assembler-times "vmovdqa\[\\t \]%xmm\[0-9\]+, " 16 } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr100865-12c.c b/gcc/testsuite/gcc.target/i386/pr100865-12c.c index 77415f2..424c916 100644 --- a/gcc/testsuite/gcc.target/i386/pr100865-12c.c +++ b/gcc/testsuite/gcc.target/i386/pr100865-12c.c @@ -3,6 +3,5 @@ #include "pr100865-12a.c" -/* { dg-final { scan-assembler-times "movabsq" 1 } } */ -/* { dg-final { scan-assembler-times "vpunpcklqdq\[\\t \]+\[^\n\]*, %xmm\[0-9\]+" 1 } } */ +/* { dg-final { scan-assembler-times "vmovddup\[\\t \]+\[^\n\]*, %xmm\[0-9\]+" 1 } } */ /* { dg-final { scan-assembler-times "vmovdqa\[\\t \]%xmm\[0-9\]+, " 16 } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr100865-2.c b/gcc/testsuite/gcc.target/i386/pr100865-2.c index 090a010..e0265d2 100644 --- a/gcc/testsuite/gcc.target/i386/pr100865-2.c +++ b/gcc/testsuite/gcc.target/i386/pr100865-2.c @@ -10,6 +10,6 @@ foo (void) __builtin_memset (dst, 3, 16); } -/* { dg-final { scan-assembler-times "vpbroadcastb\[\\t \]+%xmm\[0-9\]+, %xmm\[0-9\]+" 1 } } */ +/* { dg-final { scan-assembler-times "vpbroadcastd\[\\t \]+%xmm\[0-9\]+, %xmm\[0-9\]+" 1 } } */ /* { dg-final { scan-assembler-times "vmovdqu\[\\t \]%xmm\[0-9\]+, \\(%\[\^,\]+\\)" 1 } } */ /* { dg-final { scan-assembler-not "vmovdqa" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr100865-3.c b/gcc/testsuite/gcc.target/i386/pr100865-3.c index cde4b1c..433fd81 100644 --- a/gcc/testsuite/gcc.target/i386/pr100865-3.c +++ b/gcc/testsuite/gcc.target/i386/pr100865-3.c @@ -10,7 +10,7 @@ foo (void) __builtin_memset (dst, 3, 16); } -/* { dg-final { scan-assembler-times "vpbroadcastb\[\\t \]+%(?:r|e)\[^\n\]*, %xmm\[0-9\]+" 1 } } */ +/* { dg-final { scan-assembler-times "vpbroadcastd\[\\t \]+%(?:r|e)\[^\n\]*, %xmm\[0-9\]+" 1 } } */ /* { dg-final { scan-assembler-times "vmovdqu8\[\\t \]%xmm\[0-9\]+, \\(%\[\^,\]+\\)" 1 } } */ -/* { dg-final { scan-assembler-not "vpbroadcastb\[\\t \]+%xmm\[0-9\]+, %xmm\[0-9\]+" } } */ +/* { dg-final { scan-assembler-not "vpbroadcastd\[\\t \]+%xmm\[0-9\]+, %xmm\[0-9\]+" } } */ /* { dg-final { scan-assembler-not "vmovdqa" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr100865-4a.c b/gcc/testsuite/gcc.target/i386/pr100865-4a.c index bd99945..8009e5c 100644 --- a/gcc/testsuite/gcc.target/i386/pr100865-4a.c +++ b/gcc/testsuite/gcc.target/i386/pr100865-4a.c @@ -12,6 +12,6 @@ foo (void) array[i] = -45; } -/* { dg-final { scan-assembler-times "vpbroadcastb\[\\t \]+%xmm\[0-9\]+, %ymm\[0-9\]+" 1 } } */ +/* { dg-final { scan-assembler-times "vpbroadcastd\[\\t \]+%xmm\[0-9\]+, %ymm\[0-9\]+" 1 } } */ /* { dg-final { scan-assembler-times "vmovdqu\[\\t \]%ymm\[0-9\]+, " 2 } } */ /* { dg-final { scan-assembler-not "vmovdqa" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr100865-4b.c b/gcc/testsuite/gcc.target/i386/pr100865-4b.c index 1814306..6fd703e 100644 --- a/gcc/testsuite/gcc.target/i386/pr100865-4b.c +++ b/gcc/testsuite/gcc.target/i386/pr100865-4b.c @@ -4,8 +4,8 @@ #include "pr100865-4a.c" -/* { dg-final { scan-assembler-times "vpbroadcastb\[\\t \]+%(?:r|e)\[^\n\]*, %ymm\[0-9\]+" 1 } } */ +/* { dg-final { scan-assembler-times "vpbroadcastd\[\\t \]+%(?:r|e)\[^\n\]*, %ymm\[0-9\]+" 1 } } */ /* { dg-final { scan-assembler-times "vmovdqu8\[\\t \]%ymm\[0-9\]+, " 2 } } */ /* { dg-final { scan-assembler-times "vzeroupper" 1 } } */ -/* { dg-final { scan-assembler-not "vpbroadcastb\[\\t \]+%xmm\[0-9\]+, %ymm\[0-9\]+" } } */ +/* { dg-final { scan-assembler-not "vpbroadcastd\[\\t \]+%xmm\[0-9\]+, %ymm\[0-9\]+" } } */ /* { dg-final { scan-assembler-not "vmovdqa" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr100865-5a.c b/gcc/testsuite/gcc.target/i386/pr100865-5a.c index b023fca..d6fb79e 100644 --- a/gcc/testsuite/gcc.target/i386/pr100865-5a.c +++ b/gcc/testsuite/gcc.target/i386/pr100865-5a.c @@ -12,6 +12,6 @@ foo (void) array[i] = -45; } -/* { dg-final { scan-assembler-times "vpbroadcastw\[\\t \]+%xmm\[0-9\]+, %ymm\[0-9\]+" 1 } } */ +/* { dg-final { scan-assembler-times "vpbroadcastd\[\\t \]+%xmm\[0-9\]+, %ymm\[0-9\]+" 1 } } */ /* { dg-final { scan-assembler-times "vmovdqu\[\\t \]%ymm\[0-9\]+, " 4 } } */ /* { dg-final { scan-assembler-not "vmovdqa" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr100865-5b.c b/gcc/testsuite/gcc.target/i386/pr100865-5b.c index 5bccfd0..6c2b33d 100644 --- a/gcc/testsuite/gcc.target/i386/pr100865-5b.c +++ b/gcc/testsuite/gcc.target/i386/pr100865-5b.c @@ -4,7 +4,7 @@ #include "pr100865-5a.c" -/* { dg-final { scan-assembler-times "vpbroadcastw\[\\t \]+%(?:r|e)\[^\n\]*, %ymm\[0-9\]+" 1 } } */ +/* { dg-final { scan-assembler-times "vpbroadcastd\[\\t \]+%(?:r|e)\[^\n\]*, %ymm\[0-9\]+" 1 } } */ /* { dg-final { scan-assembler-times "vmovdqu16\[\\t \]%ymm\[0-9\]+, " 4 } } */ -/* { dg-final { scan-assembler-not "vpbroadcastw\[\\t \]+%xmm\[0-9\]+, %ymm\[0-9\]+" } } */ +/* { dg-final { scan-assembler-not "vpbroadcastd\[\\t \]+%xmm\[0-9\]+, %ymm\[0-9\]+" } } */ /* { dg-final { scan-assembler-not "vmovdqa" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr100865-9a.c b/gcc/testsuite/gcc.target/i386/pr100865-9a.c index 45d0e0d..f2ac1bd 100644 --- a/gcc/testsuite/gcc.target/i386/pr100865-9a.c +++ b/gcc/testsuite/gcc.target/i386/pr100865-9a.c @@ -21,5 +21,5 @@ foo (void) array[i] = MK_CONST128_BROADCAST (0x1fff); } -/* { dg-final { scan-assembler-times "vpbroadcastw\[\\t \]+%xmm\[0-9\]+, %xmm\[0-9\]+" 1 } } */ +/* { dg-final { scan-assembler-times "vpbroadcastd\[\\t \]+%xmm\[0-9\]+, %xmm\[0-9\]+" 1 } } */ /* { dg-final { scan-assembler-times "vmovdqa\[\\t \]%xmm\[0-9\]+, " 16 } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr100865-9b.c b/gcc/testsuite/gcc.target/i386/pr100865-9b.c index 1469624..e2a3f92 100644 --- a/gcc/testsuite/gcc.target/i386/pr100865-9b.c +++ b/gcc/testsuite/gcc.target/i386/pr100865-9b.c @@ -3,5 +3,5 @@ #include "pr100865-9a.c" -/* { dg-final { scan-assembler-times "vpbroadcastw\[\\t \]+%(?:r|e)\[^\n\]*, %xmm\[0-9\]+" 1 } } */ +/* { dg-final { scan-assembler-times "vpbroadcastd\[\\t \]+%(?:r|e)\[^\n\]*, %xmm\[0-9\]+" 1 } } */ /* { dg-final { scan-assembler-times "vmovdqa\[\\t \]%xmm\[0-9\]+, " 16 } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr102021.c b/gcc/testsuite/gcc.target/i386/pr102021.c index a5012a4..8ff898d 100644 --- a/gcc/testsuite/gcc.target/i386/pr102021.c +++ b/gcc/testsuite/gcc.target/i386/pr102021.c @@ -10,7 +10,7 @@ foo () return _mm256_set1_epi16 (12); } -/* { dg-final { scan-assembler-times "vpbroadcastq\[\\t \]+%r\[^\n\]*, %ymm\[0-9\]+" 1 { target { ! ia32 } } } } */ -/* { dg-final { scan-assembler-times "vpbroadcastq\[\\t \]+\[^\n\]*, %ymm\[0-9\]+" 1 { target ia32 } } } */ +/* { dg-final { scan-assembler-times "vpbroadcastd\[\\t \]+%\[^\n\]*, %ymm\[0-9\]+" 1 { target { ! ia32 } } } } */ +/* { dg-final { scan-assembler-times "vpbroadcastd\[\\t \]+\[^\n\]*, %ymm\[0-9\]+" 1 { target ia32 } } } */ /* { dg-final { scan-assembler-not "vmovdqa" } } */ /* { dg-final { scan-assembler-not "vzeroupper" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr90773-17.c b/gcc/testsuite/gcc.target/i386/pr90773-17.c index 3036085..61b2bfd 100644 --- a/gcc/testsuite/gcc.target/i386/pr90773-17.c +++ b/gcc/testsuite/gcc.target/i386/pr90773-17.c @@ -10,6 +10,6 @@ foo (void) __builtin_memset (dst, 12, 19); } -/* { dg-final { scan-assembler-times "vpbroadcastb" 1 } } */ +/* { dg-final { scan-assembler-times "vpbroadcastd" 1 } } */ /* { dg-final { scan-assembler-times "vmovdqu8\[\\t \]+%xmm\[0-9\]+, \\(%\[\^,\]+\\)" 1 } } */ /* { dg-final { scan-assembler-times "vmovd\[\\t \]+%xmm\[0-9\]+, 16\\(%\[\^,\]+\\)" 1 { xfail *-*-* } } } */