On Fri, Dec 22, 2023 at 6:25 PM Roger Sayle <ro...@nextmovesoftware.com> wrote:
>
>
> This patch resolves the second part of PR target/112992, building upon
> Hongtao Liu's solution to the first part.
>
> The issue addressed by this patch is that when initializing vectors by
> broadcasting integer constants, the compiler has the flexibility to
> select the most appropriate vector mode to perform the broadcast, as
> long as the resulting vector has an identical bit pattern.  For
> example, the following constants are all equivalent:
> V4SImode {0x01010101, 0x01010101, 0x01010101, 0x01010101 }
> V8HImode {0x0101, 0x0101, 0x0101, 0x0101, 0x0101, 0x0101, 0x0101, 0x0101 }
> V16QImode {0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, ... 0x01 }
> So instruction sequences that construct any of these can be used to
> construct the others (with a suitable cast/SUBREG).
>
> On x86_64, it turns out that broadcasts of SImode constants are preferred,
> as DImode constants often require a longer movabs instruction, and
> HImode and QImode broadcasts require multiple uops on some architectures.
> Hence, SImode is always the equal shortest/fastest implementation.
>
> Examples of this improvement, can be seen in the testsuite.
>
> gcc.target/i386/pr102021.c
> Before:
>    0:   48 b8 0c 00 0c 00 0c    movabs $0xc000c000c000c,%rax
>    7:   00 0c 00
>    a:   62 f2 fd 28 7c c0       vpbroadcastq %rax,%ymm0
>   10:   c3                      retq
>
> After:
>    0:   b8 0c 00 0c 00          mov    $0xc000c,%eax
>    5:   62 f2 7d 28 7c c0       vpbroadcastd %eax,%ymm0
>    b:   c3                      retq
>
> and
> gcc.target/i386/pr90773-17.c:
> Before:
>    0:   48 8b 15 00 00 00 00    mov    0x0(%rip),%rdx        # 7 <foo+0x7>
>    7:   b8 0c 00 00 00          mov    $0xc,%eax
>    c:   62 f2 7d 08 7a c0       vpbroadcastb %eax,%xmm0
>   12:   62 f1 7f 08 7f 02       vmovdqu8 %xmm0,(%rdx)
>   18:   c7 42 0f 0c 0c 0c 0c    movl   $0xc0c0c0c,0xf(%rdx)
>   1f:   c3                      retq
>
> After:
>    0:   48 8b 15 00 00 00 00    mov    0x0(%rip),%rdx        # 7 <foo+0x7>
>    7:   b8 0c 0c 0c 0c          mov    $0xc0c0c0c,%eax
>    c:   62 f2 7d 08 7c c0       vpbroadcastd %eax,%xmm0
>   12:   62 f1 7f 08 7f 02       vmovdqu8 %xmm0,(%rdx)
>   18:   c7 42 0f 0c 0c 0c 0c    movl   $0xc0c0c0c,0xf(%rdx)
>   1f:   c3                      retq
>
> where according to Agner Fog's instruction tables broadcastd is slightly
> faster on some microarchitectures, for example Knight's Landing.
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32}
> with no new failures.  Ok for mainline?
>
>
> 2023-12-21  Roger Sayle  <ro...@nextmovesoftware.com>
>
> gcc/ChangeLog
>         PR target/112992
>         * config/i386/i386-expand.cc
>         (ix86_convert_const_wide_int_to_broadcast): Allow call to
>         ix86_expand_vector_init_duplicate to fail, and return NULL_RTX.
>         (ix86_broadcast_from_constant): Revert recent change; Return a
>         suitable MEMREF independently of mode/target combinations.
>         (ix86_expand_vector_move): Allow ix86_expand_vector_init_duplicate
>         to decide whether expansion is possible/preferrable.  Only try
>         forcing DImode constants to memory (and trying again) if calling
>         ix86_expand_vector_init_duplicate fails with an DImode immediate
>         constant.
>         (ix86_expand_vector_init_duplicate) <case E_V2DImode>: Try using
>         V4SImode for suitable immediate constants.
>         <case E_V4DImode>: Try using V8SImode for suitable constants.
>         <case E_V4SImode>: Use constant pool for AVX without AVX2.
>         <case E_V4HImode>: Fail for CONST_INT_P, i.e. use constant pool.
>         <case E_V2HImode>: Likewise.
>         <case E_V8HImode>: For CONST_INT_P try using V4SImode via widen.
>         <case E_V16QImode>: For CONT_INT_P try using V8HImode via widen.
>         <label widen>: Handle CONT_INTs via simplify_binary_operation.
>         Allow recursive calls to ix86_expand_vector_init_duplicate to fail.
>         <case E_V16HImode>: For CONST_INT_P try V8SImode via widen.
>         <case E_V32QImode>: For CONST_INT_P try V16HImode via widen.
>         (ix86_expand_vector_init): Move try using a broadcast for all_same
>         with ix86_expand_vector_init_duplicate before using constant pool.
>
> gcc/testsuite/ChangeLog
>         * gcc.target/i386/avx512f-broadcast-pr87767-1.c: Update test case.
>         * gcc.target/i386/avx512f-broadcast-pr87767-5.c: Likewise.
>         * gcc.target/i386/avx512fp16-13.c: Likewise.
>         * gcc.target/i386/avx512vl-broadcast-pr87767-1.c: Likewise.
>         * gcc.target/i386/avx512vl-broadcast-pr87767-5.c: Likewise.
>         * gcc.target/i386/pr100865-10a.c: Likewise.
>         * gcc.target/i386/pr100865-10b.c: Likewise.
>         * gcc.target/i386/pr100865-11c.c: Likewise.
>         * gcc.target/i386/pr100865-12c.c: Likewise.
>         * gcc.target/i386/pr100865-2.c: Likewise.
>         * gcc.target/i386/pr100865-3.c: Likewise.
>         * gcc.target/i386/pr100865-4a.c: Likewise.
>         * gcc.target/i386/pr100865-4b.c: Likewise.
>         * gcc.target/i386/pr100865-5a.c: Likewise.
>         * gcc.target/i386/pr100865-5b.c: Likewise.
>         * gcc.target/i386/pr100865-9a.c: Likewise.
>         * gcc.target/i386/pr100865-9b.c: Likewise.
>         * gcc.target/i386/pr102021.c: Likewise.
>         * gcc.target/i386/pr90773-17.c: Likewise.
>
>
> Thanks in advance,
> Roger
> --
>

>+    case E_V2DImode:
>+      if (CONST_INT_P (val))
>+        {
>+          int tmp = (int)INTVAL (val);
>+          if (tmp == (int)(INTVAL (val) >> 32))
>+            {
>+              rtx reg = gen_reg_rtx (V4SImode);
>+              ok = ix86_vector_duplicate_value (V4SImode, reg,
>+                                                GEN_INT (tmp));
>+              if (ok)
>+                {
>+                  emit_move_insn (target, gen_lowpart (V2DImode, reg));
>+                  return true;
>+                }
>+            }
>+          if (!TARGET_AVX)
>+            return false;
>+          if (!TARGET_AVX2)
>+            val = force_const_mem (DImode, val);

We can use pshufd/pshufss to broadcast for v4si and punpcklqdq for
v2di, so I think there is no need to return false or force_const_mem
here.

>+    case E_V4SImode:
>+      if (CONST_INT_P (val))
>+        {
>+          if (!TARGET_AVX)
>+            return false;
>+          if (!TARGET_AVX2)
>+            val = force_const_mem (SImode, val);

Ditto.

>+        }
>+      return ix86_vector_duplicate_value (mode, target, val);



>         x = gen_reg_rtx (wvmode);
>         ok = ix86_expand_vector_init_duplicate (mmx_ok, wvmode, x, val);
>-        gcc_assert (ok);
>+        if (!ok)
>+          return false;
Then we can still gcc_assert (ok) here.
>         emit_move_insn (target, gen_lowpart (GET_MODE (target), x));
>-        return ok;
>+        return true;


>+    case E_V32QImode:
>+      if (CONST_INT_P (val))
>+        goto widen;
>+      /* FALLTHRU */
>+
>     case E_V16HFmode:
>    case E_V16BFmode:
>-    case E_V32QImode:
>       if (TARGET_AVX2)
>         return ix86_vector_duplicate_value (mode, target, val);
>       else
>@@ -15904,7 +15965,8 @@ ix86_expand_vector_init_duplicate (bool mmx_ok, 
>machine_mode mode,
>           rtx x = gen_reg_rtx (hvmode);
>
>           ok = ix86_expand_vector_init_duplicate (false, hvmode, x, val);
>-          gcc_assert (ok);
>+          if (!ok)
>+            return false;
>
>           x = gen_rtx_VEC_CONCAT (mode, x, x);
>           emit_insn (gen_rtx_SET (target, x));
>@@ -15941,7 +16003,8 @@ ix86_expand_vector_init_duplicate (bool mmx_ok, 
>machine_mode mode,
>           rtx x = gen_reg_rtx (hvmode);
>
>           ok = ix86_expand_vector_init_duplicate (false, hvmode, x, val);
>-          gcc_assert (ok);
>+          if (!ok)
Assume it's always true for vec_dupv8si?(vec_dupv32qi widen to
vec_dupv16hi widen to vec_dupv8si.
>+            return false;

-- 
BR,
Hongtao

Reply via email to