On 7/7/23 08:32, Juzhe-Zhong wrote:
This patch fully support gather_load/scatter_store:
1. Support single-rgroup on both RV32/RV64.
2. Support indexed element width can be same as or smaller than Pmode.
3. Support VLA SLP with gather/scatter.
4. Fully tested all gather/scatter with LMUL = M1/M2/M4/M8 both VLA and VLS.
5. Fix bug of handling (subreg:SI (const_poly_int:DI))
6. Fix bug on vec_perm which is used by gather/scatter SLP.

All kinds of GATHER/SCATTER are normalized into LEN_MASK_*.
We fully supported these 4 kinds of gather/scatter:
1. LEN_MASK_GATHER_LOAD/LEN_MASK_SCATTER_STORE with dummy length and dummy mask 
(Full vector).
2. LEN_MASK_GATHER_LOAD/LEN_MASK_SCATTER_STORE with dummy length and real mask.
2. LEN_MASK_GATHER_LOAD/LEN_MASK_SCATTER_STORE with real length and dummy mask.
2. LEN_MASK_GATHER_LOAD/LEN_MASK_SCATTER_STORE with real length and real mask.

We use vluxei/vsuxei (un-ordered indexed loads/stores of RVV to code generate 
gather/scatter).

Also, we support strided loads/stores with vlse.v/vsse.v. Consider this 
following case:
#define TEST_LOOP(DATA_TYPE, BITS)                                             \
   void __attribute__ ((noinline, noclone))                                     
\
   f_##DATA_TYPE##_##BITS (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,   
\
                          INDEX##BITS stride, INDEX##BITS n)                   \
   {                                                                            
\
     for (INDEX##BITS i = 0; i < n; ++i)                                        
\
       dest[i] += src[i * stride];                                              
\
   }

Codegen:
f_int8_t_8:
        ble     a3,zero,.L10
        li      a5,1
        mv      a4,a0
        bne     a2,a5,.L4
        li      a2,1
.L6:
        vsetvli a5,a3,e8,m2,ta,ma
        vle8.v  v2,0(a0)
        vlse8.v v4,0(a1),a2
        vsetvli a6,zero,e8,m2,ta,ma
        sub     a3,a3,a5
        vadd.vv v2,v2,v4
        vsetvli zero,a5,e8,m2,ta,ma
        vse8.v  v2,0(a4)
        add     a0,a0,a5
        add     a1,a1,a5
        add     a4,a4,a5
        bne     a3,zero,.L6
.L10:
        ret

We use vlse.v instead of vluxei.

This patch has been tested on both RV32 and RV64.

gcc/ChangeLog:

         * config/riscv/autovec.md 
(len_mask_gather_load<VNX1_QHSD:mode><VNX1_QHSDI:mode>): New pattern.
         (len_mask_gather_load<VNX2_QHSD:mode><VNX2_QHSDI:mode>): Ditto.
         (len_mask_gather_load<VNX4_QHSD:mode><VNX4_QHSDI:mode>): Ditto.
         (len_mask_gather_load<VNX8_QHSD:mode><VNX8_QHSDI:mode>): Ditto.
         (len_mask_gather_load<VNX16_QHSD:mode><VNX16_QHSDI:mode>): Ditto.
         (len_mask_gather_load<VNX32_QHS:mode><VNX32_QHSI:mode>): Ditto.
         (len_mask_gather_load<VNX64_QH:mode><VNX64_QHI:mode>): Ditto.
         (len_mask_gather_load<mode><mode>): Ditto.
         (len_mask_scatter_store<VNX1_QHSD:mode><VNX1_QHSDI:mode>): Ditto.
         (len_mask_scatter_store<VNX2_QHSD:mode><VNX2_QHSDI:mode>): Ditto.
         (len_mask_scatter_store<VNX4_QHSD:mode><VNX4_QHSDI:mode>): Ditto.
         (len_mask_scatter_store<VNX8_QHSD:mode><VNX8_QHSDI:mode>): Ditto.
         (len_mask_scatter_store<VNX16_QHSD:mode><VNX16_QHSDI:mode>): Ditto.
         (len_mask_scatter_store<VNX32_QHS:mode><VNX32_QHSI:mode>): Ditto.
         (len_mask_scatter_store<VNX64_QH:mode><VNX64_QHI:mode>): Ditto.
         (len_mask_scatter_store<mode><mode>): Ditto.
         * config/riscv/predicates.md (const_1_operand): New predicate.
         (vector_gs_offset_operand): Ditto.
         (vector_gs_scale_operand_16): Ditto.
         (vector_gs_scale_operand_32): Ditto.
         (vector_gs_scale_operand_64): Ditto.
         (vector_gs_extension_operand): Ditto.
         (vector_gs_scale_operand_16_rv32): Ditto.
         (vector_gs_scale_operand_32_rv32): Ditto.
         * config/riscv/riscv-protos.h (enum insn_type): Add gather/scatter.
         (expand_gather_scatter): New function.
         * config/riscv/riscv-v.cc (gen_const_vector_dup): Add gather/scatter.
         (emit_vlmax_masked_store_insn): New function.
         (emit_nonvlmax_masked_store_insn): Ditto.
         (modulo_sel_indices): Ditto.
         (expand_vec_perm): Fix SLP for gather/scatter.
         (prepare_gather_scatter): New function.
         (strided_load_store_p): Ditto.
         (expand_gather_scatter): Ditto.
         * config/riscv/riscv.cc (riscv_legitimize_move): Fix bug of (subreg:SI 
(DI CONST_POLY_INT)).
         * config/riscv/vector-iterators.md: Add gather/scatter.
         * config/riscv/vector.md (vec_duplicate<mode>): Use "@" instead.
         (@vec_duplicate<mode>): Ditto.
         (@pred_indexed_<order>store<VNX16_QHS:mode><VNX16_QHSDI:mode>): Fix 
name.
         (@pred_indexed_<order>store<VNX16_QHSD:mode><VNX16_QHSDI:mode>): Ditto.

gcc/testsuite/ChangeLog:

         * gcc.target/riscv/rvv/rvv.exp: Add gather/scatter tests.
         * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-1.c: New 
test.
         * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-10.c: New 
test.
         * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-11.c: New 
test.
         * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-12.c: New 
test.
         * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-2.c: New 
test.
         * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-3.c: New 
test.
         * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-4.c: New 
test.
         * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-5.c: New 
test.
         * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-6.c: New 
test.
         * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-7.c: New 
test.
         * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-8.c: New 
test.
         * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-9.c: New 
test.
         * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-1.c: New 
test.
         * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-10.c: 
New test.
         * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-11.c: 
New test.
         * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-12.c: 
New test.
         * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-2.c: New 
test.
         * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-3.c: New 
test.
         * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-4.c: New 
test.
         * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-5.c: New 
test.
         * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-6.c: New 
test.
         * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-7.c: New 
test.
         * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-8.c: New 
test.
         * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-9.c: New 
test.
         * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-1.c: 
New test.
         * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-10.c: 
New test.
         * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-11.c: 
New test.
         * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-2.c: 
New test.
         * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-3.c: 
New test.
         * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-4.c: 
New test.
         * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-5.c: 
New test.
         * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-6.c: 
New test.
         * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-7.c: 
New test.
         * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-8.c: 
New test.
         * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-9.c: 
New test.
         * 
gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-1.c: New test.
         * 
gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-10.c: New test.
         * 
gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-11.c: New test.
         * 
gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-2.c: New test.
         * 
gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-3.c: New test.
         * 
gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-4.c: New test.
         * 
gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-5.c: New test.
         * 
gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-6.c: New test.
         * 
gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-7.c: New test.
         * 
gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-8.c: New test.
         * 
gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-9.c: New test.
         * gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-1.c: 
New test.
         * gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-10.c: 
New test.
         * gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-2.c: 
New test.
         * gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-3.c: 
New test.
         * gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-4.c: 
New test.
         * gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-5.c: 
New test.
         * gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-6.c: 
New test.
         * gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-7.c: 
New test.
         * gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-8.c: 
New test.
         * gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-9.c: 
New test.
         * 
gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-1.c: New 
test.
         * 
gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-10.c: New 
test.
         * 
gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-2.c: New 
test.
         * 
gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-3.c: New 
test.
         * 
gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-4.c: New 
test.
         * 
gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-5.c: New 
test.
         * 
gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-6.c: New 
test.
         * 
gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-7.c: New 
test.
         * 
gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-8.c: New 
test.
         * 
gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-9.c: New 
test.
         * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-1.c: New 
test.
         * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-10.c: New 
test.
         * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-2.c: New 
test.
         * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-3.c: New 
test.
         * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-4.c: New 
test.
         * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-5.c: New 
test.
         * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-6.c: New 
test.
         * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-7.c: New 
test.
         * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-8.c: New 
test.
         * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-9.c: New 
test.
         * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-1.c: 
New test.
         * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-10.c: 
New test.
         * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-2.c: 
New test.
         * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-3.c: 
New test.
         * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-4.c: 
New test.
         * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-5.c: 
New test.
         * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-6.c: 
New test.
         * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-7.c: 
New test.
         * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-8.c: 
New test.
         * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-9.c: 
New test.
         * gcc.target/riscv/rvv/autovec/gather-scatter/strided_load-1.c: New 
test.
         * gcc.target/riscv/rvv/autovec/gather-scatter/strided_load-2.c: New 
test.
         * gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-1.c: 
New test.
         * gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-2.c: 
New test.
         * gcc.target/riscv/rvv/autovec/gather-scatter/strided_store-1.c: New 
test.
         * gcc.target/riscv/rvv/autovec/gather-scatter/strided_store-2.c: New 
test.
         * gcc.target/riscv/rvv/autovec/gather-scatter/strided_store_run-1.c: 
New test.
         * gcc.target/riscv/rvv/autovec/gather-scatter/strided_store_run-2.c: 
New test.

---






+
+/* Return true if it is the strided load/store.  */
+static bool
+strided_load_store_p (rtx vec_offset, rtx *base, rtx *step)
+{
+  if (const_vec_series_p (vec_offset, base, step))
+    return true;
+
+  /* For strided load/store, vectorizer always generates
+     VEC_SERIES_EXPR for vec_offset.  */
+  tree expr = REG_EXPR (vec_offset);
+  if (!expr || TREE_CODE (expr) != SSA_NAME)
+    return false;
+
+  /* Check if it is GIMPLE like: _88 = VEC_SERIES_EXPR <0, _87>;  */
+  gimple *def_stmt = SSA_NAME_DEF_STMT (expr);
+  if (!def_stmt || !is_gimple_assign (def_stmt)
+      || gimple_assign_rhs_code (def_stmt) != VEC_SERIES_EXPR)
+    return false;
Hmm, I'm not sure this is safe, especially if gimple->rtl expansion is complete. While you might be able to get REG_EXPR, I would not really expect SSA_NAME_DEF_STMT to be correct. At the least it'll need some way to make sure it's not called at an inappropriate time.


+
+/* Expand LEN_MASK_{GATHER_LOAD,SCATTER_STORE}.  */
+void
+expand_gather_scatter (rtx *ops, bool is_load)
+{
+
+  /* We use vlse.v/vsse.v instead of indexed load/store by default
+     if it is strided load/store.
+
+     FIXME: vlse.v/vsse.v may not always be better than vluxei.v/vsuxei.v.
+     We may need COST MODE to adjust it.  */
I'd be surprised if we encounter a case where vector strided will be worse than the equivalent vector indexed. In the unlikely event that happens, we'll have to implement a suitable cost model and splat the stride into a vector index register. But I wouldn't worry too much about it at this stage.


+  rtx base, step;
+  if (strided_load_store_p (vec_offset, &base, &step))
+    {
+      if (GET_MODE (step) != Pmode)
+       {
+         if (CONSTANT_P (step))
+           step = force_reg (Pmode, step);
+         else
+           {
+             rtx extend_step = gen_reg_rtx (Pmode);
+             emit_insn (gen_extend_insn (extend_step, step, Pmode,
+                                         GET_MODE (step),
+                                         zero_extend_p ? true : false));
+             step = extend_step;
+           }
What happens for a non-constant step in a mode the same size as Pmode, particularly in a non-optimizing compilation? Wouldn't that abort with an unrecognized extension insn?

I'd have similar concerns with the code that handles the case inner_offsize < inner_vsize.




diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index 5b7a17b9d34..19740c89132 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -1357,8 +1357,16 @@
        }
      }
    else if (GET_MODE_BITSIZE (<VEL>mode) > GET_MODE_BITSIZE (Pmode)
-           && immediate_operand (operands[3], Pmode))
-    operands[3] = gen_rtx_SIGN_EXTEND (<VEL>mode, force_reg (Pmode, 
operands[3]));
+           && (immediate_operand (operands[3], Pmode)
+              || (CONST_POLY_INT_P (operands[3])
+                  && known_ge (rtx_to_poly_int64 (operands[3]), 0U)
+                  && known_le (rtx_to_poly_int64 (operands[3]), GET_MODE_SIZE 
(<MODE>mode)))))
Should this have been known_lt rather than known_le?


@@ -1397,6 +1406,12 @@
          (match_dup 2)))]
    {
      gcc_assert (can_create_pseudo_p ());
+    if (CONST_POLY_INT_P (operands[3]))
+      {
+        rtx tmp = gen_reg_rtx (<VEL>mode);
+       emit_move_insn (tmp, operands[3]);
+       operands[3] = tmp;
+      }
Something's off in your formatting here.  I'd guess spaces vs tabs


In a few places you're using expand_binop. Those interfaces are really more for gimple->RTL. BUt code like expand_gather_scatter is really RTL, not gimple/tree. Is there a reason why you're not using pure RTL interfaces?

Anyway this is mostly good, but I do think there are a few outstanding questions/concerns to work through.

Jeff

Reply via email to