Hi, Jeff. >> Hmm, I'm not sure this is safe, especially if gimple->rtl expansion is >> complete. While you might be able to get REG_EXPR, I would not really >> expect SSA_NAME_DEF_STMT to be correct. At the least it'll need some >> way to make sure it's not called at an inappropriate time. I think it's safe, if SSA_NAME_DEF_STMT is NULL, then just return it.
>> Should this have been known_lt rather than known_le? It should be LE, since I will pass through GET_MODE_NUNITS/GET_MODE_SIZE for SLP. >> Something's off in your formatting here. I'd guess spaces vs tabs Ok. >>In a few places you're using expand_binop. Those interfaces are really >>more for gimple->RTL. BUt code like expand_gather_scatter is really >>RTL, not gimple/tree. Is there a reason why you're not using pure RTL >>interfaces? I saw ARM SVE is using them in many places for expanding patterns. And I think it's convenient so that's why I use them. Thanks. juzhe.zh...@rivai.ai From: Jeff Law Date: 2023-07-12 10:01 To: Juzhe-Zhong; gcc-patches CC: kito.cheng; rdapp.gcc Subject: Re: [PATCH V5] RISC-V: Support gather_load/scatter RVV auto-vectorization On 7/7/23 08:32, Juzhe-Zhong wrote: > This patch fully support gather_load/scatter_store: > 1. Support single-rgroup on both RV32/RV64. > 2. Support indexed element width can be same as or smaller than Pmode. > 3. Support VLA SLP with gather/scatter. > 4. Fully tested all gather/scatter with LMUL = M1/M2/M4/M8 both VLA and VLS. > 5. Fix bug of handling (subreg:SI (const_poly_int:DI)) > 6. Fix bug on vec_perm which is used by gather/scatter SLP. > > All kinds of GATHER/SCATTER are normalized into LEN_MASK_*. > We fully supported these 4 kinds of gather/scatter: > 1. LEN_MASK_GATHER_LOAD/LEN_MASK_SCATTER_STORE with dummy length and dummy > mask (Full vector). > 2. LEN_MASK_GATHER_LOAD/LEN_MASK_SCATTER_STORE with dummy length and real > mask. > 2. LEN_MASK_GATHER_LOAD/LEN_MASK_SCATTER_STORE with real length and dummy > mask. > 2. LEN_MASK_GATHER_LOAD/LEN_MASK_SCATTER_STORE with real length and real mask. > > We use vluxei/vsuxei (un-ordered indexed loads/stores of RVV to code generate > gather/scatter). > > Also, we support strided loads/stores with vlse.v/vsse.v. Consider this > following case: > #define TEST_LOOP(DATA_TYPE, BITS) > \ > void __attribute__ ((noinline, noclone)) > \ > f_##DATA_TYPE##_##BITS (DATA_TYPE *restrict dest, DATA_TYPE *restrict src, > \ > INDEX##BITS stride, INDEX##BITS n) \ > { > \ > for (INDEX##BITS i = 0; i < n; ++i) > \ > dest[i] += src[i * stride]; > \ > } > > Codegen: > f_int8_t_8: > ble a3,zero,.L10 > li a5,1 > mv a4,a0 > bne a2,a5,.L4 > li a2,1 > .L6: > vsetvli a5,a3,e8,m2,ta,ma > vle8.v v2,0(a0) > vlse8.v v4,0(a1),a2 > vsetvli a6,zero,e8,m2,ta,ma > sub a3,a3,a5 > vadd.vv v2,v2,v4 > vsetvli zero,a5,e8,m2,ta,ma > vse8.v v2,0(a4) > add a0,a0,a5 > add a1,a1,a5 > add a4,a4,a5 > bne a3,zero,.L6 > .L10: > ret > > We use vlse.v instead of vluxei. > > This patch has been tested on both RV32 and RV64. > > gcc/ChangeLog: > > * config/riscv/autovec.md > (len_mask_gather_load<VNX1_QHSD:mode><VNX1_QHSDI:mode>): New pattern. > (len_mask_gather_load<VNX2_QHSD:mode><VNX2_QHSDI:mode>): Ditto. > (len_mask_gather_load<VNX4_QHSD:mode><VNX4_QHSDI:mode>): Ditto. > (len_mask_gather_load<VNX8_QHSD:mode><VNX8_QHSDI:mode>): Ditto. > (len_mask_gather_load<VNX16_QHSD:mode><VNX16_QHSDI:mode>): Ditto. > (len_mask_gather_load<VNX32_QHS:mode><VNX32_QHSI:mode>): Ditto. > (len_mask_gather_load<VNX64_QH:mode><VNX64_QHI:mode>): Ditto. > (len_mask_gather_load<mode><mode>): Ditto. > (len_mask_scatter_store<VNX1_QHSD:mode><VNX1_QHSDI:mode>): Ditto. > (len_mask_scatter_store<VNX2_QHSD:mode><VNX2_QHSDI:mode>): Ditto. > (len_mask_scatter_store<VNX4_QHSD:mode><VNX4_QHSDI:mode>): Ditto. > (len_mask_scatter_store<VNX8_QHSD:mode><VNX8_QHSDI:mode>): Ditto. > (len_mask_scatter_store<VNX16_QHSD:mode><VNX16_QHSDI:mode>): Ditto. > (len_mask_scatter_store<VNX32_QHS:mode><VNX32_QHSI:mode>): Ditto. > (len_mask_scatter_store<VNX64_QH:mode><VNX64_QHI:mode>): Ditto. > (len_mask_scatter_store<mode><mode>): Ditto. > * config/riscv/predicates.md (const_1_operand): New predicate. > (vector_gs_offset_operand): Ditto. > (vector_gs_scale_operand_16): Ditto. > (vector_gs_scale_operand_32): Ditto. > (vector_gs_scale_operand_64): Ditto. > (vector_gs_extension_operand): Ditto. > (vector_gs_scale_operand_16_rv32): Ditto. > (vector_gs_scale_operand_32_rv32): Ditto. > * config/riscv/riscv-protos.h (enum insn_type): Add gather/scatter. > (expand_gather_scatter): New function. > * config/riscv/riscv-v.cc (gen_const_vector_dup): Add gather/scatter. > (emit_vlmax_masked_store_insn): New function. > (emit_nonvlmax_masked_store_insn): Ditto. > (modulo_sel_indices): Ditto. > (expand_vec_perm): Fix SLP for gather/scatter. > (prepare_gather_scatter): New function. > (strided_load_store_p): Ditto. > (expand_gather_scatter): Ditto. > * config/riscv/riscv.cc (riscv_legitimize_move): Fix bug of > (subreg:SI (DI CONST_POLY_INT)). > * config/riscv/vector-iterators.md: Add gather/scatter. > * config/riscv/vector.md (vec_duplicate<mode>): Use "@" instead. > (@vec_duplicate<mode>): Ditto. > (@pred_indexed_<order>store<VNX16_QHS:mode><VNX16_QHSDI:mode>): Fix > name. > (@pred_indexed_<order>store<VNX16_QHSD:mode><VNX16_QHSDI:mode>): > Ditto. > > gcc/testsuite/ChangeLog: > > * gcc.target/riscv/rvv/rvv.exp: Add gather/scatter tests. > * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-1.c: New > test. > * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-10.c: New > test. > * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-11.c: New > test. > * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-12.c: New > test. > * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-2.c: New > test. > * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-3.c: New > test. > * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-4.c: New > test. > * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-5.c: New > test. > * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-6.c: New > test. > * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-7.c: New > test. > * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-8.c: New > test. > * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-9.c: New > test. > * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-1.c: > New test. > * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-10.c: > New test. > * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-11.c: > New test. > * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-12.c: > New test. > * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-2.c: > New test. > * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-3.c: > New test. > * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-4.c: > New test. > * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-5.c: > New test. > * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-6.c: > New test. > * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-7.c: > New test. > * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-8.c: > New test. > * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-9.c: > New test. > * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-1.c: > New test. > * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-10.c: > New test. > * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-11.c: > New test. > * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-2.c: > New test. > * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-3.c: > New test. > * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-4.c: > New test. > * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-5.c: > New test. > * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-6.c: > New test. > * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-7.c: > New test. > * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-8.c: > New test. > * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-9.c: > New test. > * > gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-1.c: New > test. > * > gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-10.c: New > test. > * > gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-11.c: New > test. > * > gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-2.c: New > test. > * > gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-3.c: New > test. > * > gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-4.c: New > test. > * > gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-5.c: New > test. > * > gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-6.c: New > test. > * > gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-7.c: New > test. > * > gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-8.c: New > test. > * > gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-9.c: New > test. > * > gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-1.c: New test. > * > gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-10.c: New test. > * > gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-2.c: New test. > * > gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-3.c: New test. > * > gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-4.c: New test. > * > gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-5.c: New test. > * > gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-6.c: New test. > * > gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-7.c: New test. > * > gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-8.c: New test. > * > gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-9.c: New test. > * > gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-1.c: New > test. > * > gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-10.c: New > test. > * > gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-2.c: New > test. > * > gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-3.c: New > test. > * > gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-4.c: New > test. > * > gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-5.c: New > test. > * > gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-6.c: New > test. > * > gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-7.c: New > test. > * > gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-8.c: New > test. > * > gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-9.c: New > test. > * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-1.c: New > test. > * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-10.c: > New test. > * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-2.c: New > test. > * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-3.c: New > test. > * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-4.c: New > test. > * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-5.c: New > test. > * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-6.c: New > test. > * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-7.c: New > test. > * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-8.c: New > test. > * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-9.c: New > test. > * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-1.c: > New test. > * > gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-10.c: New test. > * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-2.c: > New test. > * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-3.c: > New test. > * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-4.c: > New test. > * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-5.c: > New test. > * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-6.c: > New test. > * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-7.c: > New test. > * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-8.c: > New test. > * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-9.c: > New test. > * gcc.target/riscv/rvv/autovec/gather-scatter/strided_load-1.c: New > test. > * gcc.target/riscv/rvv/autovec/gather-scatter/strided_load-2.c: New > test. > * gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-1.c: > New test. > * gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-2.c: > New test. > * gcc.target/riscv/rvv/autovec/gather-scatter/strided_store-1.c: New > test. > * gcc.target/riscv/rvv/autovec/gather-scatter/strided_store-2.c: New > test. > * gcc.target/riscv/rvv/autovec/gather-scatter/strided_store_run-1.c: > New test. > * gcc.target/riscv/rvv/autovec/gather-scatter/strided_store_run-2.c: > New test. > > --- > + > +/* Return true if it is the strided load/store. */ > +static bool > +strided_load_store_p (rtx vec_offset, rtx *base, rtx *step) > +{ > + if (const_vec_series_p (vec_offset, base, step)) > + return true; > + > + /* For strided load/store, vectorizer always generates > + VEC_SERIES_EXPR for vec_offset. */ > + tree expr = REG_EXPR (vec_offset); > + if (!expr || TREE_CODE (expr) != SSA_NAME) > + return false; > + > + /* Check if it is GIMPLE like: _88 = VEC_SERIES_EXPR <0, _87>; */ > + gimple *def_stmt = SSA_NAME_DEF_STMT (expr); > + if (!def_stmt || !is_gimple_assign (def_stmt) > + || gimple_assign_rhs_code (def_stmt) != VEC_SERIES_EXPR) > + return false; Hmm, I'm not sure this is safe, especially if gimple->rtl expansion is complete. While you might be able to get REG_EXPR, I would not really expect SSA_NAME_DEF_STMT to be correct. At the least it'll need some way to make sure it's not called at an inappropriate time. > + > +/* Expand LEN_MASK_{GATHER_LOAD,SCATTER_STORE}. */ > +void > +expand_gather_scatter (rtx *ops, bool is_load) > +{ > + > + /* We use vlse.v/vsse.v instead of indexed load/store by default > + if it is strided load/store. > + > + FIXME: vlse.v/vsse.v may not always be better than vluxei.v/vsuxei.v. > + We may need COST MODE to adjust it. */ I'd be surprised if we encounter a case where vector strided will be worse than the equivalent vector indexed. In the unlikely event that happens, we'll have to implement a suitable cost model and splat the stride into a vector index register. But I wouldn't worry too much about it at this stage. > + rtx base, step; > + if (strided_load_store_p (vec_offset, &base, &step)) > + { > + if (GET_MODE (step) != Pmode) > + { > + if (CONSTANT_P (step)) > + step = force_reg (Pmode, step); > + else > + { > + rtx extend_step = gen_reg_rtx (Pmode); > + emit_insn (gen_extend_insn (extend_step, step, Pmode, > + GET_MODE (step), > + zero_extend_p ? true : false)); > + step = extend_step; > + } What happens for a non-constant step in a mode the same size as Pmode, particularly in a non-optimizing compilation? Wouldn't that abort with an unrecognized extension insn? I'd have similar concerns with the code that handles the case inner_offsize < inner_vsize. > diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md > index 5b7a17b9d34..19740c89132 100644 > --- a/gcc/config/riscv/vector.md > +++ b/gcc/config/riscv/vector.md > @@ -1357,8 +1357,16 @@ > } > } > else if (GET_MODE_BITSIZE (<VEL>mode) > GET_MODE_BITSIZE (Pmode) > - && immediate_operand (operands[3], Pmode)) > - operands[3] = gen_rtx_SIGN_EXTEND (<VEL>mode, force_reg (Pmode, > operands[3])); > + && (immediate_operand (operands[3], Pmode) > + || (CONST_POLY_INT_P (operands[3]) > + && known_ge (rtx_to_poly_int64 (operands[3]), 0U) > + && known_le (rtx_to_poly_int64 (operands[3]), GET_MODE_SIZE > (<MODE>mode))))) Should this have been known_lt rather than known_le? > @@ -1397,6 +1406,12 @@ > (match_dup 2)))] > { > gcc_assert (can_create_pseudo_p ()); > + if (CONST_POLY_INT_P (operands[3])) > + { > + rtx tmp = gen_reg_rtx (<VEL>mode); > + emit_move_insn (tmp, operands[3]); > + operands[3] = tmp; > + } Something's off in your formatting here. I'd guess spaces vs tabs In a few places you're using expand_binop. Those interfaces are really more for gimple->RTL. BUt code like expand_gather_scatter is really RTL, not gimple/tree. Is there a reason why you're not using pure RTL interfaces? Anyway this is mostly good, but I do think there are a few outstanding questions/concerns to work through. Jeff