On 7/7/23 08:32, Juzhe-Zhong wrote:
This patch fully support gather_load/scatter_store:
1. Support single-rgroup on both RV32/RV64.
2. Support indexed element width can be same as or smaller than Pmode.
3. Support VLA SLP with gather/scatter.
4. Fully tested all gather/scatter with LMUL = M1/M2/M4/M8 both VLA and VLS.
5. Fix bug of handling (subreg:SI (const_poly_int:DI))
6. Fix bug on vec_perm which is used by gather/scatter SLP.
All kinds of GATHER/SCATTER are normalized into LEN_MASK_*.
We fully supported these 4 kinds of gather/scatter:
1. LEN_MASK_GATHER_LOAD/LEN_MASK_SCATTER_STORE with dummy length and dummy mask
(Full vector).
2. LEN_MASK_GATHER_LOAD/LEN_MASK_SCATTER_STORE with dummy length and real mask.
2. LEN_MASK_GATHER_LOAD/LEN_MASK_SCATTER_STORE with real length and dummy mask.
2. LEN_MASK_GATHER_LOAD/LEN_MASK_SCATTER_STORE with real length and real mask.
We use vluxei/vsuxei (un-ordered indexed loads/stores of RVV to code generate
gather/scatter).
Also, we support strided loads/stores with vlse.v/vsse.v. Consider this
following case:
#define TEST_LOOP(DATA_TYPE, BITS) \
void __attribute__ ((noinline, noclone))
\
f_##DATA_TYPE##_##BITS (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,
\
INDEX##BITS stride, INDEX##BITS n) \
{
\
for (INDEX##BITS i = 0; i < n; ++i)
\
dest[i] += src[i * stride];
\
}
Codegen:
f_int8_t_8:
ble a3,zero,.L10
li a5,1
mv a4,a0
bne a2,a5,.L4
li a2,1
.L6:
vsetvli a5,a3,e8,m2,ta,ma
vle8.v v2,0(a0)
vlse8.v v4,0(a1),a2
vsetvli a6,zero,e8,m2,ta,ma
sub a3,a3,a5
vadd.vv v2,v2,v4
vsetvli zero,a5,e8,m2,ta,ma
vse8.v v2,0(a4)
add a0,a0,a5
add a1,a1,a5
add a4,a4,a5
bne a3,zero,.L6
.L10:
ret
We use vlse.v instead of vluxei.
This patch has been tested on both RV32 and RV64.
gcc/ChangeLog:
* config/riscv/autovec.md
(len_mask_gather_load<VNX1_QHSD:mode><VNX1_QHSDI:mode>): New pattern.
(len_mask_gather_load<VNX2_QHSD:mode><VNX2_QHSDI:mode>): Ditto.
(len_mask_gather_load<VNX4_QHSD:mode><VNX4_QHSDI:mode>): Ditto.
(len_mask_gather_load<VNX8_QHSD:mode><VNX8_QHSDI:mode>): Ditto.
(len_mask_gather_load<VNX16_QHSD:mode><VNX16_QHSDI:mode>): Ditto.
(len_mask_gather_load<VNX32_QHS:mode><VNX32_QHSI:mode>): Ditto.
(len_mask_gather_load<VNX64_QH:mode><VNX64_QHI:mode>): Ditto.
(len_mask_gather_load<mode><mode>): Ditto.
(len_mask_scatter_store<VNX1_QHSD:mode><VNX1_QHSDI:mode>): Ditto.
(len_mask_scatter_store<VNX2_QHSD:mode><VNX2_QHSDI:mode>): Ditto.
(len_mask_scatter_store<VNX4_QHSD:mode><VNX4_QHSDI:mode>): Ditto.
(len_mask_scatter_store<VNX8_QHSD:mode><VNX8_QHSDI:mode>): Ditto.
(len_mask_scatter_store<VNX16_QHSD:mode><VNX16_QHSDI:mode>): Ditto.
(len_mask_scatter_store<VNX32_QHS:mode><VNX32_QHSI:mode>): Ditto.
(len_mask_scatter_store<VNX64_QH:mode><VNX64_QHI:mode>): Ditto.
(len_mask_scatter_store<mode><mode>): Ditto.
* config/riscv/predicates.md (const_1_operand): New predicate.
(vector_gs_offset_operand): Ditto.
(vector_gs_scale_operand_16): Ditto.
(vector_gs_scale_operand_32): Ditto.
(vector_gs_scale_operand_64): Ditto.
(vector_gs_extension_operand): Ditto.
(vector_gs_scale_operand_16_rv32): Ditto.
(vector_gs_scale_operand_32_rv32): Ditto.
* config/riscv/riscv-protos.h (enum insn_type): Add gather/scatter.
(expand_gather_scatter): New function.
* config/riscv/riscv-v.cc (gen_const_vector_dup): Add gather/scatter.
(emit_vlmax_masked_store_insn): New function.
(emit_nonvlmax_masked_store_insn): Ditto.
(modulo_sel_indices): Ditto.
(expand_vec_perm): Fix SLP for gather/scatter.
(prepare_gather_scatter): New function.
(strided_load_store_p): Ditto.
(expand_gather_scatter): Ditto.
* config/riscv/riscv.cc (riscv_legitimize_move): Fix bug of (subreg:SI
(DI CONST_POLY_INT)).
* config/riscv/vector-iterators.md: Add gather/scatter.
* config/riscv/vector.md (vec_duplicate<mode>): Use "@" instead.
(@vec_duplicate<mode>): Ditto.
(@pred_indexed_<order>store<VNX16_QHS:mode><VNX16_QHSDI:mode>): Fix
name.
(@pred_indexed_<order>store<VNX16_QHSD:mode><VNX16_QHSDI:mode>): Ditto.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/rvv.exp: Add gather/scatter tests.
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-1.c: New
test.
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-10.c: New
test.
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-11.c: New
test.
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-12.c: New
test.
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-2.c: New
test.
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-3.c: New
test.
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-4.c: New
test.
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-5.c: New
test.
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-6.c: New
test.
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-7.c: New
test.
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-8.c: New
test.
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-9.c: New
test.
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-1.c: New
test.
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-10.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-11.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-12.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-2.c: New
test.
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-3.c: New
test.
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-4.c: New
test.
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-5.c: New
test.
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-6.c: New
test.
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-7.c: New
test.
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-8.c: New
test.
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-9.c: New
test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-1.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-10.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-11.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-2.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-3.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-4.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-5.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-6.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-7.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-8.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-9.c:
New test.
*
gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-1.c: New test.
*
gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-10.c: New test.
*
gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-11.c: New test.
*
gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-2.c: New test.
*
gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-3.c: New test.
*
gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-4.c: New test.
*
gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-5.c: New test.
*
gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-6.c: New test.
*
gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-7.c: New test.
*
gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-8.c: New test.
*
gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-9.c: New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-1.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-10.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-2.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-3.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-4.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-5.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-6.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-7.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-8.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-9.c:
New test.
*
gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-1.c: New
test.
*
gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-10.c: New
test.
*
gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-2.c: New
test.
*
gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-3.c: New
test.
*
gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-4.c: New
test.
*
gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-5.c: New
test.
*
gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-6.c: New
test.
*
gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-7.c: New
test.
*
gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-8.c: New
test.
*
gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-9.c: New
test.
* gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-1.c: New
test.
* gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-10.c: New
test.
* gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-2.c: New
test.
* gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-3.c: New
test.
* gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-4.c: New
test.
* gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-5.c: New
test.
* gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-6.c: New
test.
* gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-7.c: New
test.
* gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-8.c: New
test.
* gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-9.c: New
test.
* gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-1.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-10.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-2.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-3.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-4.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-5.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-6.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-7.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-8.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-9.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/strided_load-1.c: New
test.
* gcc.target/riscv/rvv/autovec/gather-scatter/strided_load-2.c: New
test.
* gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-1.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-2.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/strided_store-1.c: New
test.
* gcc.target/riscv/rvv/autovec/gather-scatter/strided_store-2.c: New
test.
* gcc.target/riscv/rvv/autovec/gather-scatter/strided_store_run-1.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/strided_store_run-2.c:
New test.
---
+
+/* Return true if it is the strided load/store. */
+static bool
+strided_load_store_p (rtx vec_offset, rtx *base, rtx *step)
+{
+ if (const_vec_series_p (vec_offset, base, step))
+ return true;
+
+ /* For strided load/store, vectorizer always generates
+ VEC_SERIES_EXPR for vec_offset. */
+ tree expr = REG_EXPR (vec_offset);
+ if (!expr || TREE_CODE (expr) != SSA_NAME)
+ return false;
+
+ /* Check if it is GIMPLE like: _88 = VEC_SERIES_EXPR <0, _87>; */
+ gimple *def_stmt = SSA_NAME_DEF_STMT (expr);
+ if (!def_stmt || !is_gimple_assign (def_stmt)
+ || gimple_assign_rhs_code (def_stmt) != VEC_SERIES_EXPR)
+ return false;
Hmm, I'm not sure this is safe, especially if gimple->rtl expansion is
complete. While you might be able to get REG_EXPR, I would not really
expect SSA_NAME_DEF_STMT to be correct. At the least it'll need some
way to make sure it's not called at an inappropriate time.
+
+/* Expand LEN_MASK_{GATHER_LOAD,SCATTER_STORE}. */
+void
+expand_gather_scatter (rtx *ops, bool is_load)
+{
+
+ /* We use vlse.v/vsse.v instead of indexed load/store by default
+ if it is strided load/store.
+
+ FIXME: vlse.v/vsse.v may not always be better than vluxei.v/vsuxei.v.
+ We may need COST MODE to adjust it. */
I'd be surprised if we encounter a case where vector strided will be
worse than the equivalent vector indexed. In the unlikely event that
happens, we'll have to implement a suitable cost model and splat the
stride into a vector index register. But I wouldn't worry too much
about it at this stage.
+ rtx base, step;
+ if (strided_load_store_p (vec_offset, &base, &step))
+ {
+ if (GET_MODE (step) != Pmode)
+ {
+ if (CONSTANT_P (step))
+ step = force_reg (Pmode, step);
+ else
+ {
+ rtx extend_step = gen_reg_rtx (Pmode);
+ emit_insn (gen_extend_insn (extend_step, step, Pmode,
+ GET_MODE (step),
+ zero_extend_p ? true : false));
+ step = extend_step;
+ }
What happens for a non-constant step in a mode the same size as Pmode,
particularly in a non-optimizing compilation? Wouldn't that abort with
an unrecognized extension insn?
I'd have similar concerns with the code that handles the case
inner_offsize < inner_vsize.
diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index 5b7a17b9d34..19740c89132 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -1357,8 +1357,16 @@
}
}
else if (GET_MODE_BITSIZE (<VEL>mode) > GET_MODE_BITSIZE (Pmode)
- && immediate_operand (operands[3], Pmode))
- operands[3] = gen_rtx_SIGN_EXTEND (<VEL>mode, force_reg (Pmode,
operands[3]));
+ && (immediate_operand (operands[3], Pmode)
+ || (CONST_POLY_INT_P (operands[3])
+ && known_ge (rtx_to_poly_int64 (operands[3]), 0U)
+ && known_le (rtx_to_poly_int64 (operands[3]), GET_MODE_SIZE
(<MODE>mode)))))
Should this have been known_lt rather than known_le?
@@ -1397,6 +1406,12 @@
(match_dup 2)))]
{
gcc_assert (can_create_pseudo_p ());
+ if (CONST_POLY_INT_P (operands[3]))
+ {
+ rtx tmp = gen_reg_rtx (<VEL>mode);
+ emit_move_insn (tmp, operands[3]);
+ operands[3] = tmp;
+ }
Something's off in your formatting here. I'd guess spaces vs tabs
In a few places you're using expand_binop. Those interfaces are really
more for gimple->RTL. BUt code like expand_gather_scatter is really
RTL, not gimple/tree. Is there a reason why you're not using pure RTL
interfaces?
Anyway this is mostly good, but I do think there are a few outstanding
questions/concerns to work through.
Jeff