I think I prefer doing VLS mode like these: This is current VLA patterns: (define_insn "@pred_<optab><mode>" [(set (match_operand:VI 0 "register_operand" "=vd, vd, vr, vr, vd, vd, vr, vr, vd, vd, vr, vr") (if_then_else:VI (unspec:<VM> [(match_operand:<VM> 1 "vector_mask_operand" " vm, vm,Wc1, Wc1, vm, vm,Wc1,Wc1, vm, vm,Wc1,Wc1") (match_operand 5 "vector_length_operand" " rK, rK, rK, rK, rK, rK, rK, rK, rK, rK, rK, rK") (match_operand 6 "const_int_operand" " i, i, i, i, i, i, i, i, i, i, i, i") (match_operand 7 "const_int_operand" " i, i, i, i, i, i, i, i, i, i, i, i") (match_operand 8 "const_int_operand" " i, i, i, i, i, i, i, i, i, i, i, i") (reg:SI VL_REGNUM) (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE) (any_int_binop:VI (match_operand:VI 3 "<binop_rhs1_predicate>" "<binop_rhs1_constraint>") (match_operand:VI 4 "<binop_rhs2_predicate>" "<binop_rhs2_constraint>")) (match_operand:VI 2 "vector_merge_operand" "vu,0,vu,0,vu,0,vu,0,vu,0,vu,0")))] "TARGET_VECTOR" "@ v<insn>.vv\t%0,%3,%4%p1 v<insn>.vv\t%0,%3,%4%p1 v<insn>.vv\t%0,%3,%4%p1 v<insn>.vv\t%0,%3,%4%p1 v<binop_vi_variant_insn>\t%0,<binop_vi_variant_op>%p1 v<binop_vi_variant_insn>\t%0,<binop_vi_variant_op>%p1 v<binop_vi_variant_insn>\t%0,<binop_vi_variant_op>%p1 v<binop_vi_variant_insn>\t%0,<binop_vi_variant_op>%p1 v<binop_reverse_vi_variant_insn>\t%0,<binop_reverse_vi_variant_op>%p1 v<binop_reverse_vi_variant_insn>\t%0,<binop_reverse_vi_variant_op>%p1 v<binop_reverse_vi_variant_insn>\t%0,<binop_reverse_vi_variant_op>%p1 v<binop_reverse_vi_variant_insn>\t%0,<binop_reverse_vi_variant_op>%p1" [(set_attr "type" "<int_binop_insn_type>") (set_attr "mode" "<MODE>")])
(define_mode_iterator VI [ (VNx1QI "TARGET_MIN_VLEN < 128") VNx2QI VNx4QI VNx8QI VNx16QI VNx32QI (VNx64QI "TARGET_MIN_VLEN > 32") (VNx128QI "TARGET_MIN_VLEN >= 128") (VNx1HI "TARGET_MIN_VLEN < 128") VNx2HI VNx4HI VNx8HI VNx16HI (VNx32HI "TARGET_MIN_VLEN > 32") (VNx64HI "TARGET_MIN_VLEN >= 128") (VNx1SI "TARGET_MIN_VLEN < 128") VNx2SI VNx4SI VNx8SI (VNx16SI "TARGET_MIN_VLEN > 32") (VNx32SI "TARGET_MIN_VLEN >= 128") (VNx1DI "TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN < 128") (VNx2DI "TARGET_VECTOR_ELEN_64") (VNx4DI "TARGET_VECTOR_ELEN_64") (VNx8DI "TARGET_VECTOR_ELEN_64") (VNx16DI "TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN >= 128") ]) You can see there is no VLS modes in "VI". Now to support VLS, I think we should extend "VI" iterator: (define_mode_iterator VI [ (VNx1QI "TARGET_MIN_VLEN < 128") VNx2QI VNx4QI VNx8QI VNx16QI VNx32QI (VNx64QI "TARGET_MIN_VLEN > 32") (VNx128QI "TARGET_MIN_VLEN >= 128") (VNx1HI "TARGET_MIN_VLEN < 128") VNx2HI VNx4HI VNx8HI VNx16HI (VNx32HI "TARGET_MIN_VLEN > 32") (VNx64HI "TARGET_MIN_VLEN >= 128") (VNx1SI "TARGET_MIN_VLEN < 128") VNx2SI VNx4SI VNx8SI (VNx16SI "TARGET_MIN_VLEN > 32") (VNx32SI "TARGET_MIN_VLEN >= 128") (VNx1DI "TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN < 128") (VNx2DI "TARGET_VECTOR_ELEN_64") (VNx4DI "TARGET_VECTOR_ELEN_64") (VNx8DI "TARGET_VECTOR_ELEN_64") (VNx16DI "TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN >= 128") V4SI V2DI V8HI V16QI ]) Then codegen directly to this VLS patterns without any conversion. This is the safe way to deal with VLS patterns. Thanks. juzhe.zh...@rivai.ai From: Richard Biener Date: 2023-05-30 17:29 To: juzhe.zh...@rivai.ai CC: Robin Dapp; Kito.cheng; gcc-patches; palmer; kito.cheng; jeffreyalaw; pan2.li Subject: Re: Re: [PATCH] RISC-V: Basic VLS code gen for RISC-V On Tue, May 30, 2023 at 11:17 AM juzhe.zh...@rivai.ai <juzhe.zh...@rivai.ai> wrote: > > In the future, we will definitely mixing VLA and VLS-vlmin together in a > codegen and it will not cause any issues. > For VLS-vlmin, I prefer it is used in length style auto-vectorization (I am > not sure since my SELECT_VL patch is not > finished, I will check if can work when I am working in SELECT_VL patch). For the future it would be then good to have the vectorizer re-vectorize loops with VLS vector uses to VLA style? I think there's a PR with a draft patch from a few years ago attached (from me) somewhere. Currently the vectorizer will give up when seeing vector operations in a loop but ideally those should simply be SLPed. > >> In general I don't have a good overview of which optimizations we gain by > >> such an approach or rather which ones are prevented by VLA altogether? > These patches VLS modes can help for SLP auto-vectorization. > > ________________________________ > juzhe.zh...@rivai.ai > > > From: Robin Dapp > Date: 2023-05-30 17:05 > To: juzhe.zh...@rivai.ai; Richard Biener; Kito.cheng > CC: rdapp.gcc; gcc-patches; palmer; kito.cheng; jeffreyalaw; pan2.li > Subject: Re: [PATCH] RISC-V: Basic VLS code gen for RISC-V > >>> but ideally the user would be able to specify -mrvv-size=32 for an > >>> implementation with 32 byte vectors and then vector lowering would make > >>> use > >>> of vectors up to 32 bytes? > > > > Actually, we don't want to specify -mrvv-size = 32 to enable vectorization > > on GNU vectors. > > You can take a look this example: > > https://godbolt.org/z/3jYqoM84h <https://godbolt.org/z/3jYqoM84h> > > > > GCC need to specify the mrvv size to enable GNU vectors and the codegen > > only can run on CPU with vector-length = 128bit. > > However, LLVM doesn't need to specify the vector length, and the codegen > > can run on any CPU with RVV vector-length >= 128 bits. > > > > This is what this patch want to do. > > > > Thanks. > I think Richard's question was rather if it wasn't better to do it more > generically and lower vectors to what either the current cpu or what the > user specified rather than just 16-byte vectors (i.e. indeed a fixed > vlmin and not a fixed vlmin == fixed vlmax). > > This patch assumes everything is fixed for optimization purposes and then > switches over to variable-length when nothing can be changed anymore. That > is, we would work on "vlmin"-sized chunks in a VLA fashion at runtime? > We would need to make sure that no pass after reload makes use of VLA > properties at all. > > In general I don't have a good overview of which optimizations we gain by > such an approach or rather which ones are prevented by VLA altogether? > What's the idea for the future? Still use LEN_LOAD et al. (and masking) > with "fixed vlmin"? Wouldn't we select different IVs with this patch than > what we would have for pure VLA? > > Regards > Robin >