I think I prefer doing VLS mode like these:
This is current VLA patterns:
(define_insn "@pred_<optab><mode>"
  [(set (match_operand:VI 0 "register_operand"           "=vd, vd, vr, vr, vd, 
vd, vr, vr, vd, vd, vr, vr")
  (if_then_else:VI
    (unspec:<VM>
      [(match_operand:<VM> 1 "vector_mask_operand" " vm, vm,Wc1, Wc1, vm, 
vm,Wc1,Wc1, vm, vm,Wc1,Wc1")
       (match_operand 5 "vector_length_operand"    " rK, rK, rK,  rK, rK, rK, 
rK, rK, rK, rK, rK, rK")
       (match_operand 6 "const_int_operand"        "  i,  i,  i,   i,  i,  i,  
i,  i,  i,  i,  i,  i")
       (match_operand 7 "const_int_operand"        "  i,  i,  i,   i,  i,  i,  
i,  i,  i,  i,  i,  i")
       (match_operand 8 "const_int_operand"        "  i,  i,  i,   i,  i,  i,  
i,  i,  i,  i,  i,  i")
       (reg:SI VL_REGNUM)
       (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
    (any_int_binop:VI
      (match_operand:VI 3 "<binop_rhs1_predicate>" "<binop_rhs1_constraint>")
      (match_operand:VI 4 "<binop_rhs2_predicate>" "<binop_rhs2_constraint>"))
    (match_operand:VI 2 "vector_merge_operand"     
"vu,0,vu,0,vu,0,vu,0,vu,0,vu,0")))]
  "TARGET_VECTOR"
  "@
   v<insn>.vv\t%0,%3,%4%p1
   v<insn>.vv\t%0,%3,%4%p1
   v<insn>.vv\t%0,%3,%4%p1
   v<insn>.vv\t%0,%3,%4%p1
   v<binop_vi_variant_insn>\t%0,<binop_vi_variant_op>%p1
   v<binop_vi_variant_insn>\t%0,<binop_vi_variant_op>%p1
   v<binop_vi_variant_insn>\t%0,<binop_vi_variant_op>%p1
   v<binop_vi_variant_insn>\t%0,<binop_vi_variant_op>%p1
   v<binop_reverse_vi_variant_insn>\t%0,<binop_reverse_vi_variant_op>%p1
   v<binop_reverse_vi_variant_insn>\t%0,<binop_reverse_vi_variant_op>%p1
   v<binop_reverse_vi_variant_insn>\t%0,<binop_reverse_vi_variant_op>%p1
   v<binop_reverse_vi_variant_insn>\t%0,<binop_reverse_vi_variant_op>%p1"
  [(set_attr "type" "<int_binop_insn_type>")
   (set_attr "mode" "<MODE>")])

(define_mode_iterator VI [
  (VNx1QI "TARGET_MIN_VLEN < 128") VNx2QI VNx4QI VNx8QI VNx16QI VNx32QI 
(VNx64QI "TARGET_MIN_VLEN > 32") (VNx128QI "TARGET_MIN_VLEN >= 128")
  (VNx1HI "TARGET_MIN_VLEN < 128") VNx2HI VNx4HI VNx8HI VNx16HI (VNx32HI 
"TARGET_MIN_VLEN > 32") (VNx64HI "TARGET_MIN_VLEN >= 128")
  (VNx1SI "TARGET_MIN_VLEN < 128") VNx2SI VNx4SI VNx8SI (VNx16SI 
"TARGET_MIN_VLEN > 32") (VNx32SI "TARGET_MIN_VLEN >= 128")
  (VNx1DI "TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN < 128") (VNx2DI 
"TARGET_VECTOR_ELEN_64")
  (VNx4DI "TARGET_VECTOR_ELEN_64") (VNx8DI "TARGET_VECTOR_ELEN_64") (VNx16DI 
"TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN >= 128")
])

You can see there is no VLS modes in "VI". Now to support VLS, I think we 
should extend "VI" iterator:
(define_mode_iterator VI [
  (VNx1QI "TARGET_MIN_VLEN < 128") VNx2QI VNx4QI VNx8QI VNx16QI VNx32QI 
(VNx64QI "TARGET_MIN_VLEN > 32") (VNx128QI "TARGET_MIN_VLEN >= 128")
  (VNx1HI "TARGET_MIN_VLEN < 128") VNx2HI VNx4HI VNx8HI VNx16HI (VNx32HI 
"TARGET_MIN_VLEN > 32") (VNx64HI "TARGET_MIN_VLEN >= 128")
  (VNx1SI "TARGET_MIN_VLEN < 128") VNx2SI VNx4SI VNx8SI (VNx16SI 
"TARGET_MIN_VLEN > 32") (VNx32SI "TARGET_MIN_VLEN >= 128")
  (VNx1DI "TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN < 128") (VNx2DI 
"TARGET_VECTOR_ELEN_64")
  (VNx4DI "TARGET_VECTOR_ELEN_64") (VNx8DI "TARGET_VECTOR_ELEN_64") (VNx16DI 
"TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN >= 128")
V4SI V2DI V8HI V16QI
])

Then codegen directly to this VLS patterns without any conversion.
This is the safe way to deal with VLS patterns.

Thanks.


juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2023-05-30 17:29
To: juzhe.zh...@rivai.ai
CC: Robin Dapp; Kito.cheng; gcc-patches; palmer; kito.cheng; jeffreyalaw; 
pan2.li
Subject: Re: Re: [PATCH] RISC-V: Basic VLS code gen for RISC-V
On Tue, May 30, 2023 at 11:17 AM juzhe.zh...@rivai.ai
<juzhe.zh...@rivai.ai> wrote:
>
> In the future, we will definitely mixing VLA and VLS-vlmin together in a 
> codegen and it will not cause any issues.
> For VLS-vlmin, I prefer it is used in length style auto-vectorization (I am 
> not sure since my SELECT_VL patch is not
> finished, I will check if can work when I am working in SELECT_VL patch).
 
For the future it would be then good to have the vectorizer
re-vectorize loops with
VLS vector uses to VLA style?  I think there's a PR with a draft patch
from a few
years ago attached (from me) somewhere.  Currently the vectorizer will give
up when seeing vector operations in a loop but ideally those should simply
be SLPed.
 
> >> In general I don't have a good overview of which optimizations we gain by
> >> such an approach or rather which ones are prevented by VLA altogether?
> These patches VLS modes can help for SLP auto-vectorization.
>
> ________________________________
> juzhe.zh...@rivai.ai
>
>
> From: Robin Dapp
> Date: 2023-05-30 17:05
> To: juzhe.zh...@rivai.ai; Richard Biener; Kito.cheng
> CC: rdapp.gcc; gcc-patches; palmer; kito.cheng; jeffreyalaw; pan2.li
> Subject: Re: [PATCH] RISC-V: Basic VLS code gen for RISC-V
> >>> but ideally the user would be able to specify -mrvv-size=32 for an
> >>> implementation with 32 byte vectors and then vector lowering would make 
> >>> use
> >>> of vectors up to 32 bytes?
> >
> > Actually, we don't want to specify -mrvv-size = 32 to enable vectorization 
> > on GNU vectors.
> > You can take a look this example:
> > https://godbolt.org/z/3jYqoM84h <https://godbolt.org/z/3jYqoM84h>
> >
> > GCC need to specify the mrvv size to enable GNU vectors and the codegen 
> > only can run on CPU with vector-length = 128bit.
> > However, LLVM doesn't need to specify the vector length, and the codegen 
> > can run on any CPU with RVV  vector-length >= 128 bits.
> >
> > This is what this patch want to do.
> >
> > Thanks.
> I think Richard's question was rather if it wasn't better to do it more
> generically and lower vectors to what either the current cpu or what the
> user specified rather than just 16-byte vectors (i.e. indeed a fixed
> vlmin and not a fixed vlmin == fixed vlmax).
>
> This patch assumes everything is fixed for optimization purposes and then
> switches over to variable-length when nothing can be changed anymore.  That
> is, we would work on "vlmin"-sized chunks in a VLA fashion at runtime?
> We would need to make sure that no pass after reload makes use of VLA
> properties at all.
>
> In general I don't have a good overview of which optimizations we gain by
> such an approach or rather which ones are prevented by VLA altogether?
> What's the idea for the future?  Still use LEN_LOAD et al. (and masking)
> with "fixed vlmin"?  Wouldn't we select different IVs with this patch than
> what we would have for pure VLA?
>
> Regards
> Robin
>
 

Reply via email to