> `.MASK_LEN_FOLD_LEFT_PLUS`(or `mask_len_fold_left_plus_m`) is expecting the > return value will be the start value even if the length is 0. > > However current code gen in RISC-V backend is not meet that semantic, it will > result a random garbage value if length is 0. > > Let example by current code gen for MASK_LEN_FOLD_LEFT_PLUS with f64: > # _148 = .MASK_LEN_FOLD_LEFT_PLUS (stmp__148.33_134, vect__70.32_138, > { -1, ... }, loop_len_161, 0); > vsetvli zero,a5,e64,m1,ta,ma > vfmv.s.f v2,fa5 # insn 1 > vfredosum.vs v1,v1,v2 # insn 2 > vfmv.f.s fa5,v1 # insn 3 > > insn 1: > - vfmv.s.f won't do anything if VL=0, which means v2 will contain garbage > value. > insn 2: > - vfredosum.vs won't do anything if VL=0, and keep vd unchanged even TA. > (v-spec say: `If vl=0, no operation is performed and the destination register > is not updated.`) > insn 3: > - vfmv.f.s will move the value from v1 even VL=0, so this is safe. > > So how we fix that? we need two fix for that: > > 1. insn 1: need always execute with VL=1, so that we can guarantee it will > always work as expect. > 2. insn 2: Add new pattern to force `vd` use same reg as `vs1` (start value) > for > all reduction patterns, then we can guarantee vd[0] will contain > the > start value when vl=0 > > For 1, it's just a simple change to riscv_vector::expand_reduction, but for 2, > we have to add _AV variant reduction to force `vd` use same reg as `vs1` > (start > value).
I think I'm OK with both approaches. If the second one doesn't negatively affect register pressure it certainly looks a bit nicer. On the other hand I would expect the reduction to dominate latency anyway so an additional vsetvl shouldn't really matter. Another thing to consider - can we even hit the vl=0 case with the regular reductions? They are VLMAX only so just affected by when we enable SELECT_VL which we only do with one rgroup so vl=0 shouldn't happen (as opposed to fold left). I still need to convince myself that this is correct but the fact that you only saw fallout with fold left partially confirms it.