On 3/13/23 02:19, juzhe.zh...@rivai.ai wrote:
From: Ju-Zhe Zhong <juzhe.zh...@rivai.ai>

Co-authored-by: kito-cheng <kito.ch...@sifive.com>
Co-authored-by: kito-cheng <kito.ch...@gmail.com>

Consider this case:
void f19 (void *base,void *base2,void *out,size_t vl, int n)
{
     vuint64m8_t bindex = __riscv_vle64_v_u64m8 (base + 100, vl);
     for (int i = 0; i < n; i++){
       vbool8_t m = __riscv_vlm_v_b8 (base + i, vl);
       vuint64m8_t v = __riscv_vluxei64_v_u64m8_m(m,base,bindex,vl);
       vuint64m8_t v2 = __riscv_vle64_v_u64m8_tu (v, base2 + i, vl);
       vint8m1_t v3 = __riscv_vluxei64_v_i8m1_m(m,base,v,vl);
       vint8m1_t v4 = __riscv_vluxei64_v_i8m1_m(m,base,v2,vl);
       __riscv_vse8_v_i8m1 (out + 100*i,v3,vl);
       __riscv_vse8_v_i8m1 (out + 222*i,v4,vl);
     }
}

Due to the current unreasonable reg order, this case produce unnecessary
register spillings.

Fix the order can help for RA.
Note that this is likely a losing game -- over time you're likely to find that one ordering works better for one set of inputs while another ordering works better for a different set of inputs.

So while I don't object to the patch, in general we try to find a reasonable setting, knowing that it's likely not to be optimal in all cases.

Probably the most important aspect of this patch in my mind is moving the vector mask register to the end so that it's only used for vectors when we've exhausted the whole vector register file. Thus it's more likely to be usable as a mask when we need it for that purpose.

OK for the trunk and backporting to the shared RISC-V sub-branch off gcc-13 (once it's created).

jeff


Reply via email to