This point is seletected not because LCM but by Phase 3 (VL/VTYPE demand info backward fusion and propogation) which is I introduced into VSETVL PASS to enhance LCM && improve vsetvl instruction performance.
This patch is to supress the Phase 3 too aggressive backward fusion and propagation to the top of the function program when there is no define instruction of AVL (AVL is 0 ~ 31 imm since vsetivli instruction allows imm value instead of reg). You may want to ask why we need Phase 3 to the job. Well, we have so many situations that pure LCM fails to optimize, here I can show you a simple case to demonstrate it: void f (void * restrict in, void * restrict out, int n, int m, int cond) { size_t vl = 101; for (size_t j = 0; j < m; j++){ if (cond) { for (size_t i = 0; i < n; i++) { vint8mf8_t v = __riscv_vle8_v_i8mf8 (in + i + j, vl); __riscv_vse8_v_i8mf8 (out + i, v, vl); } } else { for (size_t i = 0; i < n; i++) { vint32mf2_t v = __riscv_vle32_v_i32mf2 (in + i + j, vl); v = __riscv_vadd_vv_i32mf2 (v,v,vl); __riscv_vse32_v_i32mf2 (out + i, v, vl); } } } } You can see: The first inner loop needs vsetvli e8 mf8 for vle+vse. The second inner loop need vsetvli e32 mf2 for vle+vadd+vse. If we don't have Phase 3 (Only handled by LCM (Phase 4)), we will end up with : outerloop: ... vsetvli e8mf8 inner loop 1: .... vsetvli e32mf2 inner loop 2: .... However, if we have Phase 3, Phase 3 is going to fuse the vsetvli e32 mf2 of inner loop 2 into vsetvli e8 mf8, then we will end up with this result after phase 3: outerloop: ... inner loop 1: vsetvli e32mf2 .... inner loop 2: vsetvli e32mf2 .... Then, this demand information after phase 3 will be well optimized after phase 4 (LCM), after Phase 4 result is: vsetvli e32mf2 outerloop: ... inner loop 1: .... inner loop 2: .... You can see this is the optimal codegen after current VSETVL PASS (Phase 3: Demand backward fusion and propagation + Phase 4: LCM ). This is a known issue when I start to implement VSETVL PASS. I leaved it to be fixed after I finished all target GCC 13 features. And Kito postpone this patch to be merged after GCC 14 is open. juzhe.zh...@rivai.ai From: Jeff Law Date: 2023-04-03 03:41 To: juzhe.zhong; gcc-patches CC: kito.cheng; palmer Subject: Re: [PATCH] RISC-V: Fix PR108279 On 3/27/23 00:59, juzhe.zh...@rivai.ai wrote: > From: Juzhe-Zhong <juzhe.zh...@rivai.ai> > > PR 108270 > > Fix bug: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108270. > > Consider the following testcase: > void f (void * restrict in, void * restrict out, int l, int n, int m) > { > for (int i = 0; i < l; i++){ > for (int j = 0; j < m; j++){ > for (int k = 0; k < n; k++) > { > vint8mf8_t v = __riscv_vle8_v_i8mf8 (in + i + j, 17); > __riscv_vse8_v_i8mf8 (out + i + j, v, 17); > } > } > } > } > > Compile option: -O3 > > Before this patch: > mv a7,a2 > mv a6,a0 > mv t1,a1 > mv a2,a3 > vsetivli zero,17,e8,mf8,ta,ma > ... > > After this patch: > mv a7,a2 > mv a6,a0 > mv t1,a1 > mv a2,a3 > ble a7,zero,.L1 > ble a4,zero,.L1 > ble a3,zero,.L1 > add a1,a0,a4 > li a0,0 > vsetivli zero,17,e8,mf8,ta,ma > ... > > It will produce potential bug when: > > int main () > { > vsetivli zero, 100,..... > f (in, out, 0,0,0) > asm volatile ("csrr a0,vl":::"memory"); > > // Before this patch the a0 is 17. (Wrong). > // After this patch the a0 is 100. (Correct). > ... > } So why was that point selected in the first place? I would have expected LCM to select the loop entry edge as the desired insertion point. Essentially if LCM selects the point before those branches, then it's voilating a fundamental principal of LCM, namely that you never put an evaluation on a path where it didn't have one before. So not objecting to the patch but it is raising concerns about the LCM results. jeff