https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121910
Bug ID: 121910 Summary: RISC-V: dynamic lmul choosing wrong vector mode Product: gcc Version: 16.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: chenzhongyao.hit at gmail dot com CC: juzhe.zhong at rivai dot ai, rdapp at gcc dot gnu.org Target Milestone: --- Target: riscv https://godbolt.org/z/T79ozo5jc look at the code from x264 (SPEC2017), vector register spilling to the stack. -march=rv64gcv_zvl128b -O3 -mrvv-max-lmul=dynamic -mrvv-vector-bits=zvl -fdump-tree-vect-details #include <stdint.h> /* full chroma mc (ie until 1/8 pixel)*/ void mc_chroma(uint8_t* dst, int i_dst_stride, uint8_t* src, int i_src_stride, int mvx, int mvy, int i_width, int i_height) { uint8_t* srcp; int d8x = mvx & 0x07; int d8y = mvy & 0x07; int cA = (8 - d8x) * (8 - d8y); int cB = d8x * (8 - d8y); int cC = (8 - d8x) * d8y; int cD = d8x * d8y; src += (mvy >> 3) * i_src_stride + (mvx >> 3); srcp = &src[i_src_stride]; for (int y = 0; y < i_height; y++) { for (int x = 0; x < i_width; x++) dst[x] = (cA * src[x] + cB * src[x + 1] + cC * srcp[x] + cD * srcp[x + 1] + 32) >> 6; dst += i_dst_stride; src = srcp; srcp += i_src_stride; } } According to the vect(tree) dump log: /app/example.c:19:27: note: Maximum lmul = 4, At most 20 number of live V_REG ...... /app/example.c:19:27: note: ***** Analysis succeeded with vector mode RVVM4QI ...... /app/example.c:19:27: note: Maximum lmul = 8, At most 40 number of live V_REG ...... /app/example.c:19:27: note: ***** Analysis succeeded with vector mode RVVM2QI ...... /app/example.c:19:27: note: ***** Choosing vector mode RVVM4QI If register spilling already occurs with the RVVM2QI mode, then RVVM4QI—which requires even more registers—should be more likely to spill. Therefore, choosing RVVM4QI as the final vector mode may not be optimal in this scenario. If I use -mrvv-max-lmul=m2 to limit the maximum lmul, the spilling issue does not occur. However, for this x264 case, restricting max-lmul is not an ideal solution, since other parts of the code may benefit from using a larger lmul. please help address this bug when -mrvv-max-lmul=dynamic is used. I am currently trying to fix it myself but haven’t found a good solution yet.