https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121910

            Bug ID: 121910
           Summary: RISC-V: dynamic lmul choosing wrong vector mode
           Product: gcc
           Version: 16.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c
          Assignee: unassigned at gcc dot gnu.org
          Reporter: chenzhongyao.hit at gmail dot com
                CC: juzhe.zhong at rivai dot ai, rdapp at gcc dot gnu.org
  Target Milestone: ---
            Target: riscv

https://godbolt.org/z/T79ozo5jc

look at the code from x264 (SPEC2017), vector register spilling to the stack.

-march=rv64gcv_zvl128b -O3 -mrvv-max-lmul=dynamic -mrvv-vector-bits=zvl
-fdump-tree-vect-details

#include <stdint.h>

/* full chroma mc (ie until 1/8 pixel)*/
void mc_chroma(uint8_t* dst, int i_dst_stride, uint8_t* src, int i_src_stride,
               int mvx, int mvy, int i_width, int i_height) {
    uint8_t* srcp;

    int d8x = mvx & 0x07;
    int d8y = mvy & 0x07;
    int cA = (8 - d8x) * (8 - d8y);
    int cB = d8x * (8 - d8y);
    int cC = (8 - d8x) * d8y;
    int cD = d8x * d8y;

    src += (mvy >> 3) * i_src_stride + (mvx >> 3);
    srcp = &src[i_src_stride];

    for (int y = 0; y < i_height; y++) {
        for (int x = 0; x < i_width; x++)
            dst[x] = (cA * src[x] + cB * src[x + 1] + cC * srcp[x] +
                      cD * srcp[x + 1] + 32) >>
                     6;
        dst += i_dst_stride;
        src = srcp;
        srcp += i_src_stride;
    }
}

According to the vect(tree) dump log:
/app/example.c:19:27: note:  Maximum lmul = 4, At most 20 number of live V_REG 
......
/app/example.c:19:27: note:  ***** Analysis succeeded with vector mode RVVM4QI
......
/app/example.c:19:27: note:  Maximum lmul = 8, At most 40 number of live V_REG 
......
/app/example.c:19:27: note:  ***** Analysis succeeded with vector mode RVVM2QI
......
/app/example.c:19:27: note:  ***** Choosing vector mode RVVM4QI


If register spilling already occurs with the RVVM2QI mode, then RVVM4QI—which
requires even more registers—should be more likely to spill. Therefore,
choosing RVVM4QI as the final vector mode may not be optimal in this scenario.


If I use -mrvv-max-lmul=m2 to limit the maximum lmul, the spilling issue does
not occur. However, for this x264 case, restricting max-lmul is not an ideal
solution, since other parts of the code may benefit from using a larger lmul.

please help address this bug when -mrvv-max-lmul=dynamic is used. I am
currently trying to fix it myself but haven’t found a good solution yet.

Reply via email to