https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115995

            Bug ID: 115995
           Summary: RISC-V: Can't generate portable RVV code for
                    rv64gcv_zvl512b
           Product: gcc
           Version: 15.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: sh.chiang04 at gmail dot com
                CC: juzhe.zhong at rivai dot ai, kito at gcc dot gnu.org,
                    pan2.li at intel dot com, rdapp at gcc dot gnu.org
  Target Milestone: ---

Created attachment 58704
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=58704&action=edit
same as gcc.c-torture/execute/990128-1.c

The fail test is from gcc.c-torture/execute/990128-1.c in main(), and it's for
testing link list. The following loop split 2 parts, one for main loop, another
for tail, if -march=rv64gcv_zvl512b.

According Vector spec for constraints on setting VL.
The rule 2: ceil(AVL / 2) ≤ vl ≤ VLMAX if AVL < (2 * VLMAX)

Such as, vsetivli a2,10,e64,m1,ta,ma (VLEN = 512, SEW=64, LMUL=1, VLMAX=8,
AVL=10). Follow rule 2, the VL range is 5 ≤ vl ≤ 8.
https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc#63-constraints-on-setting-vl

This case main loop expected VL to be 8, and tail VL to be 2, but spec says VL
may is 5 to 8. The fail occurred when VL was 5, The link list has not been
completely created.

compile option: -march=rv64gcv_zvl512b1p0 -mabi=lp64d -O2

#define MAX 10
int
main(void)
{
    struct s *pp;
    struct s *next;
    int i;

    p = &ss;
    next = p;
    for ( i = 0; i < MAX; i++ ) {
        next->n = &sss[i];
        next = next->n;
    }
    next->n = 0;
    ...
}

main:
        vsetvli a1,zero,e32,mf2,ta,ma
        lui     a5,%hi(ss)
        vid.v   v5
        addi    a5,a5,%lo(ss)
        vsetvli zero,zero,e64,m1,ta,ma
        lui     a3,%hi(.LANCHOR0)
        vmv.v.x v2,a5
        addi    a3,a3,%lo(.LANCHOR0)
        csrr    a5,vlenb
        vsext.vf2       v1,v5
        vmv.v.x v3,a3
        vmv.v.i v4,8
        srli    a5,a5,3
        addi    a5,a5,-1
        vslidedown.vx   v2,v2,a5
        vmadd.vv        v1,v4,v3
        lui     a0,%hi(ss)
        vmv.x.s a4,v2
        vsetivli        a2,10,e64,m1,ta,ma. // VL= 5 to 8
        vslide1up.vx    v2,v1,a4
        addi    sp,sp,-32
        lui     a4,%hi(p)
        addi    a6,a0,%lo(ss)
        sd      a6,%lo(p)(a4)
        sd      ra,24(sp)
        vsuxei64.v      v1,(zero),v2.      // create link list
        li      a4,10
        sub     a4,a4,a2                   // a4 = 10 - (2 to 5)
        bne     a4,zero,.L22
.L16:
        ....
.L22:
        vsetvli a4,a4,e32,mf2,ta,ma       // a4 = (5 to 8)   
        vmv.v.x v2,a2
        vsetvli a1,zero,e64,m1,ta,ma
        vslidedown.vx   v1,v1,a5
        vsetvli zero,a4,e32,mf2,ta,ma
        vadd.vv v2,v2,v5
        vsetvli zero,zero,e64,m1,ta,ma
        vmv.x.s a5,v1
        vsext.vf2       v1,v2
        vmadd.vv        v1,v4,v3
        vslide1up.vx    v2,v1,a5
        vsuxei64.v      v1,(zero),v2.     // create link list
        j       .L16


Following information is from QEMU. A link list is missing to create (0x12a68).

Run for VL=8
0x12a40 <sss>:     0x00012a48      0x00000000      0x00012a50      0x00000000
0x12a50 <sss+16>:  0x00012a58      0x00000000      0x00012a60      0x00000000
0x12a60 <sss+32>:  0x00012a68      0x00000000      0x00012a70      0x00000000
0x12a70 <sss+48>:  0x00012a78      0x00000000      0x00012a80      0x00000000
0x12a80 <sss+64>:  0x00012a88      0x00000000      0x00000000      0x00000000

Run for VL=5
0x12a40 <sss>:     0x00012a48      0x00000000      0x00012a50      0x00000000
0x12a50 <sss+16>:  0x00012a58      0x00000000      0x00012a60      0x00000000
0x12a60 <sss+32>:  0x00000000      0x00000000      0x00012a70      0x00000000
0x12a70 <sss+48>:  0x00012a78      0x00000000      0x00012a80      0x00000000
0x12a80 <sss+64>:  0x00012a88      0x00000000      0x00000000      0x00000000

Reply via email to