https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115995
Bug ID: 115995 Summary: RISC-V: Can't generate portable RVV code for rv64gcv_zvl512b Product: gcc Version: 15.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: sh.chiang04 at gmail dot com CC: juzhe.zhong at rivai dot ai, kito at gcc dot gnu.org, pan2.li at intel dot com, rdapp at gcc dot gnu.org Target Milestone: --- Created attachment 58704 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=58704&action=edit same as gcc.c-torture/execute/990128-1.c The fail test is from gcc.c-torture/execute/990128-1.c in main(), and it's for testing link list. The following loop split 2 parts, one for main loop, another for tail, if -march=rv64gcv_zvl512b. According Vector spec for constraints on setting VL. The rule 2: ceil(AVL / 2) ≤ vl ≤ VLMAX if AVL < (2 * VLMAX) Such as, vsetivli a2,10,e64,m1,ta,ma (VLEN = 512, SEW=64, LMUL=1, VLMAX=8, AVL=10). Follow rule 2, the VL range is 5 ≤ vl ≤ 8. https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc#63-constraints-on-setting-vl This case main loop expected VL to be 8, and tail VL to be 2, but spec says VL may is 5 to 8. The fail occurred when VL was 5, The link list has not been completely created. compile option: -march=rv64gcv_zvl512b1p0 -mabi=lp64d -O2 #define MAX 10 int main(void) { struct s *pp; struct s *next; int i; p = &ss; next = p; for ( i = 0; i < MAX; i++ ) { next->n = &sss[i]; next = next->n; } next->n = 0; ... } main: vsetvli a1,zero,e32,mf2,ta,ma lui a5,%hi(ss) vid.v v5 addi a5,a5,%lo(ss) vsetvli zero,zero,e64,m1,ta,ma lui a3,%hi(.LANCHOR0) vmv.v.x v2,a5 addi a3,a3,%lo(.LANCHOR0) csrr a5,vlenb vsext.vf2 v1,v5 vmv.v.x v3,a3 vmv.v.i v4,8 srli a5,a5,3 addi a5,a5,-1 vslidedown.vx v2,v2,a5 vmadd.vv v1,v4,v3 lui a0,%hi(ss) vmv.x.s a4,v2 vsetivli a2,10,e64,m1,ta,ma. // VL= 5 to 8 vslide1up.vx v2,v1,a4 addi sp,sp,-32 lui a4,%hi(p) addi a6,a0,%lo(ss) sd a6,%lo(p)(a4) sd ra,24(sp) vsuxei64.v v1,(zero),v2. // create link list li a4,10 sub a4,a4,a2 // a4 = 10 - (2 to 5) bne a4,zero,.L22 .L16: .... .L22: vsetvli a4,a4,e32,mf2,ta,ma // a4 = (5 to 8) vmv.v.x v2,a2 vsetvli a1,zero,e64,m1,ta,ma vslidedown.vx v1,v1,a5 vsetvli zero,a4,e32,mf2,ta,ma vadd.vv v2,v2,v5 vsetvli zero,zero,e64,m1,ta,ma vmv.x.s a5,v1 vsext.vf2 v1,v2 vmadd.vv v1,v4,v3 vslide1up.vx v2,v1,a5 vsuxei64.v v1,(zero),v2. // create link list j .L16 Following information is from QEMU. A link list is missing to create (0x12a68). Run for VL=8 0x12a40 <sss>: 0x00012a48 0x00000000 0x00012a50 0x00000000 0x12a50 <sss+16>: 0x00012a58 0x00000000 0x00012a60 0x00000000 0x12a60 <sss+32>: 0x00012a68 0x00000000 0x00012a70 0x00000000 0x12a70 <sss+48>: 0x00012a78 0x00000000 0x00012a80 0x00000000 0x12a80 <sss+64>: 0x00012a88 0x00000000 0x00000000 0x00000000 Run for VL=5 0x12a40 <sss>: 0x00012a48 0x00000000 0x00012a50 0x00000000 0x12a50 <sss+16>: 0x00012a58 0x00000000 0x00012a60 0x00000000 0x12a60 <sss+32>: 0x00000000 0x00000000 0x00012a70 0x00000000 0x12a70 <sss+48>: 0x00012a78 0x00000000 0x00012a80 0x00000000 0x12a80 <sss+64>: 0x00012a88 0x00000000 0x00000000 0x00000000