On Fri, 28 Feb 2025 12:48:36 +0100, "Robin Dapp" wrote: > > Okay, let me explain the background of my previous patch. > > > > Prior to applying my patch, for the test case bug-10.c (a reduced example > > of > > a larger program with incorrect runtime results), > > the vsetvli sequence compiled with --param=vsetvl-strategy=simple was as > > follows: > > 1. vsetvli zero,a4,e16,m4,ta,ma + vsetvli zero,a4,e32,m8,ta,ma + vsetvli > > zero,a4,e8,m2,ta,ma > > > > The vsetvli sequence compiled with --param=vsetvl-strategy=optim was as > > follows: > > 2. vsetvli zero,a4,e32,m4,ta,ma + vsetvli zero,zero,e8,m2,ta,ma > > > Although vl remains unchanged, the SEW/LMUL ratio in sequence 2 changes, > > leading to undefined behavior. > > The only difference I see with your patch vs without is > > < vsetvli zero,zero,e8,m2,ta,ma > --- > > vsetvli zero,a3,e8,m2,ta,ma > > and we ensure the former doesn't occur in the test. > > But that difference doesn't matter because the ratio is the same before and > after. That's why I'm asking. bug-10.c as is doesn't test anything > reasonable > IMHO. Right, the ratio (or rather the associated LMUL) was wrong but the > current test doesn't make sure it isn't. Can you share the non-reduced (or > less reduced) case?
Hi Robin, I apologize for the delayed response. I spent quite a bit of time trying to reproduce the case, and given the passage of time, it wasn't easy to refine the testing. Fortunately, you can see the results here. https://godbolt.org/z/Mc8veW7oT Using GCC version 14.2.0 should allow you to replicate the issue. If all goes as expected, you will encounter a "Segmentation fault (core dumped)." By disassembling the binary, you'll notice the presence of "vsetvli zero,zero,e32,m4,ta,ma", which is where the problem lies, just as I mentioned previously. Best regards, Jin Ma > > /* { dg-do compile { target { rv64 } } } */ > > /* { dg-options "-march=rv64gcv_zvfh -mabi=lp64d -O3" } */ > > > > #include <riscv_vector.h> > > > > _Float16 a (uint64_t); > > int8_t b () { > > int c = 100; > > double *d; > > _Float16 *e; > > for (size_t f;; c -= f) > > { > > f = c; > > __riscv_vsll_vx_u8mf8 (__riscv_vid_v_u8mf8 (f), 2, f); > > vfloat16mf4_t g; > > a (1); > > g = __riscv_vfmv_s_f_f16mf4 (2, f); > > vfloat64m1_t i = __riscv_vfmv_s_f_f64m1 (30491, f); > > vuint16mf4_t j; > > __riscv_vsoxei16_v_f16mf4 (e, j, g, f); > > vuint8mf8_t k = __riscv_vsll_vx_u8mf8 (__riscv_vid_v_u8mf8 (f), 3, f); > > __riscv_vsoxei8_v_f64m1 (d, k, i, f); > > } > > } > > > > /* { dg-final { scan-assembler-not "e64,mf4" } } */ > > That works, thanks. > > -- > Regards > Robin