Hi Robin: Your suggested code seems work fine, let me run more test and send v2, I guess I just don’t know how to explain why it work in comment :p
Robin Dapp <rdapp....@gmail.com>於 2023年10月5日 週四,03:57寫道: > >> I think the "max poly value" is the LMUL 1 mode coeffs[1] > >> > >> See int vlenb = BYTES_PER_RISCV_VECTOR.coeffs[1]; > >> > >> So I think bump max_power to exact_log2 (64); is not enough. > >> since we adjust the LMUL 1 mode size according to TARGET_MIN_VLEN. > >> > >> I suspect the testcase you append in this patch will fail with > -march=rv64gcv_zvl4096b. > > > > > > There is no type smaller than [64, 64] in zvl4096b, RVVMF64BI is [64, > > 64], it’s smallest type, and RVVFM1BI is [512, 512] (size of single > > vector reg.) which at most 64x for zvl4096b, so my understanding is > > log2(64) is enough :) > > > > and of cause, verified the testcase is work with -march=rv64gcv_zvl4096b > > I was wondering if the whole hunk couldn't be condensed into something > like (untested): > > div_factor = wi::ctz (factor) - wi::ctz (vlenb); > if (div_factor >= 0) > div_factor = 1; > else > div_factor = 1 << -div_factor; > > This would avoid the loop as well. An assert for the div_factor (not > exceeding a value) could still be added. > > Regards > Robin >