camel-cdr wrote: We got some benchmarks: https://camel-cdr.github.io/rvv-bench-results/spacemit_x100/index.html (vector FP throughput is still missing due to a bug in my code)
I'm not sure if `TuneDLenFactor2` is actually desirable. The throughput of simple integer RVV instructions for LMUL=1 and LMUL<1 is the same, so you'd be wasting half of the performance defaulting to LMUL=1/2. The permutation, comparison and shift instructions seem faster at LMUL=1/2 then LMUL=1, but they got linear scaling from LMUL=1/2 to LMUL=1, so you should get the same performance per element at LMUL=1. https://github.com/llvm/llvm-project/pull/173988 _______________________________________________ cfe-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
