camel-cdr wrote:

We got some benchmarks: 
https://camel-cdr.github.io/rvv-bench-results/spacemit_x100/index.html (vector 
FP throughput is still missing due to a bug in my code)

I'm not sure if `TuneDLenFactor2` is actually desirable.
The throughput of simple integer RVV instructions for LMUL=1 and LMUL<1 is the 
same, so you'd be wasting half of the performance defaulting to LMUL=1/2.
The permutation, comparison and shift instructions seem faster at LMUL=1/2 then 
LMUL=1, but they got linear scaling from LMUL=1/2 to LMUL=1, so you should get 
the same performance per element at LMUL=1.

https://github.com/llvm/llvm-project/pull/173988
_______________________________________________
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to