Le tiistaina 26. syyskuuta 2023, 12.24.58 EEST flow gg a écrit :
> benchmark:
> fcmul_add_c: 19.7
> fcmul_add_rvv_f32: 6.7

With optimisations enabled and the benchmarking fix, I get this (on the same 
hardware, I believe):

fcmul_add_c: 3.5
fcmul_add_rvv_f32: 6.7

For sure unfortunate design limitations of T-Head C910 are to blame to no 
small extent. It is not the first occurrence of an RVV optimisation that turns 
out worse than scalar due to those, and I still have honest hopes that newer 
(and conformant) IP would give saner results, but... I also believe that the 
code could be improved regardless.

-- 
Rémi Denis-Courmont
http://www.remlab.net/



_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Reply via email to