Le tiistaina 26. syyskuuta 2023, 12.24.58 EEST flow gg a écrit : > benchmark: > fcmul_add_c: 19.7 > fcmul_add_rvv_f32: 6.7
+ li t1, 4 + vsetvli t0, t1, e32, m1, ta, ma vsetivli t0, 4, ... But really, DO NOT use a fixed vector length here. At best, you're wasting half the vector width. Your input has a variable size, use it. + + li t2, 8 + + vlsseg2e32.v v0, (a1), t2 I'm not sure what you are trying to achieve here. It seems that you could just as well use vlseg2 without register stride, no? + vlsseg2e32.v v2, (a2), t2 + vlsseg2e32.v v4, (a0), t2 + + vfmul.vv v6, v0, v2 + vfmul.vv v7, v1, v3 + vfmul.vv v8, v0, v3 + vfmul.vv v9, v1, v2 + + vfadd.vv v4, v4, v6 + vfsub.vv v4, v4, v7 + vfadd.vv v5, v5, v8 + vfadd.vv v5, v5, v9 + + vssseg2e32.v v4, (a0), t2 Same here. -- レミ・デニ-クールモン http://www.remlab.net/ _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".