Le tiistaina 26. syyskuuta 2023, 12.24.58 EEST flow gg a écrit :
> benchmark:
> fcmul_add_c: 19.7
> fcmul_add_rvv_f32: 6.7

+    li t1, 4
+    vsetvli  t0, t1, e32, m1, ta, ma

vsetivli t0, 4, ...

But really, DO NOT use a fixed vector length here. At best, you're wasting half 
the vector width. Your input has a variable size, use it.

+
+    li t2, 8
+
+    vlsseg2e32.v v0, (a1), t2

I'm not sure what you are trying to achieve here. It seems that you could just 
as well use vlseg2 without register stride, no?

+    vlsseg2e32.v v2, (a2), t2
+    vlsseg2e32.v v4, (a0), t2
+
+    vfmul.vv v6, v0, v2
+    vfmul.vv v7, v1, v3
+    vfmul.vv v8, v0, v3
+    vfmul.vv v9, v1, v2
+
+    vfadd.vv v4, v4, v6
+    vfsub.vv v4, v4, v7
+    vfadd.vv v5, v5, v8
+    vfadd.vv v5, v5, v9
+
+    vssseg2e32.v v4, (a0), t2

Same here.


-- 
レミ・デニ-クールモン
http://www.remlab.net/



_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Reply via email to