Re: [FFmpeg-devel] [PATCH v3 5/9] lavc/vp9dsp: R-V V mc avg

2024-05-17 Thread flow gg
yeah, updated it in the reply Rémi Denis-Courmont 于2024年5月17日周五 23:11写道: > Le maanantaina 13. toukokuuta 2024, 19.59.22 EEST u...@foxmail.com a > écrit : > > From: sunyuechi > > > > C908: > > vp9_avg4_8bpp_c: 1.2 > > vp9_avg4_8bpp_rvv_i64: 1.0 > > vp9_avg8_8bpp_c: 3.7 > > vp9_avg8_8bpp_rvv_i64:

Re: [FFmpeg-devel] [PATCH v4 1/5] lavc/vp9dsp: R-V V mc avg

2024-05-18 Thread flow gg
Fixed issues with .irp and comma, as well as the ifc issue (same modifications as previously done for vp8). 于2024年5月19日周日 02:16写道: > From: sunyuechi > > C908: > vp9_avg4_8bpp_c: 1.2 > vp9_avg4_8bpp_rvv_i64: 1.0 > vp9_avg8_8bpp_c: 3.7 > vp9_avg8_8bpp_rvv_i64: 1.5 > vp9_avg16_8bpp_c: 14.7 > vp9_a

Re: [FFmpeg-devel] [PATCH v3 6/9] lavc/vp9dsp: R-V V mc bilin h v

2024-05-18 Thread flow gg
fixed in v4 Rémi Denis-Courmont 于2024年5月18日周六 23:56写道: > Le maanantaina 13. toukokuuta 2024, 19.59.23 EEST u...@foxmail.com a > écrit : > > From: sunyuechi > > > > C908: > > vp9_avg_bilin_4h_8bpp_c: 5.2 > > vp9_avg_bilin_4h_8bpp_rvv_i64: 2.2 > > vp9_avg_bilin_4v_8bpp_c: 5.5 > > vp9_avg_bilin_4v

Re: [FFmpeg-devel] [PATCH 1/4] lavc/vp8dsp: R-V V put_epel hv

2024-05-19 Thread flow gg
fix .irp use 于2024年5月19日周日 16:18写道: > From: sunyuechi > > C908: > vp8_put_epel4_h4v4_c: 20.0 > vp8_put_epel4_h4v4_rvv_i32: 11.0 > vp8_put_epel4_h4v6_c: 25.2 > vp8_put_epel4_h4v6_rvv_i32: 13.5 > vp8_put_epel4_h6v4_c: 22.2 > vp8_put_epel4_h6v4_rvv_i32: 14.5 > vp8_put_epel4_h6v6_c: 29.0 > vp8_put_

Re: [FFmpeg-devel] [PATCH] lavc/vvc_mc: R-V V avg w_avg

2024-05-21 Thread flow gg
To obtain test results, need to comment out the if (w == h) in tests/checkasm/vvc_mc.c. Because vset needs to be used in the loop, I manually wrote a cumbersome vset macro. 于2024年5月21日周二 15:38写道: > From: sunyuechi > > C908 X60 > avg_8_2x2_

Re: [FFmpeg-devel] [PATCH] lavc/vvc_mc: R-V V avg w_avg

2024-05-21 Thread flow gg
There are three unused lines which I forgot to delete before submitting. I have updated them here. 于2024年5月21日周二 15:47写道: > From: sunyuechi > > C908 X60 > avg_8_2x2_c:1.01.0 > avg_8_2x2_rvv_i

Re: [FFmpeg-devel] [PATCH v4 1/5] lavc/vp9dsp: R-V V mc avg

2024-05-21 Thread flow gg
> Please put commas between operands. Okay > This should probably be ff_avg_vp9 or something slightly more specific. Is it necessary here? Many macros in the C file are copied from MIPS, where it is called ff_avg4_msa. Here, it has been simply changed to ff_avg4_rvv. Rémi Denis-Courmont 于2024年

Re: [FFmpeg-devel] [PATCH v2 1/5] lavc/vp9dsp: R-V V mc avg

2024-05-21 Thread flow gg
> Please put commas between operands. > This should probably be ff_avg_vp9 or something slightly more specific. Updated here. 于2024年5月22日周三 01:14写道: > From: sunyuechi > > C908: > vp9_avg4_8bpp_c: 1.2 > vp9_avg4_8bpp_rvv_i64: 1.0 > vp9_avg8_8bpp_c: 3.7 > vp9_avg8_8bpp_rvv_i64: 1.5 > vp9_avg16_8

Re: [FFmpeg-devel] [PATCH] lavc/vvc_mc: R-V V avg w_avg

2024-05-21 Thread flow gg
> I would expect that you can get better performance by interleaving scalar and vector stuff, and possibly also vector loads and vector arithmetic. Okay, I will try > These labels lead to nowhere? If you actually mean to implicitly fall through to the next function, you can use the function name

Re: [FFmpeg-devel] [PATCH v2 2/5] lavc/vp9dsp: R-V V mc bilin h v

2024-05-21 Thread flow gg
Do macros definition also need a comma? I noticed that many of my old code and SiFive's code don't have a comma Rémi Denis-Courmont 于2024年5月22日周三 02:29写道: > Le tiistaina 21. toukokuuta 2024, 20.13.16 EEST u...@foxmail.com a écrit : > > From: sunyuechi > > > diff --git a/libavcodec/riscv/vp9_mc_

Re: [FFmpeg-devel] [PATCH] lavc/vvc_mc: R-V V avg w_avg

2024-05-21 Thread flow gg
Reordered some here. 于2024年5月22日周三 03:24写道: > From: sunyuechi > > C908 X60 > avg_8_2x2_c:1.01.0 > avg_8_2x2_rvv_i32 :0.70.7 > avg_8_2x4_c

Re: [FFmpeg-devel] [PATCH] lavc/rv34dsp: optimise R-V V idct_dc_add

2024-05-22 Thread flow gg
Unfortunately I only test to obtain benchmarks and basic correctness. I always feel the need for a professional to write the tests. Rémi Denis-Courmont 于2024年5月23日周四 04:35写道: > > > Le 22 mai 2024 23:28:54 GMT+03:00, "Rémi Denis-Courmont" > a écrit : > >This removes one stray LI and reworks the

Re: [FFmpeg-devel] [PATCH v2 1/5] lavc/vp9dsp: R-V V mc avg

2024-05-23 Thread flow gg
I want to update the VP9 bilin load, just like you did with VP8, but it seems like this patch([PATCH v2 1/5] lavc/vp9dsp: R-V V mc avg) doesn't merge the current updates here but merges the previous version instead, so the subsequent patches will have conflicts. flow gg 于2024年5月22日周三 01

Re: [FFmpeg-devel] [PATCH v2 3/5] lavc/vp9dsp: R-V V mc tap h v

2024-05-25 Thread flow gg
> Is there a reason that you cannot use the tables from C code? Similar to VP8, to adjust the positive and negative data and prevent small probability overflow during calculations. > AFAICT, regular and sharp are identical, except for the base address of the > filter table, so it should be possib

Re: [FFmpeg-devel] [PATCH v2 3/5] lavc/vp9dsp: R-V V mc tap h v

2024-05-25 Thread flow gg
One more thing I remember is that after adjusting the sign, vmacc can be used; otherwise, due to the sign, mul + add are needed. flow gg 于2024年5月25日周六 18:38写道: > > Is there a reason that you cannot use the tables from C code? > > Similar to VP8, to adjust the positive and negat

Re: [FFmpeg-devel] [PATCH 5/5] lavc/vp8dsp: factor R-V V EPEL functions for all lengths

2024-05-25 Thread flow gg
Would it be better to replace the two vsetvlstatic8 and vsetvlstatic16 with two vsetvl? This would require the previous patch and this one to work together, increasing the number of lines of code and making the code a bit harder to read. Additionally, I have a question about patch 4 'save one R-V G

Re: [FFmpeg-devel] [PATCH 5/5] lavc/vp8dsp: factor R-V V EPEL functions for all lengths

2024-05-25 Thread flow gg
reduction in code size seems to be due to switching to using j labels, doesn't seem to be about vset, but another issue. j labels are indeed better. I will make similar modifications. Rémi Denis-Courmont 于2024年5月26日周日 02:29写道: > Le lauantaina 25. toukokuuta 2024, 21.16.22 EEST flow gg a écri

Re: [FFmpeg-devel] [PATCH v2 1/5] lavc/vp9dsp: R-V V mc avg

2024-05-26 Thread flow gg
Hi, maybe we can prioritize this revert: https://git.ffmpeg.org/gitweb/ffmpeg.git/commit/0c1304ae11b0361ede055ee8ffc6e83529468c73 Using [PATCH v2 1/5] lavc/vp9dsp: R-V V mc avg to avoid conflicts with other patches. flow gg 于2024年5月24日周五 14:13写道: > I want to update the VP9 bilin load, just l

Re: [FFmpeg-devel] [PATCH v3 4/5] lavc/vp9dsp: R-V V mc tap h v

2024-05-29 Thread flow gg
A portion has been modified according to the previous review, but there are still some parts that haven't been updated > Similarly, it > should be possible to share most of the horizontal and vertical code (maybe > also for bilinear. not just EPel) with separate load/store then inner > procedures.

Re: [FFmpeg-devel] [PATCH] lavc/vp8dsp: R-V V put_bilin_h v unroll

2024-05-30 Thread flow gg
I directly copied the VP9 modifications over... Since len <= 16, it seems like it can be improved a bit more 于2024年5月30日周四 23:27写道: > From: sunyuechi > > Since len < 64, the registers are sufficient, so it can be > directly unrolled (a4 is even). > > Another benefit of unrolling is that it redu

Re: [FFmpeg-devel] [PATCH] lavc/vp8dsp: R-V V put_bilin_h v unroll

2024-05-30 Thread flow gg
Well.. because scalar registers are limited, the direct unrolling will be like this for now. We can handle different lengths separately in the future flow gg 于2024年5月30日周四 23:36写道: > I directly copied the VP9 modifications over... Since len <= 16, it seems > like it can be improved a

Re: [FFmpeg-devel] [PATCH v2] lavc/vvc_mc: R-V V avg w_avg

2024-06-01 Thread flow gg
> In keeping in line with the rest of the project, that should probably go into > **libavcodec/riscv/vvc/** > Expanding the macro 49 times, with up to 14 **branches** to get there is maybe not > such a great idea. It might look nice on the checkasm µbenchmarks because the > branches under test get

Re: [FFmpeg-devel] [PATCH v2] lavc/vvc_mc: R-V V avg w_avg

2024-06-01 Thread flow gg
> I think we can drop the 2x2 transforms. In all likelihood, scalar code will > end up faster than vector code on future hardware, especially out-of-order > pipelines. I want to drop 2x2, but since there's only one function to handle all situations instead of 7*7 functions, how can I drop only 2x2

Re: [FFmpeg-devel] [PATCH v3] lavc/vvc_mc: R-V V avg w_avg

2024-06-11 Thread flow gg
> I think we can drop the 2x2 transforms. In all likelihood, scalar code will > end up faster than vector code on future hardware, especially out-of-order > pipelines. I want to drop 2x2, but since there's only one function to handle all situations instead of 7*7 functions.. > AFAIU, this will ge

Re: [FFmpeg-devel] [PATCH v4] lavc/vvc_mc: R-V V avg w_avg

2024-06-11 Thread flow gg
> Nit: for overall code base consistency, I'd use csrwi here. Reason being that > for other rounding modes, csrwi is the better option. > > Probably faster to swap the two above, to avoid stalling on LD. > > If you check more than one length, better to get ff_get_rv_vlenb() into a local > variable.

Re: [FFmpeg-devel] [PATCH] lavc/vp8dsp: R-V V put_bilin_h v unroll

2024-06-12 Thread flow gg
ping 于2024年5月30日周四 23:27写道: > From: sunyuechi > > Since len < 64, the registers are sufficient, so it can be > directly unrolled (a4 is even). > > Another benefit of unrolling is that it reduces one load operation > vertically compared to horizontally. > > old

Re: [FFmpeg-devel] [PATCH] lavc/vp8dsp: R-V V put_bilin_h v unroll

2024-06-12 Thread flow gg
> Does this not render the type parameter of bilin_load useless (always h)? > (Not a blocker for this patch.) Yes, this was needed in the initial version, but it is no longer required. I just sent a patch. > Not sure if I already asked this but is this really faster than slide1? > Normally we wan

Re: [FFmpeg-devel] [PATCH v4 1/4] lavc/vp9dsp: R-V V mc bilin h v

2024-06-15 Thread flow gg
Just like in VP8, the unroll has been updated. 于2024年6月15日周六 19:51写道: > From: sunyuechi > > C908 X60 > vp9_avg_bilin_4h_8bpp_c:5.54.7 > vp9_avg_bilin_4h_8bpp_rvv_i32 :1.71.5 >

Re: [FFmpeg-devel] [PATCH v4 2/4] lavc/vp9dsp: R-V V mc bilin hv

2024-06-15 Thread flow gg
> Copying vectors is rarely justified - mostly only before destructive > instructions such as FMA. It is slightly different from VP8. In VP8, many scalar values are positive, so the related calculations can be easily replaced. However, in this context of VP9, since t2 is a negative number, vwmaccs

Re: [FFmpeg-devel] [PATCH v4 3/4] lavc/vp9dsp: R-V V mc tap h v

2024-06-15 Thread flow gg
> You can directly LLA filters + 16 * 8 * 2 and save one add. Same below. You can > also use .equ to alias the filter addresses, and avoid if's. > That's a lot of address dependencies, which is going to hurt performance. It > might help to just spill more S registers if needed. > This can be done

Re: [FFmpeg-devel] [PATCH v4 2/4] lavc/vp9dsp: R-V V mc bilin hv

2024-06-30 Thread flow gg
Initially, I tried using `vnclip.wi` with reference to h264, -vwadd.wxv16, v16, t4 -vnsra.wiv16, v16, 4 +vnclip.wi v16, v16, 4 but couldn't find the correct way... I think there might be some overflow issues that I didn't understand correctly. How do y

Re: [FFmpeg-devel] [PATCH 2/2] lavc/h264dsp: R-V V 8-bit luma loop filter

2024-07-01 Thread flow gg
The loop filter horizontal in vp8 also has this issue .. Rémi Denis-Courmont 于2024年6月30日周日 17:04写道: > T-Head C908 (cycles): > h264_h_loop_filter_luma_8bpp_c: 297.5 > h264_h_loop_filter_luma_8bpp_rvv_i32: 374.7 > h264_v_loop_filter_luma_8bpp_c: 862.7 > h264_v_loop_filter_luma_8bpp_rvv

Re: [FFmpeg-devel] [PATCH v5] lavc/vvc_mc: R-V V avg w_avg

2024-07-01 Thread flow gg
> I am not sure what is_w means or serves here. If you need special cases, this > feels a bit out of place for this macro. It is a special case added to merge the vset of avg and w_avg, how about giving it a default value so that it doesn't affect the use of other functions? > I am not sure if I

Re: [FFmpeg-devel] [PATCH v5] lavc/vvc_mc: R-V V avg w_avg

2024-07-01 Thread flow gg
I reviewed it again, the purpose of is_w is to limit lmul to a maximum of 1/4 of vlen, to prevent vector register shortage, which can also be considered as vset limiting lmul. I renamed it to quarter_len_limit. t0 is changed to t1. 于2024年7月2日周二 00:07写道: > From: sunyuechi > >

Re: [FFmpeg-devel] [PATCH] af_afir: RISC-V V fcmul_add

2023-11-13 Thread flow gg
-Courmont 于2023年9月28日周四 21:33写道: > > > Le 28 septembre 2023 08:45:44 GMT+03:00, flow gg a > écrit : > >Okay, I revert the volatile in ff_read_time > > > >How about this version? > > It's still using register stride which is all but guaranteed to be slow on &g

Re: [FFmpeg-devel] [PATCH] af_afir: RISC-V V fcmul_add

2023-11-15 Thread flow gg
Okay, I have updated these issues in the patch. Rémi Denis-Courmont 于2023年11月13日周一 23:35写道: >Hi, > > Le maanantaina 13. marraskuuta 2023, 11.43.01 EET flow gg a écrit : > > Sorry for the long delay in responding. > > No problem. Working with T-Head C910 (or C920?) cor

Re: [FFmpeg-devel] [PATCH] af_afir: RISC-V V fcmul_add

2023-11-15 Thread flow gg
Okay, I have updated these issues in the patch. Rémi Denis-Courmont 于2023年11月13日周一 23:35写道: >Hi, > > Le maanantaina 13. marraskuuta 2023, 11.43.01 EET flow gg a écrit : > > Sorry for the long delay in responding. > > No problem. Working with T-Head C910 (or C920?) cor

Re: [FFmpeg-devel] [PATCH] af_afir: RISC-V V fcmul_add

2023-11-15 Thread flow gg
Okay, I have modified them to 64 and added some descriptions. Rémi Denis-Courmont 于2023年11月15日周三 23:06写道: > Le keskiviikkona 15. marraskuuta 2023, 10.59.55 EET flow gg a écrit : > > Okay, I have updated these issues in the patch. > > It does not assemble but I can fix it locally

[FFmpeg-devel] [PATCH] checkasm: add test for dcmul_add

2023-11-17 Thread flow gg
From 2785ce57f68dbb2373c951b9432afa73796f7cc1 Mon Sep 17 00:00:00 2001 From: sunyuechi Date: Sat, 18 Nov 2023 10:58:17 +0800 Subject: [PATCH] checkasm: test for dcmul_add --- tests/checkasm/af_afir.c | 141 +++ 1 file changed, 98 insertions(+), 43 deletions(-

Re: [FFmpeg-devel] [PATCH] checkasm: add test for dcmul_add

2023-11-18 Thread flow gg
dst[i]); +fail(); +break; +} +} +memcpy(odst, src0, (BUF_SIZE) * sizeof(double)); +bench_new(odst, src1, src2, LEN); +} + +report("dcmul_add"); +} + + +void checkasm_check_afir(void) +{ + AudioFIRDSPContext fir =

[FFmpeg-devel] [PATCH] ac3dsp: RISC-V V float_to_fixed24

2023-11-22 Thread flow gg
c910 float_to_fixed24_c: 208.2 float_to_fixed24_rvv_f32: 71.5 From 69da974fd0febaa74db4dd551b05172caeefb846 Mon Sep 17 00:00:00 2001 From: sunyuechi Date: Wed, 22 Nov 2023 14:57:29 +0800 Subject: [PATCH] lavc/ac3dsp: R-V V float_to_fixed24 c910 float_to_fixed24_c: 208.2 float_to_f

Re: [FFmpeg-devel] [PATCH] ac3dsp: RISC-V V float_to_fixed24

2023-11-22 Thread flow gg
. (We > *do* have Zba and Zbb now though, hence the existing extract_exponents()). > > Also: > - This does not seem according to the C ABI. AFAIK `unsigned` is > sign-extended. > - ALU right before dependent conditional branch should be avoided. > - SHxADD can be used advantage

Re: [FFmpeg-devel] [PATCH] ac3dsp: RISC-V V float_to_fixed24

2023-11-22 Thread flow gg
qemu-riscv64 -cpu rv64,v=true,g=true,c=true,zba=true,vlen=128 checkasm --test=ac3dsp flow gg 于2023年11月22日周三 22:30写道: > > How did you test it? > > I wrote a test, but it was a bit rough, so I want to modify it before > submitting. I've added it to this reply. > > >

Re: [FFmpeg-devel] [PATCH] ac3dsp: RISC-V V float_to_fixed24

2023-11-22 Thread flow gg
GMT+02:00, flow gg a > écrit : > >> How did you test it? > > > >I wrote a test, but it was a bit rough, so I want to modify it before > >submitting. I've added it to this reply. > > > >> This does not seem according to the C ABI. AFAIK `unsigned

Re: [FFmpeg-devel] [PATCH] ac3dsp: RISC-V V float_to_fixed24

2023-11-22 Thread flow gg
Wow, thank you for reviewing this. I just wanted to see if the function was working properly. There are so many bugs in the test code ... ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscrib

Re: [FFmpeg-devel] [PATCH] ac3dsp: RISC-V V float_to_fixed24

2023-11-22 Thread flow gg
Hello, I saw the new commit "avcodec/ac3dsp: make len a size_t in float_to_fixed24." So I removed the part #if (__riscv_xlen == 64) and restored the patch. From 3e790fdccd780257f464aa8f8a56a37321ddd429 Mon Sep 17 00:00:00 2001 From: sunyuechi Date: Wed, 22 Nov 2023 14:57:29 +0800 Subject: [PATCH]

[FFmpeg-devel] [PATCH] checkasm/ac3dsp: add float_to_fixed24 test

2023-11-22 Thread flow gg
From 02dd534bd602ba3ec79e51070934949a98f780e2 Mon Sep 17 00:00:00 2001 From: sunyuechi Date: Wed, 22 Nov 2023 14:57:29 +0800 Subject: [PATCH] checkasm/ac3dsp: add float_to_fixed24 test --- tests/checkasm/Makefile | 1 + tests/checkasm/ac3dsp.c | 71 +++

Re: [FFmpeg-devel] [PATCH] ac3dsp: RISC-V V float_to_fixed24

2023-11-22 Thread flow gg
I modified the temporary test and sent it in "[FFmpeg-devel] [PATCH] checkasm/ac3dsp: add float_to_fixed24 test". So the test time results have changed, and I updated them in the patch. c910 float_to_fixed24_c: 2207.2 float_to_fixed24_rvv_f32: 696.2 flow gg 于2023年11月22日周三 20:00写

Re: [FFmpeg-devel] [PATCH] ac3dsp: RISC-V V float_to_fixed24

2023-11-23 Thread flow gg
Okay, changed Rémi Denis-Courmont 于2023年11月24日周五 01:09写道: > Le torstaina 23. marraskuuta 2023, 1.17.03 EET flow gg a écrit : > > Hello, I saw the new commit "avcodec/ac3dsp: make len a size_t in > > float_to_fixed24." > > > > So I removed the part #if (__ris

Re: [FFmpeg-devel] [PATCH] checkasm/ac3dsp: add float_to_fixed24 test

2023-11-23 Thread flow gg
> You should probably add the test case to tests/fate/checkasm.mak > This one is not necessary. You can reuse dst or dst2 for the bench() as it's write only. > Changed BUF_SIZE instead of 10. Okay, changed. James Almer 于2023年11月24日周五 01:11写道: > On 11/23/2023 4:08

Re: [FFmpeg-devel] [PATCH] checkasm: add test for dcmul_add

2023-11-26 Thread flow gg
This is a bit confusing for me.. I tried pulling the latest code, and then used `git am checkasm-test-for-dcmul_add.patch` without any patch corruption. Rémi Denis-Courmont 于2023年11月27日周一 03:36写道: > Le sunnuntaina 19. marraskuuta 2023, 0.28.10 EET flow gg a écrit : >

Re: [FFmpeg-devel] [PATCH] checkasm: add test for dcmul_add

2023-11-27 Thread flow gg
also posed no problems. (I am using the Gmail web page.) Rémi Denis-Courmont 于2023年11月27日周一 20:17写道: > > > Le 26 novembre 2023 22:54:28 GMT+02:00, flow gg a > écrit : > >This is a bit confusing for me.. I tried pulling the latest code, and then > >used `git am checkasm-

[FFmpeg-devel] [PATCH 1/2] checkasm: test for abs_pow34

2023-11-28 Thread flow gg
From 85e60d75554894964825f5718d14591294ec4e88 Mon Sep 17 00:00:00 2001 From: sunyuechi Date: Tue, 28 Nov 2023 14:08:12 +0800 Subject: [PATCH 1/2] checkasm: test for abs_pow34 --- libavcodec/aacenc.c| 24 +++-- libavcodec/aacenc.h| 1 + tests/checkasm/Makefile| 1 +

[FFmpeg-devel] [PATCH 2/2] lavc/aacencdsp: R-V V abs_pow34

2023-11-28 Thread flow gg
c910: abs_pow34_c: 24610.7 abs_pow34_rvv_f32: 6177.7 (need use "[FFmpeg-devel] [PATCH 1/2] checkasm: test for abs_pow34" first) From 86577c2d40d29422c4b769c854df99a88c7b3c77 Mon Sep 17 00:00:00 2001 From: sunyuechi Date: Tue, 28 Nov 2023 20:14:14 +0800 Subject: [PATCH 2/2] lavc/aacencdsp:

Re: [FFmpeg-devel] [PATCH 1/2] checkasm: test for abs_pow34

2023-11-30 Thread flow gg
Okay, I splited and attached Rémi Denis-Courmont 于2023年11月30日周四 23:31写道: > Le tiistaina 28. marraskuuta 2023, 18.59.38 EET flow gg a écrit : > > > > Since nobody else commented, I shall note that you should probably split > the > underlying lavc changes into a separ

Re: [FFmpeg-devel] [PATCH] ac3dsp: RISC-V V float_to_fixed24

2023-12-01 Thread flow gg
Okay, changed and attached Rémi Denis-Courmont 于2023年12月2日周六 02:38写道: > Le perjantaina 1. joulukuuta 2023, 20.35.10 EET Rémi Denis-Courmont a > écrit : > > Le perjantaina 24. marraskuuta 2023, 0.39.39 EET flow gg a écrit : > > > Okay, changed > > > > src/l

Re: [FFmpeg-devel] [PATCH] ac3dsp: RISC-V V float_to_fixed24

2023-12-01 Thread flow gg
I forgot to modify the Makefile; I've made the changes in this reply. flow gg 于2023年12月2日周六 03:50写道: > Okay, changed and attached > > Rémi Denis-Courmont 于2023年12月2日周六 02:38写道: > >> Le perjantaina 1. joulukuuta 2023, 20.35.10 EET Rémi Denis-Courmont a >> é

[FFmpeg-devel] [PATCH] lavc/vc1dsp: R-V V inv_trans

2023-12-03 Thread flow gg
c910 vc1dsp.vc1_inv_trans_4x4_dc_c: 84.0 vc1dsp.vc1_inv_trans_4x4_dc_rvv_i32: 74.0 vc1dsp.vc1_inv_trans_4x8_dc_c: 150.2 vc1dsp.vc1_inv_trans_4x8_dc_rvv_i32: 83.5 vc1dsp.vc1_inv_trans_8x4_dc_c: 129.0 vc1dsp.vc1_inv_trans_8x4_dc_rvv_i64: 75.7 vc1dsp.vc1_inv_trans_8x8_dc_c:

Re: [FFmpeg-devel] [PATCH] lavc/vc1dsp: R-V V inv_trans

2023-12-04 Thread flow gg
ma - vsetvli zero, zero, e64, m4, ta, ma + vsetivlizero, 8, e8, mf2, ta, ma ``` And ISCAS seems to have no announcement about getting an RVV 1.0 board. I plan to ask about it from time to time. Rémi Denis-Courmont 于2023年12月4日周一 01:17写道: > Le sunnuntaina 3. joulukuu

Re: [FFmpeg-devel] [PATCH] lavc/vc1dsp: R-V V inv_trans

2023-12-04 Thread flow gg
I found that in the case of nosplat, an additional vset can be removed, and the time is basically the same, so I updated the patch. Rémi Denis-Courmont 于2023年12月4日周一 23:15写道: > Le maanantaina 4. joulukuuta 2023, 10.48.56 EET flow gg a écrit : > > > Probably missing VLENB checks. >

Re: [FFmpeg-devel] [PATCH 1/2] checkasm: test for abs_pow34

2023-12-04 Thread flow gg
Because there was a conflict, the patch was updated in the reply flow gg 于2023年12月1日周五 04:25写道: > Okay, I splited and attached > > > > Rémi Denis-Courmont 于2023年11月30日周四 23:31写道: > >> Le tiistaina 28. marraskuuta 2023, 18.59.38 EET flow gg a écrit : >> > >

Re: [FFmpeg-devel] [PATCH] lavc/vc1dsp: R-V V inv_trans

2023-12-04 Thread flow gg
Okay, after using zext, can delete two vset, which is better than splat. I have updated the patch in this reply. Rémi Denis-Courmont 于2023年12月4日周一 23:15写道: > Le maanantaina 4. joulukuuta 2023, 10.48.56 EET flow gg a écrit : > > > Probably missing VLENB checks. > > > > Ch

Re: [FFmpeg-devel] [PATCH] lavc/vc1dsp: R-V V inv_trans

2023-12-05 Thread flow gg
> This block can be folded into the next. You don't need to check VLENB twice. Changed. > Instruction scheduling could be better, especially on in-order CPUs. I put the vload at the front, and then proceeded with the t2 operation, but I'm not sure... > You don't need to reset the AVL here, just

Re: [FFmpeg-devel] [PATCH] lavc/vc1dsp: R-V V inv_trans

2023-12-05 Thread flow gg
CSRxI immediate Changed. Rémi Denis-Courmont 于2023年12月6日周三 04:11写道: > Le tiistaina 5. joulukuuta 2023, 21.25.12 EET flow gg a écrit : > > > This block can be folded into the next. You don't need to check VLENB > > > > twice. > > > > Changed. > > > &

Re: [FFmpeg-devel] [PATCH] lavc/vc1dsp: R-V V inv_trans

2023-12-05 Thread flow gg
> FWIW CanMV-K230 boards are on sale for under 500 RMB. I just made a payment ~ (I saw you mention in IRC that you're going to write about K230+Debian. Looking forward to it) Rémi Denis-Courmont 于2023年12月6日周三 04:11写道: > Le tiistaina 5. joulukuuta 2023, 21.25.12 EET flow gg a écrit :

Re: [FFmpeg-devel] [PATCH] lavc/vc1dsp: R-V V inv_trans

2023-12-07 Thread flow gg
023, 16.40.08 EET flow gg a écrit : > > c910 > > vc1dsp.vc1_inv_trans_4x4_dc_c: 84.0 > > vc1dsp.vc1_inv_trans_4x4_dc_rvv_i32: 74.0 > > vc1dsp.vc1_inv_trans_4x8_dc_c: 150.2 > > vc1dsp.vc1_inv_trans_4x8_dc_rvv_i32: 83.5 > >

Re: [FFmpeg-devel] [PATCH 2/2] lavc/aacencdsp: R-V V abs_pow34

2023-12-09 Thread flow gg
Updated the patch to resolve conflicts, updated m4 to m8, using c908's benchmark. flow gg 于2023年11月29日周三 01:00写道: > c910: > abs_pow34_c: 24610.7 > abs_pow34_rvv_f32: 6177.7 > > (need use "[FFmpeg-devel] [PATCH 1/2] checkasm: test for

Re: [FFmpeg-devel] [PATCH 1/2] checkasm: test for abs_pow34

2023-12-09 Thread flow gg
em-rss:0kB If I remove the line 1429 with FF_CODEC_ENCODE_CB(aac_encode_frame), there is no error on k230, but I am unsure of the reason. flow gg 于2023年12月5日周二 05:46写道: > Because there was a conflict, the patch was updated in the reply > > flow gg 于2023年12月1日周五 04:25写道: &g

Re: [FFmpeg-devel] [PATCH 1/2] checkasm: test for abs_pow34

2023-12-09 Thread flow gg
To express clearly,I mean remove libavcodec/aacenc.c:1429 FF_CODEC_ENCODE_CB(aac_encode_frame) ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-

Re: [FFmpeg-devel] [PATCH 1/3] lavc/vp8dsp: R-V V put_epel h

2024-03-26 Thread flow gg
Okay, changed to use const, updated at this GitHub link ( https://github.com/hleft/FFmpeg/tree/vp8vp9) Rémi Denis-Courmont 于2024年3月27日周三 02:38写道: > Le perjantaina 22. maaliskuuta 2024, 8.01.00 EET flow gg a écrit : > > (This should be used after applying these 4 patches) > > >

Re: [FFmpeg-devel] [PATCH 1/7] lavc/vp9dsp: R-V mc copy_avg

2024-03-26 Thread flow gg
Hi, here's the github link (https://github.com/hleft/FFmpeg/tree/vp8vp9) Rémi Denis-Courmont 于2024年3月27日周三 02:30写道: > Hi, > > Le perjantaina 22. maaliskuuta 2024, 8.12.41 EET flow gg a écrit : > > It might be a bit inconvenient to find the patches related to vp8, vp9

Re: [FFmpeg-devel] [PATCH 1/3] lavc/vp8dsp: R-V V put_epel h

2024-03-27 Thread flow gg
Alright, updated it in this reply Rémi Denis-Courmont 于2024年3月27日周三 16:18写道: > Hi, > > Le 27 mars 2024 04:37:02 GMT+02:00, flow gg a > écrit : > >Okay, changed to use const, updated at this GitHub link ( > >https://github.com/hleft/FFmpeg/tree/vp8vp9) > > OK, th

Re: [FFmpeg-devel] [PATCH 2/3] lavc/vp8dsp: R-V V put_epel v

2024-03-27 Thread flow gg
s just that vp9 doesn't have enough) Rémi Denis-Courmont 于2024年3月27日周三 23:36写道: > Le perjantaina 22. maaliskuuta 2024, 8.01.21 EET flow gg a écrit : > > > > IMO, you could just as well share the code and avoid most if's. Not like > one > additional `li a3, 1` per

Re: [FFmpeg-devel] [PATCH 1/4] lavc/vp9dsp: R-V V ipred dc

2024-03-27 Thread flow gg
3月27日周三 23:41写道: > Le perjantaina 22. maaliskuuta 2024, 8.02.08 EET flow gg a écrit : > > Using macros to shorten function definitions, updated in this response > > Did you try to share the common code after getdc and see how slower it is? > If > an extra static branch ha

Re: [FFmpeg-devel] [PATCH 1/4] lavc/vp9dsp: R-V V ipred dc

2024-04-06 Thread flow gg
Okay, updated it in the reply and github( https://github.com/hleft/FFmpeg/tree/vp8vp9) Rémi Denis-Courmont 于2024年4月4日周四 04:22写道: > Le torstaina 28. maaliskuuta 2024, 4.44.33 EEST flow gg a écrit : > > I don't quite understand, I think here 8x8 because zve64x is not suitable >

Re: [FFmpeg-devel] [PATCH 2/2] lavc/vc1dsp: R-V V mspel_pixels

2024-04-06 Thread flow gg
ping flow gg 于2024年3月8日周五 17:46写道: > Alright, using m8, but for now don't add code to address dependencies in > loops that have a minor impact. Updated in the reply > > Rémi Denis-Courmont 于2024年3月8日周五 17:08写道: > >> >> >> Le 8 mars 2024 02:45:46 GMT+02:00

[FFmpeg-devel] [PATCH 1/3] lavc/vp8dsp: R-V V loop_filter_simple

2024-04-20 Thread flow gg
From 2f516e0236bd84d78ce6fd7e55c4b1a3c9d99baa Mon Sep 17 00:00:00 2001 From: sunyuechi Date: Sat, 20 Apr 2024 23:32:10 +0800 Subject: [PATCH 1/3] lavc/vp8dsp: R-V V loop_filter_simple C908: vp8_loop_filter_simple_h_c: 416.0 vp8_loop_filter_simple_h_rvv_i32: 187.5 vp8_loop_filter_simple_v_c: 429.

[FFmpeg-devel] [PATCH 2/3] lavc/vp8dsp: R-V V loop_filter_inner

2024-04-20 Thread flow gg
From c033ab8d30135dc02b09b1747c0761baefdcbb4a Mon Sep 17 00:00:00 2001 From: sunyuechi Date: Sat, 20 Apr 2024 23:13:07 +0800 Subject: [PATCH 2/3] lavc/vp8dsp: R-V V loop_filter_inner C908: vp8_loop_filter8uv_inner_v_c: 738.2 vp8_loop_filter8uv_inner_v_rvv_i32: 455.2 vp8_loop_filter16y_inner_h_c:

[FFmpeg-devel] [PATCH 3/3] lavc/vp8dsp: R-V V loop_filter

2024-04-20 Thread flow gg
From cff79c9500b94f4c0abdd9cd68c91cc736366c78 Mon Sep 17 00:00:00 2001 From: sunyuechi Date: Sat, 20 Apr 2024 23:26:58 +0800 Subject: [PATCH 3/3] lavc/vp8dsp: R-V V loop_filter C908: vp8_loop_filter8uv_v_c: 745.5 vp8_loop_filter8uv_v_rvv_i32: 467.2 vp8_loop_filter16y_h_c: 674.2 vp8_loop_filter16

Re: [FFmpeg-devel] [PATCH 1/3] lavc/vp8dsp: R-V V loop_filter_simple

2024-04-20 Thread flow gg
github link: https://github.com/hleft/FFmpeg/tree/vp8vp9 flow gg 于2024年4月20日周六 23:55写道: > > ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg

Re: [FFmpeg-devel] [PATCH 2/2] lavc/vc1dsp: R-V V mspel_pixels

2024-04-29 Thread flow gg
Happy to see you back :) Rémi Denis-Courmont 于2024年4月29日周一 02:06写道: > Le sunnuntaina 7. huhtikuuta 2024, 8.38.54 EEST flow gg a écrit : > > ping > > I have been away for a while, and catching up takes time, sorry. > > -- > レミ・デニ-クールモン

[FFmpeg-devel] [PATCH 1/2] checkasm/blockdsp: add fill_block test

2024-04-29 Thread flow gg
From 0c196a37cb4036d8c618c06c02a011b910cc56ce Mon Sep 17 00:00:00 2001 From: sunyuechi Date: Mon, 29 Apr 2024 14:18:23 +0800 Subject: [PATCH 1/2] checkasm/blockdsp: add fill_block test --- tests/checkasm/blockdsp.c | 32 1 file changed, 32 insertions(+) diff --

[FFmpeg-devel] [PATCH 2/2] lavc/blockdsp: R-V V fill_block

2024-04-29 Thread flow gg
From 4315f4e4774e3006d7cc55b6d235cb80e0173cf9 Mon Sep 17 00:00:00 2001 From: sunyuechi Date: Wed, 6 Mar 2024 12:46:03 +0800 Subject: [PATCH 2/2] lavc/blockdsp: R-V V fill_block C908: blockdsp.fill_block_tab[0]_c: 550.0 blockdsp.fill_block_tab[0]_rvv_i64: 48.2 blockdsp.fill_block_tab[1]_c: 148.7

Re: [FFmpeg-devel] [PATCH 2/4] lavc/vp9dsp: R-V V ipred vert

2024-04-29 Thread flow gg
updated it in the reply and https://github.com/hleft/FFmpeg/tree/vp8vp9 Rémi Denis-Courmont 于2024年4月30日周二 01:57写道: > Le perjantaina 22. maaliskuuta 2024, 8.02.38 EEST flow gg a écrit : > > Because the previous patch was updated, so it was updated in this > response > > Seem

Re: [FFmpeg-devel] [PATCH 2/2] lavc/blockdsp: R-V V fill_block

2024-04-29 Thread flow gg
ina 29. huhtikuuta 2024, 10.09.41 EEST flow gg a écrit : > > > > Are you sure that this works with all vector lengths? > The block8 code looks odd. > > -- > レミ・デニ-クールモン > http://www.remlab.net/ > ___ > ffmpeg-devel mailing

Re: [FFmpeg-devel] [PATCH 2/2] lavc/blockdsp: R-V V fill_block

2024-04-29 Thread flow gg
Since there is no 8x16, I changed m8 to m4, and updated it in the reply flow gg 于2024年4月30日周二 08:26写道: > Hi, I initially used a loop, but according to libavcodec/blockdsp.h, > > the maximum is 8x16 = 128 bytes, so using ff_get_rv_vlenb() >= 16 and m8 > does not

Re: [FFmpeg-devel] [PATCH 1/2] checkasm/blockdsp: add fill_block test

2024-04-29 Thread flow gg
Since there is no 8x16, not test 8x16, and updated it in the reply flow gg 于2024年4月29日周一 15:09写道: > > From fc7c28cb78e0c90880f31c0b8d6f2fc16d0fe581 Mon Sep 17 00:00:00 2001 From: sunyuechi Date: Mon, 29 Apr 2024 14:18:23 +0800 Subject: [PATCH 1/2] checkasm/blockdsp: add fill_bloc

Re: [FFmpeg-devel] [PATCH 2/2] lavc/blockdsp: R-V V fill_block

2024-04-30 Thread flow gg
Since the number of stores is controlled by a3 and not by zero, it doesn't have to be exactly 16 bytes ? Rémi Denis-Courmont 于2024年4月30日周二 14:40写道: > > > Le 30 avril 2024 03:26:25 GMT+03:00, flow gg a > écrit : > >Hi, I initially used a loop, but according to libavcodec

[FFmpeg-devel] [PATCH 1/2] checkasm/rv40dsp: add chroma_mc test

2024-04-30 Thread flow gg
From 07c0b8a26b76e31c46ecabddb251f317c48c73a3 Mon Sep 17 00:00:00 2001 From: sunyuechi Date: Tue, 30 Apr 2024 12:43:57 +0800 Subject: [PATCH 1/2] checkasm/rv40dsp: add chroma_mc test This is similar to h264. --- tests/checkasm/Makefile | 1 + tests/checkasm/checkasm.c | 3 ++ tests/checkasm

[FFmpeg-devel] [PATCH 2/2] lavc/rv40dsp: R-V V chroma_mc

2024-04-30 Thread flow gg
From 3e66b2bbe257cc91a4c2169362163e92aba6760b Mon Sep 17 00:00:00 2001 From: sunyuechi Date: Tue, 30 Apr 2024 18:24:00 +0800 Subject: [PATCH 2/2] lavc/rv40dsp: R-V V chroma_mc This is similar to h264, but here we use manual_avg instead of vaaddu because rv40's OP differs from h264. If we use vaa

Re: [FFmpeg-devel] [PATCH 2/4] lavc/vp9dsp: R-V V ipred vert

2024-05-02 Thread flow gg
Sorry, this is because a 'bpp == 8' was missed. It has been fixed in this link Rémi Denis-Courmont 于2024年5月2日周四 22:11写道: > Le tiistaina 30. huhtikuuta 2024, 2.36.22 EEST flow gg a écrit : > > updated it in the reply and https://github.com/hleft/FFmpeg/tree/vp8vp9 > > V

Re: [FFmpeg-devel] [RFC] 5 year plan & Inovation

2024-05-03 Thread flow gg
I saw about comparing emails and gitlab/hub .., I did not comprehensively understand their advantages and disadvantages, but I want to say that I support it to change to gitlab/hub Simple reason: If you need to use git-send-email, I may not be able to submit any code If you do not need to use git

Re: [FFmpeg-devel] [PATCH] lavc/vc1dsp: R-V V mspel_pixels

2024-05-04 Thread flow gg
Hi, it's me. I accidentally repeated it but it seems to be correct. 于2024年5月4日周六 18:01写道: > From: sunyuechi > > vc1dsp.avg_vc1_mspel_pixels_tab[0][0]_c: 869.7 > vc1dsp.avg_vc1_mspel_pixels_tab[0][0]_rvv_i32: 148.7 > vc1dsp.avg_vc1_mspel_pixels_tab[1][0]_c: 220.5 > vc1dsp.avg_vc1_mspel_pixels_ta

Re: [FFmpeg-devel] [PATCH 01/10] lavc/vp8dsp: R-V V put_vp8_pixels

2024-05-04 Thread flow gg
I've reorganized it, and the github link is at : https://github.com/hleft/FFmpeg/tree/vp8 于2024年5月4日周六 22:49写道: > From: sunyuechi > > C908: > vp8_put_pixels4_c: 87.5 > vp8_put_pixels4_rvv_i32: 42.7 > vp8_put_pixels8_c: 284.5 > vp8_put_pixels8_rvv_i32: 77.7 > vp8_put_pixels16_c: 1087.7 > vp8_put

Re: [FFmpeg-devel] [PATCH 01/10] lavc/vp9dsp: R-V V ipred vert

2024-05-04 Thread flow gg
the github link: https://github.com/hleft/FFmpeg/tree/vp9 于2024年5月4日周六 23:03写道: > From: sunyuechi > > C908: > vp9_vert_8x8_8bpp_c: 22.0 > vp9_vert_8x8_8bpp_rvv_i64: 18.5 > vp9_vert_16x16_8bpp_c: 71.2 > vp9_vert_16x16_8bpp_rvv_i32: 50.7 > vp9_vert_32x32_8bpp_c: 300.2 > vp9_vert_32x32_8bpp_rvv_i3

Re: [FFmpeg-devel] [PATCH] lavc/vc1dsp: R-V V mspel_pixels

2024-05-05 Thread flow gg
> Is it not faster to compute the address ahead of time, e.g.: > Ditto below and in other patches. Yes, update here and I will check other patches > Copying 64-bit quantities should not need RVV at all. Maybe the C version needs to be improved instead, but if that is not possible, then an RVI ver

Re: [FFmpeg-devel] [PATCH 01/10] lavc/vp8dsp: R-V put_vp8_pixels

2024-05-05 Thread flow gg
Made these changes according to the previous review: moved func into macro, added macro vset to reduce if else, used rvi, supplemented __riscv_xlen 于2024年5月6日周一 00:45写道: > From: sunyuechi > > C908: > vp8_put_pixels4_c: 78.0 > vp8_put_pixels4_rvi: 33.7 > vp8_put_pixels8_c: 278.0 > vp8_put_pixels

Re: [FFmpeg-devel] [PATCH v3 2/9] lavc/vp8dsp: R-V V put_bilin_h v

2024-05-05 Thread flow gg
> Doesn't this effectively discard the last element, t5? > Can't we skip the slide and just load the vector at a2+1? Also then, we can > keep VL=len and halve the multipler. Yes, this is better, I remember that using slide1down was better in the initial version testing, but now it has changed.. I

Re: [FFmpeg-devel] [PATCH v3 6/9] lavc/vp8dsp: R-V V put_epel hv

2024-05-06 Thread flow gg
> IMO, passing a complete register name, if you really need to vary it, would be simpler and more flexible than an ABI register type prefix. If the full register name is passed here, some require four parameters, some require six parameters, and there is often repetition. I feel it's easy to get c

Re: [FFmpeg-devel] [PATCH v2 1/9] lavc/vp9dsp: R-V ipred vert

2024-05-07 Thread flow gg
Fixed issues similar to vp8 于2024年5月7日周二 15:36写道: > From: sunyuechi > > C908: > vp9_vert_8x8_8bpp_c: 22.0 > vp9_vert_8x8_8bpp_rvi: 15.7 > vp9_vert_16x16_8bpp_c: 71.2 > vp9_vert_16x16_8bpp_rvi: 39.0 > vp9_vert_32x32_8bpp_c: 300.2 > vp9_vert_32x32_8bpp_rvi: 135.2 > --- > libavcodec/riscv/Makefil

  1   2   3   4   >