Re: [FFmpeg-devel] [PATCH] Fix the tail handling in R-V V sad

2025-01-18 Thread flow gg
ping 于2024年12月23日周一 23:02写道: > From: sunyuechi > > --- > libavcodec/riscv/h26x/asm.S| 36 +- > libavcodec/riscv/vvc/sad_rvv.S | 2 +- > 2 files changed, 19 insertions(+), 19 deletions(-) > > diff --git a/libavcodec/riscv/h26x/asm.S b/libavcodec/riscv/h26x/a

Re: [FFmpeg-devel] [PATCH] Fix the tail handling in R-V V sad

2025-01-09 Thread flow gg
It seems that v0 and v24 need to be set to 0, and they have already been set. Rémi Denis-Courmont 于2025年1月8日周三 02:23写道: > Le maanantaina 23. joulukuuta 2024, 17.01.32 UTC+2 uk7b-at- > foxmail@ffmpeg.org a écrit : > > From: sunyuechi > > > > --- > > libavcodec/riscv/h26x/asm.S| 36 +

Re: [FFmpeg-devel] [PATCH] Fix the tail handling in R-V V sad

2024-12-31 Thread flow gg
ping 于2024年12月23日周一 23:02写道: > From: sunyuechi > > --- > libavcodec/riscv/h26x/asm.S| 36 +- > libavcodec/riscv/vvc/sad_rvv.S | 2 +- > 2 files changed, 19 insertions(+), 19 deletions(-) > > diff --git a/libavcodec/riscv/h26x/asm.S b/libavcodec/riscv/h26x/a

Re: [FFmpeg-devel] [PATCH] lavc/vvc_mc: reduce sequential dependency in R-V V sad

2024-12-30 Thread flow gg
于2024年12月31日周二 02:26写道: > Le tiistaina 24. joulukuuta 2024, 15.30.00 EET Nuo Mi a écrit : > > On Mon, Dec 23, 2024 at 11:18 PM flow gg wrote: > > > Hi, It looks like you submitted your review comments not long after the > > > patch was merged. > > > > >

Re: [FFmpeg-devel] [PATCH] lavc/vvc_mc: reduce sequential dependency in R-V V sad

2024-12-23 Thread flow gg
Hi, It looks like you submitted your review comments not long after the patch was merged. Previously, regarding the VVC avg patch, you mentioned "LGTM for the RISC-V side. No clue about the VVC side", so I contacted Nuomi in the hope that he could help merge the patch that had been pending for a w

Re: [FFmpeg-devel] [PATCH] Fix the tail handling in R-V V sad

2024-12-23 Thread flow gg
> That makes zero sense. The logical multiplier does not accommodate larger > vector lengths than 256 bits as things stand, and in the extreme, you can > always have vector lengths to large that even the smallest valid multiplier is > "too" large. Yes, I didn't consider vlen > 256. What do you thi

Re: [FFmpeg-devel] [PATCH] lavc/vvc_mc: reduce sequential dependency in R-V V sad

2024-12-21 Thread flow gg
> Don't clobber v8 here. > Use vsub.vv here to avoid the sequential dependency. Updated. 于2024年12月21日周六 20:22写道: > From: sunyuechi > > --- > libavcodec/riscv/vvc/vvc_sad_rvv.S | 10 +- > 1 file changed, 5 insertions(+), 5 deletions(-) > > diff --git a/libavcodec/riscv/vvc/vvc_sad_rvv.

Re: [FFmpeg-devel] [PATCH v2 1/2] avcodec/vvcdec: remove vvc prefix for x86 and riscv

2024-12-21 Thread flow gg
LGTM. Nuo Mi 于2024年12月21日周六 18:19写道: > --- > libavcodec/riscv/vvc/Makefile | 6 +++--- > libavcodec/riscv/vvc/{vvcdsp_init.c => dsp_init.c} | 0 > libavcodec/riscv/vvc/{vvc_mc_rvv.S => mc_rvv.S}| 0 > libavcodec/riscv/vvc/{vvc_sad_rvv.S => sad_rvv.S} | 0 > libavco

Re: [FFmpeg-devel] [PATCH 1/2] avcodec/vvcdec: remove vvc prefix for x86 and riscv

2024-12-20 Thread flow gg
Hi, other RISC-V assembly file names usually include the extensions being used, such as rvv, rvb, etc. How about naming them mc_rvv.S and sad_rvv.S? Nuo Mi 于2024年12月17日周二 11:59写道: > --- > libavcodec/riscv/vvc/Makefile | 6 +++--- > libavcodec/riscv/vvc/{vvcdsp_init.c => ds

Re: [FFmpeg-devel] [PATCH v2_2 6/6] lavc/vvc_mc R-V V sad

2024-12-17 Thread flow gg
> Don't clobber v8 here. > Use vsub.vv here to avoid the sequential dependency. Thanks, I will update later > Are you sure this does not require tail-undisturbed mode? I think you're > setting tail-agnostic mode up. I’m not sure if I understood correctly. My understanding is that tail-undisturbe

Re: [FFmpeg-devel] [PATCH v2_2 1/6] Update R-V V vvc_mc vset to support more lengths

2024-12-15 Thread flow gg
Resolved the conflict (because #elif ARCH_WASM was newly added in master). 于2024年12月15日周日 23:56写道: > From: sunyuechi > > --- > libavcodec/riscv/vvc/vvc_mc_rvv.S | 46 +++ > 1 file changed, 23 insertions(+), 23 deletions(-) > > diff --git a/libavcodec/riscv/vvc/vvc_m

Re: [FFmpeg-devel] [PATCH v2 1/6] Update R-V V vvc_mc vset to support more lengths

2024-12-10 Thread flow gg
Thank you, this approach can indeed address similar if else scenarios. vsetvlstatic \w, \vlen, e8, mf8, mf4, mf2, m1, m2, m4 vsetvlstatic \w, \vlen, e16, mf4, mf2, m1, m2, m4, m8 vsetvlstatic \w, \vlen, e32, mf2, m1, m2, m4, m8, m8 I plan to submit it after this patch set gets merged. Nuo Mi 于2

Re: [FFmpeg-devel] [PATCH v2 1/6] Update R-V V vvc_mc vset to support more lengths

2024-12-08 Thread flow gg
ping 于2024年12月1日周日 13:11写道: > From: sunyuechi > > --- > libavcodec/riscv/vvc/vvc_mc_rvv.S | 46 +++ > 1 file changed, 23 insertions(+), 23 deletions(-) > > diff --git a/libavcodec/riscv/vvc/vvc_mc_rvv.S > b/libavcodec/riscv/vvc/vvc_mc_rvv.S > index 45f4750f82..18532

Re: [FFmpeg-devel] [PATCH v2 1/2] checkasm/rv40dsp: cover more cases

2024-12-05 Thread flow gg
Thank you for your detailed explanation! :) Ronald S. Bultje 于2024年12月5日周四 20:38写道: > Hi, > > Christophe asked me to chime in. > > On Wed, Dec 4, 2024 at 4:14 AM wrote: > > > --- a/tests/checkasm/rv40dsp.c > > +++ b/tests/checkasm/rv40dsp.c > > @@ -27,7 +27,7 @@ > > #define randomize_buffers()

Re: [FFmpeg-devel] [PATCH v2 1/2] checkasm/rv40dsp: cover more cases

2024-12-04 Thread flow gg
Hi, the original issue I encountered was that FATE failed on RISC-V because the assembly code didn't handle `rv40_bias` correctly. I submitted a new patch: "checkasm/rv40dsp: cover more cases for rv40_bias" to test this situation. The value of `src` was indeed just copied from `h264_chroma_mc`. M

Re: [FFmpeg-devel] [PATCH 1/2] checkasm/rv40dsp: cover more cases for rv40_bias

2024-12-04 Thread flow gg
Update here because there's no need to change it to 0xFF. 于2024年12月5日周四 12:31写道: > From: sunyuechi > > --- > tests/checkasm/rv40dsp.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/tests/checkasm/rv40dsp.c b/tests/checkasm/rv40dsp.c > index a1a873d430..0600b07d09

Re: [FFmpeg-devel] [PATCH 2/2] lavc/rv40dsp: fix RISC-V chroma_mc

2024-12-04 Thread flow gg
The message was sent twice; please ignore this one. 于2024年12月5日周四 12:29写道: > From: sunyuechi > > --- > libavcodec/riscv/rv40dsp_rvv.S | 116 ++--- > 1 file changed, 78 insertions(+), 38 deletions(-) > > diff --git a/libavcodec/riscv/rv40dsp_rvv.S > b/libavcodec/risc

Re: [FFmpeg-devel] [PATCH 2/2] lavc/rv40dsp: fix RISC-V chroma_mc

2024-11-30 Thread flow gg
Hi, why is there an issue with the ABI? I previously just thought that s0 shouldn't be used here. Rémi Denis-Courmont 于2024年12月1日周日 00:19写道: > Le keskiviikkona 20. marraskuuta 2024, 3.26.52 EET u...@foxmail.com a > écrit : > > From: sunyuechi > > > > --- > > libavcodec/riscv/rv40dsp_rvv.S | 11

Re: [FFmpeg-devel] [PATCH 1/4] lavc/riscv: Move VVC macro to h26x

2024-11-26 Thread flow gg
's necessary to organize them together and resubmit, please let me know clearly. Rémi Denis-Courmont 于2024年11月27日周三 01:15写道: > Hi, > > Le torstaina 21. marraskuuta 2024, 12.43.38 EET flow gg a écrit : > > This patch comes after: > > [PATCH 1/2] Update R-V V vvc_mc vs

Re: [FFmpeg-devel] [PATCH 2/2] lavc/vvc_mc: R-V V dmvr

2024-11-26 Thread flow gg
solved. Rémi Denis-Courmont 于2024年11月27日周三 01:12写道: > Le tiistaina 26. marraskuuta 2024, 5.02.57 EET flow gg a écrit : > > ping > > Unless I am mistaken this set (as a whole) had unaddressed review comments. > > -- > Rémi Denis-Co

Re: [FFmpeg-devel] [PATCH 2/2] lavc/vvc_mc: R-V V dmvr

2024-11-25 Thread flow gg
ping 于2024年10月12日周六 17:28写道: > From: sunyuechi > > k230 banana_f3 > dmvr_8_12x20_c: 619.3 ( 1.00x)624.1 ( 1.00x) > dmvr_8_12x20_rvv_i32: 128.6 ( 4.82x)103.4 ( 6.04x) > dmvr_8_20x12_c:

Re: [FFmpeg-devel] [PATCH 4/4] lavc/vvc_mc R-V V sad

2024-11-25 Thread flow gg
Updated them. Rémi Denis-Courmont 于2024年11月18日周一 04:23写道: > Le sunnuntaina 17. marraskuuta 2024, 15.16.23 EET u...@foxmail.com a > écrit : > > From: sunyuechi > > > > k230 banana_f3 > > sad_8x16_c: 385.9 ( 1.00x)403.1 ( 1.00x) > > s

Re: [FFmpeg-devel] [PATCH 1/4] lavc/riscv: Move VVC macro to h26x

2024-11-21 Thread flow gg
This patch comes after: [PATCH 1/2] Update R-V V vvc_mc vset to support more lengths [PATCH 2/2] lavc/vvc_mc: R-V V dmvr. Rémi Denis-Courmont 于2024年11月19日周二 04:10写道: > Le sunnuntaina 17. marraskuuta 2024, 15.17.49 EET flow gg a écrit : > > > Generally speaking, I think that moving

Re: [FFmpeg-devel] [PATCH 2/2] lavc/rv40dsp: fix RISC-V chroma_mc

2024-11-19 Thread flow gg
Updated. Rémi Denis-Courmont 于2024年11月20日周三 00:19写道: > Le tiistaina 19. marraskuuta 2024, 11.11.40 EET u...@foxmail.com a écrit : > > From: sunyuechi > > That patch does not conform to the ABI. > > -- > レミ・デニ-クールモン > http://www.remlab.net/ > ___ > ffm

Re: [FFmpeg-devel] [PATCH 2/2] lavc/rv40dsp: fix RISC-V chroma_mc

2024-11-19 Thread flow gg
Use this instead 于2024年11月19日周二 17:12写道: > From: sunyuechi > > --- > libavcodec/riscv/rv40dsp_rvv.S | 113 ++--- > 1 file changed, 75 insertions(+), 38 deletions(-) > > diff --git a/libavcodec/riscv/rv40dsp_rvv.S > b/libavcodec/riscv/rv40dsp_rvv.S > index ca431eb8ab

Re: [FFmpeg-devel] [PATCH 2/2] lavc/rv40dsp: fix RISC-V chroma_mc

2024-11-19 Thread flow gg
Please ignore this 于2024年11月19日周二 17:08写道: > From: sunyuechi > > --- > libavcodec/riscv/rv40dsp_rvv.S | 111 ++--- > 1 file changed, 73 insertions(+), 38 deletions(-) > > diff --git a/libavcodec/riscv/rv40dsp_rvv.S > b/libavcodec/riscv/rv40dsp_rvv.S > index ca431eb8

Re: [FFmpeg-devel] [PATCH 1/4] lavc/riscv: Move VVC macro to h26x

2024-11-18 Thread flow gg
> Generally speaking, I think that moving code should be done in dedicated > patches. > You can branch here. The rest of the byte code is the same in all but one > cases. Updated this. 于2024年11月17日周日 21:17写道: > From: sunyuechi > > --- > libavcodec/riscv/h26x/asm.S | 127 +++

Re: [FFmpeg-devel] [PATCH] Revert "lavc/rv40dsp: R-V V chroma_mc"

2024-11-18 Thread flow gg
It seems like something overflowed, I'll take a look at it... Rémi Denis-Courmont 于2024年11月18日周一 00:51写道: > This reverts commit 5bc3b7f51308b8027e5468ef60d8336a960193e2. > > put_chroma_mc4, put_chroma_mc8 and avg_chroma_mc8 are confirmed to > break `fate-rv40`. It is probably just luck that avg_

Re: [FFmpeg-devel] [PATCH 3/5] lavc/vvc_mc: R-V V put_uni_pixels

2024-11-10 Thread flow gg
> Is this going to be reused anywhere? it seems the macro is only used once atm. The next patch will use ([PATCH 4/5] lavc/hevc: R-V V pel_uni(pow2)) > Also is there a reason to use RVV here instead of just unaligned RVI? Yes, RVI is enough; I deleted it and resent it. Rémi Denis-Courmont 于202

Re: [FFmpeg-devel] [PATCH 2/2] lavc/vvc_mc: R-V V dmvr

2024-11-08 Thread flow gg
ping 于2024年10月12日周六 17:28写道: > From: sunyuechi > > k230 banana_f3 > dmvr_8_12x20_c: 619.3 ( 1.00x)624.1 ( 1.00x) > dmvr_8_12x20_rvv_i32: 128.6 ( 4.82x)103.4 ( 6.04x) > dmvr_8_20x12_c:

Re: [FFmpeg-devel] [PATCH 1/5] lavc/vvc_mc: R-V V put_pixels

2024-10-28 Thread flow gg
> Even without Zvbb's widening shift, widening multiplication is probably faster here. Updated, it has indeed gotten faster. 于2024年10月29日周二 00:44写道: > From: sunyuechi > > k230 > banana_f3 > put_chroma_pixels_8_4x4_c:

Re: [FFmpeg-devel] [PATCH 3/5] lavc/vvc_mc: R-V V put_uni_pixels

2024-10-28 Thread flow gg
> Up to 64-bit rows, you can use strided loads and stores here. Due to the SRC_OFFSET in testing, only e8 and e16 can be loaded; e32 cannot be loaded (Bus error). Since the width ranges from 4 to 128, it seems that strided loads may not be possible. > Though for memory copying, unaligned scalar a

Re: [FFmpeg-devel] [PATCH 2/2] lavc/vvc_mc: R-V V dmvr

2024-10-12 Thread flow gg
Fixed asm through `dmvr_hv\vlen\w:` to `func dmvr_hv\vlen\w, zve32x, zbb, zba` Rémi Denis-Courmont 于2024年10月12日周六 14:33写道: > Hi, > > This fails to assemble here (binutils 2.43.1). > > -- > 雷米‧德尼-库尔蒙 > http://www.remlab.net/ > ___ > ffmpeg-devel mailing

Re: [FFmpeg-devel] [PATCH 4/5] lavc/hevc: R-V V pel_uni(pow2)

2024-10-11 Thread flow gg
Fix init. 于2024年10月12日周六 14:25写道: > From: sunyuechi > > k230 > banana_f3 > put_hevc_pel_uni_pixels4_8_c: 126.3 ( 1.00x) > 90.5 ( 1.00x) > put_hevc_pel_uni_pixels4_8_rvv_i32: 24.6 ( 5.14x) > 17.5

Re: [FFmpeg-devel] [PATCH] riscv/vvc: fix UNDEF whilst initialising DSP

2024-10-11 Thread flow gg
LGTM Rémi Denis-Courmont 于2024年10月12日周六 13:38写道: > The current triggers an illegal instruction if the CPU does not support > vectors. > --- > libavcodec/riscv/vvc/vvcdsp_init.c | 12 +++- > 1 file changed, 7 insertions(+), 5 deletions(-) > > diff --git a/libavcodec/riscv/vvc/vvcdsp_init

Re: [FFmpeg-devel] [PATCH 2/2] lavc/vvc_mc: R-V V dmvr

2024-10-11 Thread flow gg
ping. ([PATCH 1/5] lavc/vvc_mc: R-V V put_pixels is after this) 于2024年9月29日周日 00:47写道: > From: sunyuechi > > k230 banana_f3 > dmvr_8_12x20_c: 619.3 ( 1.00x)624.1 ( 1.00x) > dmvr_8_12x20_rvv_i32: 128.6 (

Re: [FFmpeg-devel] [PATCH 2/2] lavc/vvc_mc: R-V V dmvr

2024-09-28 Thread flow gg
s down.. after updating, it has indeed become faster. Rémi Denis-Courmont 于2024年9月28日周六 21:49写道: > > > Le 28 septembre 2024 12:42:37 GMT+03:00, flow gg a > écrit : > >> Is 4x unroll really faster than 2x here? We don't typically unroll 4x > >> manually. > > &

Re: [FFmpeg-devel] [PATCH 2/2] lavc/vvc_mc: R-V V dmvr

2024-09-28 Thread flow gg
> Is 4x unroll really faster than 2x here? We don't typically unroll 4x > manually. I first did 2x and then changed it to 4x. The test results are similar, and I'm not sure how to choose between them... > t5 seems to be 8-bit, so vwmulu.vx should work better here? Since you > leveraged it in the

Re: [FFmpeg-devel] [PATCH 1/2] lavc/vp9dsp: R-V V mc tap h v

2024-09-24 Thread flow gg
ping flow gg 于2024年8月28日周三 14:43写道: > It seems that the previous patch have partially lacked if RVB, but now it > has if (flags & AV_CPU_FLAG_RVB). > > Rémi Denis-Courmont 于2024年8月28日周三 03:00写道: > >> Le sunnuntaina 25. elokuuta 2024, 14.41.22 EEST flow gg a écrit :

Re: [FFmpeg-devel] [PATCH] lavc/vvc_mc: R-V V avg w_avg

2024-09-21 Thread flow gg
It feels like this patch has been sitting idle for quite a long time... Maybe it's time to merge it Rémi Denis-Courmont 于2024年9月14日周六 22:45写道: > Hi, > > LGTM for the RISC-V side. No clue about the VVC side. > ___ > ffmpeg-devel mailing list > ffmpeg-de

Re: [FFmpeg-devel] [PATCH] lavc/vvc_mc: R-V V avg w_avg

2024-09-15 Thread flow gg
> LGTM for the RISC-V side. No clue about the VVC side. Hi, Nuomi, could you please reply here? Thanks flow gg 于2024年9月13日周五 00:45写道: > ping > > flow gg 于2024年8月28日周三 14:38写道: > >> Updated: zve32x -> zve32x, zbb, zba >> >> 于2024年8月2

Re: [FFmpeg-devel] [PATCH] lavc/vvc_mc: R-V V avg w_avg

2024-09-12 Thread flow gg
ping flow gg 于2024年8月28日周三 14:38写道: > Updated: zve32x -> zve32x, zbb, zba > > 于2024年8月28日周三 14:37写道: > >> From: sunyuechi >> >> C908 X60 >> avg_8_2x2_c

Re: [FFmpeg-devel] [PATCH 1/2] lavc/vp9dsp: R-V V mc tap h v

2024-08-27 Thread flow gg
It seems that the previous patch have partially lacked if RVB, but now it has if (flags & AV_CPU_FLAG_RVB). Rémi Denis-Courmont 于2024年8月28日周三 03:00写道: > Le sunnuntaina 25. elokuuta 2024, 14.41.22 EEST flow gg a écrit : > > > Does not assemble with binutils 2.43.1 a

Re: [FFmpeg-devel] [PATCH] lavc/vvc_mc: R-V V avg w_avg

2024-08-27 Thread flow gg
Updated: zve32x -> zve32x, zbb, zba 于2024年8月28日周三 14:37写道: > From: sunyuechi > > C908 X60 > avg_8_2x2_c:1.21.0 > avg_8_2x2_rvv_i32 :0.70.7 > avg_8_2x4_

Re: [FFmpeg-devel] [PATCH 1/2] lavc/vp9dsp: R-V V mc tap h v

2024-08-25 Thread flow gg
> Does not assemble with binutils 2.43.1 and default flags. Fixed through zve32x -> zve32x, zba 于2024年8月25日周日 19:40写道: > From: sunyuechi > > C908 X60 > vp9_avg_8tap_smooth_4h_8bpp_c : 12.7 11.2 > vp9_avg_8tap_smoot

Re: [FFmpeg-devel] [PATCH] lavc/vvc_mc: R-V V avg w_avg

2024-08-18 Thread flow gg
I wrote `ff_vvc_w_avg_8_rvv` by mimicking the h264 weight function. Based on the test results for 49 different resolutions, most of them were significantly slower. Only 2x32 and 2x64 had similar performance, without noticeable speed improvement. I'm not sure about the reason. Some differences ar

Re: [FFmpeg-devel] [PATCH] lavc/vvc_mc: R-V V avg w_avg

2024-08-17 Thread flow gg
How can I test the weight and biweight of H.264? I haven't seen the related test code.. tests/checkasm/checkasm --bench --test=h264dsp Rémi Denis-Courmont 于2024年8月15日周四 16:10写道: > > > Le 3 août 2024 13:30:34 GMT+03:00, u...@foxmail.com a écrit : > >From: sunyuechi > > > >

Re: [FFmpeg-devel] [PATCH 2/4] lavc/vp9dsp: R-V V mc bilin hv

2024-08-09 Thread flow gg
> That seems suboptimal and unnecessary. Updated it, there is no longer any vmv. 于2024年8月9日周五 22:24写道: > From: sunyuechi > > C908 X60 > vp9_avg_bilin_4hv_8bpp_c : 10.79.5 > vp9_avg_bilin_4hv_8bpp_rvv_i32

Re: [FFmpeg-devel] [PATCH] lavc/vvc_mc: R-V V avg w_avg

2024-08-03 Thread flow gg
Added lpad and resolved conflicts with master. 于2024年8月3日周六 18:31写道: > From: sunyuechi > > C908 X60 > avg_8_2x2_c:1.21.0 > avg_8_2x2_rvv_i32 :0.70.7 >

Re: [FFmpeg-devel] [PATCH 1/4] lavc/vp9dsp: R-V V mc bilin h v

2024-08-03 Thread flow gg
> Looks OK, but missing CFI landing pads. Added lpad. 于2024年8月3日周六 17:51写道: > From: sunyuechi > > C908 X60 > vp9_avg_bilin_4h_8bpp_c:5.54.7 > vp9_avg_bilin_4h_8bpp_rvv_i32 :1.7

Re: [FFmpeg-devel] [PATCH 3/4] lavc/vp9dsp: R-V V mc tap h v

2024-08-01 Thread flow gg
> Use rounding. Updated it and resolved conflicts with master. 于2024年8月1日周四 20:16写道: > From: sunyuechi > > C908 X60 > vp9_avg_8tap_smooth_4h_8bpp_c : 12.7 11.2 > vp9_avg_8tap_smooth_4h_8bpp_rvv_i32:

Re: [FFmpeg-devel] [PATCH 1/4] lavc/vp8dsp: R-V V 256 bilin,epel

2024-07-31 Thread flow gg
Denis-Courmont 于2024年7月31日周三 23:06写道: > Le tiistaina 30. heinäkuuta 2024, 20.57.28 EEST flow gg a écrit : > > From my understanding, moving from supporting only 128b to adding 256b > > versions can simultaneously improve LMUL and solve some issues related to > > insufficient

Re: [FFmpeg-devel] [PATCH v4 3/4] lavc/vp9dsp: R-V V mc tap h v

2024-07-31 Thread flow gg
I'm a bit confused because the calculation here goes up to 32 bits and then returns to 8 bits. It seems that the vmax and vnclipu instructions can't be removed by using round-related instructions? Rémi Denis-Courmont 于2024年7月29日周一 23:21写道: > Le tiistaina 23. heinäkuuta 2024, 11.51.48 EEST u...@f

Re: [FFmpeg-devel] [PATCH 1/4] lavc/vp8dsp: R-V V 256 bilin,epel

2024-07-30 Thread flow gg
Hi, these four patches have v2 (although the first one seems to be the same). From my understanding, moving from supporting only 128b to adding 256b versions can simultaneously improve LMUL and solve some issues related to insufficient vector registers (vvc, vp9). This can be very helpful in certa

Re: [FFmpeg-devel] [PATCH v4 4/4] lavc/vp9dsp: R-V V mc tap hv

2024-07-23 Thread flow gg
Because of the 3/4 update, updated it." 于2024年7月23日周二 16:59写道: > From: sunyuechi > > C908 X60 > vp9_avg_8tap_smooth_4hv_8bpp_c : 32.0 28.0 > vp9_avg_8tap_smooth_4hv_8bpp_rvv_i32 : 15.0 13.2 > vp9_av

Re: [FFmpeg-devel] [PATCH v4 3/4] lavc/vp9dsp: R-V V mc tap h v

2024-07-23 Thread flow gg
> TBH it is very hard to review this due to the large extents of code > conditionals. This should avoidable at least partly. You can name macros for > each filter and then expand those macros instead of using if's. Do you mean that before the addition of .equ ff_vp9_subpel_filters_xxx, epel_filter

Re: [FFmpeg-devel] [PATCH] lavc/vvc_mc: R-V V avg w_avg

2024-07-21 Thread flow gg
Okay, updated it Rémi Denis-Courmont 于2024年7月19日周五 23:56写道: > Le torstaina 18. heinäkuuta 2024, 18.04.15 EEST flow gg a écrit : > > > Again, I don't think that a maximul multiplier belongs here. If the > > > calling code cannot scale the multiplier up, then it sho

Re: [FFmpeg-devel] [PATCH] lavc/vvc_mc: R-V V avg w_avg

2024-07-18 Thread flow gg
> Again, I don't think that a maximul multiplier belongs here. If the calling > code cannot scale the multiplier up, then it should be a normal loop providing > the same code for all VLENs. I think it's acceptable to add such a parameter, which isn't particularly common in other files, because thi

Re: [FFmpeg-devel] [PATCH v2 2/4] lavc/vp8dsp: R-V V loop_filter_simple

2024-07-14 Thread flow gg
> vssseg2e8 > vlsseg4e8 > vwadd.wv > I can't find where VXRM is initialised for that. Updated them and add csrwi 于2024年7月15日周一 00:30写道: > From: sunyuechi > > C908 X60 > vp8_loop_filter_simple_h_c :6.25.7 > v

Re: [FFmpeg-devel] [PATCH v5] lavc/vvc_mc: R-V V avg w_avg

2024-07-10 Thread flow gg
function, then vsetvlstatic16 uses max_lmul == m8. If e32 is involved in the function, then vsetvlstatic16 uses max_lmul == m4. I think it is clearer now. Rémi Denis-Courmont 于2024年7月8日周一 23:41写道: > Le maanantaina 1. heinäkuuta 2024, 19.09.01 EEST flow gg a écrit : > > I reviewed it again, th

Re: [FFmpeg-devel] [PATCH v5] lavc/vvc_mc: R-V V avg w_avg

2024-07-01 Thread flow gg
I reviewed it again, the purpose of is_w is to limit lmul to a maximum of 1/4 of vlen, to prevent vector register shortage, which can also be considered as vset limiting lmul. I renamed it to quarter_len_limit. t0 is changed to t1. 于2024年7月2日周二 00:07写道: > From: sunyuechi > >

Re: [FFmpeg-devel] [PATCH v5] lavc/vvc_mc: R-V V avg w_avg

2024-07-01 Thread flow gg
> I am not sure what is_w means or serves here. If you need special cases, this > feels a bit out of place for this macro. It is a special case added to merge the vset of avg and w_avg, how about giving it a default value so that it doesn't affect the use of other functions? > I am not sure if I

Re: [FFmpeg-devel] [PATCH 2/2] lavc/h264dsp: R-V V 8-bit luma loop filter

2024-07-01 Thread flow gg
The loop filter horizontal in vp8 also has this issue .. Rémi Denis-Courmont 于2024年6月30日周日 17:04写道: > T-Head C908 (cycles): > h264_h_loop_filter_luma_8bpp_c: 297.5 > h264_h_loop_filter_luma_8bpp_rvv_i32: 374.7 > h264_v_loop_filter_luma_8bpp_c: 862.7 > h264_v_loop_filter_luma_8bpp_rvv

Re: [FFmpeg-devel] [PATCH v4 2/4] lavc/vp9dsp: R-V V mc bilin hv

2024-06-30 Thread flow gg
Initially, I tried using `vnclip.wi` with reference to h264, -vwadd.wxv16, v16, t4 -vnsra.wiv16, v16, 4 +vnclip.wi v16, v16, 4 but couldn't find the correct way... I think there might be some overflow issues that I didn't understand correctly. How do y

Re: [FFmpeg-devel] [PATCH v4 3/4] lavc/vp9dsp: R-V V mc tap h v

2024-06-15 Thread flow gg
> You can directly LLA filters + 16 * 8 * 2 and save one add. Same below. You can > also use .equ to alias the filter addresses, and avoid if's. > That's a lot of address dependencies, which is going to hurt performance. It > might help to just spill more S registers if needed. > This can be done

Re: [FFmpeg-devel] [PATCH v4 2/4] lavc/vp9dsp: R-V V mc bilin hv

2024-06-15 Thread flow gg
> Copying vectors is rarely justified - mostly only before destructive > instructions such as FMA. It is slightly different from VP8. In VP8, many scalar values are positive, so the related calculations can be easily replaced. However, in this context of VP9, since t2 is a negative number, vwmaccs

Re: [FFmpeg-devel] [PATCH v4 1/4] lavc/vp9dsp: R-V V mc bilin h v

2024-06-15 Thread flow gg
Just like in VP8, the unroll has been updated. 于2024年6月15日周六 19:51写道: > From: sunyuechi > > C908 X60 > vp9_avg_bilin_4h_8bpp_c:5.54.7 > vp9_avg_bilin_4h_8bpp_rvv_i32 :1.71.5 >

Re: [FFmpeg-devel] [PATCH] lavc/vp8dsp: R-V V put_bilin_h v unroll

2024-06-12 Thread flow gg
> Does this not render the type parameter of bilin_load useless (always h)? > (Not a blocker for this patch.) Yes, this was needed in the initial version, but it is no longer required. I just sent a patch. > Not sure if I already asked this but is this really faster than slide1? > Normally we wan

Re: [FFmpeg-devel] [PATCH] lavc/vp8dsp: R-V V put_bilin_h v unroll

2024-06-12 Thread flow gg
ping 于2024年5月30日周四 23:27写道: > From: sunyuechi > > Since len < 64, the registers are sufficient, so it can be > directly unrolled (a4 is even). > > Another benefit of unrolling is that it reduces one load operation > vertically compared to horizontally. > > old

Re: [FFmpeg-devel] [PATCH v4] lavc/vvc_mc: R-V V avg w_avg

2024-06-11 Thread flow gg
> Nit: for overall code base consistency, I'd use csrwi here. Reason being that > for other rounding modes, csrwi is the better option. > > Probably faster to swap the two above, to avoid stalling on LD. > > If you check more than one length, better to get ff_get_rv_vlenb() into a local > variable.

Re: [FFmpeg-devel] [PATCH v3] lavc/vvc_mc: R-V V avg w_avg

2024-06-11 Thread flow gg
> I think we can drop the 2x2 transforms. In all likelihood, scalar code will > end up faster than vector code on future hardware, especially out-of-order > pipelines. I want to drop 2x2, but since there's only one function to handle all situations instead of 7*7 functions.. > AFAIU, this will ge

Re: [FFmpeg-devel] [PATCH v2] lavc/vvc_mc: R-V V avg w_avg

2024-06-01 Thread flow gg
> I think we can drop the 2x2 transforms. In all likelihood, scalar code will > end up faster than vector code on future hardware, especially out-of-order > pipelines. I want to drop 2x2, but since there's only one function to handle all situations instead of 7*7 functions, how can I drop only 2x2

Re: [FFmpeg-devel] [PATCH v2] lavc/vvc_mc: R-V V avg w_avg

2024-06-01 Thread flow gg
> In keeping in line with the rest of the project, that should probably go into > **libavcodec/riscv/vvc/** > Expanding the macro 49 times, with up to 14 **branches** to get there is maybe not > such a great idea. It might look nice on the checkasm µbenchmarks because the > branches under test get

Re: [FFmpeg-devel] [PATCH] lavc/vp8dsp: R-V V put_bilin_h v unroll

2024-05-30 Thread flow gg
Well.. because scalar registers are limited, the direct unrolling will be like this for now. We can handle different lengths separately in the future flow gg 于2024年5月30日周四 23:36写道: > I directly copied the VP9 modifications over... Since len <= 16, it seems > like it can be improved a

Re: [FFmpeg-devel] [PATCH] lavc/vp8dsp: R-V V put_bilin_h v unroll

2024-05-30 Thread flow gg
I directly copied the VP9 modifications over... Since len <= 16, it seems like it can be improved a bit more 于2024年5月30日周四 23:27写道: > From: sunyuechi > > Since len < 64, the registers are sufficient, so it can be > directly unrolled (a4 is even). > > Another benefit of unrolling is that it redu

Re: [FFmpeg-devel] [PATCH v3 4/5] lavc/vp9dsp: R-V V mc tap h v

2024-05-29 Thread flow gg
A portion has been modified according to the previous review, but there are still some parts that haven't been updated > Similarly, it > should be possible to share most of the horizontal and vertical code (maybe > also for bilinear. not just EPel) with separate load/store then inner > procedures.

Re: [FFmpeg-devel] [PATCH v2 1/5] lavc/vp9dsp: R-V V mc avg

2024-05-26 Thread flow gg
Hi, maybe we can prioritize this revert: https://git.ffmpeg.org/gitweb/ffmpeg.git/commit/0c1304ae11b0361ede055ee8ffc6e83529468c73 Using [PATCH v2 1/5] lavc/vp9dsp: R-V V mc avg to avoid conflicts with other patches. flow gg 于2024年5月24日周五 14:13写道: > I want to update the VP9 bilin load, just l

Re: [FFmpeg-devel] [PATCH 5/5] lavc/vp8dsp: factor R-V V EPEL functions for all lengths

2024-05-25 Thread flow gg
reduction in code size seems to be due to switching to using j labels, doesn't seem to be about vset, but another issue. j labels are indeed better. I will make similar modifications. Rémi Denis-Courmont 于2024年5月26日周日 02:29写道: > Le lauantaina 25. toukokuuta 2024, 21.16.22 EEST flow gg a écri

Re: [FFmpeg-devel] [PATCH 5/5] lavc/vp8dsp: factor R-V V EPEL functions for all lengths

2024-05-25 Thread flow gg
Would it be better to replace the two vsetvlstatic8 and vsetvlstatic16 with two vsetvl? This would require the previous patch and this one to work together, increasing the number of lines of code and making the code a bit harder to read. Additionally, I have a question about patch 4 'save one R-V G

Re: [FFmpeg-devel] [PATCH v2 3/5] lavc/vp9dsp: R-V V mc tap h v

2024-05-25 Thread flow gg
One more thing I remember is that after adjusting the sign, vmacc can be used; otherwise, due to the sign, mul + add are needed. flow gg 于2024年5月25日周六 18:38写道: > > Is there a reason that you cannot use the tables from C code? > > Similar to VP8, to adjust the positive and negat

Re: [FFmpeg-devel] [PATCH v2 3/5] lavc/vp9dsp: R-V V mc tap h v

2024-05-25 Thread flow gg
> Is there a reason that you cannot use the tables from C code? Similar to VP8, to adjust the positive and negative data and prevent small probability overflow during calculations. > AFAICT, regular and sharp are identical, except for the base address of the > filter table, so it should be possib

Re: [FFmpeg-devel] [PATCH v2 1/5] lavc/vp9dsp: R-V V mc avg

2024-05-23 Thread flow gg
I want to update the VP9 bilin load, just like you did with VP8, but it seems like this patch([PATCH v2 1/5] lavc/vp9dsp: R-V V mc avg) doesn't merge the current updates here but merges the previous version instead, so the subsequent patches will have conflicts. flow gg 于2024年5月22日周三 01

Re: [FFmpeg-devel] [PATCH] lavc/rv34dsp: optimise R-V V idct_dc_add

2024-05-22 Thread flow gg
Unfortunately I only test to obtain benchmarks and basic correctness. I always feel the need for a professional to write the tests. Rémi Denis-Courmont 于2024年5月23日周四 04:35写道: > > > Le 22 mai 2024 23:28:54 GMT+03:00, "Rémi Denis-Courmont" > a écrit : > >This removes one stray LI and reworks the

Re: [FFmpeg-devel] [PATCH] lavc/vvc_mc: R-V V avg w_avg

2024-05-21 Thread flow gg
Reordered some here. 于2024年5月22日周三 03:24写道: > From: sunyuechi > > C908 X60 > avg_8_2x2_c:1.01.0 > avg_8_2x2_rvv_i32 :0.70.7 > avg_8_2x4_c

Re: [FFmpeg-devel] [PATCH v2 2/5] lavc/vp9dsp: R-V V mc bilin h v

2024-05-21 Thread flow gg
Do macros definition also need a comma? I noticed that many of my old code and SiFive's code don't have a comma Rémi Denis-Courmont 于2024年5月22日周三 02:29写道: > Le tiistaina 21. toukokuuta 2024, 20.13.16 EEST u...@foxmail.com a écrit : > > From: sunyuechi > > > diff --git a/libavcodec/riscv/vp9_mc_

Re: [FFmpeg-devel] [PATCH] lavc/vvc_mc: R-V V avg w_avg

2024-05-21 Thread flow gg
> I would expect that you can get better performance by interleaving scalar and vector stuff, and possibly also vector loads and vector arithmetic. Okay, I will try > These labels lead to nowhere? If you actually mean to implicitly fall through to the next function, you can use the function name

Re: [FFmpeg-devel] [PATCH v2 1/5] lavc/vp9dsp: R-V V mc avg

2024-05-21 Thread flow gg
> Please put commas between operands. > This should probably be ff_avg_vp9 or something slightly more specific. Updated here. 于2024年5月22日周三 01:14写道: > From: sunyuechi > > C908: > vp9_avg4_8bpp_c: 1.2 > vp9_avg4_8bpp_rvv_i64: 1.0 > vp9_avg8_8bpp_c: 3.7 > vp9_avg8_8bpp_rvv_i64: 1.5 > vp9_avg16_8

Re: [FFmpeg-devel] [PATCH v4 1/5] lavc/vp9dsp: R-V V mc avg

2024-05-21 Thread flow gg
> Please put commas between operands. Okay > This should probably be ff_avg_vp9 or something slightly more specific. Is it necessary here? Many macros in the C file are copied from MIPS, where it is called ff_avg4_msa. Here, it has been simply changed to ff_avg4_rvv. Rémi Denis-Courmont 于2024年

Re: [FFmpeg-devel] [PATCH] lavc/vvc_mc: R-V V avg w_avg

2024-05-21 Thread flow gg
There are three unused lines which I forgot to delete before submitting. I have updated them here. 于2024年5月21日周二 15:47写道: > From: sunyuechi > > C908 X60 > avg_8_2x2_c:1.01.0 > avg_8_2x2_rvv_i

Re: [FFmpeg-devel] [PATCH] lavc/vvc_mc: R-V V avg w_avg

2024-05-21 Thread flow gg
To obtain test results, need to comment out the if (w == h) in tests/checkasm/vvc_mc.c. Because vset needs to be used in the loop, I manually wrote a cumbersome vset macro. 于2024年5月21日周二 15:38写道: > From: sunyuechi > > C908 X60 > avg_8_2x2_

Re: [FFmpeg-devel] [PATCH 1/4] lavc/vp8dsp: R-V V put_epel hv

2024-05-19 Thread flow gg
fix .irp use 于2024年5月19日周日 16:18写道: > From: sunyuechi > > C908: > vp8_put_epel4_h4v4_c: 20.0 > vp8_put_epel4_h4v4_rvv_i32: 11.0 > vp8_put_epel4_h4v6_c: 25.2 > vp8_put_epel4_h4v6_rvv_i32: 13.5 > vp8_put_epel4_h6v4_c: 22.2 > vp8_put_epel4_h6v4_rvv_i32: 14.5 > vp8_put_epel4_h6v6_c: 29.0 > vp8_put_

Re: [FFmpeg-devel] [PATCH v3 6/9] lavc/vp9dsp: R-V V mc bilin h v

2024-05-18 Thread flow gg
fixed in v4 Rémi Denis-Courmont 于2024年5月18日周六 23:56写道: > Le maanantaina 13. toukokuuta 2024, 19.59.23 EEST u...@foxmail.com a > écrit : > > From: sunyuechi > > > > C908: > > vp9_avg_bilin_4h_8bpp_c: 5.2 > > vp9_avg_bilin_4h_8bpp_rvv_i64: 2.2 > > vp9_avg_bilin_4v_8bpp_c: 5.5 > > vp9_avg_bilin_4v

Re: [FFmpeg-devel] [PATCH v4 1/5] lavc/vp9dsp: R-V V mc avg

2024-05-18 Thread flow gg
Fixed issues with .irp and comma, as well as the ifc issue (same modifications as previously done for vp8). 于2024年5月19日周日 02:16写道: > From: sunyuechi > > C908: > vp9_avg4_8bpp_c: 1.2 > vp9_avg4_8bpp_rvv_i64: 1.0 > vp9_avg8_8bpp_c: 3.7 > vp9_avg8_8bpp_rvv_i64: 1.5 > vp9_avg16_8bpp_c: 14.7 > vp9_a

Re: [FFmpeg-devel] [PATCH v3 5/9] lavc/vp9dsp: R-V V mc avg

2024-05-17 Thread flow gg
yeah, updated it in the reply Rémi Denis-Courmont 于2024年5月17日周五 23:11写道: > Le maanantaina 13. toukokuuta 2024, 19.59.22 EEST u...@foxmail.com a > écrit : > > From: sunyuechi > > > > C908: > > vp9_avg4_8bpp_c: 1.2 > > vp9_avg4_8bpp_rvv_i64: 1.0 > > vp9_avg8_8bpp_c: 3.7 > > vp9_avg8_8bpp_rvv_i64:

Re: [FFmpeg-devel] [PATCHv2 2/2] lavc/startcode: add R-V V startcode_find_candidate

2024-05-15 Thread flow gg
Is the test result missing here? Rémi Denis-Courmont 于2024年5月16日周四 01:11写道: > --- > libavcodec/riscv/Makefile| 1 + > libavcodec/riscv/h264dsp_init.c | 5 > libavcodec/riscv/startcode_rvv.S | 44 > libavcodec/riscv/vc1dsp_init.c | 16 +++---

Re: [FFmpeg-devel] [PATCH 4/9] lavc/vp9dsp: R-V V ipred tm

2024-05-14 Thread flow gg
updated for clean code 于2024年5月15日周三 11:56写道: > From: sunyuechi > > C908: > vp9_tm_4x4_8bpp_c: 116.5 > vp9_tm_4x4_8bpp_rvv_i32: 43.5 > vp9_tm_8x8_8bpp_c: 416.2 > vp9_tm_8x8_8bpp_rvv_i32: 86.0 > vp9_tm_16x16_8bpp_c: 1665.5 > vp9_tm_16x16_8bpp_rvv_i32: 187.2 > vp9_tm_32x32_8bpp_c: 6974.2 > vp9_tm

Re: [FFmpeg-devel] [PATCH v3 4/9] lavc/vp9dsp: R-V V ipred tm

2024-05-14 Thread flow gg
in the reply Rémi Denis-Courmont 于2024年5月15日周三 02:08写道: > Le tiistaina 14. toukokuuta 2024, 20.57.17 EEST flow gg a écrit : > > Why is it unnecessary to reset the vector configuration every time? I > think > > it is necessary to reset e16/e8 each time. > > I misread the p

Re: [FFmpeg-devel] [PATCH v3 4/9] lavc/vp9dsp: R-V V ipred tm

2024-05-14 Thread flow gg
Why is it unnecessary to reset the vector configuration every time? I think it is necessary to reset e16/e8 each time. Rémi Denis-Courmont 于2024年5月15日周三 01:46写道: > Le maanantaina 13. toukokuuta 2024, 19.59.21 EEST u...@foxmail.com a > écrit : > > From: sunyuechi > > > > C908: > > vp9_tm_4x4_8bp

Re: [FFmpeg-devel] [PATCH v3 1/9] lavc/vp9dsp: R-V ipred vert

2024-05-14 Thread flow gg
Okay, learned it Rémi Denis-Courmont 于2024年5月15日周三 01:00写道: > Le tiistaina 14. toukokuuta 2024, 7.45.29 EEST flow gg a écrit : > > I am locally using: > > if (bpp == 8 && (flags & AV_CPU_FLAG_RVI) && (flags & > > AV_CPU_FLAG_RVB_ADDR)) { > &g

Re: [FFmpeg-devel] [PATCH v3 2/9] lavc/vp9dsp: R-V mc copy

2024-05-14 Thread flow gg
Using this will give output `if (bpp == 8 && (flags & AV_CPU_FLAG_RVI)) {` Did you comment out the MISALIGNED flag check but not add RVI, resulting in no output? Rémi Denis-Courmont 于2024年5月15日周三 01:02写道: > Le tiistaina 14. toukokuuta 2024, 7.44.55 EEST flow gg a écrit : > &g

  1   2   3   4   >