Re: [FFmpeg-devel] [PATCH v3 2/9] lavc/vp8dsp: R-V V put_bilin_h v

2024-05-07 Thread flow gg
I didn't understand what you mean... What does judging whether the type is 'h' or 'v' have to do with the number? Rémi Denis-Courmont 于2024年5月8日周三 00:00写道: > Le maanantaina 6. toukokuuta 2024, 6.38.02 EEST u...@foxmail.com a écrit : > > From: sunyuechi > > > > C908: > > vp8_put_bilin4_h_c: 367.

Re: [FFmpeg-devel] [PATCH v4 2/9] lavc/vp8dsp: R-V V put_bilin_h v

2024-05-07 Thread flow gg
> h is not a number so that's not a valid condition. Fixed two of this issue 于2024年5月8日周三 00:55写道: > From: sunyuechi > > C908: > vp8_put_bilin4_h_c: 367.0 > vp8_put_bilin4_h_rvv_i32: 137.7 > vp8_put_bilin4_v_c: 377.0 > vp8_put_bilin4_v_rvv_i32: 137.7 > vp8_put_bilin8_h_c: 1431.0 > vp8_put_bili

Re: [FFmpeg-devel] [PATCH v2 3/9] lavc/vp9dsp: R-V V ipred hor

2024-05-07 Thread flow gg
> Do you gain much by unrolling all the way to 16x? Given that you have the > counter value already in t0, it should not make much difference to just unroll > 2x or maybe 4x and then loop. I chose this simple method because I think the effect is about the same.. Do I need to change it? > It might

Re: [FFmpeg-devel] [PATCH] lavc/vc1dsp: R-V V mspel_pixels

2024-05-10 Thread flow gg
Hi, I got BananaPi F3, made some fixes, updated in reply Rémi Denis-Courmont 于2024年5月6日周一 03:26写道: > Le sunnuntaina 5. toukokuuta 2024, 12.18.56 EEST flow gg a écrit : > > > Does MF2 actually improve perfs over M1 here? > > > > The difference here seems very small, but

Re: [FFmpeg-devel] [PATCH] lavc/vc1dsp: R-V V mspel_pixels

2024-05-11 Thread flow gg
: > Le perjantaina 10. toukokuuta 2024, 11.22.53 EEST flow gg a écrit : > > Hi, I got BananaPi F3, made some fixes, updated in reply > > So... Does it benefit from halving the logical multiplier to process > fixed-sized > block as compared to C908, or can we stick to the same code r

Re: [FFmpeg-devel] [PATCH 1/3] lavc/vp9dsp: fix indentation

2024-05-11 Thread flow gg
The patch `lavc/vp9dsp: R-V ipred vert` needs to add `#if HAVE_RV`. How about I modify these `#if HAVE_RVV` indentations together in this patch? Rémi Denis-Courmont 于2024年5月11日周六 00:39写道: > --- > libavcodec/riscv/vp9dsp_init.c | 50 +- > 1 file changed, 25 insert

Re: [FFmpeg-devel] [PATCH v4 6/9] lavc/vp8dsp: R-V V put_epel hv

2024-05-11 Thread flow gg
Okay, updated it in the reply Rémi Denis-Courmont 于2024年5月10日周五 23:41写道: > Le tiistaina 7. toukokuuta 2024, 19.54.09 EEST u...@foxmail.com a écrit : > > From: sunyuechi > > > > C908: > > vp8_put_epel4_h4v4_c: 20.0 > > vp8_put_epel4_h4v4_rvv_i32: 11.0 > > vp8_put_epel4_h4v6_c: 25.2 > > vp8_put_e

Re: [FFmpeg-devel] [PATCH] lavc/vc1dsp: R-V V mspel_pixels

2024-05-11 Thread flow gg
In banana_f3, further reducing the value of mf resulted in another performance improvement. I think in the end we might need to use different functions depending on vlen in init.. Rémi Denis-Courmont 于2024年5月11日周六 18:24写道: > Le lauantaina 11. toukokuuta 2024, 13.02.02 EEST flow gg a éc

Re: [FFmpeg-devel] [PATCH v3 1/9] lavc/vp8dsp: R-V put_vp8_pixels

2024-05-11 Thread flow gg
Wow, got it Rémi Denis-Courmont 于2024年5月11日周六 22:39写道: > Le maanantaina 6. toukokuuta 2024, 6.38.01 EEST u...@foxmail.com a écrit : > > From: sunyuechi > > > > C908: > > vp8_put_pixels4_c: 78.0 > > vp8_put_pixels4_rvi: 33.7 > > vp8_put_pixels8_c: 278.0 > > vp8_put_pixels8_rvi: 55.0 > > vp8_put_

Re: [FFmpeg-devel] [PATCH v3 1/9] lavc/vp9dsp: R-V ipred vert

2024-05-12 Thread flow gg
> It should be possible to improve ordering to avoid immediate dependency from ADD to SD Okay, updated it. Additionally improved the mc-tap_64 on vlen>=256 and something 于2024年5月12日周日 18:04写道: > From: sunyuechi > > C908: > vp9_vert_8x8_8bpp_c: 22.0 > vp9_vert_8x8_8bpp_rvi: 15.7 > vp9_vert_16x

Re: [FFmpeg-devel] [PATCH] lavc/vc1dsp: R-V V mspel_pixels

2024-05-12 Thread flow gg
It seems like it can't... update using AV_CPU_FLAG_RV_MISALIGNED Rémi Denis-Courmont 于2024年5月12日周日 19:48写道: > Le perjantaina 10. toukokuuta 2024, 11.21.14 EEST u...@foxmail.com a > écrit : > > From: sunyuechi > > > > C908 X60 > > vc1dsp.avg_

Re: [FFmpeg-devel] [PATCH v3 1/9] lavc/vp9dsp: R-V ipred vert

2024-05-13 Thread flow gg
just rebase 于2024年5月14日周二 01:00写道: > From: sunyuechi > > C908: > vp9_vert_8x8_8bpp_c: 22.0 > vp9_vert_8x8_8bpp_rvi: 15.7 > vp9_vert_16x16_8bpp_c: 71.2 > vp9_vert_16x16_8bpp_rvi: 39.0 > vp9_vert_32x32_8bpp_c: 300.2 > vp9_vert_32x32_8bpp_rvi: 135.2 > --- > libavcodec/riscv/Makefile| 1 +

Re: [FFmpeg-devel] [PATCH v3 2/9] lavc/vp9dsp: R-V mc copy

2024-05-13 Thread flow gg
I am locally using: if (bpp == 8 && (flags & AV_CPU_FLAG_RVI)) { this performs better on k230/banana_f3 than C. For email, refer to [FFmpeg-devel] [PATCH 2/2] lavc/vp8dsp: restrict RVI optimisations and change it to if (bpp == 8 && (flags & AV_CPU_FLAG_RV_MISALIGNED)) { So no output, but I

Re: [FFmpeg-devel] [PATCH v3 1/9] lavc/vp9dsp: R-V ipred vert

2024-05-13 Thread flow gg
I am locally using: if (bpp == 8 && (flags & AV_CPU_FLAG_RVI) && (flags & AV_CPU_FLAG_RVB_ADDR)) { this performs better on k230/banana_f3 than C. For email, refer to [FFmpeg-devel] [PATCH 2/2] lavc/vp8dsp: restrict RVI optimisations and change it to if (bpp == 8 && (flags & AV_CPU_FLAG_RV_M

Re: [FFmpeg-devel] [PATCH v3 2/9] lavc/vp9dsp: R-V mc copy

2024-05-14 Thread flow gg
Using this will give output `if (bpp == 8 && (flags & AV_CPU_FLAG_RVI)) {` Did you comment out the MISALIGNED flag check but not add RVI, resulting in no output? Rémi Denis-Courmont 于2024年5月15日周三 01:02写道: > Le tiistaina 14. toukokuuta 2024, 7.44.55 EEST flow gg a écrit : > &g

Re: [FFmpeg-devel] [PATCH v3 1/9] lavc/vp9dsp: R-V ipred vert

2024-05-14 Thread flow gg
Okay, learned it Rémi Denis-Courmont 于2024年5月15日周三 01:00写道: > Le tiistaina 14. toukokuuta 2024, 7.45.29 EEST flow gg a écrit : > > I am locally using: > > if (bpp == 8 && (flags & AV_CPU_FLAG_RVI) && (flags & > > AV_CPU_FLAG_RVB_ADDR)) { > &g

Re: [FFmpeg-devel] [PATCH v3 4/9] lavc/vp9dsp: R-V V ipred tm

2024-05-14 Thread flow gg
Why is it unnecessary to reset the vector configuration every time? I think it is necessary to reset e16/e8 each time. Rémi Denis-Courmont 于2024年5月15日周三 01:46写道: > Le maanantaina 13. toukokuuta 2024, 19.59.21 EEST u...@foxmail.com a > écrit : > > From: sunyuechi > > > > C908: > > vp9_tm_4x4_8bp

Re: [FFmpeg-devel] [PATCH v3 4/9] lavc/vp9dsp: R-V V ipred tm

2024-05-14 Thread flow gg
in the reply Rémi Denis-Courmont 于2024年5月15日周三 02:08写道: > Le tiistaina 14. toukokuuta 2024, 20.57.17 EEST flow gg a écrit : > > Why is it unnecessary to reset the vector configuration every time? I > think > > it is necessary to reset e16/e8 each time. > > I misread the p

Re: [FFmpeg-devel] [PATCH 4/9] lavc/vp9dsp: R-V V ipred tm

2024-05-14 Thread flow gg
updated for clean code 于2024年5月15日周三 11:56写道: > From: sunyuechi > > C908: > vp9_tm_4x4_8bpp_c: 116.5 > vp9_tm_4x4_8bpp_rvv_i32: 43.5 > vp9_tm_8x8_8bpp_c: 416.2 > vp9_tm_8x8_8bpp_rvv_i32: 86.0 > vp9_tm_16x16_8bpp_c: 1665.5 > vp9_tm_16x16_8bpp_rvv_i32: 187.2 > vp9_tm_32x32_8bpp_c: 6974.2 > vp9_tm

Re: [FFmpeg-devel] [PATCHv2 2/2] lavc/startcode: add R-V V startcode_find_candidate

2024-05-15 Thread flow gg
Is the test result missing here? Rémi Denis-Courmont 于2024年5月16日周四 01:11写道: > --- > libavcodec/riscv/Makefile| 1 + > libavcodec/riscv/h264dsp_init.c | 5 > libavcodec/riscv/startcode_rvv.S | 44 > libavcodec/riscv/vc1dsp_init.c | 16 +++---

[FFmpeg-devel] [PATCH] x86: Remove MMX assembly rv34_inv_transform_dc in rv34dsp

2024-02-12 Thread flow gg
checkasm in [FFmpeg-devel] [PATCH 1/4] checkasm/rv34dsp: add rv34_inv_transform_dc test From 1aa51d60def8d4313c1b11a50528662ec832530e Mon Sep 17 00:00:00 2001 From: sunyuechi Date: Tue, 13 Feb 2024 08:41:20 +0800 Subject: [PATCH] x86: Remove MMX assembly rv34_inv_transform_dc in rv34dsp This asm

Re: [FFmpeg-devel] [PATCH 2/2] lavc/blockdsp: R-V V clear_blocks

2024-02-12 Thread flow gg
ok, updated it in the reply Rémi Denis-Courmont 于2024年2月13日周二 03:49写道: > Le perjantaina 2. helmikuuta 2024, 3.14.39 EET flow gg a écrit : > > Ok, updated it in the reply > > Sorry I meant directive, not macro. .rept is just fine here. > > -- > レミ・デニ-クールモン

Re: [FFmpeg-devel] [PATCH 4/4] lavc/rv34dsp: R-V V rv34_idct_dc_add

2024-02-12 Thread flow gg
I tested this in '[FFmpeg-devel] [PATCH] lavc/vc1dsp: R-V V inv_trans'. The logic here is the same, using vext can reduce vset, making it a bit faster Rémi Denis-Courmont 于2024年2月13日周二 03:46写道: > Le keskiviikkona 31. tammikuuta 2024, 19.58.55 EET flow gg a écrit : > > Fixe

Re: [FFmpeg-devel] [PATCH 1/3] lavc/vp8dsp: R-V V vp8_idct_dc_add

2024-02-12 Thread flow gg
xxx_idct_dc_add is quite similar because vext can reduce vset, so it is a bit faster than using vwadd. This was tested when '[FFmpeg-devel] [PATCH] lavc/vc1dsp: R-V V inv_trans' Rémi Denis-Courmont 于2024年2月13日周二 03:53写道: > Hi, > > I think you cna use vwadd here? > > -- > Rémi Denis-Courmont > ht

Re: [FFmpeg-devel] [PATCH 2/3] lavc/vp8dsp: R-V V vp8_idct_dc_add4y

2024-02-12 Thread flow gg
Okay, updated it in the reply Rémi Denis-Courmont 于2024年2月13日周二 03:54写道: > Hi, > > To avoid repeating the code, you can either use .repr or .irp. You can > even > use assembler conditionals to elide the redundant code on the last > iteration. > > -- > レミ・デニ-クールモン > http://www.remlab.net/ > _

Re: [FFmpeg-devel] [PATCH 1/4] checkasm/rv34dsp: add rv34_inv_transform_dc test

2024-02-12 Thread flow gg
I sended "[FFmpeg-devel] [PATCH] x86: Remove MMX assembly rv34_inv_transform_dc in rv34dsp" Rémi Denis-Courmont 于2024年2月13日周二 03:37写道: > Le perjantaina 2. helmikuuta 2024, 2.47.16 EET flow gg a écrit : > > It seems to be caused by movd m0, r1d in libavcodec/x86/rv34dsp.asm

Re: [FFmpeg-devel] [PATCH] x86: Remove MMX assembly rv34_inv_transform_dc in rv34dsp

2024-02-13 Thread flow gg
Thank you for your guidance. Do you mean that it should be modified test like this? - declare_func(void, uint8_t *dst, ptrdiff_t stride, int dc); + declare_func_emms(AV_CPU_FLAG_MMX, void, uint8_t *, ptrdiff_t, int); I tried to do it this way, but the test still failed. not sure why ... _

Re: [FFmpeg-devel] [PATCH] x86: Remove MMX assembly rv34_inv_transform_dc in rv34dsp

2024-02-13 Thread flow gg
I made a mistake. It can be fixed your way. Please ignore this reply. flow gg 于2024年2月13日周二 17:47写道: > Thank you for your guidance. Do you mean that it should be modified test > like this? > > - declare_func(void, uint8_t *dst, ptrdiff_t stride, int dc); > + declare_func_emms(

Re: [FFmpeg-devel] [PATCH 1/4] checkasm/rv34dsp: add rv34_inv_transform_dc test

2024-02-13 Thread flow gg
it was due to a testing , not MMX. fixed it in this reply. flow gg 于2024年2月13日周二 10:37写道: > I sended "[FFmpeg-devel] [PATCH] x86: Remove MMX assembly > rv34_inv_transform_dc in rv34dsp" > > Rémi Denis-Courmont 于2024年2月13日周二 03:37写道: > >> Le perjantaina 2. helmi

Re: [FFmpeg-devel] Subject: [PATCH 3/3] lavc/dnxhdenc: R-V V get_pixels_8x4_sym

2024-02-18 Thread flow gg
ping flow gg 于2024年1月30日周二 00:22写道: > > I expect that it would be faster to make one large load, and then 4 small > > stores, but that might work only for exactly 128-bit vectors? > > This seems to require vle128, so I didn't modify it. > > > That's not

[FFmpeg-devel] [PATCH] lavc/vp8dsp: R-V V put_vp8_pixels

2024-02-19 Thread flow gg
The reason for using m1+le8 instead of stride load + larger group multipliers is the same as in "[FFmpeg-devel] [PATCH 1/7] lavc/me_cmp: R-V V pix_abs." In the test, there is #define src (buf + 2 * SRC_BUF_STRIDE + 2 + 1) Therefore, not using e8 will result : (fatal signal 7: Bus error). From 6d

Re: [FFmpeg-devel] [PATCH] lavc/vp8dsp: R-V V put_vp8_pixels

2024-02-21 Thread flow gg
llo, > > Le maanantaina 19. helmikuuta 2024, 13.13.43 EET flow gg a écrit : > > The reason for using m1+le8 instead of stride load + larger group > > multipliers is the same as in "[FFmpeg-devel] [PATCH 1/7] lavc/me_cmp: > R-V > > V pix_abs." > > > >

Re: [FFmpeg-devel] [PATCH 5/7] lavc/me_cmp: R-V V vsse vsad

2024-02-21 Thread flow gg
: asm=917745 c=3865 Rémi Denis-Courmont 于2024年2月22日周四 02:07写道: > Le tiistaina 6. helmikuuta 2024, 17.56.32 EET flow gg a écrit : > > > > Did you try to compute integral absolute values with the ad-hoc (floating > point) instruction instead of vneg/vmax? It should work since the

Re: [FFmpeg-devel] [PATCH 7/7] lavc/me_cmp: R-V V nsse

2024-02-22 Thread flow gg
Okay, updated it in the reply Rémi Denis-Courmont 于2024年2月22日周四 23:20写道: > Le tiistaina 6. helmikuuta 2024, 17.56.59 EET flow gg a écrit : > > > > Use 'static' functions where possible. > > -- > レミ・デニ-クールモン > http://www.remlab.net/ >

[FFmpeg-devel] [PATCH 1/3] lavc/vp8dsp: R-V V put_bilin_h

2024-02-23 Thread flow gg
From b773a2b640ba38a106539da7f3414d6892364c4f Mon Sep 17 00:00:00 2001 From: sunyuechi Date: Fri, 23 Feb 2024 13:27:42 +0800 Subject: [PATCH 1/3] lavc/vp8dsp: R-V V put_bilin_h C908: vp8_put_bilin4_h_c: 373.5 vp8_put_bilin4_h_rvv_i32: 158.7 vp8_put_bilin8_h_c: 1437.7 vp8_put_bilin8_h_rvv_i32: 31

[FFmpeg-devel] [PATCH 2/3] lavc/vp8dsp: R-V V put_bilin_v

2024-02-23 Thread flow gg
From 488d0cd6645b2c6936c3298e010615facb6d0bd0 Mon Sep 17 00:00:00 2001 From: sunyuechi Date: Fri, 23 Feb 2024 22:35:01 +0800 Subject: [PATCH 2/3] lavc/vp8dsp: R-V V put_bilin_v C908: vp8_put_bilin4_v_c: 383.5 vp8_put_bilin4_v_rvv_i32: 139.7 vp8_put_bilin8_v_c: 1455.7 vp8_put_bilin8_v_rvv_i32: 29

[FFmpeg-devel] [PATCH 3/3] lavc/vp8dsp: R-V V put_bilin_hv

2024-02-23 Thread flow gg
From e1a01b1e0a365935868d7825d53c7cc64e2c1787 Mon Sep 17 00:00:00 2001 From: sunyuechi Date: Fri, 23 Feb 2024 22:35:23 +0800 Subject: [PATCH 3/3] lavc/vp8dsp: R-V V put_bilin_hv C908: vp8_put_bilin4_hv_c: 567.7 vp8_put_bilin4_hv_rvv_i32: 255.7 vp8_put_bilin8_hv_c: 2169.5 vp8_put_bilin8_hv_rvv_i3

Re: [FFmpeg-devel] [PATCH 1/3] lavc/vp8dsp: R-V V put_bilin_h

2024-02-23 Thread flow gg
.ifc \len,4 -vsetivlizero, 5, e8, mf2, ta, ma +vsetivlizero, 5, e8, m1, ta, ma .elseif \len == 8 vsetivlizero, 9, e8, m1, ta, ma .else @@ -112,9 +112,9 @@ endfunc vslide1down.vx v2, \dst, t5 .ifc \len,4 -vsetivlizero, 4

Re: [FFmpeg-devel] [PATCH 1/3] lavc/vp8dsp: R-V V put_bilin_h

2024-02-24 Thread flow gg
gt; > Le 24 février 2024 03:07:36 GMT+02:00, flow gg a > écrit : > > .ifc \len,4 > >-vsetivlizero, 5, e8, mf2, ta, ma > >+vsetivlizero, 5, e8, m1, ta, ma > > .elseif \len == 8 > > vsetivlizero, 9, e8, m1,

[FFmpeg-devel] [PATCH 1/3] lavc/vp9dsp: R-V V ipred vert

2024-02-26 Thread flow gg
From 54d784dfd5d0d04456164f250766a3620d42c8c2 Mon Sep 17 00:00:00 2001 From: sunyuechi Date: Mon, 26 Feb 2024 14:42:17 +0800 Subject: [PATCH 1/3] lavc/vp9dsp: R-V V ipred vert C908 vp9_vert_16x16_8bpp_c: 80.2 vp9_vert_16x16_8bpp_rvv_i32: 55.7 vp9_vert_32x32_8bpp_c: 308.2 vp9_vert_32x32_8bpp_rvv_

[FFmpeg-devel] [PATCH 2/3] lavc/vp9dsp: R-V V ipred hor

2024-02-26 Thread flow gg
From e791fada3a4777fae87dec806c0b46b595d265db Mon Sep 17 00:00:00 2001 From: sunyuechi Date: Tue, 27 Feb 2024 00:06:25 +0800 Subject: [PATCH 2/3] lavc/vp9dsp: R-V V ipred hor C908: vp9_hor_4x4_8bpp_c: 37.7 vp9_hor_4x4_8bpp_rvv_i32: 33.7 vp9_hor_8x8_8bpp_c: 82.7 vp9_hor_8x8_8bpp_rvv_i32: 51.5 vp9

[FFmpeg-devel] [PATCH 3/3] lavc/vp9dsp: R-V V ipred dc dc_left dc_top

2024-02-26 Thread flow gg
From 1a83f04530e3c299b28bd56dd10694aaa6b963d7 Mon Sep 17 00:00:00 2001 From: sunyuechi Date: Tue, 27 Feb 2024 00:07:08 +0800 Subject: [PATCH 3/3] lavc/vp9dsp: R-V V ipred dc dc_left dc_top C908: vp9_dc_16x16_8bpp_c: 117.0 vp9_dc_16x16_8bpp_rvv_i32: 81.7 vp9_dc_32x32_8bpp_c: 373.2 vp9_dc_32x32_8b

Re: [FFmpeg-devel] [PATCH 1/3] lavc/vp9dsp: R-V V ipred vert

2024-02-28 Thread flow gg
Found some problems.. I'll come back to modify this later. (to prevent wasting time on this now) ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-deve

Re: [FFmpeg-devel] [PATCH 1/3] lavc/vp9dsp: R-V V ipred vert

2024-03-01 Thread flow gg
please ignore this, updated in "[FFmpeg-devel] [PATCH 1/4] lavc/vp9dsp: R-V V ipred dc" flow gg 于2024年2月27日周二 00:19写道: > > ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To un

Re: [FFmpeg-devel] [PATCH 2/3] lavc/vp9dsp: R-V V ipred hor

2024-03-01 Thread flow gg
please ignore this, updated in "[FFmpeg-devel] [PATCH 1/4] lavc/vp9dsp: R-V V ipred dc" flow gg 于2024年2月27日周二 00:19写道: > > ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To un

Re: [FFmpeg-devel] [PATCH 3/3] lavc/vp9dsp: R-V V ipred dc dc_left dc_top

2024-03-01 Thread flow gg
please ignore this, updated in "[FFmpeg-devel] [PATCH 1/4] lavc/vp9dsp: R-V V ipred dc" flow gg 于2024年2月27日周二 00:19写道: > > ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To un

[FFmpeg-devel] [PATCH 1/4] lavc/vp9dsp: R-V V ipred dc

2024-03-01 Thread flow gg
From adaae06a3e18bccec1772a3134334cbea652ae77 Mon Sep 17 00:00:00 2001 From: sunyuechi Date: Mon, 26 Feb 2024 14:42:17 +0800 Subject: [PATCH 1/4] lavc/vp9dsp: R-V V ipred dc C908: vp9_dc_8x8_8bpp_c: 46.0 vp9_dc_8x8_8bpp_rvv_i64: 41.0 vp9_dc_16x16_8bpp_c: 109.2 vp9_dc_16x16_8bpp_rvv_i32: 72.7 vp9

[FFmpeg-devel] [PATCH 2/4] lavc/vp9dsp: R-V V ipred vert

2024-03-01 Thread flow gg
From 7abd262daa281cee412a905ea75a5f10dd0b1fbe Mon Sep 17 00:00:00 2001 From: sunyuechi Date: Fri, 1 Mar 2024 18:38:43 +0800 Subject: [PATCH 2/4] lavc/vp9dsp: R-V V ipred vert C908: vp9_vert_8x8_8bpp_c: 22.0 vp9_vert_8x8_8bpp_rvv_i64: 18.5 vp9_vert_16x16_8bpp_c: 71.2 vp9_vert_16x16_8bpp_rvv_i32:

[FFmpeg-devel] [PATCH 3/4] lavc/vp9dsp: R-V V ipred hor

2024-03-01 Thread flow gg
From 173072b33d3237b924f3fa342e20558d96a72457 Mon Sep 17 00:00:00 2001 From: sunyuechi Date: Sat, 2 Mar 2024 08:35:39 +0800 Subject: [PATCH 3/4] lavc/vp9dsp: R-V V ipred hor C908: vp9_hor_8x8_8bpp_c: 74.7 vp9_hor_8x8_8bpp_rvv_i32: 35.7 vp9_hor_16x16_8bpp_c: 175.5 vp9_hor_16x16_8bpp_rvv_i32: 80.2

[FFmpeg-devel] [PATCH 4/4] lavc/vp9dsp: R-V V ipred tm

2024-03-01 Thread flow gg
From 3128765d298f5a44fd13be7b3da2ef88c96083f9 Mon Sep 17 00:00:00 2001 From: sunyuechi Date: Sat, 2 Mar 2024 09:35:22 +0800 Subject: [PATCH 4/4] lavc/vp9dsp: R-V V ipred tm C908: vp9_tm_4x4_8bpp_c: 116.5 vp9_tm_4x4_8bpp_rvv_i32: 43.5 vp9_tm_8x8_8bpp_c: 416.2 vp9_tm_8x8_8bpp_rvv_i32: 86.0 vp9_tm_

Re: [FFmpeg-devel] [PATCH 1/4] lavc/vp9dsp: R-V V ipred dc

2024-03-02 Thread flow gg
Okay, reduced if/else in the response. Rémi Denis-Courmont 于2024年3月2日周六 17:03写道: > Le lauantaina 2. maaliskuuta 2024, 9.42.06 EET flow gg a écrit : > > > > You would need a lot fewer if/else if you passed the order/bit-width > instead > of the size as macro parameter. >

[FFmpeg-devel] [PATCH 1/2] checkasm/vc1dsp: add mspel_pixels test

2024-03-02 Thread flow gg
From efcb91959cb373145f2fc9fcbfcc6659610172cc Mon Sep 17 00:00:00 2001 From: sunyuechi Date: Fri, 1 Mar 2024 19:45:53 +0800 Subject: [PATCH 1/2] checkasm/vc1dsp: add mspel_pixels test --- tests/checkasm/vc1dsp.c | 37 + 1 file changed, 37 insertions(+) diff

[FFmpeg-devel] [PATCH 2/2] lavc/vc1dsp: R-V V mspel_pixels

2024-03-02 Thread flow gg
Here adjusting the order, rather than simply using .rept, will be 13%-24% faster. From 07aa3e2eff0fe1660ac82dec5d06d50fa4c433a4 Mon Sep 17 00:00:00 2001 From: sunyuechi Date: Wed, 28 Feb 2024 16:32:39 +0800 Subject: [PATCH 2/2] lavc/vc1dsp: R-V V mspel_pixels vc1dsp.avg_vc1_mspel_pixels_tab[0][0]

Re: [FFmpeg-devel] [PATCH 1/4] lavc/vp9dsp: R-V V ipred dc

2024-03-02 Thread flow gg
updated a little improve in this reply flow gg 于2024年3月2日周六 17:48写道: > Okay, reduced if/else in the response. > > Rémi Denis-Courmont 于2024年3月2日周六 17:03写道: > >> Le lauantaina 2. maaliskuuta 2024, 9.42.06 EET flow gg a écrit : >> > >> >> You would nee

Re: [FFmpeg-devel] [PATCH 2/4] lavc/vp9dsp: R-V V ipred vert

2024-03-02 Thread flow gg
Due to the PATCH 1/4 update, updates here. flow gg 于2024年3月2日周六 15:42写道: > > From ed44215bff4cbf0372cd04f87f45a6ba25274564 Mon Sep 17 00:00:00 2001 From: sunyuechi Date: Fri, 1 Mar 2024 18:38:43 +0800 Subject: [PATCH 2/4] lavc/vp9dsp: R-V V ipred vert C908: vp9_vert_8x8_8bpp_c

Re: [FFmpeg-devel] [PATCH 3/4] lavc/vp9dsp: R-V V ipred hor

2024-03-02 Thread flow gg
flow gg 于2024年3月2日周六 15:42写道: > > From 006dcbe723592a3653bceb0d7f8cc3004e05cb05 Mon Sep 17 00:00:00 2001 From: sunyuechi Date: Sat, 2 Mar 2024 08:35:39 +0800 Subject: [PATCH 3/4] lavc/vp9dsp: R-V V ipred hor C908: vp9_hor_8x8_8bpp_c: 74.7 vp9_hor_8x8_8bpp_rvv_i32: 35.7 vp9_hor_16x16_

Re: [FFmpeg-devel] [PATCH 4/4] lavc/vp9dsp: R-V V ipred tm

2024-03-02 Thread flow gg
Due to the PATCH 1/4 update, updates are made here. flow gg 于2024年3月2日周六 15:42写道: > > From d7aa14940f52b627baf0ae4905e8af6038dc16fc Mon Sep 17 00:00:00 2001 From: sunyuechi Date: Sat, 2 Mar 2024 09:35:22 +0800 Subject: [PATCH 4/4] lavc/vp9dsp: R-V V ipred tm C908: vp9_tm_4x4_8bpp_c:

Re: [FFmpeg-devel] [PATCH 1/3] lavc/vp8dsp: R-V V put_bilin_h

2024-03-03 Thread flow gg
am *.patch'. Rémi Denis-Courmont 于2024年3月3日周日 22:39写道: > Le perjantaina 23. helmikuuta 2024, 16.45.46 EET flow gg a écrit : > > > > Looks like this needs rebasing, or otherwise does not apply. > > -- > Rémi Denis-Courmont > http://www.remlab.net/ > > > > _

Re: [FFmpeg-devel] [PATCH 1/4] lavc/vp9dsp: R-V V ipred dc

2024-03-03 Thread flow gg
> Similarly, you can use \restore as a truth value directly: `.if \restore`. Okay FWIW, it seems that you could just as well include func/endfunc inside the macros. Do you mean to generate func/endfunc using macros? Rémi Denis-Courmont 于2024年3月3日周日 22:46写道: > Le sunnuntaina 3. maaliskuuta 20

Re: [FFmpeg-devel] [PATCH 1/4] lavc/vp9dsp: R-V V ipred dc

2024-03-07 Thread flow gg
updated it in the reply flow gg 于2024年3月3日周日 23:31写道: > > As noted eaerlier, I don't understand why you have two size parameters. > It > seems that \size is always either the same as (1 << (\size2 - 1)) a.k.a. > ((1 > << \size2) / 2), or unused. The

Re: [FFmpeg-devel] [PATCH 2/2] lavc/vc1dsp: R-V V mspel_pixels

2024-03-07 Thread flow gg
uuta 2024, 14.06.13 EET flow gg a écrit : > > Here adjusting the order, rather than simply using .rept, will be 13%-24% > > faster. > > Isn't it also faster to max LMUL for the adds here? > > Also this might not be much noticeable on C908, but avoiding sequential >

Re: [FFmpeg-devel] [PATCH 2/2] lavc/vc1dsp: R-V V mspel_pixels

2024-03-08 Thread flow gg
Alright, using m8, but for now don't add code to address dependencies in loops that have a minor impact. Updated in the reply Rémi Denis-Courmont 于2024年3月8日周五 17:08写道: > > > Le 8 mars 2024 02:45:46 GMT+02:00, flow gg a > écrit : > >> Isn't it also faste

Re: [FFmpeg-devel] [PATCH 1/3] lavc/vp8dsp: R-V V put_bilin_h

2024-03-17 Thread flow gg
ping flow gg 于2024年3月3日周日 23:03写道: > Sorry since I did not send the emails all at once, so cannot apply all 4 > patches together with git am *.patch. Instead, it needs to first apply the > patch with 'git am '[PATCH] lavc/vp8dsp: R-V V put_vp8_pixels'', and then &

[FFmpeg-devel] [PATCH 1/3] lavc/vp8dsp: R-V V put_epel h

2024-03-21 Thread flow gg
(This should be used after applying these 4 patches) ``` [FFmpeg-devel] [PATCH] lavc/vp8dsp: R-V V put_vp8_pixels [FFmpeg-devel] [PATCH 1/3] lavc/vp8dsp: R-V V put_bilin_h 1-3 ``` From 201274b32ef49fdeb6782498634ed78491a9519a Mon Sep 17 00:00:00 2001 From: sunyuechi Date: Sat, 9 Mar 2024 08:41:31

[FFmpeg-devel] [PATCH 2/3] lavc/vp8dsp: R-V V put_epel v

2024-03-21 Thread flow gg
From a59509c554a319f8271ad4175da40788445f7a56 Mon Sep 17 00:00:00 2001 From: sunyuechi Date: Thu, 21 Mar 2024 17:49:54 +0800 Subject: [PATCH 2/3] lavc/vp8dsp: R-V V put_epel v C908: vp8_put_epel4_v4_c: 11.0 vp8_put_epel4_v4_rvv_i32: 5.0 vp8_put_epel4_v6_c: 16.5 vp8_put_epel4_v6_rvv_i32: 6.2 vp8_

[FFmpeg-devel] [PATCH 3/3] lavc/vp8dsp: R-V V put_epel hv

2024-03-21 Thread flow gg
From 278e473681eddaf24977e47c88f715620105c6b3 Mon Sep 17 00:00:00 2001 From: sunyuechi Date: Thu, 21 Mar 2024 17:50:58 +0800 Subject: [PATCH 3/3] lavc/vp8dsp: R-V V put_epel hv C908: vp8_put_epel4_h4v4_c: 20.0 vp8_put_epel4_h4v4_rvv_i32: 11.0 vp8_put_epel4_h4v6_c: 25.2 vp8_put_epel4_h4v6_rvv_i32

Re: [FFmpeg-devel] [PATCH 1/4] lavc/vp9dsp: R-V V ipred dc

2024-03-21 Thread flow gg
Using macros to shorten function definitions, updated in this response flow gg 于2024年3月7日周四 19:20写道: > updated it in the reply > > flow gg 于2024年3月3日周日 23:31写道: > >> > As noted eaerlier, I don't understand why you have two size parameters. >> It >> seems t

Re: [FFmpeg-devel] [PATCH 2/4] lavc/vp9dsp: R-V V ipred vert

2024-03-21 Thread flow gg
Because the previous patch was updated, so it was updated in this response flow gg 于2024年3月3日周日 10:01写道: > Due to the PATCH 1/4 update, updates here. > > flow gg 于2024年3月2日周六 15:42写道: > >> >> From 6feb148e9167e1f0cc6d8a0e9ca701d61222c03e Mon Sep 17 00:00:00 2001 From:

Re: [FFmpeg-devel] [PATCH 3/4] lavc/vp9dsp: R-V V ipred hor

2024-03-21 Thread flow gg
Because the previous patch was updated, so it was updated in this response flow gg 于2024年3月3日周日 10:01写道: > > > flow gg 于2024年3月2日周六 15:42写道: > >> >> From a4672687a10a49702623449e8569d68913e91346 Mon Sep 17 00:00:00 2001 From: sunyuechi Date: Thu, 21 Mar 2024 21:39:50 +

Re: [FFmpeg-devel] [PATCH 4/4] lavc/vp9dsp: R-V V ipred tm

2024-03-21 Thread flow gg
Because the previous patch was updated, so it was updated in this response flow gg 于2024年3月3日周日 10:01写道: > Due to the PATCH 1/4 update, updates are made here. > > flow gg 于2024年3月2日周六 15:42写道: > >> >> From 9561d35be25c330a0be3a371269289ce21f5ada3 Mon Sep 17 00:00:00 20

[FFmpeg-devel] [PATCH 1/7] lavc/vp9dsp: R-V mc copy_avg

2024-03-21 Thread flow gg
(This should be used after applying these patches) ``` [FFmpeg-devel] [PATCH 1/4] lavc/vp9dsp: R-V V ipred dc 1-4 ``` From ea81872215165ff859a0b5b2e003c5c678ea8ed0 Mon Sep 17 00:00:00 2001 From: sunyuechi Date: Thu, 21 Mar 2024 22:01:18 +0800 Subject: [PATCH 1/7] lavc/vp9dsp: R-V mc copy_avg vp9

[FFmpeg-devel] [PATCH 2/7] lavc/vp9dsp: R-V V mc bilin h

2024-03-21 Thread flow gg
From 7ad03f4bc70e4c334d8e52dce2ea2b6f09a9a244 Mon Sep 17 00:00:00 2001 From: sunyuechi Date: Thu, 21 Mar 2024 22:11:26 +0800 Subject: [PATCH 2/7] lavc/vp9dsp: R-V V mc bilin h C908: vp9_avg_bilin_4h_8bpp_c: 5.5 vp9_avg_bilin_4h_8bpp_rvv_i64: 2.5 vp9_avg_bilin_8h_8bpp_c: 19.7 vp9_avg_bilin_8h_8bp

[FFmpeg-devel] [PATCH 3/7] lavc/vp9dsp: R-V V mc tap h

2024-03-21 Thread flow gg
The order of some instructions appears imperfect because, when len==32, the registers for operations like hv can only just suffice, making it difficult to adjust. It's possible to create a separate function for len<32, but it likely won't have a significant impact, so this hasn't been done yet. Fro

[FFmpeg-devel] [PATCH 4/7] lavc/vp9dsp: R-V V mc bilin v

2024-03-21 Thread flow gg
From eb004dcf5cc6a3c379cb6cb7b8592afa65626c5c Mon Sep 17 00:00:00 2001 From: sunyuechi Date: Thu, 21 Mar 2024 23:00:19 +0800 Subject: [PATCH 4/7] lavc/vp9dsp: R-V V mc bilin v C908: vp9_avg_bilin_4v_8bpp_c: 5.5 vp9_avg_bilin_4v_8bpp_rvv_i64: 2.2 vp9_avg_bilin_8v_8bpp_c: 20.7 vp9_avg_bilin_8v_8bp

[FFmpeg-devel] [PATCH 5/7] lavc/vp9dsp: R-V V mc tap v

2024-03-21 Thread flow gg
From 94aacf6d1d49cc009669f89c91db71038a13285d Mon Sep 17 00:00:00 2001 From: sunyuechi Date: Thu, 21 Mar 2024 23:08:01 +0800 Subject: [PATCH 5/7] lavc/vp9dsp: R-V V mc tap v C908: vp9_avg_8tap_smooth_4v_8bpp_c: 13.7 vp9_avg_8tap_smooth_4v_8bpp_rvv_i64: 5.0 vp9_avg_8tap_smooth_8v_8bpp_c: 49.7 vp9

[FFmpeg-devel] [PATCH 6/7] lavc/vp9dsp: R-V V mc bilin hv

2024-03-21 Thread flow gg
From 5df2835fd182378b78530e001669c65f3638946d Mon Sep 17 00:00:00 2001 From: sunyuechi Date: Thu, 21 Mar 2024 23:14:10 +0800 Subject: [PATCH 6/7] lavc/vp9dsp: R-V V mc bilin hv C908: vp9_avg_bilin_4hv_8bpp_c: 10.7 vp9_avg_bilin_4hv_8bpp_rvv_i64: 4.5 vp9_avg_bilin_8hv_8bpp_c: 38.7 vp9_avg_bilin_8

[FFmpeg-devel] [PATCH 7/7] lavc/vp9dsp: R-V V mc tap hv

2024-03-21 Thread flow gg
From 5d29de366bab4736b1e05e2167d976d344dd8c44 Mon Sep 17 00:00:00 2001 From: sunyuechi Date: Thu, 21 Mar 2024 23:21:18 +0800 Subject: [PATCH 7/7] lavc/vp9dsp: R-V V mc tap hv C908: vp9_avg_8tap_smooth_4hv_8bpp_c: 32.2 vp9_avg_8tap_smooth_4hv_8bpp_rvv_i64: 15.2 vp9_avg_8tap_smooth_8hv_8bpp_c: 98.

Re: [FFmpeg-devel] [PATCH 1/7] lavc/vp9dsp: R-V mc copy_avg

2024-03-21 Thread flow gg
It might be a bit inconvenient to find the patches related to vp8, vp9 that were sent earlier. Here, I've placed them in a zip file in this reply flow gg 于2024年3月22日周五 14:03写道: > (This should be used after applying these patches) > > ``` > [FFmpeg-devel] [PATCH 1/4] lavc/vp9ds

[FFmpeg-devel] [PATCH] af_afir: RISC-V V fcmul_add

2023-09-26 Thread flow gg
benchmark: fcmul_add_c: 19.7 fcmul_add_rvv_f32: 6.7 From 6bef2523728a472bb803ce085a1aafdfd624e212 Mon Sep 17 00:00:00 2001 From: h Date: Tue, 26 Sep 2023 15:03:12 +0800 Subject: [PATCH] af_afir: RISC-V V fcmul_add fcmul_add_c: 19.7 fcmul_add_rvv_f32: 6.7 --- libavfilter/af_afirdsp.h | 3

Re: [FFmpeg-devel] [PATCH] af_afir: RISC-V V fcmul_add

2023-09-26 Thread flow gg
Courmont 于2023年9月27日周三 02:44写道: > Le tiistaina 26. syyskuuta 2023, 21.40.12 EEST Paul B Mahol a écrit : > > On Tue, Sep 26, 2023 at 8:35 PM Rémi Denis-Courmont > wrote: > > > Le tiistaina 26. syyskuuta 2023, 12.24.58 EEST flow gg a écrit : > > > > benchmark: >

Re: [FFmpeg-devel] [PATCH] af_afir: RISC-V V fcmul_add

2023-09-27 Thread flow gg
] fcmul_add_c: 4.2 fcmul_add_rvv_f32: 4.2 - af_afir.fcmul_add [OK] fcmul_add_c: 4.5 fcmul_add_rvv_f32: 4.2 - af_afir.fcmul_add [OK] fcmul_add_c: 4.7 fcmul_add_rvv_f32: 3.5 Rémi Denis-Courmont 于2023年9月28日周四 00:41写道: > Le tiistaina 26. syyskuuta 2023, 12.24.58 EEST flow gg a écrit : > >

Re: [FFmpeg-devel] [PATCH v5] lavc/vvc_mc: R-V V avg w_avg

2024-07-10 Thread flow gg
function, then vsetvlstatic16 uses max_lmul == m8. If e32 is involved in the function, then vsetvlstatic16 uses max_lmul == m4. I think it is clearer now. Rémi Denis-Courmont 于2024年7月8日周一 23:41写道: > Le maanantaina 1. heinäkuuta 2024, 19.09.01 EEST flow gg a écrit : > > I reviewed it again, th

Re: [FFmpeg-devel] [PATCH v2 2/4] lavc/vp8dsp: R-V V loop_filter_simple

2024-07-14 Thread flow gg
> vssseg2e8 > vlsseg4e8 > vwadd.wv > I can't find where VXRM is initialised for that. Updated them and add csrwi 于2024年7月15日周一 00:30写道: > From: sunyuechi > > C908 X60 > vp8_loop_filter_simple_h_c :6.25.7 > v

Re: [FFmpeg-devel] [PATCH] lavc/vvc_mc: R-V V avg w_avg

2024-07-18 Thread flow gg
> Again, I don't think that a maximul multiplier belongs here. If the calling > code cannot scale the multiplier up, then it should be a normal loop providing > the same code for all VLENs. I think it's acceptable to add such a parameter, which isn't particularly common in other files, because thi

Re: [FFmpeg-devel] [PATCH] lavc/vvc_mc: R-V V avg w_avg

2024-07-21 Thread flow gg
Okay, updated it Rémi Denis-Courmont 于2024年7月19日周五 23:56写道: > Le torstaina 18. heinäkuuta 2024, 18.04.15 EEST flow gg a écrit : > > > Again, I don't think that a maximul multiplier belongs here. If the > > > calling code cannot scale the multiplier up, then it sho

Re: [FFmpeg-devel] [PATCH v4 3/4] lavc/vp9dsp: R-V V mc tap h v

2024-07-23 Thread flow gg
> TBH it is very hard to review this due to the large extents of code > conditionals. This should avoidable at least partly. You can name macros for > each filter and then expand those macros instead of using if's. Do you mean that before the addition of .equ ff_vp9_subpel_filters_xxx, epel_filter

Re: [FFmpeg-devel] [PATCH v4 4/4] lavc/vp9dsp: R-V V mc tap hv

2024-07-23 Thread flow gg
Because of the 3/4 update, updated it." 于2024年7月23日周二 16:59写道: > From: sunyuechi > > C908 X60 > vp9_avg_8tap_smooth_4hv_8bpp_c : 32.0 28.0 > vp9_avg_8tap_smooth_4hv_8bpp_rvv_i32 : 15.0 13.2 > vp9_av

Re: [FFmpeg-devel] [PATCH 1/4] lavc/vp8dsp: R-V V 256 bilin,epel

2024-07-30 Thread flow gg
Hi, these four patches have v2 (although the first one seems to be the same). From my understanding, moving from supporting only 128b to adding 256b versions can simultaneously improve LMUL and solve some issues related to insufficient vector registers (vvc, vp9). This can be very helpful in certa

Re: [FFmpeg-devel] [PATCH v4 3/4] lavc/vp9dsp: R-V V mc tap h v

2024-07-31 Thread flow gg
I'm a bit confused because the calculation here goes up to 32 bits and then returns to 8 bits. It seems that the vmax and vnclipu instructions can't be removed by using round-related instructions? Rémi Denis-Courmont 于2024年7月29日周一 23:21写道: > Le tiistaina 23. heinäkuuta 2024, 11.51.48 EEST u...@f

Re: [FFmpeg-devel] [PATCH 1/4] lavc/vp8dsp: R-V V 256 bilin,epel

2024-07-31 Thread flow gg
Denis-Courmont 于2024年7月31日周三 23:06写道: > Le tiistaina 30. heinäkuuta 2024, 20.57.28 EEST flow gg a écrit : > > From my understanding, moving from supporting only 128b to adding 256b > > versions can simultaneously improve LMUL and solve some issues related to > > insufficient

Re: [FFmpeg-devel] [PATCH 3/4] lavc/vp9dsp: R-V V mc tap h v

2024-08-01 Thread flow gg
> Use rounding. Updated it and resolved conflicts with master. 于2024年8月1日周四 20:16写道: > From: sunyuechi > > C908 X60 > vp9_avg_8tap_smooth_4h_8bpp_c : 12.7 11.2 > vp9_avg_8tap_smooth_4h_8bpp_rvv_i32:

Re: [FFmpeg-devel] [PATCH 1/4] lavc/vp9dsp: R-V V mc bilin h v

2024-08-03 Thread flow gg
> Looks OK, but missing CFI landing pads. Added lpad. 于2024年8月3日周六 17:51写道: > From: sunyuechi > > C908 X60 > vp9_avg_bilin_4h_8bpp_c:5.54.7 > vp9_avg_bilin_4h_8bpp_rvv_i32 :1.7

Re: [FFmpeg-devel] [PATCH] lavc/vvc_mc: R-V V avg w_avg

2024-08-03 Thread flow gg
Added lpad and resolved conflicts with master. 于2024年8月3日周六 18:31写道: > From: sunyuechi > > C908 X60 > avg_8_2x2_c:1.21.0 > avg_8_2x2_rvv_i32 :0.70.7 >

Re: [FFmpeg-devel] [PATCH 2/4] lavc/vp9dsp: R-V V mc bilin hv

2024-08-09 Thread flow gg
> That seems suboptimal and unnecessary. Updated it, there is no longer any vmv. 于2024年8月9日周五 22:24写道: > From: sunyuechi > > C908 X60 > vp9_avg_bilin_4hv_8bpp_c : 10.79.5 > vp9_avg_bilin_4hv_8bpp_rvv_i32

Re: [FFmpeg-devel] [PATCH] lavc/vvc_mc: R-V V avg w_avg

2024-08-17 Thread flow gg
How can I test the weight and biweight of H.264? I haven't seen the related test code.. tests/checkasm/checkasm --bench --test=h264dsp Rémi Denis-Courmont 于2024年8月15日周四 16:10写道: > > > Le 3 août 2024 13:30:34 GMT+03:00, u...@foxmail.com a écrit : > >From: sunyuechi > > > >

Re: [FFmpeg-devel] [PATCH] lavc/vvc_mc: R-V V avg w_avg

2024-08-18 Thread flow gg
I wrote `ff_vvc_w_avg_8_rvv` by mimicking the h264 weight function. Based on the test results for 49 different resolutions, most of them were significantly slower. Only 2x32 and 2x64 had similar performance, without noticeable speed improvement. I'm not sure about the reason. Some differences ar

Re: [FFmpeg-devel] [PATCH 1/2] lavc/vp9dsp: R-V V mc tap h v

2024-08-25 Thread flow gg
> Does not assemble with binutils 2.43.1 and default flags. Fixed through zve32x -> zve32x, zba 于2024年8月25日周日 19:40写道: > From: sunyuechi > > C908 X60 > vp9_avg_8tap_smooth_4h_8bpp_c : 12.7 11.2 > vp9_avg_8tap_smoot

Re: [FFmpeg-devel] [PATCH] lavc/vvc_mc: R-V V avg w_avg

2024-08-27 Thread flow gg
Updated: zve32x -> zve32x, zbb, zba 于2024年8月28日周三 14:37写道: > From: sunyuechi > > C908 X60 > avg_8_2x2_c:1.21.0 > avg_8_2x2_rvv_i32 :0.70.7 > avg_8_2x4_

Re: [FFmpeg-devel] [PATCH 1/2] lavc/vp9dsp: R-V V mc tap h v

2024-08-27 Thread flow gg
It seems that the previous patch have partially lacked if RVB, but now it has if (flags & AV_CPU_FLAG_RVB). Rémi Denis-Courmont 于2024年8月28日周三 03:00写道: > Le sunnuntaina 25. elokuuta 2024, 14.41.22 EEST flow gg a écrit : > > > Does not assemble with binutils 2.43.1 a

Re: [FFmpeg-devel] [PATCH] lavc/vvc_mc: R-V V avg w_avg

2024-09-12 Thread flow gg
ping flow gg 于2024年8月28日周三 14:38写道: > Updated: zve32x -> zve32x, zbb, zba > > 于2024年8月28日周三 14:37写道: > >> From: sunyuechi >> >> C908 X60 >> avg_8_2x2_c

<    1   2   3   4   >