I didn't understand what you mean... What does judging whether the type is
'h' or 'v' have to do with the number?
Rémi Denis-Courmont 于2024年5月8日周三 00:00写道:
> Le maanantaina 6. toukokuuta 2024, 6.38.02 EEST u...@foxmail.com a écrit :
> > From: sunyuechi
> >
> > C908:
> > vp8_put_bilin4_h_c: 367.
> h is not a number so that's not a valid condition.
Fixed two of this issue
于2024年5月8日周三 00:55写道:
> From: sunyuechi
>
> C908:
> vp8_put_bilin4_h_c: 367.0
> vp8_put_bilin4_h_rvv_i32: 137.7
> vp8_put_bilin4_v_c: 377.0
> vp8_put_bilin4_v_rvv_i32: 137.7
> vp8_put_bilin8_h_c: 1431.0
> vp8_put_bili
> Do you gain much by unrolling all the way to 16x? Given that you have the
> counter value already in t0, it should not make much difference to just
unroll
> 2x or maybe 4x and then loop.
I chose this simple method because I think the effect is about the same..
Do I need to change it?
> It might
Hi, I got BananaPi F3, made some fixes, updated in reply
Rémi Denis-Courmont 于2024年5月6日周一 03:26写道:
> Le sunnuntaina 5. toukokuuta 2024, 12.18.56 EEST flow gg a écrit :
> > > Does MF2 actually improve perfs over M1 here?
> >
> > The difference here seems very small, but
:
> Le perjantaina 10. toukokuuta 2024, 11.22.53 EEST flow gg a écrit :
> > Hi, I got BananaPi F3, made some fixes, updated in reply
>
> So... Does it benefit from halving the logical multiplier to process
> fixed-sized
> block as compared to C908, or can we stick to the same code r
The patch `lavc/vp9dsp: R-V ipred vert` needs to add `#if HAVE_RV`. How
about I modify these `#if HAVE_RVV` indentations together in this patch?
Rémi Denis-Courmont 于2024年5月11日周六 00:39写道:
> ---
> libavcodec/riscv/vp9dsp_init.c | 50 +-
> 1 file changed, 25 insert
Okay, updated it in the reply
Rémi Denis-Courmont 于2024年5月10日周五 23:41写道:
> Le tiistaina 7. toukokuuta 2024, 19.54.09 EEST u...@foxmail.com a écrit :
> > From: sunyuechi
> >
> > C908:
> > vp8_put_epel4_h4v4_c: 20.0
> > vp8_put_epel4_h4v4_rvv_i32: 11.0
> > vp8_put_epel4_h4v6_c: 25.2
> > vp8_put_e
In banana_f3, further reducing the value of mf resulted in another
performance improvement. I think in the end we might need to use different
functions depending on vlen in init..
Rémi Denis-Courmont 于2024年5月11日周六 18:24写道:
> Le lauantaina 11. toukokuuta 2024, 13.02.02 EEST flow gg a éc
Wow, got it
Rémi Denis-Courmont 于2024年5月11日周六 22:39写道:
> Le maanantaina 6. toukokuuta 2024, 6.38.01 EEST u...@foxmail.com a écrit :
> > From: sunyuechi
> >
> > C908:
> > vp8_put_pixels4_c: 78.0
> > vp8_put_pixels4_rvi: 33.7
> > vp8_put_pixels8_c: 278.0
> > vp8_put_pixels8_rvi: 55.0
> > vp8_put_
> It should be possible to improve ordering to avoid immediate dependency
from ADD to SD
Okay, updated it.
Additionally improved the mc-tap_64 on vlen>=256 and something
于2024年5月12日周日 18:04写道:
> From: sunyuechi
>
> C908:
> vp9_vert_8x8_8bpp_c: 22.0
> vp9_vert_8x8_8bpp_rvi: 15.7
> vp9_vert_16x
It seems like it can't... update using AV_CPU_FLAG_RV_MISALIGNED
Rémi Denis-Courmont 于2024年5月12日周日 19:48写道:
> Le perjantaina 10. toukokuuta 2024, 11.21.14 EEST u...@foxmail.com a
> écrit :
> > From: sunyuechi
> >
> > C908 X60
> > vc1dsp.avg_
just rebase
于2024年5月14日周二 01:00写道:
> From: sunyuechi
>
> C908:
> vp9_vert_8x8_8bpp_c: 22.0
> vp9_vert_8x8_8bpp_rvi: 15.7
> vp9_vert_16x16_8bpp_c: 71.2
> vp9_vert_16x16_8bpp_rvi: 39.0
> vp9_vert_32x32_8bpp_c: 300.2
> vp9_vert_32x32_8bpp_rvi: 135.2
> ---
> libavcodec/riscv/Makefile| 1 +
I am locally using:
if (bpp == 8 && (flags & AV_CPU_FLAG_RVI)) {
this performs better on k230/banana_f3 than C.
For email, refer to [FFmpeg-devel] [PATCH 2/2] lavc/vp8dsp: restrict RVI
optimisations and change it to
if (bpp == 8 && (flags & AV_CPU_FLAG_RV_MISALIGNED)) {
So no output, but I
I am locally using:
if (bpp == 8 && (flags & AV_CPU_FLAG_RVI) && (flags &
AV_CPU_FLAG_RVB_ADDR)) {
this performs better on k230/banana_f3 than C.
For email, refer to [FFmpeg-devel] [PATCH 2/2] lavc/vp8dsp: restrict RVI
optimisations and change it to
if (bpp == 8 && (flags & AV_CPU_FLAG_RV_M
Using this will give output `if (bpp == 8 && (flags & AV_CPU_FLAG_RVI)) {`
Did you comment out the MISALIGNED flag check but not add RVI, resulting in
no output?
Rémi Denis-Courmont 于2024年5月15日周三 01:02写道:
> Le tiistaina 14. toukokuuta 2024, 7.44.55 EEST flow gg a écrit :
> &g
Okay, learned it
Rémi Denis-Courmont 于2024年5月15日周三 01:00写道:
> Le tiistaina 14. toukokuuta 2024, 7.45.29 EEST flow gg a écrit :
> > I am locally using:
> > if (bpp == 8 && (flags & AV_CPU_FLAG_RVI) && (flags &
> > AV_CPU_FLAG_RVB_ADDR)) {
>
&g
Why is it unnecessary to reset the vector configuration every time? I think
it is necessary to reset e16/e8 each time.
Rémi Denis-Courmont 于2024年5月15日周三 01:46写道:
> Le maanantaina 13. toukokuuta 2024, 19.59.21 EEST u...@foxmail.com a
> écrit :
> > From: sunyuechi
> >
> > C908:
> > vp9_tm_4x4_8bp
in the reply
Rémi Denis-Courmont 于2024年5月15日周三 02:08写道:
> Le tiistaina 14. toukokuuta 2024, 20.57.17 EEST flow gg a écrit :
> > Why is it unnecessary to reset the vector configuration every time? I
> think
> > it is necessary to reset e16/e8 each time.
>
> I misread the p
updated for clean code
于2024年5月15日周三 11:56写道:
> From: sunyuechi
>
> C908:
> vp9_tm_4x4_8bpp_c: 116.5
> vp9_tm_4x4_8bpp_rvv_i32: 43.5
> vp9_tm_8x8_8bpp_c: 416.2
> vp9_tm_8x8_8bpp_rvv_i32: 86.0
> vp9_tm_16x16_8bpp_c: 1665.5
> vp9_tm_16x16_8bpp_rvv_i32: 187.2
> vp9_tm_32x32_8bpp_c: 6974.2
> vp9_tm
Is the test result missing here?
Rémi Denis-Courmont 于2024年5月16日周四 01:11写道:
> ---
> libavcodec/riscv/Makefile| 1 +
> libavcodec/riscv/h264dsp_init.c | 5
> libavcodec/riscv/startcode_rvv.S | 44
> libavcodec/riscv/vc1dsp_init.c | 16 +++---
checkasm in [FFmpeg-devel] [PATCH 1/4] checkasm/rv34dsp: add
rv34_inv_transform_dc test
From 1aa51d60def8d4313c1b11a50528662ec832530e Mon Sep 17 00:00:00 2001
From: sunyuechi
Date: Tue, 13 Feb 2024 08:41:20 +0800
Subject: [PATCH] x86: Remove MMX assembly rv34_inv_transform_dc in rv34dsp
This asm
ok, updated it in the reply
Rémi Denis-Courmont 于2024年2月13日周二 03:49写道:
> Le perjantaina 2. helmikuuta 2024, 3.14.39 EET flow gg a écrit :
> > Ok, updated it in the reply
>
> Sorry I meant directive, not macro. .rept is just fine here.
>
> --
> レミ・デニ-クールモン
I tested this in '[FFmpeg-devel] [PATCH] lavc/vc1dsp: R-V V inv_trans'. The
logic here is the same, using vext can reduce vset, making it a bit faster
Rémi Denis-Courmont 于2024年2月13日周二 03:46写道:
> Le keskiviikkona 31. tammikuuta 2024, 19.58.55 EET flow gg a écrit :
> > Fixe
xxx_idct_dc_add is quite similar because vext can reduce vset, so it is a
bit faster than using vwadd. This was tested when '[FFmpeg-devel] [PATCH]
lavc/vc1dsp: R-V V inv_trans'
Rémi Denis-Courmont 于2024年2月13日周二 03:53写道:
> Hi,
>
> I think you cna use vwadd here?
>
> --
> Rémi Denis-Courmont
> ht
Okay, updated it in the reply
Rémi Denis-Courmont 于2024年2月13日周二 03:54写道:
> Hi,
>
> To avoid repeating the code, you can either use .repr or .irp. You can
> even
> use assembler conditionals to elide the redundant code on the last
> iteration.
>
> --
> レミ・デニ-クールモン
> http://www.remlab.net/
> _
I sended "[FFmpeg-devel] [PATCH] x86: Remove MMX assembly
rv34_inv_transform_dc in rv34dsp"
Rémi Denis-Courmont 于2024年2月13日周二 03:37写道:
> Le perjantaina 2. helmikuuta 2024, 2.47.16 EET flow gg a écrit :
> > It seems to be caused by movd m0, r1d in libavcodec/x86/rv34dsp.asm
Thank you for your guidance. Do you mean that it should be modified test
like this?
- declare_func(void, uint8_t *dst, ptrdiff_t stride, int dc);
+ declare_func_emms(AV_CPU_FLAG_MMX, void, uint8_t *, ptrdiff_t, int);
I tried to do it this way, but the test still failed. not sure why ...
_
I made a mistake. It can be fixed your way. Please ignore this reply.
flow gg 于2024年2月13日周二 17:47写道:
> Thank you for your guidance. Do you mean that it should be modified test
> like this?
>
> - declare_func(void, uint8_t *dst, ptrdiff_t stride, int dc);
> + declare_func_emms(
it was due to a testing , not MMX. fixed it in this reply.
flow gg 于2024年2月13日周二 10:37写道:
> I sended "[FFmpeg-devel] [PATCH] x86: Remove MMX assembly
> rv34_inv_transform_dc in rv34dsp"
>
> Rémi Denis-Courmont 于2024年2月13日周二 03:37写道:
>
>> Le perjantaina 2. helmi
ping
flow gg 于2024年1月30日周二 00:22写道:
> > I expect that it would be faster to make one large load, and then 4 small
> > stores, but that might work only for exactly 128-bit vectors?
>
> This seems to require vle128, so I didn't modify it.
>
> > That's not
The reason for using m1+le8 instead of stride load + larger group
multipliers is the same as in "[FFmpeg-devel] [PATCH 1/7] lavc/me_cmp: R-V
V pix_abs."
In the test, there is
#define src (buf + 2 * SRC_BUF_STRIDE + 2 + 1)
Therefore, not using e8 will result : (fatal signal 7: Bus error).
From 6d
llo,
>
> Le maanantaina 19. helmikuuta 2024, 13.13.43 EET flow gg a écrit :
> > The reason for using m1+le8 instead of stride load + larger group
> > multipliers is the same as in "[FFmpeg-devel] [PATCH 1/7] lavc/me_cmp:
> R-V
> > V pix_abs."
> >
> >
: asm=917745 c=3865
Rémi Denis-Courmont 于2024年2月22日周四 02:07写道:
> Le tiistaina 6. helmikuuta 2024, 17.56.32 EET flow gg a écrit :
> >
>
> Did you try to compute integral absolute values with the ad-hoc (floating
> point) instruction instead of vneg/vmax? It should work since the
Okay, updated it in the reply
Rémi Denis-Courmont 于2024年2月22日周四 23:20写道:
> Le tiistaina 6. helmikuuta 2024, 17.56.59 EET flow gg a écrit :
> >
>
> Use 'static' functions where possible.
>
> --
> レミ・デニ-クールモン
> http://www.remlab.net/
>
From b773a2b640ba38a106539da7f3414d6892364c4f Mon Sep 17 00:00:00 2001
From: sunyuechi
Date: Fri, 23 Feb 2024 13:27:42 +0800
Subject: [PATCH 1/3] lavc/vp8dsp: R-V V put_bilin_h
C908:
vp8_put_bilin4_h_c: 373.5
vp8_put_bilin4_h_rvv_i32: 158.7
vp8_put_bilin8_h_c: 1437.7
vp8_put_bilin8_h_rvv_i32: 31
From 488d0cd6645b2c6936c3298e010615facb6d0bd0 Mon Sep 17 00:00:00 2001
From: sunyuechi
Date: Fri, 23 Feb 2024 22:35:01 +0800
Subject: [PATCH 2/3] lavc/vp8dsp: R-V V put_bilin_v
C908:
vp8_put_bilin4_v_c: 383.5
vp8_put_bilin4_v_rvv_i32: 139.7
vp8_put_bilin8_v_c: 1455.7
vp8_put_bilin8_v_rvv_i32: 29
From e1a01b1e0a365935868d7825d53c7cc64e2c1787 Mon Sep 17 00:00:00 2001
From: sunyuechi
Date: Fri, 23 Feb 2024 22:35:23 +0800
Subject: [PATCH 3/3] lavc/vp8dsp: R-V V put_bilin_hv
C908:
vp8_put_bilin4_hv_c: 567.7
vp8_put_bilin4_hv_rvv_i32: 255.7
vp8_put_bilin8_hv_c: 2169.5
vp8_put_bilin8_hv_rvv_i3
.ifc \len,4
-vsetivlizero, 5, e8, mf2, ta, ma
+vsetivlizero, 5, e8, m1, ta, ma
.elseif \len == 8
vsetivlizero, 9, e8, m1, ta, ma
.else
@@ -112,9 +112,9 @@ endfunc
vslide1down.vx v2, \dst, t5
.ifc \len,4
-vsetivlizero, 4
gt;
> Le 24 février 2024 03:07:36 GMT+02:00, flow gg a
> écrit :
> > .ifc \len,4
> >-vsetivlizero, 5, e8, mf2, ta, ma
> >+vsetivlizero, 5, e8, m1, ta, ma
> > .elseif \len == 8
> > vsetivlizero, 9, e8, m1,
From 54d784dfd5d0d04456164f250766a3620d42c8c2 Mon Sep 17 00:00:00 2001
From: sunyuechi
Date: Mon, 26 Feb 2024 14:42:17 +0800
Subject: [PATCH 1/3] lavc/vp9dsp: R-V V ipred vert
C908
vp9_vert_16x16_8bpp_c: 80.2
vp9_vert_16x16_8bpp_rvv_i32: 55.7
vp9_vert_32x32_8bpp_c: 308.2
vp9_vert_32x32_8bpp_rvv_
From e791fada3a4777fae87dec806c0b46b595d265db Mon Sep 17 00:00:00 2001
From: sunyuechi
Date: Tue, 27 Feb 2024 00:06:25 +0800
Subject: [PATCH 2/3] lavc/vp9dsp: R-V V ipred hor
C908:
vp9_hor_4x4_8bpp_c: 37.7
vp9_hor_4x4_8bpp_rvv_i32: 33.7
vp9_hor_8x8_8bpp_c: 82.7
vp9_hor_8x8_8bpp_rvv_i32: 51.5
vp9
From 1a83f04530e3c299b28bd56dd10694aaa6b963d7 Mon Sep 17 00:00:00 2001
From: sunyuechi
Date: Tue, 27 Feb 2024 00:07:08 +0800
Subject: [PATCH 3/3] lavc/vp9dsp: R-V V ipred dc dc_left dc_top
C908:
vp9_dc_16x16_8bpp_c: 117.0
vp9_dc_16x16_8bpp_rvv_i32: 81.7
vp9_dc_32x32_8bpp_c: 373.2
vp9_dc_32x32_8b
Found some problems.. I'll come back to modify this later. (to prevent
wasting time on this now)
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-deve
please ignore this, updated in "[FFmpeg-devel] [PATCH 1/4] lavc/vp9dsp: R-V
V ipred dc"
flow gg 于2024年2月27日周二 00:19写道:
>
>
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To un
please ignore this, updated in "[FFmpeg-devel] [PATCH 1/4] lavc/vp9dsp: R-V
V ipred dc"
flow gg 于2024年2月27日周二 00:19写道:
>
>
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To un
please ignore this, updated in "[FFmpeg-devel] [PATCH 1/4] lavc/vp9dsp: R-V
V ipred dc"
flow gg 于2024年2月27日周二 00:19写道:
>
>
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To un
From adaae06a3e18bccec1772a3134334cbea652ae77 Mon Sep 17 00:00:00 2001
From: sunyuechi
Date: Mon, 26 Feb 2024 14:42:17 +0800
Subject: [PATCH 1/4] lavc/vp9dsp: R-V V ipred dc
C908:
vp9_dc_8x8_8bpp_c: 46.0
vp9_dc_8x8_8bpp_rvv_i64: 41.0
vp9_dc_16x16_8bpp_c: 109.2
vp9_dc_16x16_8bpp_rvv_i32: 72.7
vp9
From 7abd262daa281cee412a905ea75a5f10dd0b1fbe Mon Sep 17 00:00:00 2001
From: sunyuechi
Date: Fri, 1 Mar 2024 18:38:43 +0800
Subject: [PATCH 2/4] lavc/vp9dsp: R-V V ipred vert
C908:
vp9_vert_8x8_8bpp_c: 22.0
vp9_vert_8x8_8bpp_rvv_i64: 18.5
vp9_vert_16x16_8bpp_c: 71.2
vp9_vert_16x16_8bpp_rvv_i32:
From 173072b33d3237b924f3fa342e20558d96a72457 Mon Sep 17 00:00:00 2001
From: sunyuechi
Date: Sat, 2 Mar 2024 08:35:39 +0800
Subject: [PATCH 3/4] lavc/vp9dsp: R-V V ipred hor
C908:
vp9_hor_8x8_8bpp_c: 74.7
vp9_hor_8x8_8bpp_rvv_i32: 35.7
vp9_hor_16x16_8bpp_c: 175.5
vp9_hor_16x16_8bpp_rvv_i32: 80.2
From 3128765d298f5a44fd13be7b3da2ef88c96083f9 Mon Sep 17 00:00:00 2001
From: sunyuechi
Date: Sat, 2 Mar 2024 09:35:22 +0800
Subject: [PATCH 4/4] lavc/vp9dsp: R-V V ipred tm
C908:
vp9_tm_4x4_8bpp_c: 116.5
vp9_tm_4x4_8bpp_rvv_i32: 43.5
vp9_tm_8x8_8bpp_c: 416.2
vp9_tm_8x8_8bpp_rvv_i32: 86.0
vp9_tm_
Okay, reduced if/else in the response.
Rémi Denis-Courmont 于2024年3月2日周六 17:03写道:
> Le lauantaina 2. maaliskuuta 2024, 9.42.06 EET flow gg a écrit :
> >
>
> You would need a lot fewer if/else if you passed the order/bit-width
> instead
> of the size as macro parameter.
>
From efcb91959cb373145f2fc9fcbfcc6659610172cc Mon Sep 17 00:00:00 2001
From: sunyuechi
Date: Fri, 1 Mar 2024 19:45:53 +0800
Subject: [PATCH 1/2] checkasm/vc1dsp: add mspel_pixels test
---
tests/checkasm/vc1dsp.c | 37 +
1 file changed, 37 insertions(+)
diff
Here adjusting the order, rather than simply using .rept, will be 13%-24%
faster.
From 07aa3e2eff0fe1660ac82dec5d06d50fa4c433a4 Mon Sep 17 00:00:00 2001
From: sunyuechi
Date: Wed, 28 Feb 2024 16:32:39 +0800
Subject: [PATCH 2/2] lavc/vc1dsp: R-V V mspel_pixels
vc1dsp.avg_vc1_mspel_pixels_tab[0][0]
updated a little improve in this reply
flow gg 于2024年3月2日周六 17:48写道:
> Okay, reduced if/else in the response.
>
> Rémi Denis-Courmont 于2024年3月2日周六 17:03写道:
>
>> Le lauantaina 2. maaliskuuta 2024, 9.42.06 EET flow gg a écrit :
>> >
>>
>> You would nee
Due to the PATCH 1/4 update, updates here.
flow gg 于2024年3月2日周六 15:42写道:
>
>
From ed44215bff4cbf0372cd04f87f45a6ba25274564 Mon Sep 17 00:00:00 2001
From: sunyuechi
Date: Fri, 1 Mar 2024 18:38:43 +0800
Subject: [PATCH 2/4] lavc/vp9dsp: R-V V ipred vert
C908:
vp9_vert_8x8_8bpp_c
flow gg 于2024年3月2日周六 15:42写道:
>
>
From 006dcbe723592a3653bceb0d7f8cc3004e05cb05 Mon Sep 17 00:00:00 2001
From: sunyuechi
Date: Sat, 2 Mar 2024 08:35:39 +0800
Subject: [PATCH 3/4] lavc/vp9dsp: R-V V ipred hor
C908:
vp9_hor_8x8_8bpp_c: 74.7
vp9_hor_8x8_8bpp_rvv_i32: 35.7
vp9_hor_16x16_
Due to the PATCH 1/4 update, updates are made here.
flow gg 于2024年3月2日周六 15:42写道:
>
>
From d7aa14940f52b627baf0ae4905e8af6038dc16fc Mon Sep 17 00:00:00 2001
From: sunyuechi
Date: Sat, 2 Mar 2024 09:35:22 +0800
Subject: [PATCH 4/4] lavc/vp9dsp: R-V V ipred tm
C908:
vp9_tm_4x4_8bpp_c:
am *.patch'.
Rémi Denis-Courmont 于2024年3月3日周日 22:39写道:
> Le perjantaina 23. helmikuuta 2024, 16.45.46 EET flow gg a écrit :
> >
>
> Looks like this needs rebasing, or otherwise does not apply.
>
> --
> Rémi Denis-Courmont
> http://www.remlab.net/
>
>
>
> _
> Similarly, you can use \restore as a truth value directly: `.if \restore`.
Okay
FWIW, it seems that you could just as well include func/endfunc inside the
macros.
Do you mean to generate func/endfunc using macros?
Rémi Denis-Courmont 于2024年3月3日周日 22:46写道:
> Le sunnuntaina 3. maaliskuuta 20
updated it in the reply
flow gg 于2024年3月3日周日 23:31写道:
> > As noted eaerlier, I don't understand why you have two size parameters.
> It
> seems that \size is always either the same as (1 << (\size2 - 1)) a.k.a.
> ((1
> << \size2) / 2), or unused. The
uuta 2024, 14.06.13 EET flow gg a écrit :
> > Here adjusting the order, rather than simply using .rept, will be 13%-24%
> > faster.
>
> Isn't it also faster to max LMUL for the adds here?
>
> Also this might not be much noticeable on C908, but avoiding sequential
>
Alright, using m8, but for now don't add code to address dependencies in
loops that have a minor impact. Updated in the reply
Rémi Denis-Courmont 于2024年3月8日周五 17:08写道:
>
>
> Le 8 mars 2024 02:45:46 GMT+02:00, flow gg a
> écrit :
> >> Isn't it also faste
ping
flow gg 于2024年3月3日周日 23:03写道:
> Sorry since I did not send the emails all at once, so cannot apply all 4
> patches together with git am *.patch. Instead, it needs to first apply the
> patch with 'git am '[PATCH] lavc/vp8dsp: R-V V put_vp8_pixels'', and then
&
(This should be used after applying these 4 patches)
```
[FFmpeg-devel] [PATCH] lavc/vp8dsp: R-V V put_vp8_pixels
[FFmpeg-devel] [PATCH 1/3] lavc/vp8dsp: R-V V put_bilin_h
1-3
```
From 201274b32ef49fdeb6782498634ed78491a9519a Mon Sep 17 00:00:00 2001
From: sunyuechi
Date: Sat, 9 Mar 2024 08:41:31
From a59509c554a319f8271ad4175da40788445f7a56 Mon Sep 17 00:00:00 2001
From: sunyuechi
Date: Thu, 21 Mar 2024 17:49:54 +0800
Subject: [PATCH 2/3] lavc/vp8dsp: R-V V put_epel v
C908:
vp8_put_epel4_v4_c: 11.0
vp8_put_epel4_v4_rvv_i32: 5.0
vp8_put_epel4_v6_c: 16.5
vp8_put_epel4_v6_rvv_i32: 6.2
vp8_
From 278e473681eddaf24977e47c88f715620105c6b3 Mon Sep 17 00:00:00 2001
From: sunyuechi
Date: Thu, 21 Mar 2024 17:50:58 +0800
Subject: [PATCH 3/3] lavc/vp8dsp: R-V V put_epel hv
C908:
vp8_put_epel4_h4v4_c: 20.0
vp8_put_epel4_h4v4_rvv_i32: 11.0
vp8_put_epel4_h4v6_c: 25.2
vp8_put_epel4_h4v6_rvv_i32
Using macros to shorten function definitions, updated in this response
flow gg 于2024年3月7日周四 19:20写道:
> updated it in the reply
>
> flow gg 于2024年3月3日周日 23:31写道:
>
>> > As noted eaerlier, I don't understand why you have two size parameters.
>> It
>> seems t
Because the previous patch was updated, so it was updated in this response
flow gg 于2024年3月3日周日 10:01写道:
> Due to the PATCH 1/4 update, updates here.
>
> flow gg 于2024年3月2日周六 15:42写道:
>
>>
>>
From 6feb148e9167e1f0cc6d8a0e9ca701d61222c03e Mon Sep 17 00:00:00 2001
From:
Because the previous patch was updated, so it was updated in this response
flow gg 于2024年3月3日周日 10:01写道:
>
>
> flow gg 于2024年3月2日周六 15:42写道:
>
>>
>>
From a4672687a10a49702623449e8569d68913e91346 Mon Sep 17 00:00:00 2001
From: sunyuechi
Date: Thu, 21 Mar 2024 21:39:50 +
Because the previous patch was updated, so it was updated in this response
flow gg 于2024年3月3日周日 10:01写道:
> Due to the PATCH 1/4 update, updates are made here.
>
> flow gg 于2024年3月2日周六 15:42写道:
>
>>
>>
From 9561d35be25c330a0be3a371269289ce21f5ada3 Mon Sep 17 00:00:00 20
(This should be used after applying these patches)
```
[FFmpeg-devel] [PATCH 1/4] lavc/vp9dsp: R-V V ipred dc
1-4
```
From ea81872215165ff859a0b5b2e003c5c678ea8ed0 Mon Sep 17 00:00:00 2001
From: sunyuechi
Date: Thu, 21 Mar 2024 22:01:18 +0800
Subject: [PATCH 1/7] lavc/vp9dsp: R-V mc copy_avg
vp9
From 7ad03f4bc70e4c334d8e52dce2ea2b6f09a9a244 Mon Sep 17 00:00:00 2001
From: sunyuechi
Date: Thu, 21 Mar 2024 22:11:26 +0800
Subject: [PATCH 2/7] lavc/vp9dsp: R-V V mc bilin h
C908:
vp9_avg_bilin_4h_8bpp_c: 5.5
vp9_avg_bilin_4h_8bpp_rvv_i64: 2.5
vp9_avg_bilin_8h_8bpp_c: 19.7
vp9_avg_bilin_8h_8bp
The order of some instructions appears imperfect because, when len==32, the
registers for operations like hv can only just suffice, making it difficult
to adjust.
It's possible to create a separate function for len<32, but it likely won't
have a significant impact, so this hasn't been done yet.
Fro
From eb004dcf5cc6a3c379cb6cb7b8592afa65626c5c Mon Sep 17 00:00:00 2001
From: sunyuechi
Date: Thu, 21 Mar 2024 23:00:19 +0800
Subject: [PATCH 4/7] lavc/vp9dsp: R-V V mc bilin v
C908:
vp9_avg_bilin_4v_8bpp_c: 5.5
vp9_avg_bilin_4v_8bpp_rvv_i64: 2.2
vp9_avg_bilin_8v_8bpp_c: 20.7
vp9_avg_bilin_8v_8bp
From 94aacf6d1d49cc009669f89c91db71038a13285d Mon Sep 17 00:00:00 2001
From: sunyuechi
Date: Thu, 21 Mar 2024 23:08:01 +0800
Subject: [PATCH 5/7] lavc/vp9dsp: R-V V mc tap v
C908:
vp9_avg_8tap_smooth_4v_8bpp_c: 13.7
vp9_avg_8tap_smooth_4v_8bpp_rvv_i64: 5.0
vp9_avg_8tap_smooth_8v_8bpp_c: 49.7
vp9
From 5df2835fd182378b78530e001669c65f3638946d Mon Sep 17 00:00:00 2001
From: sunyuechi
Date: Thu, 21 Mar 2024 23:14:10 +0800
Subject: [PATCH 6/7] lavc/vp9dsp: R-V V mc bilin hv
C908:
vp9_avg_bilin_4hv_8bpp_c: 10.7
vp9_avg_bilin_4hv_8bpp_rvv_i64: 4.5
vp9_avg_bilin_8hv_8bpp_c: 38.7
vp9_avg_bilin_8
From 5d29de366bab4736b1e05e2167d976d344dd8c44 Mon Sep 17 00:00:00 2001
From: sunyuechi
Date: Thu, 21 Mar 2024 23:21:18 +0800
Subject: [PATCH 7/7] lavc/vp9dsp: R-V V mc tap hv
C908:
vp9_avg_8tap_smooth_4hv_8bpp_c: 32.2
vp9_avg_8tap_smooth_4hv_8bpp_rvv_i64: 15.2
vp9_avg_8tap_smooth_8hv_8bpp_c: 98.
It might be a bit inconvenient to find the patches related to vp8, vp9 that
were sent earlier. Here, I've placed them in a zip file in this reply
flow gg 于2024年3月22日周五 14:03写道:
> (This should be used after applying these patches)
>
> ```
> [FFmpeg-devel] [PATCH 1/4] lavc/vp9ds
benchmark:
fcmul_add_c: 19.7
fcmul_add_rvv_f32: 6.7
From 6bef2523728a472bb803ce085a1aafdfd624e212 Mon Sep 17 00:00:00 2001
From: h
Date: Tue, 26 Sep 2023 15:03:12 +0800
Subject: [PATCH] af_afir: RISC-V V fcmul_add
fcmul_add_c: 19.7
fcmul_add_rvv_f32: 6.7
---
libavfilter/af_afirdsp.h | 3
Courmont 于2023年9月27日周三 02:44写道:
> Le tiistaina 26. syyskuuta 2023, 21.40.12 EEST Paul B Mahol a écrit :
> > On Tue, Sep 26, 2023 at 8:35 PM Rémi Denis-Courmont
> wrote:
> > > Le tiistaina 26. syyskuuta 2023, 12.24.58 EEST flow gg a écrit :
> > > > benchmark:
>
]
fcmul_add_c: 4.2
fcmul_add_rvv_f32: 4.2
- af_afir.fcmul_add [OK]
fcmul_add_c: 4.5
fcmul_add_rvv_f32: 4.2
- af_afir.fcmul_add [OK]
fcmul_add_c: 4.7
fcmul_add_rvv_f32: 3.5
Rémi Denis-Courmont 于2023年9月28日周四 00:41写道:
> Le tiistaina 26. syyskuuta 2023, 12.24.58 EEST flow gg a écrit :
> >
function, then vsetvlstatic16 uses max_lmul == m8.
If e32 is involved in the function, then vsetvlstatic16 uses max_lmul == m4.
I think it is clearer now.
Rémi Denis-Courmont 于2024年7月8日周一 23:41写道:
> Le maanantaina 1. heinäkuuta 2024, 19.09.01 EEST flow gg a écrit :
> > I reviewed it again, th
> vssseg2e8
> vlsseg4e8
> vwadd.wv
> I can't find where VXRM is initialised for that.
Updated them and add csrwi
于2024年7月15日周一 00:30写道:
> From: sunyuechi
>
> C908 X60
> vp8_loop_filter_simple_h_c :6.25.7
> v
> Again, I don't think that a maximul multiplier belongs here. If the
calling
> code cannot scale the multiplier up, then it should be a normal loop
providing
> the same code for all VLENs.
I think it's acceptable to add such a parameter, which isn't particularly
common in other files, because thi
Okay, updated it
Rémi Denis-Courmont 于2024年7月19日周五 23:56写道:
> Le torstaina 18. heinäkuuta 2024, 18.04.15 EEST flow gg a écrit :
> > > Again, I don't think that a maximul multiplier belongs here. If the
> > > calling code cannot scale the multiplier up, then it sho
> TBH it is very hard to review this due to the large extents of code
> conditionals. This should avoidable at least partly. You can name macros
for
> each filter and then expand those macros instead of using if's.
Do you mean that before the addition of .equ ff_vp9_subpel_filters_xxx,
epel_filter
Because of the 3/4 update, updated it."
于2024年7月23日周二 16:59写道:
> From: sunyuechi
>
> C908 X60
> vp9_avg_8tap_smooth_4hv_8bpp_c : 32.0 28.0
> vp9_avg_8tap_smooth_4hv_8bpp_rvv_i32 : 15.0 13.2
> vp9_av
Hi, these four patches have v2 (although the first one seems to be the
same).
From my understanding, moving from supporting only 128b to adding 256b
versions can simultaneously improve LMUL and solve some issues related to
insufficient vector registers (vvc, vp9).
This can be very helpful in certa
I'm a bit confused because the calculation here goes up to 32 bits and then
returns to 8 bits. It seems that the vmax and vnclipu instructions can't be
removed by using round-related instructions?
Rémi Denis-Courmont 于2024年7月29日周一 23:21写道:
> Le tiistaina 23. heinäkuuta 2024, 11.51.48 EEST u...@f
Denis-Courmont 于2024年7月31日周三 23:06写道:
> Le tiistaina 30. heinäkuuta 2024, 20.57.28 EEST flow gg a écrit :
> > From my understanding, moving from supporting only 128b to adding 256b
> > versions can simultaneously improve LMUL and solve some issues related to
> > insufficient
> Use rounding.
Updated it and resolved conflicts with master.
于2024年8月1日周四 20:16写道:
> From: sunyuechi
>
> C908 X60
> vp9_avg_8tap_smooth_4h_8bpp_c : 12.7 11.2
> vp9_avg_8tap_smooth_4h_8bpp_rvv_i32:
> Looks OK, but missing CFI landing pads.
Added lpad.
于2024年8月3日周六 17:51写道:
> From: sunyuechi
>
> C908 X60
> vp9_avg_bilin_4h_8bpp_c:5.54.7
> vp9_avg_bilin_4h_8bpp_rvv_i32 :1.7
Added lpad and resolved conflicts with master.
于2024年8月3日周六 18:31写道:
> From: sunyuechi
>
> C908 X60
> avg_8_2x2_c:1.21.0
> avg_8_2x2_rvv_i32 :0.70.7
>
> That seems suboptimal and unnecessary.
Updated it, there is no longer any vmv.
于2024年8月9日周五 22:24写道:
> From: sunyuechi
>
> C908 X60
> vp9_avg_bilin_4hv_8bpp_c : 10.79.5
> vp9_avg_bilin_4hv_8bpp_rvv_i32
How can I test the weight and biweight of H.264? I haven't seen the related
test code..
tests/checkasm/checkasm --bench --test=h264dsp
Rémi Denis-Courmont 于2024年8月15日周四 16:10写道:
>
>
> Le 3 août 2024 13:30:34 GMT+03:00, u...@foxmail.com a écrit :
> >From: sunyuechi
> >
> >
I wrote `ff_vvc_w_avg_8_rvv` by mimicking the h264 weight function.
Based on the test results for 49 different resolutions, most of them were
significantly slower.
Only 2x32 and 2x64 had similar performance, without noticeable speed
improvement.
I'm not sure about the reason. Some differences ar
> Does not assemble with binutils 2.43.1 and default flags.
Fixed through zve32x -> zve32x, zba
于2024年8月25日周日 19:40写道:
> From: sunyuechi
>
> C908 X60
> vp9_avg_8tap_smooth_4h_8bpp_c : 12.7 11.2
> vp9_avg_8tap_smoot
Updated: zve32x -> zve32x, zbb, zba
于2024年8月28日周三 14:37写道:
> From: sunyuechi
>
> C908 X60
> avg_8_2x2_c:1.21.0
> avg_8_2x2_rvv_i32 :0.70.7
> avg_8_2x4_
It seems that the previous patch have partially lacked if RVB, but now it
has if (flags & AV_CPU_FLAG_RVB).
Rémi Denis-Courmont 于2024年8月28日周三 03:00写道:
> Le sunnuntaina 25. elokuuta 2024, 14.41.22 EEST flow gg a écrit :
> > > Does not assemble with binutils 2.43.1 a
ping
flow gg 于2024年8月28日周三 14:38写道:
> Updated: zve32x -> zve32x, zbb, zba
>
> 于2024年8月28日周三 14:37写道:
>
>> From: sunyuechi
>>
>> C908 X60
>> avg_8_2x2_c
101 - 200 of 308 matches
Mail list logo