yeah, updated it in the reply
Rémi Denis-Courmont 于2024年5月17日周五 23:11写道:
> Le maanantaina 13. toukokuuta 2024, 19.59.22 EEST u...@foxmail.com a
> écrit :
> > From: sunyuechi
> >
> > C908:
> > vp9_avg4_8bpp_c: 1.2
> > vp9_avg4_8bpp_rvv_i64: 1.0
> > vp9_avg8_8bpp_c: 3.7
> > vp9_avg8_8bpp_rvv_i64:
Fixed issues with .irp and comma, as well as the ifc issue (same
modifications as previously done for vp8).
于2024年5月19日周日 02:16写道:
> From: sunyuechi
>
> C908:
> vp9_avg4_8bpp_c: 1.2
> vp9_avg4_8bpp_rvv_i64: 1.0
> vp9_avg8_8bpp_c: 3.7
> vp9_avg8_8bpp_rvv_i64: 1.5
> vp9_avg16_8bpp_c: 14.7
> vp9_a
fixed in v4
Rémi Denis-Courmont 于2024年5月18日周六 23:56写道:
> Le maanantaina 13. toukokuuta 2024, 19.59.23 EEST u...@foxmail.com a
> écrit :
> > From: sunyuechi
> >
> > C908:
> > vp9_avg_bilin_4h_8bpp_c: 5.2
> > vp9_avg_bilin_4h_8bpp_rvv_i64: 2.2
> > vp9_avg_bilin_4v_8bpp_c: 5.5
> > vp9_avg_bilin_4v
fix .irp use
于2024年5月19日周日 16:18写道:
> From: sunyuechi
>
> C908:
> vp8_put_epel4_h4v4_c: 20.0
> vp8_put_epel4_h4v4_rvv_i32: 11.0
> vp8_put_epel4_h4v6_c: 25.2
> vp8_put_epel4_h4v6_rvv_i32: 13.5
> vp8_put_epel4_h6v4_c: 22.2
> vp8_put_epel4_h6v4_rvv_i32: 14.5
> vp8_put_epel4_h6v6_c: 29.0
> vp8_put_
To obtain test results, need to comment out the if (w == h) in
tests/checkasm/vvc_mc.c.
Because vset needs to be used in the loop, I manually wrote a cumbersome
vset macro.
于2024年5月21日周二 15:38写道:
> From: sunyuechi
>
> C908 X60
> avg_8_2x2_
There are three unused lines which I forgot to delete before submitting. I
have updated them here.
于2024年5月21日周二 15:47写道:
> From: sunyuechi
>
> C908 X60
> avg_8_2x2_c:1.01.0
> avg_8_2x2_rvv_i
> Please put commas between operands.
Okay
> This should probably be ff_avg_vp9 or something slightly more specific.
Is it necessary here? Many macros in the C file are copied from MIPS, where
it is called ff_avg4_msa. Here, it has been simply changed to ff_avg4_rvv.
Rémi Denis-Courmont 于2024年
> Please put commas between operands.
> This should probably be ff_avg_vp9 or something slightly more specific.
Updated here.
于2024年5月22日周三 01:14写道:
> From: sunyuechi
>
> C908:
> vp9_avg4_8bpp_c: 1.2
> vp9_avg4_8bpp_rvv_i64: 1.0
> vp9_avg8_8bpp_c: 3.7
> vp9_avg8_8bpp_rvv_i64: 1.5
> vp9_avg16_8
> I would expect that you can get better performance by interleaving scalar
and
vector stuff, and possibly also vector loads and vector arithmetic.
Okay, I will try
> These labels lead to nowhere? If you actually mean to implicitly fall
through
to the next function, you can use the function name
Do macros definition also need a comma? I noticed that many of my old code
and SiFive's code don't have a comma
Rémi Denis-Courmont 于2024年5月22日周三 02:29写道:
> Le tiistaina 21. toukokuuta 2024, 20.13.16 EEST u...@foxmail.com a écrit :
> > From: sunyuechi
>
> > diff --git a/libavcodec/riscv/vp9_mc_
Reordered some here.
于2024年5月22日周三 03:24写道:
> From: sunyuechi
>
> C908 X60
> avg_8_2x2_c:1.01.0
> avg_8_2x2_rvv_i32 :0.70.7
> avg_8_2x4_c
Unfortunately I only test to obtain benchmarks and basic correctness. I
always feel the need for a professional to write the tests.
Rémi Denis-Courmont 于2024年5月23日周四 04:35写道:
>
>
> Le 22 mai 2024 23:28:54 GMT+03:00, "Rémi Denis-Courmont"
> a écrit :
> >This removes one stray LI and reworks the
I want to update the VP9 bilin load, just like you did with VP8, but it
seems like this patch([PATCH v2 1/5] lavc/vp9dsp: R-V V mc avg) doesn't
merge the current updates here but merges the previous version instead, so
the subsequent patches will have conflicts.
flow gg 于2024年5月22日周三 01
> Is there a reason that you cannot use the tables from C code?
Similar to VP8, to adjust the positive and negative data and prevent small
probability overflow during calculations.
> AFAICT, regular and sharp are identical, except for the base address of
the
> filter table, so it should be possib
One more thing I remember is that after adjusting the sign, vmacc can be
used; otherwise, due to the sign, mul + add are needed.
flow gg 于2024年5月25日周六 18:38写道:
> > Is there a reason that you cannot use the tables from C code?
>
> Similar to VP8, to adjust the positive and negat
Would it be better to replace the two vsetvlstatic8 and vsetvlstatic16 with
two vsetvl? This would require the previous patch and this one to work
together, increasing the number of lines of code and making the code a bit
harder to read.
Additionally, I have a question about patch 4 'save one R-V G
reduction in code size seems to be due to switching to using j labels,
doesn't seem to be about vset, but another issue. j labels are indeed
better. I will make similar modifications.
Rémi Denis-Courmont 于2024年5月26日周日 02:29写道:
> Le lauantaina 25. toukokuuta 2024, 21.16.22 EEST flow gg a écri
Hi, maybe we can prioritize this revert:
https://git.ffmpeg.org/gitweb/ffmpeg.git/commit/0c1304ae11b0361ede055ee8ffc6e83529468c73
Using [PATCH v2 1/5] lavc/vp9dsp: R-V V mc avg to avoid conflicts with
other patches.
flow gg 于2024年5月24日周五 14:13写道:
> I want to update the VP9 bilin load, just l
A portion has been modified according to the previous review, but there are
still some parts that haven't been updated
> Similarly, it
> should be possible to share most of the horizontal and vertical code
(maybe
> also for bilinear. not just EPel) with separate load/store then inner
> procedures.
I directly copied the VP9 modifications over... Since len <= 16, it seems
like it can be improved a bit more
于2024年5月30日周四 23:27写道:
> From: sunyuechi
>
> Since len < 64, the registers are sufficient, so it can be
> directly unrolled (a4 is even).
>
> Another benefit of unrolling is that it redu
Well.. because scalar registers are limited, the direct unrolling will be
like this for now. We can handle different lengths separately in the future
flow gg 于2024年5月30日周四 23:36写道:
> I directly copied the VP9 modifications over... Since len <= 16, it seems
> like it can be improved a
> In keeping in line with the rest of the project, that should probably go
into
> **libavcodec/riscv/vvc/**
> Expanding the macro 49 times, with up to 14 **branches** to get there is
maybe not
> such a great idea. It might look nice on the checkasm µbenchmarks because
the
> branches under test get
> I think we can drop the 2x2 transforms. In all likelihood, scalar code
will
> end up faster than vector code on future hardware, especially out-of-order
> pipelines.
I want to drop 2x2, but since there's only one function to handle all
situations instead of 7*7 functions, how can I drop only 2x2
> I think we can drop the 2x2 transforms. In all likelihood, scalar code
will
> end up faster than vector code on future hardware, especially out-of-order
> pipelines.
I want to drop 2x2, but since there's only one function to handle all
situations instead of 7*7 functions..
> AFAIU, this will ge
> Nit: for overall code base consistency, I'd use csrwi here. Reason being
that
> for other rounding modes, csrwi is the better option.
>
> Probably faster to swap the two above, to avoid stalling on LD.
>
> If you check more than one length, better to get ff_get_rv_vlenb() into a
local
> variable.
ping
于2024年5月30日周四 23:27写道:
> From: sunyuechi
>
> Since len < 64, the registers are sufficient, so it can be
> directly unrolled (a4 is even).
>
> Another benefit of unrolling is that it reduces one load operation
> vertically compared to horizontally.
>
> old
> Does this not render the type parameter of bilin_load useless (always h)?
> (Not a blocker for this patch.)
Yes, this was needed in the initial version, but it is no longer required.
I just sent a patch.
> Not sure if I already asked this but is this really faster than slide1?
> Normally we wan
Just like in VP8, the unroll has been updated.
于2024年6月15日周六 19:51写道:
> From: sunyuechi
>
> C908 X60
> vp9_avg_bilin_4h_8bpp_c:5.54.7
> vp9_avg_bilin_4h_8bpp_rvv_i32 :1.71.5
>
> Copying vectors is rarely justified - mostly only before destructive
> instructions such as FMA.
It is slightly different from VP8. In VP8, many scalar values are positive,
so the related calculations can be easily replaced. However, in this
context of VP9, since t2 is a negative number, vwmaccs
> You can directly LLA filters + 16 * 8 * 2 and save one add. Same below.
You can
> also use .equ to alias the filter addresses, and avoid if's.
> That's a lot of address dependencies, which is going to hurt performance.
It
> might help to just spill more S registers if needed.
> This can be done
Initially, I tried using `vnclip.wi` with reference to h264,
-vwadd.wxv16, v16, t4
-vnsra.wiv16, v16, 4
+vnclip.wi v16, v16, 4
but couldn't find the correct way... I think there might be some overflow
issues that I didn't understand correctly. How do y
The loop filter horizontal in vp8 also has this issue ..
Rémi Denis-Courmont 于2024年6月30日周日 17:04写道:
> T-Head C908 (cycles):
> h264_h_loop_filter_luma_8bpp_c: 297.5
> h264_h_loop_filter_luma_8bpp_rvv_i32: 374.7
> h264_v_loop_filter_luma_8bpp_c: 862.7
> h264_v_loop_filter_luma_8bpp_rvv
> I am not sure what is_w means or serves here. If you need special cases,
this
> feels a bit out of place for this macro.
It is a special case added to merge the vset of avg and w_avg, how about
giving it a default value so that it doesn't affect the use of other
functions?
> I am not sure if I
I reviewed it again, the purpose of is_w is to limit lmul to a maximum of
1/4 of vlen, to prevent vector register shortage, which can also be
considered as vset limiting lmul. I renamed it to quarter_len_limit.
t0 is changed to t1.
于2024年7月2日周二 00:07写道:
> From: sunyuechi
>
>
-Courmont 于2023年9月28日周四 21:33写道:
>
>
> Le 28 septembre 2023 08:45:44 GMT+03:00, flow gg a
> écrit :
> >Okay, I revert the volatile in ff_read_time
> >
> >How about this version?
>
> It's still using register stride which is all but guaranteed to be slow on
&g
Okay, I have updated these issues in the patch.
Rémi Denis-Courmont 于2023年11月13日周一 23:35写道:
>Hi,
>
> Le maanantaina 13. marraskuuta 2023, 11.43.01 EET flow gg a écrit :
> > Sorry for the long delay in responding.
>
> No problem. Working with T-Head C910 (or C920?) cor
Okay, I have updated these issues in the patch.
Rémi Denis-Courmont 于2023年11月13日周一 23:35写道:
>Hi,
>
> Le maanantaina 13. marraskuuta 2023, 11.43.01 EET flow gg a écrit :
> > Sorry for the long delay in responding.
>
> No problem. Working with T-Head C910 (or C920?) cor
Okay, I have modified them to 64 and added some descriptions.
Rémi Denis-Courmont 于2023年11月15日周三 23:06写道:
> Le keskiviikkona 15. marraskuuta 2023, 10.59.55 EET flow gg a écrit :
> > Okay, I have updated these issues in the patch.
>
> It does not assemble but I can fix it locally
From 2785ce57f68dbb2373c951b9432afa73796f7cc1 Mon Sep 17 00:00:00 2001
From: sunyuechi
Date: Sat, 18 Nov 2023 10:58:17 +0800
Subject: [PATCH] checkasm: test for dcmul_add
---
tests/checkasm/af_afir.c | 141 +++
1 file changed, 98 insertions(+), 43 deletions(-
dst[i]);
+fail();
+break;
+}
+}
+memcpy(odst, src0, (BUF_SIZE) * sizeof(double));
+bench_new(odst, src1, src2, LEN);
+}
+
+report("dcmul_add");
+}
+
+
+void checkasm_check_afir(void)
+{
+ AudioFIRDSPContext fir =
c910
float_to_fixed24_c: 208.2
float_to_fixed24_rvv_f32: 71.5
From 69da974fd0febaa74db4dd551b05172caeefb846 Mon Sep 17 00:00:00 2001
From: sunyuechi
Date: Wed, 22 Nov 2023 14:57:29 +0800
Subject: [PATCH] lavc/ac3dsp: R-V V float_to_fixed24
c910
float_to_fixed24_c: 208.2
float_to_f
. (We
> *do* have Zba and Zbb now though, hence the existing extract_exponents()).
>
> Also:
> - This does not seem according to the C ABI. AFAIK `unsigned` is
> sign-extended.
> - ALU right before dependent conditional branch should be avoided.
> - SHxADD can be used advantage
qemu-riscv64 -cpu rv64,v=true,g=true,c=true,zba=true,vlen=128 checkasm
--test=ac3dsp
flow gg 于2023年11月22日周三 22:30写道:
> > How did you test it?
>
> I wrote a test, but it was a bit rough, so I want to modify it before
> submitting. I've added it to this reply.
>
> >
GMT+02:00, flow gg a
> écrit :
> >> How did you test it?
> >
> >I wrote a test, but it was a bit rough, so I want to modify it before
> >submitting. I've added it to this reply.
> >
> >> This does not seem according to the C ABI. AFAIK `unsigned
Wow, thank you for reviewing this. I just wanted to see if the function was
working properly. There are so many bugs in the test code ...
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscrib
Hello, I saw the new commit "avcodec/ac3dsp: make len a size_t in
float_to_fixed24."
So I removed the part #if (__riscv_xlen == 64) and restored the patch.
From 3e790fdccd780257f464aa8f8a56a37321ddd429 Mon Sep 17 00:00:00 2001
From: sunyuechi
Date: Wed, 22 Nov 2023 14:57:29 +0800
Subject: [PATCH]
From 02dd534bd602ba3ec79e51070934949a98f780e2 Mon Sep 17 00:00:00 2001
From: sunyuechi
Date: Wed, 22 Nov 2023 14:57:29 +0800
Subject: [PATCH] checkasm/ac3dsp: add float_to_fixed24 test
---
tests/checkasm/Makefile | 1 +
tests/checkasm/ac3dsp.c | 71 +++
I modified the temporary test and sent it in "[FFmpeg-devel] [PATCH]
checkasm/ac3dsp: add float_to_fixed24 test".
So the test time results have changed, and I updated them in the patch.
c910
float_to_fixed24_c: 2207.2
float_to_fixed24_rvv_f32: 696.2
flow gg 于2023年11月22日周三 20:00写
Okay, changed
Rémi Denis-Courmont 于2023年11月24日周五 01:09写道:
> Le torstaina 23. marraskuuta 2023, 1.17.03 EET flow gg a écrit :
> > Hello, I saw the new commit "avcodec/ac3dsp: make len a size_t in
> > float_to_fixed24."
> >
> > So I removed the part #if (__ris
> You should probably add the test case to tests/fate/checkasm.mak
> This one is not necessary. You can reuse dst or dst2 for the bench() as
it's write only.
> Changed BUF_SIZE instead of 10.
Okay, changed.
James Almer 于2023年11月24日周五 01:11写道:
> On 11/23/2023 4:08
This is a bit confusing for me.. I tried pulling the latest code, and then
used `git am checkasm-test-for-dcmul_add.patch` without any patch
corruption.
Rémi Denis-Courmont 于2023年11月27日周一 03:36写道:
> Le sunnuntaina 19. marraskuuta 2023, 0.28.10 EET flow gg a écrit :
>
also posed no problems.
(I am using the Gmail web page.)
Rémi Denis-Courmont 于2023年11月27日周一 20:17写道:
>
>
> Le 26 novembre 2023 22:54:28 GMT+02:00, flow gg a
> écrit :
> >This is a bit confusing for me.. I tried pulling the latest code, and then
> >used `git am checkasm-
From 85e60d75554894964825f5718d14591294ec4e88 Mon Sep 17 00:00:00 2001
From: sunyuechi
Date: Tue, 28 Nov 2023 14:08:12 +0800
Subject: [PATCH 1/2] checkasm: test for abs_pow34
---
libavcodec/aacenc.c| 24 +++--
libavcodec/aacenc.h| 1 +
tests/checkasm/Makefile| 1 +
c910:
abs_pow34_c: 24610.7
abs_pow34_rvv_f32: 6177.7
(need use "[FFmpeg-devel] [PATCH 1/2] checkasm: test for abs_pow34" first)
From 86577c2d40d29422c4b769c854df99a88c7b3c77 Mon Sep 17 00:00:00 2001
From: sunyuechi
Date: Tue, 28 Nov 2023 20:14:14 +0800
Subject: [PATCH 2/2] lavc/aacencdsp:
Okay, I splited and attached
Rémi Denis-Courmont 于2023年11月30日周四 23:31写道:
> Le tiistaina 28. marraskuuta 2023, 18.59.38 EET flow gg a écrit :
> >
>
> Since nobody else commented, I shall note that you should probably split
> the
> underlying lavc changes into a separ
Okay, changed and attached
Rémi Denis-Courmont 于2023年12月2日周六 02:38写道:
> Le perjantaina 1. joulukuuta 2023, 20.35.10 EET Rémi Denis-Courmont a
> écrit :
> > Le perjantaina 24. marraskuuta 2023, 0.39.39 EET flow gg a écrit :
> > > Okay, changed
> >
> > src/l
I forgot to modify the Makefile; I've made the changes in this reply.
flow gg 于2023年12月2日周六 03:50写道:
> Okay, changed and attached
>
> Rémi Denis-Courmont 于2023年12月2日周六 02:38写道:
>
>> Le perjantaina 1. joulukuuta 2023, 20.35.10 EET Rémi Denis-Courmont a
>> é
c910
vc1dsp.vc1_inv_trans_4x4_dc_c: 84.0
vc1dsp.vc1_inv_trans_4x4_dc_rvv_i32: 74.0
vc1dsp.vc1_inv_trans_4x8_dc_c: 150.2
vc1dsp.vc1_inv_trans_4x8_dc_rvv_i32: 83.5
vc1dsp.vc1_inv_trans_8x4_dc_c: 129.0
vc1dsp.vc1_inv_trans_8x4_dc_rvv_i64: 75.7
vc1dsp.vc1_inv_trans_8x8_dc_c:
ma
- vsetvli zero, zero, e64, m4, ta, ma
+ vsetivlizero, 8, e8, mf2, ta, ma
```
And ISCAS seems to have no announcement about getting an RVV 1.0 board. I
plan to ask about it from time to time.
Rémi Denis-Courmont 于2023年12月4日周一 01:17写道:
> Le sunnuntaina 3. joulukuu
I found that in the case of nosplat, an additional vset can be removed, and
the time is basically the same, so I updated the patch.
Rémi Denis-Courmont 于2023年12月4日周一 23:15写道:
> Le maanantaina 4. joulukuuta 2023, 10.48.56 EET flow gg a écrit :
> > > Probably missing VLENB checks.
>
Because there was a conflict, the patch was updated in the reply
flow gg 于2023年12月1日周五 04:25写道:
> Okay, I splited and attached
>
>
>
> Rémi Denis-Courmont 于2023年11月30日周四 23:31写道:
>
>> Le tiistaina 28. marraskuuta 2023, 18.59.38 EET flow gg a écrit :
>> >
>
Okay, after using zext, can delete two vset, which is better than splat. I
have updated the patch in this reply.
Rémi Denis-Courmont 于2023年12月4日周一 23:15写道:
> Le maanantaina 4. joulukuuta 2023, 10.48.56 EET flow gg a écrit :
> > > Probably missing VLENB checks.
> >
> > Ch
> This block can be folded into the next. You don't need to check VLENB
twice.
Changed.
> Instruction scheduling could be better, especially on in-order CPUs.
I put the vload at the front, and then proceeded with the t2 operation, but
I'm not sure...
> You don't need to reset the AVL here, just
CSRxI immediate
Changed.
Rémi Denis-Courmont 于2023年12月6日周三 04:11写道:
> Le tiistaina 5. joulukuuta 2023, 21.25.12 EET flow gg a écrit :
> > > This block can be folded into the next. You don't need to check VLENB
> >
> > twice.
> >
> > Changed.
> >
> &
> FWIW CanMV-K230 boards are on sale for under 500 RMB.
I just made a payment ~ (I saw you mention in IRC that you're going to
write about K230+Debian. Looking forward to it)
Rémi Denis-Courmont 于2023年12月6日周三 04:11写道:
> Le tiistaina 5. joulukuuta 2023, 21.25.12 EET flow gg a écrit :
023, 16.40.08 EET flow gg a écrit :
> > c910
> > vc1dsp.vc1_inv_trans_4x4_dc_c: 84.0
> > vc1dsp.vc1_inv_trans_4x4_dc_rvv_i32: 74.0
> > vc1dsp.vc1_inv_trans_4x8_dc_c: 150.2
> > vc1dsp.vc1_inv_trans_4x8_dc_rvv_i32: 83.5
> >
Updated the patch to resolve conflicts, updated m4 to m8, using c908's
benchmark.
flow gg 于2023年11月29日周三 01:00写道:
> c910:
> abs_pow34_c: 24610.7
> abs_pow34_rvv_f32: 6177.7
>
> (need use "[FFmpeg-devel] [PATCH 1/2] checkasm: test for
em-rss:0kB
If I remove the line 1429 with FF_CODEC_ENCODE_CB(aac_encode_frame), there
is no error on k230, but I am unsure of the reason.
flow gg 于2023年12月5日周二 05:46写道:
> Because there was a conflict, the patch was updated in the reply
>
> flow gg 于2023年12月1日周五 04:25写道:
&g
To express clearly,I mean remove
libavcodec/aacenc.c:1429 FF_CODEC_ENCODE_CB(aac_encode_frame)
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-
Okay, changed to use const, updated at this GitHub link (
https://github.com/hleft/FFmpeg/tree/vp8vp9)
Rémi Denis-Courmont 于2024年3月27日周三 02:38写道:
> Le perjantaina 22. maaliskuuta 2024, 8.01.00 EET flow gg a écrit :
> > (This should be used after applying these 4 patches)
> >
>
Hi, here's the github link (https://github.com/hleft/FFmpeg/tree/vp8vp9)
Rémi Denis-Courmont 于2024年3月27日周三 02:30写道:
> Hi,
>
> Le perjantaina 22. maaliskuuta 2024, 8.12.41 EET flow gg a écrit :
> > It might be a bit inconvenient to find the patches related to vp8, vp9
Alright, updated it in this reply
Rémi Denis-Courmont 于2024年3月27日周三 16:18写道:
> Hi,
>
> Le 27 mars 2024 04:37:02 GMT+02:00, flow gg a
> écrit :
> >Okay, changed to use const, updated at this GitHub link (
> >https://github.com/hleft/FFmpeg/tree/vp8vp9)
>
> OK, th
s just that
vp9 doesn't have enough)
Rémi Denis-Courmont 于2024年3月27日周三 23:36写道:
> Le perjantaina 22. maaliskuuta 2024, 8.01.21 EET flow gg a écrit :
> >
>
> IMO, you could just as well share the code and avoid most if's. Not like
> one
> additional `li a3, 1` per
3月27日周三 23:41写道:
> Le perjantaina 22. maaliskuuta 2024, 8.02.08 EET flow gg a écrit :
> > Using macros to shorten function definitions, updated in this response
>
> Did you try to share the common code after getdc and see how slower it is?
> If
> an extra static branch ha
Okay, updated it in the reply and github(
https://github.com/hleft/FFmpeg/tree/vp8vp9)
Rémi Denis-Courmont 于2024年4月4日周四 04:22写道:
> Le torstaina 28. maaliskuuta 2024, 4.44.33 EEST flow gg a écrit :
> > I don't quite understand, I think here 8x8 because zve64x is not suitable
>
ping
flow gg 于2024年3月8日周五 17:46写道:
> Alright, using m8, but for now don't add code to address dependencies in
> loops that have a minor impact. Updated in the reply
>
> Rémi Denis-Courmont 于2024年3月8日周五 17:08写道:
>
>>
>>
>> Le 8 mars 2024 02:45:46 GMT+02:00
From 2f516e0236bd84d78ce6fd7e55c4b1a3c9d99baa Mon Sep 17 00:00:00 2001
From: sunyuechi
Date: Sat, 20 Apr 2024 23:32:10 +0800
Subject: [PATCH 1/3] lavc/vp8dsp: R-V V loop_filter_simple
C908:
vp8_loop_filter_simple_h_c: 416.0
vp8_loop_filter_simple_h_rvv_i32: 187.5
vp8_loop_filter_simple_v_c: 429.
From c033ab8d30135dc02b09b1747c0761baefdcbb4a Mon Sep 17 00:00:00 2001
From: sunyuechi
Date: Sat, 20 Apr 2024 23:13:07 +0800
Subject: [PATCH 2/3] lavc/vp8dsp: R-V V loop_filter_inner
C908:
vp8_loop_filter8uv_inner_v_c: 738.2
vp8_loop_filter8uv_inner_v_rvv_i32: 455.2
vp8_loop_filter16y_inner_h_c:
From cff79c9500b94f4c0abdd9cd68c91cc736366c78 Mon Sep 17 00:00:00 2001
From: sunyuechi
Date: Sat, 20 Apr 2024 23:26:58 +0800
Subject: [PATCH 3/3] lavc/vp8dsp: R-V V loop_filter
C908:
vp8_loop_filter8uv_v_c: 745.5
vp8_loop_filter8uv_v_rvv_i32: 467.2
vp8_loop_filter16y_h_c: 674.2
vp8_loop_filter16
github link: https://github.com/hleft/FFmpeg/tree/vp8vp9
flow gg 于2024年4月20日周六 23:55写道:
>
>
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg
Happy to see you back :)
Rémi Denis-Courmont 于2024年4月29日周一 02:06写道:
> Le sunnuntaina 7. huhtikuuta 2024, 8.38.54 EEST flow gg a écrit :
> > ping
>
> I have been away for a while, and catching up takes time, sorry.
>
> --
> レミ・デニ-クールモン
From 0c196a37cb4036d8c618c06c02a011b910cc56ce Mon Sep 17 00:00:00 2001
From: sunyuechi
Date: Mon, 29 Apr 2024 14:18:23 +0800
Subject: [PATCH 1/2] checkasm/blockdsp: add fill_block test
---
tests/checkasm/blockdsp.c | 32
1 file changed, 32 insertions(+)
diff --
From 4315f4e4774e3006d7cc55b6d235cb80e0173cf9 Mon Sep 17 00:00:00 2001
From: sunyuechi
Date: Wed, 6 Mar 2024 12:46:03 +0800
Subject: [PATCH 2/2] lavc/blockdsp: R-V V fill_block
C908:
blockdsp.fill_block_tab[0]_c: 550.0
blockdsp.fill_block_tab[0]_rvv_i64: 48.2
blockdsp.fill_block_tab[1]_c: 148.7
updated it in the reply and https://github.com/hleft/FFmpeg/tree/vp8vp9
Rémi Denis-Courmont 于2024年4月30日周二 01:57写道:
> Le perjantaina 22. maaliskuuta 2024, 8.02.38 EEST flow gg a écrit :
> > Because the previous patch was updated, so it was updated in this
> response
>
> Seem
ina 29. huhtikuuta 2024, 10.09.41 EEST flow gg a écrit :
> >
>
> Are you sure that this works with all vector lengths?
> The block8 code looks odd.
>
> --
> レミ・デニ-クールモン
> http://www.remlab.net/
> ___
> ffmpeg-devel mailing
Since there is no 8x16, I changed m8 to m4, and updated it in the reply
flow gg 于2024年4月30日周二 08:26写道:
> Hi, I initially used a loop, but according to libavcodec/blockdsp.h,
>
> the maximum is 8x16 = 128 bytes, so using ff_get_rv_vlenb() >= 16 and m8
> does not
Since there is no 8x16, not test 8x16, and updated it in the reply
flow gg 于2024年4月29日周一 15:09写道:
>
>
From fc7c28cb78e0c90880f31c0b8d6f2fc16d0fe581 Mon Sep 17 00:00:00 2001
From: sunyuechi
Date: Mon, 29 Apr 2024 14:18:23 +0800
Subject: [PATCH 1/2] checkasm/blockdsp: add fill_bloc
Since the number of stores is controlled by a3 and not by zero, it doesn't
have to be exactly 16 bytes ?
Rémi Denis-Courmont 于2024年4月30日周二 14:40写道:
>
>
> Le 30 avril 2024 03:26:25 GMT+03:00, flow gg a
> écrit :
> >Hi, I initially used a loop, but according to libavcodec
From 07c0b8a26b76e31c46ecabddb251f317c48c73a3 Mon Sep 17 00:00:00 2001
From: sunyuechi
Date: Tue, 30 Apr 2024 12:43:57 +0800
Subject: [PATCH 1/2] checkasm/rv40dsp: add chroma_mc test
This is similar to h264.
---
tests/checkasm/Makefile | 1 +
tests/checkasm/checkasm.c | 3 ++
tests/checkasm
From 3e66b2bbe257cc91a4c2169362163e92aba6760b Mon Sep 17 00:00:00 2001
From: sunyuechi
Date: Tue, 30 Apr 2024 18:24:00 +0800
Subject: [PATCH 2/2] lavc/rv40dsp: R-V V chroma_mc
This is similar to h264, but here we use manual_avg instead of vaaddu
because rv40's OP differs from h264. If we use vaa
Sorry, this is because a 'bpp == 8' was missed. It has been fixed in this
link
Rémi Denis-Courmont 于2024年5月2日周四 22:11写道:
> Le tiistaina 30. huhtikuuta 2024, 2.36.22 EEST flow gg a écrit :
> > updated it in the reply and https://github.com/hleft/FFmpeg/tree/vp8vp9
>
> V
I saw about comparing emails and gitlab/hub .., I did not comprehensively
understand their advantages and disadvantages, but I want to say that I
support it to change to gitlab/hub
Simple reason:
If you need to use git-send-email, I may not be able to submit any code
If you do not need to use git
Hi, it's me. I accidentally repeated it but it seems to be correct.
于2024年5月4日周六 18:01写道:
> From: sunyuechi
>
> vc1dsp.avg_vc1_mspel_pixels_tab[0][0]_c: 869.7
> vc1dsp.avg_vc1_mspel_pixels_tab[0][0]_rvv_i32: 148.7
> vc1dsp.avg_vc1_mspel_pixels_tab[1][0]_c: 220.5
> vc1dsp.avg_vc1_mspel_pixels_ta
I've reorganized it, and the github link is at :
https://github.com/hleft/FFmpeg/tree/vp8
于2024年5月4日周六 22:49写道:
> From: sunyuechi
>
> C908:
> vp8_put_pixels4_c: 87.5
> vp8_put_pixels4_rvv_i32: 42.7
> vp8_put_pixels8_c: 284.5
> vp8_put_pixels8_rvv_i32: 77.7
> vp8_put_pixels16_c: 1087.7
> vp8_put
the github link: https://github.com/hleft/FFmpeg/tree/vp9
于2024年5月4日周六 23:03写道:
> From: sunyuechi
>
> C908:
> vp9_vert_8x8_8bpp_c: 22.0
> vp9_vert_8x8_8bpp_rvv_i64: 18.5
> vp9_vert_16x16_8bpp_c: 71.2
> vp9_vert_16x16_8bpp_rvv_i32: 50.7
> vp9_vert_32x32_8bpp_c: 300.2
> vp9_vert_32x32_8bpp_rvv_i3
> Is it not faster to compute the address ahead of time, e.g.:
> Ditto below and in other patches.
Yes, update here and I will check other patches
> Copying 64-bit quantities should not need RVV at all. Maybe the C version
needs to be improved instead, but if that is not possible, then an RVI
ver
Made these changes according to the previous review:
moved func into macro, added macro vset to reduce if else, used rvi,
supplemented __riscv_xlen
于2024年5月6日周一 00:45写道:
> From: sunyuechi
>
> C908:
> vp8_put_pixels4_c: 78.0
> vp8_put_pixels4_rvi: 33.7
> vp8_put_pixels8_c: 278.0
> vp8_put_pixels
> Doesn't this effectively discard the last element, t5?
> Can't we skip the slide and just load the vector at a2+1? Also then, we
can
> keep VL=len and halve the multipler.
Yes, this is better, I remember that using slide1down was better in the
initial version testing, but now it has changed..
I
> IMO, passing a complete register name, if you really need to vary it,
would be
simpler and more flexible than an ABI register type prefix.
If the full register name is passed here, some require four parameters,
some require six parameters, and there is often repetition.
I feel it's easy to get c
Fixed issues similar to vp8
于2024年5月7日周二 15:36写道:
> From: sunyuechi
>
> C908:
> vp9_vert_8x8_8bpp_c: 22.0
> vp9_vert_8x8_8bpp_rvi: 15.7
> vp9_vert_16x16_8bpp_c: 71.2
> vp9_vert_16x16_8bpp_rvi: 39.0
> vp9_vert_32x32_8bpp_c: 300.2
> vp9_vert_32x32_8bpp_rvi: 135.2
> ---
> libavcodec/riscv/Makefil
1 - 100 of 308 matches
Mail list logo