ping
于2024年12月23日周一 23:02写道:
> From: sunyuechi
>
> ---
> libavcodec/riscv/h26x/asm.S| 36 +-
> libavcodec/riscv/vvc/sad_rvv.S | 2 +-
> 2 files changed, 19 insertions(+), 19 deletions(-)
>
> diff --git a/libavcodec/riscv/h26x/asm.S b/libavcodec/riscv/h26x/a
It seems that v0 and v24 need to be set to 0, and they have already been
set.
Rémi Denis-Courmont 于2025年1月8日周三 02:23写道:
> Le maanantaina 23. joulukuuta 2024, 17.01.32 UTC+2 uk7b-at-
> foxmail@ffmpeg.org a écrit :
> > From: sunyuechi
> >
> > ---
> > libavcodec/riscv/h26x/asm.S| 36 +
ping
于2024年12月23日周一 23:02写道:
> From: sunyuechi
>
> ---
> libavcodec/riscv/h26x/asm.S| 36 +-
> libavcodec/riscv/vvc/sad_rvv.S | 2 +-
> 2 files changed, 19 insertions(+), 19 deletions(-)
>
> diff --git a/libavcodec/riscv/h26x/asm.S b/libavcodec/riscv/h26x/a
于2024年12月31日周二 02:26写道:
> Le tiistaina 24. joulukuuta 2024, 15.30.00 EET Nuo Mi a écrit :
> > On Mon, Dec 23, 2024 at 11:18 PM flow gg wrote:
> > > Hi, It looks like you submitted your review comments not long after the
> > > patch was merged.
> > >
> >
Hi, It looks like you submitted your review comments not long after the
patch was merged.
Previously, regarding the VVC avg patch, you mentioned "LGTM for the RISC-V
side. No clue about the VVC side",
so I contacted Nuomi in the hope that he could help merge the patch that
had been pending for a w
> That makes zero sense. The logical multiplier does not accommodate larger
> vector lengths than 256 bits as things stand, and in the extreme, you can
> always have vector lengths to large that even the smallest valid
multiplier is
> "too" large.
Yes, I didn't consider vlen > 256. What do you thi
> Don't clobber v8 here.
> Use vsub.vv here to avoid the sequential dependency.
Updated.
于2024年12月21日周六 20:22写道:
> From: sunyuechi
>
> ---
> libavcodec/riscv/vvc/vvc_sad_rvv.S | 10 +-
> 1 file changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/libavcodec/riscv/vvc/vvc_sad_rvv.
LGTM.
Nuo Mi 于2024年12月21日周六 18:19写道:
> ---
> libavcodec/riscv/vvc/Makefile | 6 +++---
> libavcodec/riscv/vvc/{vvcdsp_init.c => dsp_init.c} | 0
> libavcodec/riscv/vvc/{vvc_mc_rvv.S => mc_rvv.S}| 0
> libavcodec/riscv/vvc/{vvc_sad_rvv.S => sad_rvv.S} | 0
> libavco
Hi, other RISC-V assembly file names usually include the extensions being
used, such as rvv, rvb, etc.
How about naming them mc_rvv.S and sad_rvv.S?
Nuo Mi 于2024年12月17日周二 11:59写道:
> ---
> libavcodec/riscv/vvc/Makefile | 6 +++---
> libavcodec/riscv/vvc/{vvcdsp_init.c => ds
> Don't clobber v8 here.
> Use vsub.vv here to avoid the sequential dependency.
Thanks, I will update later
> Are you sure this does not require tail-undisturbed mode? I think you're
> setting tail-agnostic mode up.
I’m not sure if I understood correctly.
My understanding is that tail-undisturbe
Resolved the conflict (because #elif ARCH_WASM was newly added in master).
于2024年12月15日周日 23:56写道:
> From: sunyuechi
>
> ---
> libavcodec/riscv/vvc/vvc_mc_rvv.S | 46 +++
> 1 file changed, 23 insertions(+), 23 deletions(-)
>
> diff --git a/libavcodec/riscv/vvc/vvc_m
Thank you, this approach can indeed address similar if else scenarios.
vsetvlstatic \w, \vlen, e8, mf8, mf4, mf2, m1, m2, m4
vsetvlstatic \w, \vlen, e16, mf4, mf2, m1, m2, m4, m8
vsetvlstatic \w, \vlen, e32, mf2, m1, m2, m4, m8, m8
I plan to submit it after this patch set gets merged.
Nuo Mi 于2
ping
于2024年12月1日周日 13:11写道:
> From: sunyuechi
>
> ---
> libavcodec/riscv/vvc/vvc_mc_rvv.S | 46 +++
> 1 file changed, 23 insertions(+), 23 deletions(-)
>
> diff --git a/libavcodec/riscv/vvc/vvc_mc_rvv.S
> b/libavcodec/riscv/vvc/vvc_mc_rvv.S
> index 45f4750f82..18532
Thank you for your detailed explanation! :)
Ronald S. Bultje 于2024年12月5日周四 20:38写道:
> Hi,
>
> Christophe asked me to chime in.
>
> On Wed, Dec 4, 2024 at 4:14 AM wrote:
>
> > --- a/tests/checkasm/rv40dsp.c
> > +++ b/tests/checkasm/rv40dsp.c
> > @@ -27,7 +27,7 @@
> > #define randomize_buffers()
Hi, the original issue I encountered was that FATE failed on RISC-V because
the assembly code didn't handle `rv40_bias` correctly.
I submitted a new patch: "checkasm/rv40dsp: cover more cases for rv40_bias"
to test this situation.
The value of `src` was indeed just copied from `h264_chroma_mc`. M
Update here because there's no need to change it to 0xFF.
于2024年12月5日周四 12:31写道:
> From: sunyuechi
>
> ---
> tests/checkasm/rv40dsp.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/tests/checkasm/rv40dsp.c b/tests/checkasm/rv40dsp.c
> index a1a873d430..0600b07d09
The message was sent twice; please ignore this one.
于2024年12月5日周四 12:29写道:
> From: sunyuechi
>
> ---
> libavcodec/riscv/rv40dsp_rvv.S | 116 ++---
> 1 file changed, 78 insertions(+), 38 deletions(-)
>
> diff --git a/libavcodec/riscv/rv40dsp_rvv.S
> b/libavcodec/risc
Hi, why is there an issue with the ABI? I previously just thought that s0
shouldn't be used here.
Rémi Denis-Courmont 于2024年12月1日周日 00:19写道:
> Le keskiviikkona 20. marraskuuta 2024, 3.26.52 EET u...@foxmail.com a
> écrit :
> > From: sunyuechi
> >
> > ---
> > libavcodec/riscv/rv40dsp_rvv.S | 11
's necessary to organize them together and resubmit, please let me
know clearly.
Rémi Denis-Courmont 于2024年11月27日周三 01:15写道:
> Hi,
>
> Le torstaina 21. marraskuuta 2024, 12.43.38 EET flow gg a écrit :
> > This patch comes after:
> > [PATCH 1/2] Update R-V V vvc_mc vs
solved.
Rémi Denis-Courmont 于2024年11月27日周三 01:12写道:
> Le tiistaina 26. marraskuuta 2024, 5.02.57 EET flow gg a écrit :
> > ping
>
> Unless I am mistaken this set (as a whole) had unaddressed review comments.
>
> --
> Rémi Denis-Co
ping
于2024年10月12日周六 17:28写道:
> From: sunyuechi
>
> k230 banana_f3
> dmvr_8_12x20_c: 619.3 ( 1.00x)624.1 ( 1.00x)
> dmvr_8_12x20_rvv_i32: 128.6 ( 4.82x)103.4 ( 6.04x)
> dmvr_8_20x12_c:
Updated them.
Rémi Denis-Courmont 于2024年11月18日周一 04:23写道:
> Le sunnuntaina 17. marraskuuta 2024, 15.16.23 EET u...@foxmail.com a
> écrit :
> > From: sunyuechi
> >
> > k230 banana_f3
> > sad_8x16_c: 385.9 ( 1.00x)403.1 ( 1.00x)
> > s
This patch comes after:
[PATCH 1/2] Update R-V V vvc_mc vset to support more lengths
[PATCH 2/2] lavc/vvc_mc: R-V V dmvr.
Rémi Denis-Courmont 于2024年11月19日周二 04:10写道:
> Le sunnuntaina 17. marraskuuta 2024, 15.17.49 EET flow gg a écrit :
> > > Generally speaking, I think that moving
Updated.
Rémi Denis-Courmont 于2024年11月20日周三 00:19写道:
> Le tiistaina 19. marraskuuta 2024, 11.11.40 EET u...@foxmail.com a écrit :
> > From: sunyuechi
>
> That patch does not conform to the ABI.
>
> --
> レミ・デニ-クールモン
> http://www.remlab.net/
> ___
> ffm
Use this instead
于2024年11月19日周二 17:12写道:
> From: sunyuechi
>
> ---
> libavcodec/riscv/rv40dsp_rvv.S | 113 ++---
> 1 file changed, 75 insertions(+), 38 deletions(-)
>
> diff --git a/libavcodec/riscv/rv40dsp_rvv.S
> b/libavcodec/riscv/rv40dsp_rvv.S
> index ca431eb8ab
Please ignore this
于2024年11月19日周二 17:08写道:
> From: sunyuechi
>
> ---
> libavcodec/riscv/rv40dsp_rvv.S | 111 ++---
> 1 file changed, 73 insertions(+), 38 deletions(-)
>
> diff --git a/libavcodec/riscv/rv40dsp_rvv.S
> b/libavcodec/riscv/rv40dsp_rvv.S
> index ca431eb8
> Generally speaking, I think that moving code should be done in dedicated
> patches.
> You can branch here. The rest of the byte code is the same in all but one
> cases.
Updated this.
于2024年11月17日周日 21:17写道:
> From: sunyuechi
>
> ---
> libavcodec/riscv/h26x/asm.S | 127 +++
It seems like something overflowed, I'll take a look at it...
Rémi Denis-Courmont 于2024年11月18日周一 00:51写道:
> This reverts commit 5bc3b7f51308b8027e5468ef60d8336a960193e2.
>
> put_chroma_mc4, put_chroma_mc8 and avg_chroma_mc8 are confirmed to
> break `fate-rv40`. It is probably just luck that avg_
> Is this going to be reused anywhere? it seems the macro is only used once
atm.
The next patch will use ([PATCH 4/5] lavc/hevc: R-V V pel_uni(pow2))
> Also is there a reason to use RVV here instead of just unaligned RVI?
Yes, RVI is enough; I deleted it and resent it.
Rémi Denis-Courmont 于202
ping
于2024年10月12日周六 17:28写道:
> From: sunyuechi
>
> k230 banana_f3
> dmvr_8_12x20_c: 619.3 ( 1.00x)624.1 ( 1.00x)
> dmvr_8_12x20_rvv_i32: 128.6 ( 4.82x)103.4 ( 6.04x)
> dmvr_8_20x12_c:
> Even without Zvbb's widening shift, widening multiplication is probably
faster
here.
Updated, it has indeed gotten faster.
于2024年10月29日周二 00:44写道:
> From: sunyuechi
>
> k230
> banana_f3
> put_chroma_pixels_8_4x4_c:
> Up to 64-bit rows, you can use strided loads and stores here.
Due to the SRC_OFFSET in testing, only e8 and e16 can be loaded; e32 cannot
be loaded (Bus error).
Since the width ranges from 4 to 128, it seems that strided loads may not
be possible.
> Though for memory copying, unaligned scalar a
Fixed asm through `dmvr_hv\vlen\w:` to `func dmvr_hv\vlen\w, zve32x, zbb,
zba`
Rémi Denis-Courmont 于2024年10月12日周六 14:33写道:
> Hi,
>
> This fails to assemble here (binutils 2.43.1).
>
> --
> 雷米‧德尼-库尔蒙
> http://www.remlab.net/
> ___
> ffmpeg-devel mailing
Fix init.
于2024年10月12日周六 14:25写道:
> From: sunyuechi
>
> k230
> banana_f3
> put_hevc_pel_uni_pixels4_8_c: 126.3 ( 1.00x)
> 90.5 ( 1.00x)
> put_hevc_pel_uni_pixels4_8_rvv_i32: 24.6 ( 5.14x)
> 17.5
LGTM
Rémi Denis-Courmont 于2024年10月12日周六 13:38写道:
> The current triggers an illegal instruction if the CPU does not support
> vectors.
> ---
> libavcodec/riscv/vvc/vvcdsp_init.c | 12 +++-
> 1 file changed, 7 insertions(+), 5 deletions(-)
>
> diff --git a/libavcodec/riscv/vvc/vvcdsp_init
ping. ([PATCH 1/5] lavc/vvc_mc: R-V V put_pixels is after this)
于2024年9月29日周日 00:47写道:
> From: sunyuechi
>
> k230 banana_f3
> dmvr_8_12x20_c: 619.3 ( 1.00x)624.1 ( 1.00x)
> dmvr_8_12x20_rvv_i32: 128.6 (
s down.. after updating,
it has indeed become faster.
Rémi Denis-Courmont 于2024年9月28日周六 21:49写道:
>
>
> Le 28 septembre 2024 12:42:37 GMT+03:00, flow gg a
> écrit :
> >> Is 4x unroll really faster than 2x here? We don't typically unroll 4x
> >> manually.
> >
&
> Is 4x unroll really faster than 2x here? We don't typically unroll 4x
> manually.
I first did 2x and then changed it to 4x. The test results are similar, and
I'm not sure how to choose between them...
> t5 seems to be 8-bit, so vwmulu.vx should work better here? Since you
> leveraged it in the
ping
flow gg 于2024年8月28日周三 14:43写道:
> It seems that the previous patch have partially lacked if RVB, but now it
> has if (flags & AV_CPU_FLAG_RVB).
>
> Rémi Denis-Courmont 于2024年8月28日周三 03:00写道:
>
>> Le sunnuntaina 25. elokuuta 2024, 14.41.22 EEST flow gg a écrit :
It feels like this patch has been sitting idle for quite a long time...
Maybe it's time to merge it
Rémi Denis-Courmont 于2024年9月14日周六 22:45写道:
> Hi,
>
> LGTM for the RISC-V side. No clue about the VVC side.
> ___
> ffmpeg-devel mailing list
> ffmpeg-de
> LGTM for the RISC-V side. No clue about the VVC side.
Hi, Nuomi, could you please reply here? Thanks
flow gg 于2024年9月13日周五 00:45写道:
> ping
>
> flow gg 于2024年8月28日周三 14:38写道:
>
>> Updated: zve32x -> zve32x, zbb, zba
>>
>> 于2024年8月2
ping
flow gg 于2024年8月28日周三 14:38写道:
> Updated: zve32x -> zve32x, zbb, zba
>
> 于2024年8月28日周三 14:37写道:
>
>> From: sunyuechi
>>
>> C908 X60
>> avg_8_2x2_c
It seems that the previous patch have partially lacked if RVB, but now it
has if (flags & AV_CPU_FLAG_RVB).
Rémi Denis-Courmont 于2024年8月28日周三 03:00写道:
> Le sunnuntaina 25. elokuuta 2024, 14.41.22 EEST flow gg a écrit :
> > > Does not assemble with binutils 2.43.1 a
Updated: zve32x -> zve32x, zbb, zba
于2024年8月28日周三 14:37写道:
> From: sunyuechi
>
> C908 X60
> avg_8_2x2_c:1.21.0
> avg_8_2x2_rvv_i32 :0.70.7
> avg_8_2x4_
> Does not assemble with binutils 2.43.1 and default flags.
Fixed through zve32x -> zve32x, zba
于2024年8月25日周日 19:40写道:
> From: sunyuechi
>
> C908 X60
> vp9_avg_8tap_smooth_4h_8bpp_c : 12.7 11.2
> vp9_avg_8tap_smoot
I wrote `ff_vvc_w_avg_8_rvv` by mimicking the h264 weight function.
Based on the test results for 49 different resolutions, most of them were
significantly slower.
Only 2x32 and 2x64 had similar performance, without noticeable speed
improvement.
I'm not sure about the reason. Some differences ar
How can I test the weight and biweight of H.264? I haven't seen the related
test code..
tests/checkasm/checkasm --bench --test=h264dsp
Rémi Denis-Courmont 于2024年8月15日周四 16:10写道:
>
>
> Le 3 août 2024 13:30:34 GMT+03:00, u...@foxmail.com a écrit :
> >From: sunyuechi
> >
> >
> That seems suboptimal and unnecessary.
Updated it, there is no longer any vmv.
于2024年8月9日周五 22:24写道:
> From: sunyuechi
>
> C908 X60
> vp9_avg_bilin_4hv_8bpp_c : 10.79.5
> vp9_avg_bilin_4hv_8bpp_rvv_i32
Added lpad and resolved conflicts with master.
于2024年8月3日周六 18:31写道:
> From: sunyuechi
>
> C908 X60
> avg_8_2x2_c:1.21.0
> avg_8_2x2_rvv_i32 :0.70.7
>
> Looks OK, but missing CFI landing pads.
Added lpad.
于2024年8月3日周六 17:51写道:
> From: sunyuechi
>
> C908 X60
> vp9_avg_bilin_4h_8bpp_c:5.54.7
> vp9_avg_bilin_4h_8bpp_rvv_i32 :1.7
> Use rounding.
Updated it and resolved conflicts with master.
于2024年8月1日周四 20:16写道:
> From: sunyuechi
>
> C908 X60
> vp9_avg_8tap_smooth_4h_8bpp_c : 12.7 11.2
> vp9_avg_8tap_smooth_4h_8bpp_rvv_i32:
Denis-Courmont 于2024年7月31日周三 23:06写道:
> Le tiistaina 30. heinäkuuta 2024, 20.57.28 EEST flow gg a écrit :
> > From my understanding, moving from supporting only 128b to adding 256b
> > versions can simultaneously improve LMUL and solve some issues related to
> > insufficient
I'm a bit confused because the calculation here goes up to 32 bits and then
returns to 8 bits. It seems that the vmax and vnclipu instructions can't be
removed by using round-related instructions?
Rémi Denis-Courmont 于2024年7月29日周一 23:21写道:
> Le tiistaina 23. heinäkuuta 2024, 11.51.48 EEST u...@f
Hi, these four patches have v2 (although the first one seems to be the
same).
From my understanding, moving from supporting only 128b to adding 256b
versions can simultaneously improve LMUL and solve some issues related to
insufficient vector registers (vvc, vp9).
This can be very helpful in certa
Because of the 3/4 update, updated it."
于2024年7月23日周二 16:59写道:
> From: sunyuechi
>
> C908 X60
> vp9_avg_8tap_smooth_4hv_8bpp_c : 32.0 28.0
> vp9_avg_8tap_smooth_4hv_8bpp_rvv_i32 : 15.0 13.2
> vp9_av
> TBH it is very hard to review this due to the large extents of code
> conditionals. This should avoidable at least partly. You can name macros
for
> each filter and then expand those macros instead of using if's.
Do you mean that before the addition of .equ ff_vp9_subpel_filters_xxx,
epel_filter
Okay, updated it
Rémi Denis-Courmont 于2024年7月19日周五 23:56写道:
> Le torstaina 18. heinäkuuta 2024, 18.04.15 EEST flow gg a écrit :
> > > Again, I don't think that a maximul multiplier belongs here. If the
> > > calling code cannot scale the multiplier up, then it sho
> Again, I don't think that a maximul multiplier belongs here. If the
calling
> code cannot scale the multiplier up, then it should be a normal loop
providing
> the same code for all VLENs.
I think it's acceptable to add such a parameter, which isn't particularly
common in other files, because thi
> vssseg2e8
> vlsseg4e8
> vwadd.wv
> I can't find where VXRM is initialised for that.
Updated them and add csrwi
于2024年7月15日周一 00:30写道:
> From: sunyuechi
>
> C908 X60
> vp8_loop_filter_simple_h_c :6.25.7
> v
function, then vsetvlstatic16 uses max_lmul == m8.
If e32 is involved in the function, then vsetvlstatic16 uses max_lmul == m4.
I think it is clearer now.
Rémi Denis-Courmont 于2024年7月8日周一 23:41写道:
> Le maanantaina 1. heinäkuuta 2024, 19.09.01 EEST flow gg a écrit :
> > I reviewed it again, th
I reviewed it again, the purpose of is_w is to limit lmul to a maximum of
1/4 of vlen, to prevent vector register shortage, which can also be
considered as vset limiting lmul. I renamed it to quarter_len_limit.
t0 is changed to t1.
于2024年7月2日周二 00:07写道:
> From: sunyuechi
>
>
> I am not sure what is_w means or serves here. If you need special cases,
this
> feels a bit out of place for this macro.
It is a special case added to merge the vset of avg and w_avg, how about
giving it a default value so that it doesn't affect the use of other
functions?
> I am not sure if I
The loop filter horizontal in vp8 also has this issue ..
Rémi Denis-Courmont 于2024年6月30日周日 17:04写道:
> T-Head C908 (cycles):
> h264_h_loop_filter_luma_8bpp_c: 297.5
> h264_h_loop_filter_luma_8bpp_rvv_i32: 374.7
> h264_v_loop_filter_luma_8bpp_c: 862.7
> h264_v_loop_filter_luma_8bpp_rvv
Initially, I tried using `vnclip.wi` with reference to h264,
-vwadd.wxv16, v16, t4
-vnsra.wiv16, v16, 4
+vnclip.wi v16, v16, 4
but couldn't find the correct way... I think there might be some overflow
issues that I didn't understand correctly. How do y
> You can directly LLA filters + 16 * 8 * 2 and save one add. Same below.
You can
> also use .equ to alias the filter addresses, and avoid if's.
> That's a lot of address dependencies, which is going to hurt performance.
It
> might help to just spill more S registers if needed.
> This can be done
> Copying vectors is rarely justified - mostly only before destructive
> instructions such as FMA.
It is slightly different from VP8. In VP8, many scalar values are positive,
so the related calculations can be easily replaced. However, in this
context of VP9, since t2 is a negative number, vwmaccs
Just like in VP8, the unroll has been updated.
于2024年6月15日周六 19:51写道:
> From: sunyuechi
>
> C908 X60
> vp9_avg_bilin_4h_8bpp_c:5.54.7
> vp9_avg_bilin_4h_8bpp_rvv_i32 :1.71.5
>
> Does this not render the type parameter of bilin_load useless (always h)?
> (Not a blocker for this patch.)
Yes, this was needed in the initial version, but it is no longer required.
I just sent a patch.
> Not sure if I already asked this but is this really faster than slide1?
> Normally we wan
ping
于2024年5月30日周四 23:27写道:
> From: sunyuechi
>
> Since len < 64, the registers are sufficient, so it can be
> directly unrolled (a4 is even).
>
> Another benefit of unrolling is that it reduces one load operation
> vertically compared to horizontally.
>
> old
> Nit: for overall code base consistency, I'd use csrwi here. Reason being
that
> for other rounding modes, csrwi is the better option.
>
> Probably faster to swap the two above, to avoid stalling on LD.
>
> If you check more than one length, better to get ff_get_rv_vlenb() into a
local
> variable.
> I think we can drop the 2x2 transforms. In all likelihood, scalar code
will
> end up faster than vector code on future hardware, especially out-of-order
> pipelines.
I want to drop 2x2, but since there's only one function to handle all
situations instead of 7*7 functions..
> AFAIU, this will ge
> I think we can drop the 2x2 transforms. In all likelihood, scalar code
will
> end up faster than vector code on future hardware, especially out-of-order
> pipelines.
I want to drop 2x2, but since there's only one function to handle all
situations instead of 7*7 functions, how can I drop only 2x2
> In keeping in line with the rest of the project, that should probably go
into
> **libavcodec/riscv/vvc/**
> Expanding the macro 49 times, with up to 14 **branches** to get there is
maybe not
> such a great idea. It might look nice on the checkasm µbenchmarks because
the
> branches under test get
Well.. because scalar registers are limited, the direct unrolling will be
like this for now. We can handle different lengths separately in the future
flow gg 于2024年5月30日周四 23:36写道:
> I directly copied the VP9 modifications over... Since len <= 16, it seems
> like it can be improved a
I directly copied the VP9 modifications over... Since len <= 16, it seems
like it can be improved a bit more
于2024年5月30日周四 23:27写道:
> From: sunyuechi
>
> Since len < 64, the registers are sufficient, so it can be
> directly unrolled (a4 is even).
>
> Another benefit of unrolling is that it redu
A portion has been modified according to the previous review, but there are
still some parts that haven't been updated
> Similarly, it
> should be possible to share most of the horizontal and vertical code
(maybe
> also for bilinear. not just EPel) with separate load/store then inner
> procedures.
Hi, maybe we can prioritize this revert:
https://git.ffmpeg.org/gitweb/ffmpeg.git/commit/0c1304ae11b0361ede055ee8ffc6e83529468c73
Using [PATCH v2 1/5] lavc/vp9dsp: R-V V mc avg to avoid conflicts with
other patches.
flow gg 于2024年5月24日周五 14:13写道:
> I want to update the VP9 bilin load, just l
reduction in code size seems to be due to switching to using j labels,
doesn't seem to be about vset, but another issue. j labels are indeed
better. I will make similar modifications.
Rémi Denis-Courmont 于2024年5月26日周日 02:29写道:
> Le lauantaina 25. toukokuuta 2024, 21.16.22 EEST flow gg a écri
Would it be better to replace the two vsetvlstatic8 and vsetvlstatic16 with
two vsetvl? This would require the previous patch and this one to work
together, increasing the number of lines of code and making the code a bit
harder to read.
Additionally, I have a question about patch 4 'save one R-V G
One more thing I remember is that after adjusting the sign, vmacc can be
used; otherwise, due to the sign, mul + add are needed.
flow gg 于2024年5月25日周六 18:38写道:
> > Is there a reason that you cannot use the tables from C code?
>
> Similar to VP8, to adjust the positive and negat
> Is there a reason that you cannot use the tables from C code?
Similar to VP8, to adjust the positive and negative data and prevent small
probability overflow during calculations.
> AFAICT, regular and sharp are identical, except for the base address of
the
> filter table, so it should be possib
I want to update the VP9 bilin load, just like you did with VP8, but it
seems like this patch([PATCH v2 1/5] lavc/vp9dsp: R-V V mc avg) doesn't
merge the current updates here but merges the previous version instead, so
the subsequent patches will have conflicts.
flow gg 于2024年5月22日周三 01
Unfortunately I only test to obtain benchmarks and basic correctness. I
always feel the need for a professional to write the tests.
Rémi Denis-Courmont 于2024年5月23日周四 04:35写道:
>
>
> Le 22 mai 2024 23:28:54 GMT+03:00, "Rémi Denis-Courmont"
> a écrit :
> >This removes one stray LI and reworks the
Reordered some here.
于2024年5月22日周三 03:24写道:
> From: sunyuechi
>
> C908 X60
> avg_8_2x2_c:1.01.0
> avg_8_2x2_rvv_i32 :0.70.7
> avg_8_2x4_c
Do macros definition also need a comma? I noticed that many of my old code
and SiFive's code don't have a comma
Rémi Denis-Courmont 于2024年5月22日周三 02:29写道:
> Le tiistaina 21. toukokuuta 2024, 20.13.16 EEST u...@foxmail.com a écrit :
> > From: sunyuechi
>
> > diff --git a/libavcodec/riscv/vp9_mc_
> I would expect that you can get better performance by interleaving scalar
and
vector stuff, and possibly also vector loads and vector arithmetic.
Okay, I will try
> These labels lead to nowhere? If you actually mean to implicitly fall
through
to the next function, you can use the function name
> Please put commas between operands.
> This should probably be ff_avg_vp9 or something slightly more specific.
Updated here.
于2024年5月22日周三 01:14写道:
> From: sunyuechi
>
> C908:
> vp9_avg4_8bpp_c: 1.2
> vp9_avg4_8bpp_rvv_i64: 1.0
> vp9_avg8_8bpp_c: 3.7
> vp9_avg8_8bpp_rvv_i64: 1.5
> vp9_avg16_8
> Please put commas between operands.
Okay
> This should probably be ff_avg_vp9 or something slightly more specific.
Is it necessary here? Many macros in the C file are copied from MIPS, where
it is called ff_avg4_msa. Here, it has been simply changed to ff_avg4_rvv.
Rémi Denis-Courmont 于2024年
There are three unused lines which I forgot to delete before submitting. I
have updated them here.
于2024年5月21日周二 15:47写道:
> From: sunyuechi
>
> C908 X60
> avg_8_2x2_c:1.01.0
> avg_8_2x2_rvv_i
To obtain test results, need to comment out the if (w == h) in
tests/checkasm/vvc_mc.c.
Because vset needs to be used in the loop, I manually wrote a cumbersome
vset macro.
于2024年5月21日周二 15:38写道:
> From: sunyuechi
>
> C908 X60
> avg_8_2x2_
fix .irp use
于2024年5月19日周日 16:18写道:
> From: sunyuechi
>
> C908:
> vp8_put_epel4_h4v4_c: 20.0
> vp8_put_epel4_h4v4_rvv_i32: 11.0
> vp8_put_epel4_h4v6_c: 25.2
> vp8_put_epel4_h4v6_rvv_i32: 13.5
> vp8_put_epel4_h6v4_c: 22.2
> vp8_put_epel4_h6v4_rvv_i32: 14.5
> vp8_put_epel4_h6v6_c: 29.0
> vp8_put_
fixed in v4
Rémi Denis-Courmont 于2024年5月18日周六 23:56写道:
> Le maanantaina 13. toukokuuta 2024, 19.59.23 EEST u...@foxmail.com a
> écrit :
> > From: sunyuechi
> >
> > C908:
> > vp9_avg_bilin_4h_8bpp_c: 5.2
> > vp9_avg_bilin_4h_8bpp_rvv_i64: 2.2
> > vp9_avg_bilin_4v_8bpp_c: 5.5
> > vp9_avg_bilin_4v
Fixed issues with .irp and comma, as well as the ifc issue (same
modifications as previously done for vp8).
于2024年5月19日周日 02:16写道:
> From: sunyuechi
>
> C908:
> vp9_avg4_8bpp_c: 1.2
> vp9_avg4_8bpp_rvv_i64: 1.0
> vp9_avg8_8bpp_c: 3.7
> vp9_avg8_8bpp_rvv_i64: 1.5
> vp9_avg16_8bpp_c: 14.7
> vp9_a
yeah, updated it in the reply
Rémi Denis-Courmont 于2024年5月17日周五 23:11写道:
> Le maanantaina 13. toukokuuta 2024, 19.59.22 EEST u...@foxmail.com a
> écrit :
> > From: sunyuechi
> >
> > C908:
> > vp9_avg4_8bpp_c: 1.2
> > vp9_avg4_8bpp_rvv_i64: 1.0
> > vp9_avg8_8bpp_c: 3.7
> > vp9_avg8_8bpp_rvv_i64:
Is the test result missing here?
Rémi Denis-Courmont 于2024年5月16日周四 01:11写道:
> ---
> libavcodec/riscv/Makefile| 1 +
> libavcodec/riscv/h264dsp_init.c | 5
> libavcodec/riscv/startcode_rvv.S | 44
> libavcodec/riscv/vc1dsp_init.c | 16 +++---
updated for clean code
于2024年5月15日周三 11:56写道:
> From: sunyuechi
>
> C908:
> vp9_tm_4x4_8bpp_c: 116.5
> vp9_tm_4x4_8bpp_rvv_i32: 43.5
> vp9_tm_8x8_8bpp_c: 416.2
> vp9_tm_8x8_8bpp_rvv_i32: 86.0
> vp9_tm_16x16_8bpp_c: 1665.5
> vp9_tm_16x16_8bpp_rvv_i32: 187.2
> vp9_tm_32x32_8bpp_c: 6974.2
> vp9_tm
in the reply
Rémi Denis-Courmont 于2024年5月15日周三 02:08写道:
> Le tiistaina 14. toukokuuta 2024, 20.57.17 EEST flow gg a écrit :
> > Why is it unnecessary to reset the vector configuration every time? I
> think
> > it is necessary to reset e16/e8 each time.
>
> I misread the p
Why is it unnecessary to reset the vector configuration every time? I think
it is necessary to reset e16/e8 each time.
Rémi Denis-Courmont 于2024年5月15日周三 01:46写道:
> Le maanantaina 13. toukokuuta 2024, 19.59.21 EEST u...@foxmail.com a
> écrit :
> > From: sunyuechi
> >
> > C908:
> > vp9_tm_4x4_8bp
Okay, learned it
Rémi Denis-Courmont 于2024年5月15日周三 01:00写道:
> Le tiistaina 14. toukokuuta 2024, 7.45.29 EEST flow gg a écrit :
> > I am locally using:
> > if (bpp == 8 && (flags & AV_CPU_FLAG_RVI) && (flags &
> > AV_CPU_FLAG_RVB_ADDR)) {
>
&g
Using this will give output `if (bpp == 8 && (flags & AV_CPU_FLAG_RVI)) {`
Did you comment out the MISALIGNED flag check but not add RVI, resulting in
no output?
Rémi Denis-Courmont 于2024年5月15日周三 01:02写道:
> Le tiistaina 14. toukokuuta 2024, 7.44.55 EEST flow gg a écrit :
> &g
1 - 100 of 308 matches
Mail list logo