> Unless I am mistaken this set (as a whole) had unaddressed review
comments.
Which comment remains unresolved? It seems that the last comment here
> This fails to assemble here (binutils 2.43.1).
has already been addressed.
As for the next 1-4 patches, the comments have also been resolved.
Ré
Le tiistaina 26. marraskuuta 2024, 5.02.57 EET flow gg a écrit :
> ping
Unless I am mistaken this set (as a whole) had unaddressed review comments.
--
Rémi Denis-Courmont
http://www.remlab.net/
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
h
ping
于2024年10月12日周六 17:28写道:
> From: sunyuechi
>
> k230 banana_f3
> dmvr_8_12x20_c: 619.3 ( 1.00x)624.1 ( 1.00x)
> dmvr_8_12x20_rvv_i32: 128.6 ( 4.82x)103.4 ( 6.04x)
> dmvr_8_20x12_c:
ping
于2024年10月12日周六 17:28写道:
> From: sunyuechi
>
> k230 banana_f3
> dmvr_8_12x20_c: 619.3 ( 1.00x)624.1 ( 1.00x)
> dmvr_8_12x20_rvv_i32: 128.6 ( 4.82x)103.4 ( 6.04x)
> dmvr_8_20x12_c:
Fixed asm through `dmvr_hv\vlen\w:` to `func dmvr_hv\vlen\w, zve32x, zbb,
zba`
Rémi Denis-Courmont 于2024年10月12日周六 14:33写道:
> Hi,
>
> This fails to assemble here (binutils 2.43.1).
>
> --
> 雷米‧德尼-库尔蒙
> http://www.remlab.net/
> ___
> ffmpeg-devel mailing
From: sunyuechi
k230 banana_f3
dmvr_8_12x20_c: 619.3 ( 1.00x)624.1 ( 1.00x)
dmvr_8_12x20_rvv_i32: 128.6 ( 4.82x)103.4 ( 6.04x)
dmvr_8_20x12_c: 610.0 ( 1.00x)665.6 ( 1.00x)
dm
Hi,
This fails to assemble here (binutils 2.43.1).
--
雷米‧德尼-库尔蒙
http://www.remlab.net/
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-requ.
ping. ([PATCH 1/5] lavc/vvc_mc: R-V V put_pixels is after this)
于2024年9月29日周日 00:47写道:
> From: sunyuechi
>
> k230 banana_f3
> dmvr_8_12x20_c: 619.3 ( 1.00x)624.1 ( 1.00x)
> dmvr_8_12x20_rvv_i32: 128.6 (
> At similar speed, shorter code is better.
Okay, updated it.
> Sure but so what? vsetvli/vsetivli is pretty fast (unlike vsetvl), and in
this case the code would be shorter. Or are you trying to factor the code
for different VTYPEs?
I mistakenly thought these vsets would slow things down.. afte
From: sunyuechi
k230 banana_f3
dmvr_8_12x20_c: 619.3 ( 1.00x)624.1 ( 1.00x)
dmvr_8_12x20_rvv_i32: 128.6 ( 4.82x)103.4 ( 6.04x)
dmvr_8_20x12_c: 610.0 ( 1.00x)665.6 ( 1.00x)
dm
Le 28 septembre 2024 12:42:37 GMT+03:00, flow gg a écrit
:
>> Is 4x unroll really faster than 2x here? We don't typically unroll 4x
>> manually.
>
>I first did 2x and then changed it to 4x. The test results are similar, and
>I'm not sure how to choose between them...
At similar speed, shorter
> Is 4x unroll really faster than 2x here? We don't typically unroll 4x
> manually.
I first did 2x and then changed it to 4x. The test results are similar, and
I'm not sure how to choose between them...
> t5 seems to be 8-bit, so vwmulu.vx should work better here? Since you
> leveraged it in the
From: sunyuechi
k230 banana_f3
dmvr_8_12x20_c: 626.5 ( 1.00x)621.7 ( 1.00x)
dmvr_8_12x20_rvv_i32: 126.3 ( 4.96x)79.9 ( 7.78x)
dmvr_8_20x12_c: 608.0 ( 1.00x)652.9 ( 1.00x)
dmv
Hi,
Le perjantaina 27. syyskuuta 2024, 20.09.30 EEST u...@foxmail.com a écrit :
> From: sunyuechi
>
> k230 banana_f3
> dmvr_8_12x20_c: 628.5 ( 1.00x)624.1 ( 1.00x)
> dmvr_8_12x20_rvv_i32: 137.5 ( 4.57x)
From: sunyuechi
k230 banana_f3
dmvr_8_12x20_c: 628.5 ( 1.00x)624.1 ( 1.00x)
dmvr_8_12x20_rvv_i32: 137.5 ( 4.57x)92.9 ( 6.72x)
dmvr_8_20x12_c: 609.7 ( 1.00x)655.4 ( 1.00x)
dmv
15 matches
Mail list logo