Le lauantaina 10. helmikuuta 2024, 11.14.11 EET Rémi Denis-Courmont a écrit :
> But your patchset seems to leave those out anyway.
Nevermind that bit, I missed other mails
--
レミ・デニ-クールモン
http://www.remlab.net/
___
ffmpeg-devel mailing list
ffmpeg-deve
Le perjantaina 9. helmikuuta 2024, 17.34.40 EET flow gg a écrit :
> The issue here is that any load greater than e8 will fail the test(Bus
> error), so it cannot use vlse64 or similar methods...
AFAICT, data is aligned on 16 bytes here, so using larger element sizes should
not be a problem. That
The issue here is that any load greater than e8 will fail the test(Bus
error), so it cannot use vlse64 or similar methods...
Rémi Denis-Courmont 于2024年2月9日周五 18:32写道:
>
>
> Le 9 février 2024 00:39:38 GMT+02:00, flow gg a
> écrit :
> >From my understanding, to use larger group multipliers, one n
Le 9 février 2024 00:39:38 GMT+02:00, flow gg a écrit :
>From my understanding, to use larger group multipliers, one needs to
>utilize vlse64 (8x8) vlse128 (16x16).
>
>However, due to the use in tests of
>
>ptr = img2 + y * WIDTH + x;
>d2 = call_ref(NULL, img1, ptr, WIDTH, h);
>d1 = call_new(NUL
From my understanding, to use larger group multipliers, one needs to
utilize vlse64 (8x8) vlse128 (16x16).
However, due to the use in tests of
ptr = img2 + y * WIDTH + x;
d2 = call_ref(NULL, img1, ptr, WIDTH, h);
d1 = call_new(NULL, img1, ptr, WIDTH, h);
will get: pix_abs_1_0_rvv_i32 (fatal sig
Le keskiviikkona 7. helmikuuta 2024, 2.01.23 EET flow gg a écrit :
> I think in most cases it is like this, but specifically for this function,
> using Reduction only once would be slower.
>
> The currently submitted version roughly takes:
> pix_abs_0_0_rvv_i32: 136.2
>
> The version that uses Re
I think in most cases it is like this, but specifically for this function,
using Reduction only once would be slower.
The currently submitted version roughly takes:
pix_abs_0_0_rvv_i32: 136.2
The version that uses Reduction only once takes:
pix_abs_0_0_rvv_i32: 169.2
Here is the implementation o
Hi,
To sum a vector, you should only reduce once at the end of the function, c.f.
how it's done in existing scalar products. Reduction instructions are
(intrinsically) slow.
--
Rémi Denis-Courmont
http://www.remlab.net/
___
ffmpeg-devel mailing li
From d4d6b3ea040f3f7997463b4452813bc75d1c9f9d Mon Sep 17 00:00:00 2001
From: sunyuechi
Date: Sat, 3 Feb 2024 10:58:13 +0800
Subject: [PATCH 1/7] lavc/me_cmp: R-V V pix_abs
C908:
pix_abs_0_0_c: 534.0
pix_abs_0_0_rvv_i32: 136.2
pix_abs_1_0_c: 287.7
pix_abs_1_0_rvv_i32: 125.2
sad_0_c: 534.0
sad_0_r