Le lauantaina 10. helmikuuta 2024, 11.14.11 EET Rémi Denis-Courmont a écrit :
> But your patchset seems to leave those out anyway.
Nevermind that bit, I missed other mails
--
レミ・デニ-クールモン
http://www.remlab.net/
___
ffmpeg-devel mailing list
ffmpeg-deve
Le perjantaina 9. helmikuuta 2024, 17.34.40 EET flow gg a écrit :
> The issue here is that any load greater than e8 will fail the test(Bus
> error), so it cannot use vlse64 or similar methods...
AFAICT, data is aligned on 16 bytes here, so using larger element sizes should
not be a problem. That
The issue here is that any load greater than e8 will fail the test(Bus
error), so it cannot use vlse64 or similar methods...
Rémi Denis-Courmont 于2024年2月9日周五 18:32写道:
>
>
> Le 9 février 2024 00:39:38 GMT+02:00, flow gg a
> écrit :
> >From my understanding, to use larger group multipliers, one n
Le 9 février 2024 00:39:38 GMT+02:00, flow gg a écrit :
>From my understanding, to use larger group multipliers, one needs to
>utilize vlse64 (8x8) vlse128 (16x16).
>
>However, due to the use in tests of
>
>ptr = img2 + y * WIDTH + x;
>d2 = call_ref(NULL, img1, ptr, WIDTH, h);
>d1 = call_new(NUL
From my understanding, to use larger group multipliers, one needs to
utilize vlse64 (8x8) vlse128 (16x16).
However, due to the use in tests of
ptr = img2 + y * WIDTH + x;
d2 = call_ref(NULL, img1, ptr, WIDTH, h);
d1 = call_new(NULL, img1, ptr, WIDTH, h);
will get: pix_abs_1_0_rvv_i32 (fatal sig
Le keskiviikkona 7. helmikuuta 2024, 2.01.23 EET flow gg a écrit :
> I think in most cases it is like this, but specifically for this function,
> using Reduction only once would be slower.
>
> The currently submitted version roughly takes:
> pix_abs_0_0_rvv_i32: 136.2
>
> The version that uses Re
I think in most cases it is like this, but specifically for this function,
using Reduction only once would be slower.
The currently submitted version roughly takes:
pix_abs_0_0_rvv_i32: 136.2
The version that uses Reduction only once takes:
pix_abs_0_0_rvv_i32: 169.2
Here is the implementation o
Hi,
To sum a vector, you should only reduce once at the end of the function, c.f.
how it's done in existing scalar products. Reduction instructions are
(intrinsically) slow.
--
Rémi Denis-Courmont
http://www.remlab.net/
___
ffmpeg-devel mailing li