Re: [FFmpeg-devel] [PATCH] swscale: Disable avx2 hscale 8to15 on IceLake and below due to Intel Gather Data Sampling mitigation performance loss

2025-08-21 Thread Alan Kelly via ffmpeg-devel
On Fri, Aug 8, 2025 at 2:23 PM Alan Kelly wrote: > > > On Fri, Aug 8, 2025 at 2:21 PM Alan Kelly wrote: > >> Intel provided a microcode update to mitigate this security >> vulnerability which has a huge negative performance impact on gather >> instructions. This means that hscale 8to15 avx2, whi

Re: [FFmpeg-devel] [PATCH] swscale: Disable avx2 hscale 8to15 on IceLake and below due to Intel Gather Data Sampling mitigation performance loss

2025-08-08 Thread Alan Kelly via ffmpeg-devel
On Fri, Aug 8, 2025 at 2:21 PM Alan Kelly wrote: > Intel provided a microcode update to mitigate this security > vulnerability which has a huge negative performance impact on gather > instructions. This means that hscale 8to15 avx2, which uses gather > extensively, is no longer faster than SSSE3

[FFmpeg-devel] [PATCH] swscale: Disable avx2 hscale 8to15 on IceLake and below due to Intel Gather Data Sampling mitigation performance loss

2025-08-08 Thread Alan Kelly via ffmpeg-devel
Intel provided a microcode update to mitigate this security vulnerability which has a huge negative performance impact on gather instructions. This means that hscale 8to15 avx2, which uses gather extensively, is no longer faster than SSSE3 on impacted CPUs. --- libavutil/x86/cpu.c | 6 -- 1 fi

Re: [FFmpeg-devel] [PATCH] swscale: Break loop-carried dependency enabling parallel out of order execution of the gathers.

2025-08-07 Thread Alan Kelly via ffmpeg-devel
On Mon, Aug 4, 2025 at 10:04 PM Hendrik Leppkes wrote: > On Mon, Aug 4, 2025 at 7:19 PM Jacob Lifshay > wrote: > > > > > > > > On August 4, 2025 6:49:20 AM PDT, Alan Kelly via ffmpeg-devel < > ffmpeg-devel@ffmpeg.org> wrote: > > > The gather i

[FFmpeg-devel] [PATCH] swscale: Break loop-carried dependency enabling parallel out of order execution of the gathers.

2025-08-04 Thread Alan Kelly via ffmpeg-devel
The gather is unmasked but the instruction does a merge into ymm4, which depends on the value of ymm4 from the previous loop iteration. The out-of-order scheduler does not know statically that the instruction is fully unmasked, preventing parallel out-of-order execution of the gathers. --- libswsc

Re: [FFmpeg-devel] [PATCH 1/3] swscale/x86/swscale: Process yuv2yuvX tails using next largest register size

2023-09-06 Thread Alan Kelly via ffmpeg-devel
On Tue, Sep 5, 2023 at 12:03 AM Michael Niedermayer wrote: > On Mon, Sep 04, 2023 at 02:30:00PM +0200, Alan Kelly via ffmpeg-devel > wrote: > > Hi, > > > > Any issues with this patch or can it be merged? > > are all cases covered by tests ? > if yes and the te

[FFmpeg-devel] [PATCH 2/2] swscale/x86/yuv2yuvX: Process tails by jumping back into the main loop.

2023-09-06 Thread Alan Kelly via ffmpeg-devel
--- libswscale/x86/swscale.c| 19 --- libswscale/x86/yuv2yuvX.asm | 24 ++-- 2 files changed, 26 insertions(+), 17 deletions(-) diff --git a/libswscale/x86/swscale.c b/libswscale/x86/swscale.c index 00e42b4bec..6980002e9e 100644 --- a/libswscale/x86/swscale

[FFmpeg-devel] [PATCH 1/2] swscale/x86/yuv2yuvX: Add yuv2yuvX avx512

2023-09-06 Thread Alan Kelly via ffmpeg-devel
--- libswscale/x86/swscale.c| 7 +++ libswscale/x86/yuv2yuvX.asm | 19 ++- 2 files changed, 25 insertions(+), 1 deletion(-) diff --git a/libswscale/x86/swscale.c b/libswscale/x86/swscale.c index ff16398988..00e42b4bec 100644 --- a/libswscale/x86/swscale.c +++ b/libswscale

Re: [FFmpeg-devel] [PATCH 1/3] swscale/x86/swscale: Process yuv2yuvX tails using next largest register size

2023-09-04 Thread Alan Kelly via ffmpeg-devel
Hi, Any issues with this patch or can it be merged? Thanks, Alan On Fri, Jul 14, 2023 at 12:08 PM Alan Kelly wrote: > --- > libswscale/x86/swscale.c | 8 > 1 file changed, 4 insertions(+), 4 deletions(-) > > diff --git a/libswscale/x86/swscale.c b/libswscale/x86/swscale.c > index ff