Re: [FFmpeg-devel] [PATCH] swscale/x86/rgb2rgb: add AVX512ICL version of uyvytoyuv422

2025-02-18 Thread James Almer
On 2/18/2025 11:58 AM, Shreesh Adiga wrote: On Mon, Feb 3, 2025 at 10:03 PM Shreesh Adiga <16567adigashre...@gmail.com> wrote: The scalar loop is replaced with masked AVX512 instructions. For extracting the Y from UYVY, vperm2b is used instead of various AND and packuswb. Instead of loading th

Re: [FFmpeg-devel] [PATCH] swscale/x86/rgb2rgb: add AVX512ICL version of uyvytoyuv422

2025-02-18 Thread Shreesh Adiga
On Mon, Feb 3, 2025 at 10:03 PM Shreesh Adiga <16567adigashre...@gmail.com> wrote: > > The scalar loop is replaced with masked AVX512 instructions. > For extracting the Y from UYVY, vperm2b is used instead of > various AND and packuswb. > > Instead of loading the vectors with interleaved lanes as d

[FFmpeg-devel] [PATCH] swscale/x86/rgb2rgb: add AVX512ICL version of uyvytoyuv422

2025-02-03 Thread Shreesh Adiga
The scalar loop is replaced with masked AVX512 instructions. For extracting the Y from UYVY, vperm2b is used instead of various AND and packuswb. Instead of loading the vectors with interleaved lanes as done in AVX2 version, normal load is used. At the end of packuswb, for U and V, an extra permut