Re: [FFmpeg-devel] [PATCH] swscale/x86/rgb2rgb: optimize AVX2 version of uyvytoyuv422

2025-04-05 Thread Kieran Kunhya via ffmpeg-devel
On Mon, 3 Mar 2025, 16:38 Shreesh Adiga, <16567adigashre...@gmail.com> wrote: > On Thu, Feb 20, 2025 at 6:51 PM Shreesh Adiga > <16567adigashre...@gmail.com> wrote: > > > > Currently the AVX2 version of uyvytoyuv422 in the SIMD loop does the > following: > > 4 vinsertq to have interleaving of the

Re: [FFmpeg-devel] [PATCH] swscale/x86/rgb2rgb: optimize AVX2 version of uyvytoyuv422

2025-03-03 Thread Shreesh Adiga
On Thu, Feb 20, 2025 at 6:51 PM Shreesh Adiga <16567adigashre...@gmail.com> wrote: > > Currently the AVX2 version of uyvytoyuv422 in the SIMD loop does the > following: > 4 vinsertq to have interleaving of the vector lanes during load from memory. > 4 vperm2i128 inside 4 RSHIFT_COPY calls to achie

[FFmpeg-devel] [PATCH] swscale/x86/rgb2rgb: optimize AVX2 version of uyvytoyuv422

2025-02-20 Thread Shreesh Adiga
Currently the AVX2 version of uyvytoyuv422 in the SIMD loop does the following: 4 vinsertq to have interleaving of the vector lanes during load from memory. 4 vperm2i128 inside 4 RSHIFT_COPY calls to achieve the desired layout. This patch replaces the above 8 instructions with 2 vpermq and 2 vperm