On Mon, 3 Mar 2025, 16:38 Shreesh Adiga, <16567adigashre...@gmail.com>
wrote:
> On Thu, Feb 20, 2025 at 6:51 PM Shreesh Adiga
> <16567adigashre...@gmail.com> wrote:
> >
> > Currently the AVX2 version of uyvytoyuv422 in the SIMD loop does the
> following:
> > 4 vinsertq to have interleaving of the
On Thu, Feb 20, 2025 at 6:51 PM Shreesh Adiga
<16567adigashre...@gmail.com> wrote:
>
> Currently the AVX2 version of uyvytoyuv422 in the SIMD loop does the
> following:
> 4 vinsertq to have interleaving of the vector lanes during load from memory.
> 4 vperm2i128 inside 4 RSHIFT_COPY calls to achie
Currently the AVX2 version of uyvytoyuv422 in the SIMD loop does the following:
4 vinsertq to have interleaving of the vector lanes during load from memory.
4 vperm2i128 inside 4 RSHIFT_COPY calls to achieve the desired layout.
This patch replaces the above 8 instructions with 2 vpermq and
2 vperm