On 2/18/2025 11:58 AM, Shreesh Adiga wrote:
On Mon, Feb 3, 2025 at 10:03 PM Shreesh Adiga
<16567adigashre...@gmail.com> wrote:
The scalar loop is replaced with masked AVX512 instructions.
For extracting the Y from UYVY, vperm2b is used instead of
various AND and packuswb.
Instead of loading th
On Mon, Feb 3, 2025 at 10:03 PM Shreesh Adiga
<16567adigashre...@gmail.com> wrote:
>
> The scalar loop is replaced with masked AVX512 instructions.
> For extracting the Y from UYVY, vperm2b is used instead of
> various AND and packuswb.
>
> Instead of loading the vectors with interleaved lanes as d
The scalar loop is replaced with masked AVX512 instructions.
For extracting the Y from UYVY, vperm2b is used instead of
various AND and packuswb.
Instead of loading the vectors with interleaved lanes as done
in AVX2 version, normal load is used. At the end of packuswb,
for U and V, an extra permut