rgb2rgb: add uyvytoyuv422 avx2

Wu, Jianhua Tue, 28 Sep 2021 00:13:42 -0700

Min Chen wrote:
> 
> The current algoithm may get improve, may you combin these optimize with
> your patches? since extra VPERM make code a little more slower.
> 
> 
> 
> On Haswell
> Current alogithm:
> RSHIFT_COPY m6, m2, 1 ; UYVY UYVY -> YVYU YVY...
> pand m6, m1; YxYx YxYx... RSHIFT_COPY m7, m3, 1 ; UYVY UYVY -> YVYU YVY...
> pand m7, m1 ; YxYx YxYx... packuswb m6, m7 ; YYYY YYYY...
> 
> 
> Latency:
> 1 + 1 + 1 + 1 + 1 = 5
> 
> 
> Proposed:
> pshufb m6, m2, mX ; UYVY UYVY -> xxxx YYYY pshufb m7, m3, mX
> punpcklqdq m6, m7 ; YYYY YYYY
> 
> 
> Latency:
> 1 + 1 + 1 = 3
> 
> 
> I guess the current algorithm optimize for compatible with SSE2, because
> PSHUFB addition since SSSE3.
> Now, we try to optimzie with AVX, AVX2 and AVX512, so I suggest we use
> proposed algorithm to get more performance.
> 
> 
> Regards,
> Min Chen
>


Hi Min Chen,

Thanks for the careful review. You're right. 

Using the specific functionalities added in AVX2/512 should be better. I'll try
your proposal and see if it has a better performance. If so, I'll resubmit the 
new patches.

Best regards,
Jianhua

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH 3/4] libswscale/x86/rgb2rgb: add uyvytoyuv422 avx2

Reply via email to