input: add AVX2 optimized uyvytoyuv422

Rémi Denis-Courmont Fri, 07 Jun 2024 00:25:19 -0700


Le 6 juin 2024 10:01:24 GMT+03:00, Christophe Gisquet 
<christophe.gisq...@gmail.com> a écrit :
>Le jeu. 6 juin 2024 à 08:11, Rémi Denis-Courmont <r...@remlab.net> a écrit :
>> >James Almer:
>> >> uyvytoyuv422_c: 23991.8
>> >> uyvytoyuv422_sse2: 2817.8
>> >> uyvytoyuv422_avx: 2819.3
>> >
>> >Why don't you nuke the avx version in a follow-up patch?
>>
>> Same problem with the RGBA stuff as well. Are the AVX functions expected to 
>> be faster than SSE2 on processors *without* AVX2?
>
>Something frequent in this type of questions is that people are using
>numbers from a CPU that has had 10 years of arch improvements (and
>probably a doubling in throughput for any instruction set) over one
>that supported at most AVX. The presence of an AVX function (whose
>benefit is only 3-operand instructions, so admittedly small) would
>ideally only be benchmarked on that kind of CPUs.


It feels a bit dense for someone not intimate with x86 innards such as I. 
Intuitively, 3 operands instructions are certainly helpful in avoiding vector 
copies around destructive operations. But it should be clear if, for any given 
function, this does or does not help.

That being said, I have no objections as such to run-time optimisations for 
middle-aged processors.

If anything, I think we should auto-trim the useless C code at least via DCE if 
we want to save space. We can assume at least SSE2 on x86-64, no? Of course 
this will break checkasm and some command line flags.
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH 2/2] swscale/x86/input: add AVX2 optimized uyvytoyuv422

Reply via email to