Re: [FFmpeg-devel] swscale/rgb2rgb : add X86_64 SIMD (SSSE3 and AVX2) for shuffly_bytes func

Carl Eugen Hoyos Sun, 18 Mar 2018 11:01:18 -0700

2018-03-18 18:20 GMT+01:00, Paul B Mahol <one...@gmail.com>:
> On 3/18/18, Carl Eugen Hoyos <ceffm...@gmail.com> wrote:
>> 2018-03-18 17:46 GMT+01:00, Martin Vignali <martin.vign...@gmail.com>:
>>> 2018-03-18 17:37 GMT+01:00 Paul B Mahol <one...@gmail.com>:
>>>
>>>> On 3/18/18, Nicolas George <geo...@nsup.org> wrote:
>>>> > Martin Vignali (2018-03-18):
>>>> >> I run the test again with a bigger width (512 instead of 128)
>>>> >> This is my result :
>>>> >> shuffle_bytes_0321_c: 128.6
>>>> >> shuffle_bytes_0321_ssse3: 41.6
>>>> >> shuffle_bytes_0321_avx2: 23.4
>>>> >
>>>> > IIUC, these benchmarks are expressed in CPU cycles. But what James
>>>> > says
>>>> > is that it can cause the CPU frequency to be throttled: if that
>>>> > happens,
>>>> > less cycles can use more time, and even worse, cause other unrelated
>>>> > to
>>>> > take more time. A benchmark in actual time and typical use case would
>>>> > be
>>>> > needed to decide.
>>>>
>>>> Yes, always also test overall with typical code usecase.
>>
>> +1
>>
>>> I tested it using a "benchmark" command line, who test two shuffle func
>>> ./ffmpeg -benchmark -f lavfi -i rgbtestsrc=size=3840x2160:duration=10 -vf
>>> format=argb,format=rgba -f null -
>>>
>>> With the patch :
>>> bench: utime=3.611s
>>> With only SSSE 3 (disable AVX2 part), i have similar result.
>>
>> Indicating James' original comment that the avx2 optimization
>> makes no sense is correct?
>
> You are almost always wrong.


I tend to agree but I wonder how you know that I am wrong here:
What in above mail indicates that avx2 has an advantage over
ssse3?

Carl Eugen
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Re: [FFmpeg-devel] swscale/rgb2rgb : add X86_64 SIMD (SSSE3 and AVX2) for shuffly_bytes func

Reply via email to