On 3/18/2018 1:28 PM, Nicolas George wrote: > Martin Vignali (2018-03-18): >> I run the test again with a bigger width (512 instead of 128) >> This is my result : >> shuffle_bytes_0321_c: 128.6 >> shuffle_bytes_0321_ssse3: 41.6 >> shuffle_bytes_0321_avx2: 23.4 > > IIUC, these benchmarks are expressed in CPU cycles. But what James says > is that it can cause the CPU frequency to be throttled: if that happens, > less cycles can use more time, and even worse, cause other unrelated to > take more time. A benchmark in actual time and typical use case would be > needed to decide. > > Regards,
In any case, short of swscale being used without any decoding going on, AVX2 code is most likely going to be used and said throttling will already have taken place because countless other functions. And 2x speed up from an AVX2 version is basically the best you're going to get out of such an implementation. _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel