Replaced VSHUFPS with VPBLENDD to relieve port 5 bottleneck
AVX2 is 1.4x faster than AVX
---
libavcodec/v210dec.c | 10 +-
libavcodec/x86/v210-init.c | 8 +
libavcodec/x86/v210.asm| 72 +-
3 files changed, 73 insertions(+), 17 deletions(-)
di
Replaced VSHUFPS with VPBLENDD to relieve port 5 bottleneck
AVX2 is now 1.4x faster than AVX
Tested on Broadwell CPU, Ubuntu 18.10 x86_64
~/FFmpeg$ tests/checkasm/checkasm --bench --test=v210dec
benchmarking with native FFmpeg timers
nop: 94.1
checkasm: using random seed 3963743306
SSSE3:
- v210
---
libavcodec/v210dec.c | 10 +-
libavcodec/x86/v210-init.c | 8 +
libavcodec/x86/v210.asm| 63 --
3 files changed, 64 insertions(+), 17 deletions(-)
diff --git a/libavcodec/v210dec.c b/libavcodec/v210dec.c
index ddc5dbe8be..26954c0df3 10064
The AVX2 code leverages VPERMD to process 12 pixels/iteration. This is my
first patch submission so any comments are greatly appreciated.
-Mike
Tested on Skylake (Win32 & Win64)
1920x1080 input frame
=
C code - 440 fps
SSSE3 - 920 fps
AVX- 930 fps
AVX2 - 1040 fps
Reg