I tested the speed gains by using ffmpeg to decode a 720p yuv422p10 file encoded with the relevant transform. The summary is below.
Haar C: 119fps SSE2: 204fps AVX: 206fps AVX2: 221fps 5_3 C: 94fps SSE2: 118fps AVX2: 121fps 9_7 C: 84fps SSE2: 111fps AVX2: 115fps Is the AVX worth it in Haar? Is the AVX2 worth it in the latter two? I added those later which is why they are separate patches. I will squash them before pushing if I keep them. James Darnley (6): diracdec: add 10-bit Haar SIMD functions diracdec: add 10-bit Legall 5,3 (5_3) SIMD functions diracdec: add 10-bit Deslauriers-Dubuc 9,7 (9_7) vertical high-pass function diracdec: avx2 legall diracdec: avx2 dd97 diracdec: increase rodata alignment for avx2 libavcodec/dirac_dwt.c | 7 +- libavcodec/dirac_dwt.h | 1 + libavcodec/x86/Makefile | 6 +- libavcodec/x86/dirac_dwt_10bit.asm | 209 +++++++++++++++++++++++++ libavcodec/x86/dirac_dwt_init_10bit.c | 210 ++++++++++++++++++++++++++ 5 files changed, 430 insertions(+), 3 deletions(-) create mode 100644 libavcodec/x86/dirac_dwt_10bit.asm create mode 100644 libavcodec/x86/dirac_dwt_init_10bit.c -- 2.17.1 _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel