On 19 July 2018 at 15:52, James Darnley <jdarn...@obe.tv> wrote: > I tested the speed gains by using ffmpeg to decode a 720p yuv422p10 file > encoded > with the relevant transform. The summary is below. > > Haar > C: 119fps > SSE2: 204fps > AVX: 206fps > AVX2: 221fps > > 5_3 > C: 94fps > SSE2: 118fps > AVX2: 121fps > > 9_7 > C: 84fps > SSE2: 111fps > AVX2: 115fps > > Is the AVX worth it in Haar? Is the AVX2 worth it in the latter two? I > added > those later which is why they are separate patches. I will squash them > before > pushing if I keep them. > > James Darnley (6): > diracdec: add 10-bit Haar SIMD functions > diracdec: add 10-bit Legall 5,3 (5_3) SIMD functions > diracdec: add 10-bit Deslauriers-Dubuc 9,7 (9_7) vertical high-pass > function > diracdec: avx2 legall > diracdec: avx2 dd97 > diracdec: increase rodata alignment for avx2 > > libavcodec/dirac_dwt.c | 7 +- > libavcodec/dirac_dwt.h | 1 + > libavcodec/x86/Makefile | 6 +- > libavcodec/x86/dirac_dwt_10bit.asm | 209 +++++++++++++++++++++++++ > libavcodec/x86/dirac_dwt_init_10bit.c | 210 ++++++++++++++++++++++++++ > 5 files changed, 430 insertions(+), 3 deletions(-) > create mode 100644 libavcodec/x86/dirac_dwt_10bit.asm > create mode 100644 libavcodec/x86/dirac_dwt_init_10bit.c > > -- > 2.17.1 > > _______________________________________________ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > http://ffmpeg.org/mailman/listinfo/ffmpeg-devel >
Could you provide standard overall transform results using START/STOP_TIMER rather than overall decoding speed? Coefficients sizes and therefore golomb unpacking speed changes with respect to the transform so potentially there could be somewhat of a bottleneck on decoding before the inverse transform. _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel