On Tue, Oct 6, 2015 at 3:43 PM, Ronald S. Bultje <rsbul...@gmail.com> wrote: > --- > libavcodec/x86/Makefile | 1 + > libavcodec/x86/vp9dsp_init.c | 4 +- > libavcodec/x86/vp9dsp_init.h | 15 ++-- > libavcodec/x86/vp9dsp_init_16bpp_template.c | 14 +++- > libavcodec/x86/vp9itxfm.asm | 16 +---- > libavcodec/x86/vp9itxfm_16bpp.asm | 108 > ++++++++++++++++++++++++++++ > libavcodec/x86/vp9itxfm_template.asm | 37 ++++++++++ > 7 files changed, 173 insertions(+), 22 deletions(-) > create mode 100644 libavcodec/x86/vp9itxfm_16bpp.asm > create mode 100644 libavcodec/x86/vp9itxfm_template.asm
Did you look into using SSE2 instead? That would eliminate instructions in some parts but might make other parts more complex. Note that some MMX instructions only has half the throughput of equivalent SSE/AVX ones in Skylake (and most likely future Intel µarchs as well). _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel