I think I have reached the final state for these patches. There has been little change to the 1st, 3rd, 4th, and 5th.
The 2nd adds an option to explicitly control what the macro does after the IDCT. This allows the small optimisation for 8-bit of not storing the data back to the source block. The 6th lets the IDCT use the slightly different coefficients to get exact output compared with the MMX original. This is rather messy but I think it is slightly better than trying to alter the code macro. A word diff looks much cleaner than the line diff git uses by default. If people would kindly give their opinion on the 2nd and 6th patches in particular I would greatly appreciate it. Performance gain decoding an MPEG2 HD sample over the old MMX: - Yorkfield: 210 to 224 fps - Haswell: 387 to 426 fps Would anyone like me to get some timer figures for the functions themselves? James Darnley (6): avcodec/x86: cleanup simple_idct10 avcodec/x86: modify simple_idct10 macros to add an action paramter avcodec/x86: add x86-64 8-bit simple_idct function avcodec/x86: add x86-64 8-bit simple_idct put function avcodec/x86: add x86-64 8-bit simple_idct add function avcodec/x86: allow 8-bit simple_idct to use slightly different coefficients libavcodec/tests/x86/dct.c | 2 + libavcodec/x86/idctdsp_init.c | 23 +++++ libavcodec/x86/proresdsp.asm | 22 ++--- libavcodec/x86/simple_idct.h | 9 ++ libavcodec/x86/simple_idct10.asm | 139 ++++++++++++++++++++++++++---- libavcodec/x86/simple_idct10_template.asm | 136 ++++++++++++++++------------- 6 files changed, 244 insertions(+), 87 deletions(-) -- 2.13.0 _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel