This patch series does not attempt to change the core implementation of the iDCT.
First patch is relatively straightforward. I've only dropped the alignment on a series of jumps which I didn't see helping at all. Second patch is less, as I've also tried to reuse tables. Some of them seem to be similar to what can be found in, e.g., fdct.c. This MMX code is not compiled for ARCH_X86_64. I also decided to edit the licence header. The last 2 patches are more questionable. They attempt to merge the {put,add}_clamped and the iDCT for the SSE2 versions. This leads to little object size increase, as the iDCT was always inlined in them. To achieve this merge, ease rather than code minimization was targeted. It's roughly 10 cycles/10% gain, but that's hardly noticeable. This has been tested under Win32 and Win64, on a 140000-frames video, producing the expected CRC. The patch series passes fate's xvid-idct and xvid-custom-matrix. However, linux was not tested, and this is arguably sensitive code, so further evaluation is welcome. Christophe Gisquet (4): x86: xvid: port SSE2 idct to yasm x86: xvid_idct: port MMX IDCT to yasm x86: xvid_idct: merged idct_put SSE2 versions x86: xvid_idct: SSE2 merged add version libavcodec/x86/Makefile | 3 +- libavcodec/x86/xvididct.asm | 983 +++++++++++++++++++++++++++++++++++++++++ libavcodec/x86/xvididct_init.c | 49 +- libavcodec/x86/xvididct_mmx.c | 549 ----------------------- libavcodec/x86/xvididct_sse2.c | 406 ----------------- 5 files changed, 1024 insertions(+), 966 deletions(-) create mode 100644 libavcodec/x86/xvididct.asm delete mode 100644 libavcodec/x86/xvididct_mmx.c delete mode 100644 libavcodec/x86/xvididct_sse2.c -- 1.9.2.msysgit.0 _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel