Some more assembly for review. This time we have 10-bit h chroma functions.
The intra ones have some strange benchmark results. Overall the improvement isn't that large, particularly for the 4:2:0 intra. And for the avx version of that function it is slower than the sse2, by quite a margin. I will definitely try benchmarking it on my Nehalem after sending these emails. Suggestions greatly appreciated. James Darnley (6): avcodec/h264: mmx2, sse2, avx 10-bit h chroma deblock/loop filter avcodec/h264: clean up and expand x86 function definitions whitespace changes after last commit avcodec/h264: mmx2, sse2, avx 10-bit 4:2:2 h chroma deblock/loop filter avcodec/h264: mmx2, sse2, avx 10-bit h chroma intra deblock/loop filter avcodec/h264: mmx2, sse2, avx 10-bit 4:2:2 h chroma intra deblock/loop filter libavcodec/x86/h264_deblock_10bit.asm | 213 ++++++++++++++++++++++++++++++++++ libavcodec/x86/h264dsp_init.c | 74 ++++++++---- 2 files changed, 262 insertions(+), 25 deletions(-) -- 2.10.2 _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel