As far as I can see, the only reason those functions are SSE4 is because of the pextrw needed for the following block widths: - 2, used only by chroma; - 6, used by chroma and indirectly by luma; - 12, used by both. The better solution would be to convert all chroma handling to NV12, but it is vastly simpler to modify the above cases to not use pextrw.
This is done in 2 steps: - Fix width of 12 to do 8+4 instead of 6+6; - Modify the store macros for width 2 and 6 by passing data through a GPR (alas at the cost for some functions of a supplementary GPR). Christophe Gisquet (2): x86: hevc_mc: split differently calls x86: hevc_mc: convert to ssse3 libavcodec/x86/hevc_mc.asm | 63 +++-- libavcodec/x86/hevcdsp.h | 48 ++-- libavcodec/x86/hevcdsp_init.c | 561 ++++++++++++++++++++++-------------------- 3 files changed, 362 insertions(+), 310 deletions(-) -- 1.9.2.msysgit.0 _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel