Preamble: I don't see an easy way out of the issue, and this patchset has several drawbacks, so I don't mind if it is not applied.
The dsp init instanciates most widths and thus unrolls the calls. As a consequence, the object size balloons quite quickly: x86/hevc_mc.o: 115920 x86/hevcdsp_init.o: 185404 This to be compared to vp9 (albeit it probably has fewer special cases): x86/vp9mc.o: 11408 x86/vp9dsp_init.o: 25260 To reduce this, use instead "proxy" functions that will loop on calling a specific function to achieve the intended width. The current code is somewhat dirty (copypasta) and will probably make it difficult to add new instruction sets. But doing this while keeping the current code will causes an even larger increase (as experienced by having SSSE3 and SSE4 versions), which doesn't sound acceptable to me. Beside the code size reduction, it is possible (although probably difficult to measure) that that amount of code causes important cache pressure. Overall, this is more of a hackish patch. The issue looks wider to me, and probably requires a serious amount of work. Christophe Gisquet (2): x86: hevc_mc: use proxy functions x86: hevc_mt: use proxy functions for WP libavcodec/x86/hevcdsp_init.c | 998 ++++++++++++++++++++++++++++++++++-------- 1 file changed, 822 insertions(+), 176 deletions(-) -- 1.9.2.msysgit.0 _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel