Preamble: I don't see an easy way out of the issue, and this patchset
has several drawbacks, so I don't mind if it is not applied.

The dsp init instanciates most widths and thus unrolls the calls. As a
consequence, the object size balloons quite quickly:
x86/hevc_mc.o:      115920
x86/hevcdsp_init.o: 185404

This to be compared to vp9 (albeit it probably has fewer special cases):
x86/vp9mc.o:       11408
x86/vp9dsp_init.o: 25260

To reduce this, use instead "proxy" functions that will loop on calling
a specific function to achieve the intended width. The current code is
somewhat dirty (copypasta) and will probably make it difficult to add
new instruction sets. But doing this while keeping the current code
will causes an even larger increase (as experienced by having SSSE3 and
SSE4 versions), which doesn't sound acceptable to me.

Beside the code size reduction, it is possible (although probably
difficult to measure) that that amount of code causes important
cache pressure.

Overall, this is more of a hackish patch. The issue looks wider to me,
and probably requires a serious amount of work.

Christophe Gisquet (2):
  x86: hevc_mc: use proxy functions
  x86: hevc_mt: use proxy functions for WP

 libavcodec/x86/hevcdsp_init.c | 998 ++++++++++++++++++++++++++++++++++--------
 1 file changed, 822 insertions(+), 176 deletions(-)

-- 
1.9.2.msysgit.0

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Reply via email to