On 23/08/14 10:22 AM, Christophe Gisquet wrote: > The only sse4 instruction is pextrw, which is used on rather minor > functions for small blocks. Therefore use whichever GPR is available > to extract the output word. > > Before (sse4), for block_w == 6: > 4627 decicycles in epel_uni, 16377 runs, 7 skips > 7422 decicycles in epel_bi, 65501 runs, 35 skips > > After: > 4649 decicycles in epel_uni, 16371 runs, 13 skips > 7432 decicycles in epel_bi, 65505 runs, 31 skips > --- > libavcodec/x86/hevc_mc.asm | 63 +++-- > libavcodec/x86/hevcdsp.h | 48 ++-- > libavcodec/x86/hevcdsp_init.c | 522 > +++++++++++++++++++++--------------------- > 3 files changed, 323 insertions(+), 310 deletions(-) > > diff --git a/libavcodec/x86/hevc_mc.asm b/libavcodec/x86/hevc_mc.asm > index e2236ec..eb61b18 100644 > --- a/libavcodec/x86/hevc_mc.asm > +++ b/libavcodec/x86/hevc_mc.asm > @@ -52,9 +52,9 @@ hevc_epel_filters_%4_%1 times %2 d%3 -2, 58 > > > > -EPEL_TABLE 8, 8, b, sse4 > -EPEL_TABLE 10, 4, w, sse4 > -EPEL_TABLE 12, 4, w, sse4 > +EPEL_TABLE 8, 8, b, ssse3 > +EPEL_TABLE 10, 4, w, ssse3 > +EPEL_TABLE 12, 4, w, ssse3 > > %macro QPEL_TABLE 4 > hevc_qpel_filters_%4_%1 times %2 d%3 -1, 4 > @@ -71,13 +71,13 @@ hevc_qpel_filters_%4_%1 times %2 d%3 -1, 4 > times %2 d%3 4, -1 > %endmacro > > -QPEL_TABLE 8, 8, b, sse4 > -QPEL_TABLE 10, 4, w, sse4 > -QPEL_TABLE 12, 4, w, sse4 > +QPEL_TABLE 8, 8, b, ssse3 > +QPEL_TABLE 10, 4, w, ssse3 > +QPEL_TABLE 12, 4, w, ssse3
Do these need to be duplicated? You could just remove the suffix and let every version of the function use the same tables. _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel