On 23/08/14 10:22 AM, Christophe Gisquet wrote:
> The only sse4 instruction is pextrw, which is used on rather minor
> functions for small blocks. Therefore use whichever GPR is available
> to extract the output word.
> 
> Before (sse4), for block_w == 6:
> 4627 decicycles in epel_uni, 16377 runs, 7 skips
> 7422 decicycles in epel_bi, 65501 runs, 35 skips
> 
> After:
> 4649 decicycles in epel_uni, 16371 runs, 13 skips
> 7432 decicycles in epel_bi, 65505 runs, 31 skips
> ---
>  libavcodec/x86/hevc_mc.asm    |  63 +++--
>  libavcodec/x86/hevcdsp.h      |  48 ++--
>  libavcodec/x86/hevcdsp_init.c | 522 
> +++++++++++++++++++++---------------------
>  3 files changed, 323 insertions(+), 310 deletions(-)
> 
> diff --git a/libavcodec/x86/hevc_mc.asm b/libavcodec/x86/hevc_mc.asm
> index e2236ec..eb61b18 100644
> --- a/libavcodec/x86/hevc_mc.asm
> +++ b/libavcodec/x86/hevc_mc.asm
> @@ -52,9 +52,9 @@ hevc_epel_filters_%4_%1 times %2 d%3 -2, 58
>  
>  
>  
> -EPEL_TABLE  8, 8, b, sse4
> -EPEL_TABLE 10, 4, w, sse4
> -EPEL_TABLE 12, 4, w, sse4
> +EPEL_TABLE  8, 8, b, ssse3
> +EPEL_TABLE 10, 4, w, ssse3
> +EPEL_TABLE 12, 4, w, ssse3
>  
>  %macro QPEL_TABLE 4
>  hevc_qpel_filters_%4_%1 times %2 d%3  -1,  4
> @@ -71,13 +71,13 @@ hevc_qpel_filters_%4_%1 times %2 d%3  -1,  4
>                          times %2 d%3   4, -1
>  %endmacro
>  
> -QPEL_TABLE  8, 8, b, sse4
> -QPEL_TABLE 10, 4, w, sse4
> -QPEL_TABLE 12, 4, w, sse4
> +QPEL_TABLE  8, 8, b, ssse3
> +QPEL_TABLE 10, 4, w, ssse3
> +QPEL_TABLE 12, 4, w, ssse3

Do these need to be duplicated? You could just remove the suffix and let 
every version of the function use the same tables.
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Reply via email to