Re: [FFmpeg-devel] [PATCH] x86/hecv_res_add: add ff_hevc_transform_add{8, 16, 32}_8_avx

Christophe Gisquet Wed, 20 Aug 2014 00:30:07 -0700

Hi,

2014-08-20 4:55 GMT+02:00 James Almer <jamr...@gmail.com>:
> ~15% faster than sse2
[...]
> @@ -509,7 +509,11 @@ void ff_hevc_dsp_init_x86(HEVCDSPContext *c, const int 
> bit_depth)
>              if (ARCH_X86_64) {
>                  c->hevc_v_loop_filter_luma = 
> ff_hevc_v_loop_filter_luma_8_avx;
>                  c->hevc_h_loop_filter_luma = 
> ff_hevc_h_loop_filter_luma_8_avx;
> +
> +                c->transform_add[2]    = ff_hevc_transform_add16_8_avx;
> +                c->transform_add[3]    = ff_hevc_transform_add32_8_avx;


Does avx => ARCH_X86_64 (didn't know) ? Otherwise the reg count seems
fine, meaning the condition is unneeded.

>              }
> +            c->transform_add[1]    = ff_hevc_transform_add8_8_avx;

I'm not entirely sure, but this is instantiated through INIT_YMM avx2,
and I wouldn't expect performance improvement past the 3-op-form?

So couldn't this one be instantiated to use xmm regs? (mmx may be a
burden eg need for emms and need to rewrite it).

-- 
Christophe
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Re: [FFmpeg-devel] [PATCH] x86/hecv_res_add: add ff_hevc_transform_add{8, 16, 32}_8_avx

Reply via email to