Hi, 2014-08-20 4:55 GMT+02:00 James Almer <jamr...@gmail.com>: > ~15% faster than sse2 [...] > @@ -509,7 +509,11 @@ void ff_hevc_dsp_init_x86(HEVCDSPContext *c, const int > bit_depth) > if (ARCH_X86_64) { > c->hevc_v_loop_filter_luma = > ff_hevc_v_loop_filter_luma_8_avx; > c->hevc_h_loop_filter_luma = > ff_hevc_h_loop_filter_luma_8_avx; > + > + c->transform_add[2] = ff_hevc_transform_add16_8_avx; > + c->transform_add[3] = ff_hevc_transform_add32_8_avx;
Does avx => ARCH_X86_64 (didn't know) ? Otherwise the reg count seems fine, meaning the condition is unneeded. > } > + c->transform_add[1] = ff_hevc_transform_add8_8_avx; I'm not entirely sure, but this is instantiated through INIT_YMM avx2, and I wouldn't expect performance improvement past the 3-op-form? So couldn't this one be instantiated to use xmm regs? (mmx may be a burden eg need for emms and need to rewrite it). -- Christophe _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel