On 20/08/14 4:29 AM, Christophe Gisquet wrote: > Hi, > > 2014-08-20 4:55 GMT+02:00 James Almer <jamr...@gmail.com>: >> ~15% faster than sse2 > [...] >> @@ -509,7 +509,11 @@ void ff_hevc_dsp_init_x86(HEVCDSPContext *c, const int >> bit_depth) >> if (ARCH_X86_64) { >> c->hevc_v_loop_filter_luma = >> ff_hevc_v_loop_filter_luma_8_avx; >> c->hevc_h_loop_filter_luma = >> ff_hevc_h_loop_filter_luma_8_avx; >> + >> + c->transform_add[2] = ff_hevc_transform_add16_8_avx; >> + c->transform_add[3] = ff_hevc_transform_add32_8_avx; > > Does avx => ARCH_X86_64 (didn't know) ? Otherwise the reg count seems > fine, meaning the condition is unneeded.
No, AVX does not imply x86_64. The reg count for these is currently 12 xmm regs, meaning x86_64 only. I'll send a patch to get them down to 8 or so later. > >> } >> + c->transform_add[1] = ff_hevc_transform_add8_8_avx; > > I'm not entirely sure, but this is instantiated through INIT_YMM avx2, > and I wouldn't expect performance improvement past the 3-op-form? > > So couldn't this one be instantiated to use xmm regs? (mmx may be a > burden eg need for emms and need to rewrite it). Aren't you thinking about the 10bit functions? All three AVX I'm adding here are 8bit and using xmm. There are no 8bit AVX2 functions currently. _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel