On 20/08/14 4:29 AM, Christophe Gisquet wrote:
> Hi,
> 
> 2014-08-20 4:55 GMT+02:00 James Almer <jamr...@gmail.com>:
>> ~15% faster than sse2
> [...]
>> @@ -509,7 +509,11 @@ void ff_hevc_dsp_init_x86(HEVCDSPContext *c, const int 
>> bit_depth)
>>              if (ARCH_X86_64) {
>>                  c->hevc_v_loop_filter_luma = 
>> ff_hevc_v_loop_filter_luma_8_avx;
>>                  c->hevc_h_loop_filter_luma = 
>> ff_hevc_h_loop_filter_luma_8_avx;
>> +
>> +                c->transform_add[2]    = ff_hevc_transform_add16_8_avx;
>> +                c->transform_add[3]    = ff_hevc_transform_add32_8_avx;
> 
> Does avx => ARCH_X86_64 (didn't know) ? Otherwise the reg count seems
> fine, meaning the condition is unneeded.

No, AVX does not imply x86_64. The reg count for these is currently 12 xmm 
regs, 
meaning x86_64 only.
I'll send a patch to get them down to 8 or so later.

> 
>>              }
>> +            c->transform_add[1]    = ff_hevc_transform_add8_8_avx;
> 
> I'm not entirely sure, but this is instantiated through INIT_YMM avx2,
> and I wouldn't expect performance improvement past the 3-op-form?
> 
> So couldn't this one be instantiated to use xmm regs? (mmx may be a
> burden eg need for emms and need to rewrite it).

Aren't you thinking about the 10bit functions? All three AVX I'm adding here 
are 8bit 
and using xmm. There are no 8bit AVX2 functions currently.
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Reply via email to