On 25/01/15 10:11 AM, Christophe Gisquet wrote:
> Hi,
> 
> 2015-01-25 2:05 GMT+01:00 James Almer <jamr...@gmail.com>:
>> 2 to 2.5 times faster.
>>
>> Signed-off-by: James Almer <jamr...@gmail.com>
>> ---
>>  libavcodec/x86/sbrdsp.asm    | 114 
>> +++++++++++++++++++++++++++++++++++++++++++
> 
> Not the first time that I notice that, but memmoves are often
> suboptimal using old SSE ones.
> While movlhps is fine, movlps isn't, on my old core i5. You may want
> to validate this with the attached patch, where storing ps_mask3 in m8
> is a gain in Win64 (the gain does not match the number of loops, but
> it is still there).

I can reproduce the gains using mov{q,sd} instead of movlps, but not with the 
mask loaded into m8 (Tested on win64 using a k10 cpu and linux x64 using a 
Haswell cpu).

> 
> Benchmarks:
> x64:  6023 decicycles in g, 262108 runs, 36 skips
> SSE:  3049 decicycles in g, 262130 runs, 14 skips
> SSE3: 2843 decicycles in g, 262086 runs, 58 skips
> movq: 2693 decicycles in g, 262117 runs, 27 skips
> m8:   2648 decicycles in g, 262083 runs, 61 skips
> 
> Thanks for doing it, I had only 3yo scraps left and no further
> motivation to tackle the start/tail parts.

I applied the first part for now.

Thanks.

> 
> 
> 
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> 

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Reply via email to