flacenc: add AVX2 version of the 32-bit LPC encoder

James Almer Mon, 27 Nov 2017 09:01:31 -0800

On 11/27/2017 1:50 PM, Henrik Gramner wrote:
> On Sun, Nov 26, 2017 at 11:51 PM, James Darnley <[email protected]> 
> wrote:
>> -pd_0_int_min: times  2 dd 0, -2147483648
>> -pq_int_min:   times  2 dq -2147483648
>> -pq_int_max:   times  2 dq  2147483647
>> +pd_0_int_min: times  4 dd 0, -2147483648
>> +pq_int_min:   times  4 dq -2147483648
>> +pq_int_max:   times  4 dq  2147483647
> 
> Using 128-bit broadcasts is preferable over duplicating the constants
> to 256-bit unless there's a good reason for doing so since it wastes
> less cache and is faster on AMD CPU:s.


What would that reason be? Afaik broadcasts are expensive, since they
both load from memory then splat data across lanes. Using them inside
loops doesn't sound like a good idea. But i guess you have more
experience testing with more varied chips than i do.

Also, by AMD cpus you mean Ryzen? Because on Bulldozer-based CPUs we
purposely disabled functions using ymm regs.
_______________________________________________
ffmpeg-devel mailing list
[email protected]
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Re: [FFmpeg-devel] [PATCH 7/8] lavc/flacenc: add AVX2 version of the 32-bit LPC encoder

Reply via email to