On 11/27/2017 2:17 PM, Martin Vignali wrote: > 2017-11-27 17:59 GMT+01:00 Henrik Gramner <hen...@gramner.com>: > >> On Sat, Nov 25, 2017 at 9:53 PM, Martin Vignali >> <martin.vign...@gmail.com> wrote: >>> Hello, >>> >>> In attach patch to convert pb_bswap32 to ymm constant >>> and remove the vbroadcasti128 part >>> >>> Speed seems to be similar to me >> >> This just wastes cache for no reason. A tiny amount, sure, but minor >> things tends to add up eventually. >> >> 128-bit broadcasts are the same speed as 256-bit loads on Intel CPU:s >> and twice as fast as 256-bit loads on AMD CPU:s. >> >> A better solution if you want to avoid ifdeffery would be to create a >> macro that uses vbroadcasti128 when mmsize == 32 and mova otherwise. >> _______________________________________________ >> >> > Hello, > > Thanks for your comments. > Do you have an idea, for the name of this macro ?
It doesn't currently exist, so look at the existing ones in x86utils.asm and add one for vbroadcasti128. > > Relative to previous patch similar to this in discussion : > avcodec/x86/exrdsp : use ymm constant for pb_80 instead of vbroadcasti128 > > Do you think, we need to not use YMM constant (declare in constants.h/c), > and convert the constantes to XMM in this file, with a vbroadcasti128 load ? There's no need to convert them back to xmm to use broadcasts. _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel