On Sat, Nov 25, 2017 at 9:53 PM, Martin Vignali <martin.vign...@gmail.com> wrote: > Hello, > > In attach patch to convert pb_bswap32 to ymm constant > and remove the vbroadcasti128 part > > Speed seems to be similar to me
This just wastes cache for no reason. A tiny amount, sure, but minor things tends to add up eventually. 128-bit broadcasts are the same speed as 256-bit loads on Intel CPU:s and twice as fast as 256-bit loads on AMD CPU:s. A better solution if you want to avoid ifdeffery would be to create a macro that uses vbroadcasti128 when mmsize == 32 and mova otherwise. _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel