On Thu, Dec 14, 2017 at 11:16 AM, Martin Vignali <martin.vign...@gmail.com> wrote: > 2017-12-13 17:37 GMT+01:00 Henrik Gramner <hen...@gramner.com>: >> You could also do vextracti128 + 128-bit packuswb instead of 256-bit >> packuswb + vpermq. >> > Sorry don't understand this part > do you mean 128 bit packuswb + movh for each lane ? > or something else ?
packuswb m0, m0 vpermq m0, m0, q3120 vs. vextracti128 xm1, m0, 1 packuswb xm0, xm1 Uses a 128-bit op instead of a 256-bit one which is generally preferable whenever possible. _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel