On Wed, Jul 13, 2016 at 6:37 PM, Ronald S. Bultje <rsbul...@gmail.com> wrote: > +cglobal vp9_idct_idct_32x32_add, 4, 9, 16, 2048, dst, stride, block, eob [...] > + movd xm0, [blockq] > + mova m1, [pw_11585x2] > + pmulhrsw m0, m1 > + pmulhrsw m0, m1 > + vpbroadcastw m0, xm0 > + pmulhrsw m0, [pw_512]
The vpbroadcastw could be done from memory in the beginning which would get rid of the movd. Is it mathematically possible to merge consecutive pmulhrsw instructions into a single one using a different constant? I'm guessing no, but I'm not sure. [...] > + ; at the end of the loop, m7 should still be zero > + ; use that to zero out block coefficients > + ZERO_BLOCK blockq, 64, 16, m1 comment says m7, code says m1. [...] > + ; at the end of the loop, m7 should still be zero > + ; use that to zero out block coefficients > + ZERO_BLOCK blockq, 64, 32, m1 Ditto. _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel