On Wed, Oct 7, 2015 at 3:59 AM, Ronald S. Bultje <rsbul...@gmail.com> wrote: > diff --git a/libavcodec/x86/vp9itxfm_16bpp.asm > b/libavcodec/x86/vp9itxfm_16bpp.asm
> +%macro IADST4_12BPP_1D 0 > + pand m4, m0, [pd_3fff] > + pand m5, m1, [pd_3fff] > + psrad m0, 14 > + psrad m1, 14 > + packssdw m5, m1 > + packssdw m4, m0 > + punpckhwd m1, m4, m5 > + punpcklwd m4, m5 > + pand m5, m2, [pd_3fff] > + pand m6, m3, [pd_3fff] mova m6, [pd_3fff] > + pmaddwd m7, m5, [pw_15212_9929] > + pmaddwd m6, m4, [pw_5283_13377] > + pmaddwd m2, m3, [pw_15212_9929] > + pmaddwd m0, m1, [pw_5283_13377] mova m2, [pw_15212_9929] mova m0, [pw_5283_13377] > + pmaddwd m7, m5, [pw_m13377_13377] > + pmaddwd m2, m4, [pw_13377_0] > + pmaddwd m8, m3, [pw_m13377_13377] > + pmaddwd m9, m1, [pw_13377_0] mova m8, [pw_m13377_13377] mova m9, [pw_13377_0] > + pmaddwd m7, m5, [pw_m5283_m15212] > + pmaddwd m6, m4, [pw_9929_13377] > + pmaddwd m8, m3, [pw_m5283_m15212] > + pmaddwd m9, m1, [pw_9929_13377] mova m8, [pw_m5283_m15212] mova m9, [pw_9929_13377] > +%macro IADST4_12BPP_FN 4 > +INIT_XMM sse2 I'd use INIT_* when invoking the macro instead unless there's a reason not to > +cglobal vp9_%1_%3_4x4_add_12, 3, 3, 10, dst, stride, block, eob [...] > + paddd m0, [pd_8] > + paddd m1, [pd_8] > + paddd m2, [pd_8] > + paddd m3, [pd_8] > + psrad m0, 4 > + psrad m1, 4 > + psrad m2, 4 > + psrad m3, 4 Store [pd_8] in a register. In general SIMD code is usually not load-bound (and modern CPUs has two load units) so having redundant loads of the same value multiple times is fine, but it's often a good idea to only do a single load to a register when doing so reduces code size. _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel