On Wed, Jan 13, 2016 at 4:55 PM, James Darnley <james.darn...@gmail.com> wrote: > diff --git a/libavcodec/x86/v210enc.asm b/libavcodec/x86/v210enc.asm > index 859e2d9..a8f3d3c 100644 > --- a/libavcodec/x86/v210enc.asm > +++ b/libavcodec/x86/v210enc.asm > -cextern pb_FE > -%define v210_enc_max_8 pb_FE > +;cextern pb_FE > +local_pb_FE: times 32 db 0xfe > +%define v210_enc_max_8 local_pb_FE
You could change ff_pb_FE to be 32-byte instead of duplicating it. > +%if cpuflag(avx2) > + movu xm1, [yq+widthq*2] > + vinserti128 m1, m1, [yq+widthq*2+12], 1 > +%else > movu m1, [yq+2*widthq] > +%endif xmN can be used unconditionally which gets rid of the %else. E.g. movu xm1, [yq+widthq*2] %if cpuflag(avx2) vinserti128 m1, m1, [yq+widthq*2+12], 1 %endif > +%if cpuflag(avx2) > + movq xm3, [uq+widthq] > + movhps xm3, [vq+widthq] > + movq xm7, [uq+widthq+6] > + movhps xm7, [vq+widthq+6] > + vinserti128 m3, m3, xm7, 1 > +%else > movq m3, [uq+widthq] > movhps m3, [vq+widthq] > +%endif Ditto. Also use xm2 instead of xm7 since it's unused at this point and it avoids having to use an extra vector register in the AVX2 version. > +%if cpuflag(avx2) > + movu [dstq], xm0 > + movu [dstq+16], xm1 > + vextracti128 [dstq+32], m0, 1 > + vextracti128 [dstq+48], m1, 1 > +%else > movu [dstq], m0 > movu [dstq+mmsize], m1 > +%endif Ditto. Otherwise LGTM. _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel