On Mon, Nov 16, 2020 at 11:03 AM Alan Kelly <alankelly-at-google....@ffmpeg.org> wrote: > +cglobal yuv2yuvX, 6, 7, 16, filter, filterSize, dest, dstW, dither, offset, > src Only 8 xmm registers are used, so 8 should be used instead of 16 here. Otherwise it causes unnecessary spilling of registers on 64-bit Windows.
> +%if ARCH_X86_64 > +%define ptr_size 8 [...] > +%else > +%define ptr_size 4 The predefined variable gprsize already exists for this purpose, so that can be used instead. > + movq xmm3, [ditherq] If vpbroadcastq m3, [ditherq] is used for AVX2 here, then the following > + vperm2i128 m3, m3, m3, 0 instruction can be eliminated. > + punpcklwd m1, m1 > + punpckldq m1, m1 Can be replaced with pshuflw m1, m1, q0000 >+ mov srcq, [filterSizeq] >+ test srcd, srcd test srcq, srcq should be used here, since the lower 32 bits of a valid pointer could randomly happen to be zero on a 64-bit system. > + REP_RET Since non-temporal stores are being used, this should be replaced with sfence RET to guarantee proper memory ordering semantics in multi-threaded use cases. Things will usually work fine without it, but may potentially break in "fun to debug" ways. _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".