On 9/4/2019 5:47 PM, Henrik Gramner wrote: > On Wed, Sep 4, 2019 at 10:01 PM James Almer <jamr...@gmail.com> wrote: >> On 9/4/2019 4:28 PM, Paul B Mahol wrote: >>> + vpmulld m3, m1, m0 >>> + vpaddd m1, m3, m2 >> >> pmulld m1, m0 >> paddd m1, m2 > > Could use pmaddwd instead as well, it's faster than pmulld on pretty > much every CPU. > >>> + mova m2, m4 >> >> Pointless mova. Just use m4 in the vpgatherdd below. > > No, it's required. Gathers overwrite the mask register.
Ah, my bad. > >>> + vpgatherdd m5, [srcq + m1], m2 >>> + vextracti128 xm3, m5, 1 >>> + vpshufb m1, m5, m6 >>> + vpshufb m2, m3, m6 >> >> You could make these two pshufb use xmm regs, since you don't care >> what's in the upper 128 bits. > > Or a single ymm pshufb before the vectracti128. _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".