On Sun, Aug 18, 2024 at 11:18 AM James Almer <jamr...@gmail.com> wrote:
> On 8/17/2024 10:48 PM, Nuo Mi wrote: > > + pxor m6, m6 > > + phaddw m%2, m6 > > + phaddw m%2, m6 > > Horizonal adds are slow. Can't you do this with normal adds, shifts and > blend? > > > + vpermq m%2, m%2, q0020 > > + pshufd m%2, m%2, q1120 > > + pmovsxwd m%2, xmm%2 ; 4 sgxgy > > + > > + pmulld m%2, m11 ; 4 vx * sgxgy > > Hi James, thank you for the review > Similarly, pmulld is super slow (Ten cycles in some architectures), and > that's on top of a pmovsx. > fixed in v2 > Since you have m6 zeroed already, wouldn't pmaddwd work here? fixed > The pd_15 > and pd_m15 constants would need to be changed to words, as would the > values to be clipped. > We are clipping the dword, not a word, > > > + psrad m%2, 1 > > _______________________________________________ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > To unsubscribe, visit link above, or email > ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe". > _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".