vvcdec: inter, add optical flow avx2 code

Nuo Mi Tue, 20 Aug 2024 06:25:42 -0700

On Sun, Aug 18, 2024 at 11:18 AM James Almer <jamr...@gmail.com> wrote:


> On 8/17/2024 10:48 PM, Nuo Mi wrote:
> > +    pxor                    m6, m6
> > +    phaddw                 m%2, m6
> > +    phaddw                 m%2, m6
>
> Horizonal adds are slow. Can't you do this with normal adds, shifts and
> blend?
>
> > +    vpermq                 m%2, m%2, q0020
> > +    pshufd                 m%2, m%2, q1120
> > +    pmovsxwd               m%2, xmm%2               ; 4 sgxgy
> > +
> > +    pmulld                 m%2, m11                 ; 4 vx * sgxgy
>
> Hi James,
thank you for the review

> Similarly, pmulld is super slow (Ten cycles in some architectures), and
> that's on top of a pmovsx.
>
fixed in v2

> Since you have m6 zeroed already, wouldn't pmaddwd work here?

fixed

> The pd_15
> and pd_m15 constants would need to be changed to words, as would the
> values to be clipped.
>
We are clipping the dword,  not a word,

>
> > +    psrad                  m%2, 1
>
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
>
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH 3/4] x86/vvcdec: inter, add optical flow avx2 code

Reply via email to