On Fri, Jul 16, 2021 at 4:02 PM James Almer <jamr...@gmail.com> wrote:

> On 7/16/2021 10:44 AM, Alan Kelly wrote:
> > Broadwell and later and Zen3 and later have fast gather instructions.
> > ---
> >   Haswell is now excluded from EXTERNAL_AVX2_FAST as discussed in the
> >   email thread.
>
> I was very explicit about this not being ok. We're not disabling all ymm
> usage for Haswell just for one or two swscale functions using gathers.
>
> Lets go with Lynne's latest suggestion and not change the flags at all
> and use gathers on Haswell, same as other arches, by looking at the
> AVX2_FAST flag.
>
> >   libavutil/cpu.h     |  1 +
> >   libavutil/x86/cpu.c | 11 ++++++++++-
> >   2 files changed, 11 insertions(+), 1 deletion(-)
> >
> > diff --git a/libavutil/cpu.h b/libavutil/cpu.h
> > index c069076439..ec3073d021 100644
> > --- a/libavutil/cpu.h
> > +++ b/libavutil/cpu.h
> > @@ -113,6 +113,7 @@ void av_force_cpu_count(int count);
> >    *  av_set_cpu_flags_mask(), then this function will behave as if AVX
> is not
> >    *  present.
> >    */
> > +
> >   size_t av_cpu_max_align(void);
> >
> >   #endif /* AVUTIL_CPU_H */
> > diff --git a/libavutil/x86/cpu.c b/libavutil/x86/cpu.c
> > index bcd41a50a2..158e2170c4 100644
> > --- a/libavutil/x86/cpu.c
> > +++ b/libavutil/x86/cpu.c
> > @@ -146,8 +146,17 @@ int ff_get_cpu_flags_x86(void)
> >       if (max_std_level >= 7) {
> >           cpuid(7, eax, ebx, ecx, edx);
> >   #if HAVE_AVX2
> > -        if ((rval & AV_CPU_FLAG_AVX) && (ebx & 0x00000020))
> > +        if ((rval & AV_CPU_FLAG_AVX) && (ebx & 0x00000020)){
> >               rval |= AV_CPU_FLAG_AVX2;
> > +
> > +            cpuid(1, eax, ebx, ecx, std_caps);
> > +            family = ((eax >> 8) & 0xf) + ((eax >> 20) & 0xff);
> > +            model  = ((eax >> 4) & 0xf) + ((eax >> 12) & 0xf0);
> > +            // Haswell and earlier has slow gather
> > +            if(family == 6 && model < 70)
> > +                rval |= AV_CPU_FLAG_AVXSLOW;
> > +        }
> > +
> >   #if HAVE_AVX512 /* F, CD, BW, DQ, VL */
> >           if ((xcr0_lo & 0xe0) == 0xe0) { /* OPMASK/ZMM state */
> >               if ((rval & AV_CPU_FLAG_AVX2) && (ebx & 0xd0030000) ==
> 0xd0030000)
> >
>
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
>

OK, apologies for the misunderstanding. In that case part 1 of this patch
is not required. Part two remains valid with the function protected by
EXTERNAL_AVX2_FAST. Should part 2 be re-submitted as a standalone patch or
is it OK as is?
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Reply via email to