On Fri, Jul 16, 2021 at 4:02 PM James Almer <jamr...@gmail.com> wrote:
> On 7/16/2021 10:44 AM, Alan Kelly wrote: > > Broadwell and later and Zen3 and later have fast gather instructions. > > --- > > Haswell is now excluded from EXTERNAL_AVX2_FAST as discussed in the > > email thread. > > I was very explicit about this not being ok. We're not disabling all ymm > usage for Haswell just for one or two swscale functions using gathers. > > Lets go with Lynne's latest suggestion and not change the flags at all > and use gathers on Haswell, same as other arches, by looking at the > AVX2_FAST flag. > > > libavutil/cpu.h | 1 + > > libavutil/x86/cpu.c | 11 ++++++++++- > > 2 files changed, 11 insertions(+), 1 deletion(-) > > > > diff --git a/libavutil/cpu.h b/libavutil/cpu.h > > index c069076439..ec3073d021 100644 > > --- a/libavutil/cpu.h > > +++ b/libavutil/cpu.h > > @@ -113,6 +113,7 @@ void av_force_cpu_count(int count); > > * av_set_cpu_flags_mask(), then this function will behave as if AVX > is not > > * present. > > */ > > + > > size_t av_cpu_max_align(void); > > > > #endif /* AVUTIL_CPU_H */ > > diff --git a/libavutil/x86/cpu.c b/libavutil/x86/cpu.c > > index bcd41a50a2..158e2170c4 100644 > > --- a/libavutil/x86/cpu.c > > +++ b/libavutil/x86/cpu.c > > @@ -146,8 +146,17 @@ int ff_get_cpu_flags_x86(void) > > if (max_std_level >= 7) { > > cpuid(7, eax, ebx, ecx, edx); > > #if HAVE_AVX2 > > - if ((rval & AV_CPU_FLAG_AVX) && (ebx & 0x00000020)) > > + if ((rval & AV_CPU_FLAG_AVX) && (ebx & 0x00000020)){ > > rval |= AV_CPU_FLAG_AVX2; > > + > > + cpuid(1, eax, ebx, ecx, std_caps); > > + family = ((eax >> 8) & 0xf) + ((eax >> 20) & 0xff); > > + model = ((eax >> 4) & 0xf) + ((eax >> 12) & 0xf0); > > + // Haswell and earlier has slow gather > > + if(family == 6 && model < 70) > > + rval |= AV_CPU_FLAG_AVXSLOW; > > + } > > + > > #if HAVE_AVX512 /* F, CD, BW, DQ, VL */ > > if ((xcr0_lo & 0xe0) == 0xe0) { /* OPMASK/ZMM state */ > > if ((rval & AV_CPU_FLAG_AVX2) && (ebx & 0xd0030000) == > 0xd0030000) > > > > _______________________________________________ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > To unsubscribe, visit link above, or email > ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe". > OK, apologies for the misunderstanding. In that case part 1 of this patch is not required. Part two remains valid with the function protected by EXTERNAL_AVX2_FAST. Should part 2 be re-submitted as a standalone patch or is it OK as is? _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".