On Mon, Dec 20, 2021 at 3:53 PM James Almer <jamr...@gmail.com> wrote:
> > > On 12/20/2021 11:47 AM, Lynne wrote: > > 20 Dec 2021, 15:43 by alankelly-at-google....@ffmpeg.org: > > > >> This flag is set on Haswell and earlier and all AMD cpus. > >> --- > >> Removes unnecessary indentation, clarifies comment and only sets flag > on AMD > >> cpus with AVX2. > >> libavutil/cpu.h | 1 + > >> libavutil/x86/cpu.c | 14 +++++++++++++- > >> 2 files changed, 14 insertions(+), 1 deletion(-) > >> > >> diff --git a/libavutil/cpu.h b/libavutil/cpu.h > >> index ae443eccad..ce9bf14bf7 100644 > >> --- a/libavutil/cpu.h > >> +++ b/libavutil/cpu.h > >> @@ -54,6 +54,7 @@ > >> #define AV_CPU_FLAG_BMI1 0x20000 ///< Bit Manipulation > Instruction Set 1 > >> #define AV_CPU_FLAG_BMI2 0x40000 ///< Bit Manipulation > Instruction Set 2 > >> #define AV_CPU_FLAG_AVX512 0x100000 ///< AVX-512 functions: > requires OS support even if YMM/ZMM registers aren't used > >> +#define AV_CPU_FLAG_SLOW_GATHER 0x2000000 ///< CPU has slow gathers. > >> > >> #define AV_CPU_FLAG_ALTIVEC 0x0001 ///< standard > >> #define AV_CPU_FLAG_VSX 0x0002 ///< ISA 2.06 > >> diff --git a/libavutil/x86/cpu.c b/libavutil/x86/cpu.c > >> index bcd41a50a2..563984f234 100644 > >> --- a/libavutil/x86/cpu.c > >> +++ b/libavutil/x86/cpu.c > >> @@ -146,8 +146,16 @@ int ff_get_cpu_flags_x86(void) > >> if (max_std_level >= 7) { > >> cpuid(7, eax, ebx, ecx, edx); > >> #if HAVE_AVX2 > >> - if ((rval & AV_CPU_FLAG_AVX) && (ebx & 0x00000020)) > >> + if ((rval & AV_CPU_FLAG_AVX) && (ebx & 0x00000020)) { > >> rval |= AV_CPU_FLAG_AVX2; > >> + cpuid(1, eax, ebx, ecx, std_caps); > >> + family = ((eax >> 8) & 0xf) + ((eax >> 20) & 0xff); > >> + model = ((eax >> 4) & 0xf) + ((eax >> 12) & 0xf0); > >> + /* Haswell has slow gather */ > >> + if(family == 6 && model < 70) > >> + rval |= AV_CPU_FLAG_SLOW_GATHER; > >> + } > >> + > >> #if HAVE_AVX512 /* F, CD, BW, DQ, VL */ > >> if ((xcr0_lo & 0xe0) == 0xe0) { /* OPMASK/ZMM state */ > >> if ((rval & AV_CPU_FLAG_AVX2) && (ebx & 0xd0030000) == 0xd0030000) > >> @@ -196,6 +204,10 @@ int ff_get_cpu_flags_x86(void) > >> used unless explicitly disabled by checking AV_CPU_FLAG_AVXSLOW. */ > >> if ((family == 0x15 || family == 0x16) && (rval & AV_CPU_FLAG_AVX)) > >> rval |= AV_CPU_FLAG_AVXSLOW; > >> + > >> + /* AMD cpus have slow gather */ > >> + if(rval & AV_CPU_FLAG_AVX2) > >> + rval |= AV_CPU_FLAG_SLOW_GATHER; > >> } > >> > > > > No, I'd rather limit AMD CPUs to all currently released CPUs. > > Future ones are getting AVX512, which did speed up gathers on > > Intel CPUs, as the ISA extension extended gathers and addded > > scatters. > > I wouldn't hold my breath for that, but it's probably a good idea > anyway. A check so it's flagged only on Excavator and Zen <= 3. > > > > > Also your previous patch introduces ff_shuffle_filter_coefficients() > > which is so bad it pretty much needs a complete rewrite. > > You're also not detecting malloc errors or propagating them back. > > That's unrelated to this patch. > > > > > _______________________________________________ > > ffmpeg-devel mailing list > > ffmpeg-devel@ffmpeg.org > > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > > > To unsubscribe, visit link above, or email > > ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe". > _______________________________________________ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > To unsubscribe, visit link above, or email > ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe". > Updated patch sent with check for family <= 25 so that future CPUs will have avx2 hscale enabled by default. I may have time this week to look at ff_shuffle_filter_coefficients. _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".