Re: [FFmpeg-devel] [PATCH 1/2] libavutil/cpu: Adds fast gather detection.

2021-07-16 Thread James Almer
On 7/16/2021 11:46 AM, Alan Kelly wrote: On Fri, Jul 16, 2021 at 4:02 PM James Almer wrote: On 7/16/2021 10:44 AM, Alan Kelly wrote: Broadwell and later and Zen3 and later have fast gather instructions. --- Haswell is now excluded from EXTERNAL_AVX2_FAST as discussed in the email thread

Re: [FFmpeg-devel] [PATCH 1/2] libavutil/cpu: Adds fast gather detection.

2021-07-16 Thread Alan Kelly
On Fri, Jul 16, 2021 at 4:02 PM James Almer wrote: > On 7/16/2021 10:44 AM, Alan Kelly wrote: > > Broadwell and later and Zen3 and later have fast gather instructions. > > --- > > Haswell is now excluded from EXTERNAL_AVX2_FAST as discussed in the > > email thread. > > I was very explicit abo

Re: [FFmpeg-devel] [PATCH 1/2] libavutil/cpu: Adds fast gather detection.

2021-07-16 Thread James Almer
On 7/16/2021 10:44 AM, Alan Kelly wrote: Broadwell and later and Zen3 and later have fast gather instructions. --- Haswell is now excluded from EXTERNAL_AVX2_FAST as discussed in the email thread. I was very explicit about this not being ok. We're not disabling all ymm usage for Haswell ju

[FFmpeg-devel] [PATCH 1/2] libavutil/cpu: Adds fast gather detection.

2021-07-16 Thread Alan Kelly
Broadwell and later and Zen3 and later have fast gather instructions. --- Haswell is now excluded from EXTERNAL_AVX2_FAST as discussed in the email thread. libavutil/cpu.h | 1 + libavutil/x86/cpu.c | 11 ++- 2 files changed, 11 insertions(+), 1 deletion(-) diff --git a/libavutil/c

Re: [FFmpeg-devel] [PATCH 1/2] libavutil/cpu: Adds fast gather detection.

2021-07-12 Thread Lynne
12 Jul 2021, 13:53 by jamr...@gmail.com: > On 7/12/2021 7:46 AM, Lynne wrote: > >> 12 Jul 2021, 11:29 by alankelly-at-google@ffmpeg.org: >> >>> On Fri, Jun 25, 2021 at 1:24 PM Alan Kelly wrote: >>> On Fri, Jun 25, 2021 at 10:40 AM Lynne wrote: > Jun 25, 2021, 09:54 by alankelly

Re: [FFmpeg-devel] [PATCH 1/2] libavutil/cpu: Adds fast gather detection.

2021-07-12 Thread James Almer
On 7/12/2021 7:46 AM, Lynne wrote: 12 Jul 2021, 11:29 by alankelly-at-google@ffmpeg.org: On Fri, Jun 25, 2021 at 1:24 PM Alan Kelly wrote: On Fri, Jun 25, 2021 at 10:40 AM Lynne wrote: Jun 25, 2021, 09:54 by alankelly-at-google@ffmpeg.org: Broadwell and later and Zen3 and later

Re: [FFmpeg-devel] [PATCH 1/2] libavutil/cpu: Adds fast gather detection.

2021-07-12 Thread Lynne
12 Jul 2021, 11:29 by alankelly-at-google@ffmpeg.org: > On Fri, Jun 25, 2021 at 1:24 PM Alan Kelly wrote: > >> On Fri, Jun 25, 2021 at 10:40 AM Lynne wrote: >> >>> Jun 25, 2021, 09:54 by alankelly-at-google@ffmpeg.org: >>> >>> > Broadwell and later and Zen3 and later have fast gather ins

Re: [FFmpeg-devel] [PATCH 1/2] libavutil/cpu: Adds fast gather detection.

2021-07-12 Thread Alan Kelly
On Fri, Jun 25, 2021 at 1:24 PM Alan Kelly wrote: > On Fri, Jun 25, 2021 at 10:40 AM Lynne wrote: > >> Jun 25, 2021, 09:54 by alankelly-at-google@ffmpeg.org: >> >> > Broadwell and later and Zen3 and later have fast gather instructions. >> > --- >> > Gather requires between 9 and 12 cycles o

Re: [FFmpeg-devel] [PATCH 1/2] libavutil/cpu: Adds fast gather detection.

2021-06-25 Thread Alan Kelly
On Fri, Jun 25, 2021 at 10:40 AM Lynne wrote: > Jun 25, 2021, 09:54 by alankelly-at-google@ffmpeg.org: > > > Broadwell and later and Zen3 and later have fast gather instructions. > > --- > > Gather requires between 9 and 12 cycles on Haswell, 5 to 7 on Broadwell, > > and 2 to 5 on Skylake a

Re: [FFmpeg-devel] [PATCH 1/2] libavutil/cpu: Adds fast gather detection.

2021-06-25 Thread Lynne
Jun 25, 2021, 09:54 by alankelly-at-google@ffmpeg.org: > Broadwell and later and Zen3 and later have fast gather instructions. > --- > Gather requires between 9 and 12 cycles on Haswell, 5 to 7 on Broadwell, > and 2 to 5 on Skylake and newer. It is also slow on AMD before Zen 3. > libavutil

[FFmpeg-devel] [PATCH 1/2] libavutil/cpu: Adds fast gather detection.

2021-06-25 Thread Alan Kelly
Broadwell and later and Zen3 and later have fast gather instructions. --- Gather requires between 9 and 12 cycles on Haswell, 5 to 7 on Broadwell, and 2 to 5 on Skylake and newer. It is also slow on AMD before Zen 3. libavutil/cpu.h | 2 ++ libavutil/x86/cpu.c | 18 -- libav