Re: [FFmpeg-devel] [PATCH 1/3] swscale/x86/swscale: Process yuv2yuvX tails using next largest register size

2023-09-06 Thread Alan Kelly via ffmpeg-devel
On Tue, Sep 5, 2023 at 12:03 AM Michael Niedermayer wrote: > On Mon, Sep 04, 2023 at 02:30:00PM +0200, Alan Kelly via ffmpeg-devel > wrote: > > Hi, > > > > Any issues with this patch or can it be merged? > > are all cases covered by tests ? > if yes and the te

[FFmpeg-devel] [PATCH 2/2] swscale/x86/yuv2yuvX: Process tails by jumping back into the main loop.

2023-09-06 Thread Alan Kelly via ffmpeg-devel
--- libswscale/x86/swscale.c| 19 --- libswscale/x86/yuv2yuvX.asm | 24 ++-- 2 files changed, 26 insertions(+), 17 deletions(-) diff --git a/libswscale/x86/swscale.c b/libswscale/x86/swscale.c index 00e42b4bec..6980002e9e 100644 --- a/libswscale/x86/swscale

[FFmpeg-devel] [PATCH 1/2] swscale/x86/yuv2yuvX: Add yuv2yuvX avx512

2023-09-06 Thread Alan Kelly via ffmpeg-devel
--- libswscale/x86/swscale.c| 7 +++ libswscale/x86/yuv2yuvX.asm | 19 ++- 2 files changed, 25 insertions(+), 1 deletion(-) diff --git a/libswscale/x86/swscale.c b/libswscale/x86/swscale.c index ff16398988..00e42b4bec 100644 --- a/libswscale/x86/swscale.c +++ b/libswscale

Re: [FFmpeg-devel] [PATCH 1/3] swscale/x86/swscale: Process yuv2yuvX tails using next largest register size

2023-09-04 Thread Alan Kelly via ffmpeg-devel
Hi, Any issues with this patch or can it be merged? Thanks, Alan On Fri, Jul 14, 2023 at 12:08 PM Alan Kelly wrote: > --- > libswscale/x86/swscale.c | 8 > 1 file changed, 4 insertions(+), 4 deletions(-) > > diff --git a/libswscale/x86/swscale.c b/libswscale/x86/sw

[FFmpeg-devel] [PATCH 3/3] swscale/x86/yuv2yuvX: Process tails by jumping back into the main loop.

2023-07-17 Thread Alan Kelly
--- libswscale/x86/swscale.c| 11 --- libswscale/x86/yuv2yuvX.asm | 24 ++-- 2 files changed, 22 insertions(+), 13 deletions(-) diff --git a/libswscale/x86/swscale.c b/libswscale/x86/swscale.c index 600c7d6c91..6980002e9e 100644 --- a/libswscale/x86/swscale.c +++ b

Re: [FFmpeg-devel] [PATCH 3/3] swscale/x86/yuv2yuvX: Process tails by jumping back into the main loop.

2023-07-17 Thread Alan Kelly
On Sat, Jul 15, 2023 at 10:40 PM Michael Niedermayer wrote: > On Fri, Jul 14, 2023 at 12:08:46PM +0200, Alan Kelly wrote: > > --- > > libswscale/x86/swscale.c| 11 --- > > libswscale/x86/yuv2yuvX.asm | 12 ++-- > > 2 files changed, 14 insertions(+

[FFmpeg-devel] [PATCH 2/3] swscale/x86/yuv2yuvX: Add yuv2yuvX avx512

2023-07-17 Thread Alan Kelly
--- Checks for EXTERNAL_AVX512ICL to prevent downclocking on Skylake libswscale/x86/swscale.c| 7 +++ libswscale/x86/yuv2yuvX.asm | 19 ++- 2 files changed, 25 insertions(+), 1 deletion(-) diff --git a/libswscale/x86/swscale.c b/libswscale/x86/swscale.c index 8c67bf4fab.

Re: [FFmpeg-devel] [PATCH 2/3] swscale/x86/yuv2yuvX: Add yuv2yuvX avx512

2023-07-17 Thread Alan Kelly
Happy to add the check. Thanks, Alan On Fri, Jul 14, 2023 at 4:59 PM James Almer wrote: > On 7/14/2023 11:57 AM, Kieran Kunhya wrote: > > On Fri, 14 Jul 2023 at 14:03, James Almer wrote: > > > >> On 7/14/2023 9:59 AM, Kieran Kunhya wrote: > +#if ARCH_X86_64 && HAVE_AVX512_EXTERNAL >

[FFmpeg-devel] [PATCH 3/3] swscale/x86/yuv2yuvX: Process tails by jumping back into the main loop.

2023-07-14 Thread Alan Kelly
--- libswscale/x86/swscale.c| 11 --- libswscale/x86/yuv2yuvX.asm | 12 ++-- 2 files changed, 14 insertions(+), 9 deletions(-) diff --git a/libswscale/x86/swscale.c b/libswscale/x86/swscale.c index 52423a1199..71434f58d3 100644 --- a/libswscale/x86/swscale.c +++ b/libswscale/x

[FFmpeg-devel] [PATCH 2/3] swscale/x86/yuv2yuvX: Add yuv2yuvX avx512

2023-07-14 Thread Alan Kelly
--- libswscale/x86/swscale.c| 7 +++ libswscale/x86/yuv2yuvX.asm | 19 ++- 2 files changed, 25 insertions(+), 1 deletion(-) diff --git a/libswscale/x86/swscale.c b/libswscale/x86/swscale.c index 8c67bf4fab..52423a1199 100644 --- a/libswscale/x86/swscale.c +++ b/libswscale

[FFmpeg-devel] [PATCH 1/3] swscale/x86/swscale: Process yuv2yuvX tails using next largest register size

2023-07-14 Thread Alan Kelly
--- libswscale/x86/swscale.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/libswscale/x86/swscale.c b/libswscale/x86/swscale.c index ff16398988..8c67bf4fab 100644 --- a/libswscale/x86/swscale.c +++ b/libswscale/x86/swscale.c @@ -194,7 +194,7 @@ static void yuv2yuvX_ #

Re: [FFmpeg-devel] [PATCH] sws: Don't compile yuv2yuvX for mmx

2022-08-19 Thread Alan Kelly
Thanks for doing this! On Fri, Aug 19, 2022 at 10:53 AM Andreas Rheinhardt < andreas.rheinha...@outlook.com> wrote: > Alan Kelly: > > --- > > libswscale/x86/yuv2yuvX.asm | 2 -- > > 1 file changed, 2 deletions(-) > > > > diff --git a/libswscale/x86/yuv

[FFmpeg-devel] [PATCH] sws: Don't compile yuv2yuvX for mmx

2022-08-19 Thread Alan Kelly
--- libswscale/x86/yuv2yuvX.asm | 2 -- 1 file changed, 2 deletions(-) diff --git a/libswscale/x86/yuv2yuvX.asm b/libswscale/x86/yuv2yuvX.asm index b6294cb919..d5b03495fd 100644 --- a/libswscale/x86/yuv2yuvX.asm +++ b/libswscale/x86/yuv2yuvX.asm @@ -124,8 +124,6 @@ cglobal yuv2yuvX, 7, 7, 8, filt

Re: [FFmpeg-devel] [PATCH v2] checkasm: sw_scale: Produce more realistic test filter coefficients for yuv2yuvX

2022-08-18 Thread Alan Kelly
Thanks Martin for doing this. On Thu, Aug 18, 2022 at 10:16 AM Martin Storsjö wrote: > This avoids triggering overflows in the filters, and avoids stray > test failures in the approximate functions on x86; due to rounding > differences, one implementation might overflow while another one > doesn

[FFmpeg-devel] [PATCH] sws: Replace call to yuv2yuvX_mmx by yuv2yuvX_mmxext

2022-08-17 Thread Alan Kelly
--- Remove yuv2yuvX_mmx as it is no longer used. libswscale/x86/swscale.c | 7 ++- 1 file changed, 2 insertions(+), 5 deletions(-) diff --git a/libswscale/x86/swscale.c b/libswscale/x86/swscale.c index 32d441245d..89ef9f5d2b 100644 --- a/libswscale/x86/swscale.c +++ b/libswscale/x86/swscale.

[FFmpeg-devel] [PATCH] sws: Replace call to yuv2yuvX_mmx by yuv2yuvX_mmxext

2022-08-17 Thread Alan Kelly
--- Call yuv2yuvX_mmxext on line 208 also. libswscale/x86/swscale.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/libswscale/x86/swscale.c b/libswscale/x86/swscale.c index 32d441245d..e0f90d5c58 100644 --- a/libswscale/x86/swscale.c +++ b/libswscale/x86/swscale.c @@ -205

[FFmpeg-devel] [PATCH] sws: Replace call to yuv2yuvX_mmx by yuv2yuvX_mmxext

2022-08-17 Thread Alan Kelly
--- libswscale/x86/swscale.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/libswscale/x86/swscale.c b/libswscale/x86/swscale.c index 32d441245d..881a4b7798 100644 --- a/libswscale/x86/swscale.c +++ b/libswscale/x86/swscale.c @@ -211,7 +211,7 @@ static void yuv2yuvX_ ##opt(con

Re: [FFmpeg-devel] [PATCH v2 4/5] libswscale: Enable hscale_avx2 for all input sizes.

2022-08-15 Thread Alan Kelly
Hi Michael, Is there anything blocking this change being applied? Is there anything I can do to help? Thanks, Alan On Mon, Jul 18, 2022 at 6:49 PM Michael Niedermayer wrote: > On Mon, Jul 18, 2022 at 09:54:39AM +0200, Alan Kelly wrote: > > Hi Michael, > > > > I have t

Re: [FFmpeg-devel] [PATCH v2 4/5] libswscale: Enable hscale_avx2 for all input sizes.

2022-07-18 Thread Alan Kelly
Sat, Jul 16, 2022 at 1:14 PM Michael Niedermayer wrote: > On Fri, Jul 15, 2022 at 05:03:56PM +0200, Alan Kelly wrote: > > Hi Michael, > > > > Thanks for looking at this. I fixed the test issue. > > seems to be still failing here: > make distclean ; ./configure &am

Re: [FFmpeg-devel] [PATCH v2 4/5] libswscale: Enable hscale_avx2 for all input sizes.

2022-07-15 Thread Alan Kelly
Hi Michael, Thanks for looking at this. I fixed the test issue. Alan On Fri, Jul 15, 2022 at 4:59 PM Alan Kelly wrote: > ff_shuffle_filter_coefficients shuffles the tail as required. > --- > libswscale/utils.c| 19 --- > libswscale/x86/swscale.c | 6 ++-

[FFmpeg-devel] [PATCH v2 5/5] checkasm/sw_scale: hscale does not requires cpuflag test.

2022-07-15 Thread Alan Kelly
This is done in ff_shuffle_filter_coefficients. --- tests/checkasm/sw_scale.c | 5 + 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/tests/checkasm/sw_scale.c b/tests/checkasm/sw_scale.c index 798990a6cf..7be107bef1 100644 --- a/tests/checkasm/sw_scale.c +++ b/tests/checkasm/sw_sc

[FFmpeg-devel] [PATCH v2 4/5] libswscale: Enable hscale_avx2 for all input sizes.

2022-07-15 Thread Alan Kelly
ff_shuffle_filter_coefficients shuffles the tail as required. --- libswscale/utils.c| 19 --- libswscale/x86/swscale.c | 6 ++ tests/checkasm/sw_scale.c | 2 +- 3 files changed, 19 insertions(+), 8 deletions(-) diff --git a/libswscale/utils.c b/libswscale/utils.c in

Re: [FFmpeg-devel] [PATCH v2 4/5] libswscale: Enable hscale_avx2 for all input sizes.

2022-07-13 Thread Alan Kelly
Pushing this back up to the top. This is required to enable the previous patch in this chain. Thanks On Fri, Apr 22, 2022 at 10:04 AM Alan Kelly wrote: > Ping! > > On Thu, Feb 17, 2022 at 11:04 AM Alan Kelly wrote: > >> ff_shuffle_filter_coefficients shuffles th

Re: [FFmpeg-devel] [PATCH v2 3/5] libswscale: Avx2 hscale can process inputs of any size.

2022-07-13 Thread Alan Kelly
Hi, Are there any further comments on this patch or can it be committed? Thanks, Alan On Tue, Apr 26, 2022 at 10:00 AM Alan Kelly wrote: > The main loop processes blocks of 16 pixels. The tail processes blocks > of size 4. > --- > libswscale/x86/scale_a

[FFmpeg-devel] [PATCH v2 3/5] libswscale: Avx2 hscale can process inputs of any size.

2022-04-26 Thread Alan Kelly
The main loop processes blocks of 16 pixels. The tail processes blocks of size 4. --- libswscale/x86/scale_avx2.asm | 44 ++- 1 file changed, 43 insertions(+), 1 deletion(-) diff --git a/libswscale/x86/scale_avx2.asm b/libswscale/x86/scale_avx2.asm index 20acdbd633

Re: [FFmpeg-devel] [PATCH v2 3/5] libswscale: Avx2 hscale can process inputs of any size.

2022-04-26 Thread Alan Kelly
filter size of 4 is processed 35% faster (506 vs 771). Thanks for the tip on countq, one add has been removed from each loop. Alan On Fri, Apr 22, 2022 at 7:43 PM Michael Niedermayer wrote: > On Thu, Feb 17, 2022 at 11:04:04AM +0100, Alan Kelly wrote: > > The main loop processes blo

Re: [FFmpeg-devel] [PATCH v2 4/5] libswscale: Enable hscale_avx2 for all input sizes.

2022-04-22 Thread Alan Kelly
Ping! On Thu, Feb 17, 2022 at 11:04 AM Alan Kelly wrote: > ff_shuffle_filter_coefficients shuffles the tail as required. > --- > libswscale/utils.c | 19 --- > libswscale/x86/swscale.c | 6 ++ > 2 files changed, 18 insertions(+), 7 deletions(-) &

Re: [FFmpeg-devel] [PATCH v2 3/5] libswscale: Avx2 hscale can process inputs of any size.

2022-04-22 Thread Alan Kelly
Hi, Is anyone interested in this patch? This makes AVX2 hscale work on all input sizes. Thanks, Alan On Mon, Mar 7, 2022 at 4:27 PM Alan Kelly wrote: > Hi Michael, > > Thanks for reviewing the first two parts of this patchset. > > Is there anybody interested in revi

Re: [FFmpeg-devel] [PATCH v2 3/5] libswscale: Avx2 hscale can process inputs of any size.

2022-03-07 Thread Alan Kelly
Hi Michael, Thanks for reviewing the first two parts of this patchset. Is there anybody interested in reviewing this part? Thanks, Alan On Thu, Feb 17, 2022 at 5:21 PM Michael Niedermayer wrote: > On Thu, Feb 17, 2022 at 11:04:04AM +0100, Alan Kelly wrote: > > The main loop process

[FFmpeg-devel] [PATCH v2 5/5] checkasm/sw_scale: hscale does not requires cpuflag test.

2022-02-17 Thread Alan Kelly
This is done in ff_shuffle_filter_coefficients. --- tests/checkasm/sw_scale.c | 5 + 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/tests/checkasm/sw_scale.c b/tests/checkasm/sw_scale.c index 3c0a083b42..4c57b6a372 100644 --- a/tests/checkasm/sw_scale.c +++ b/tests/checkasm/sw_sc

[FFmpeg-devel] [PATCH v2 4/5] libswscale: Enable hscale_avx2 for all input sizes.

2022-02-17 Thread Alan Kelly
ff_shuffle_filter_coefficients shuffles the tail as required. --- libswscale/utils.c | 19 --- libswscale/x86/swscale.c | 6 ++ 2 files changed, 18 insertions(+), 7 deletions(-) diff --git a/libswscale/utils.c b/libswscale/utils.c index 7c8e1bbdde..d818c9ce55 100644 ---

[FFmpeg-devel] [PATCH v2 3/5] libswscale: Avx2 hscale can process inputs of any size.

2022-02-17 Thread Alan Kelly
The main loop processes blocks of 16 pixels. The tail processes blocks of size 4. --- libswscale/x86/scale_avx2.asm | 48 +-- 1 file changed, 46 insertions(+), 2 deletions(-) diff --git a/libswscale/x86/scale_avx2.asm b/libswscale/x86/scale_avx2.asm index 20acdbd63

[FFmpeg-devel] [PATCH v2 2/5] libswscale: Re-factor ff_shuffle_filter_coefficients.

2022-02-17 Thread Alan Kelly
Make the code more readable and follow the style guide. --- libswscale/utils.c | 66 +- 1 file changed, 36 insertions(+), 30 deletions(-) diff --git a/libswscale/utils.c b/libswscale/utils.c index 344c87dfdf..7c8e1bbdde 100644 --- a/libswscale/utils.c +

[FFmpeg-devel] [PATCH v2 1/5] libswscale: Check and propagate memory allocation errors from ff_shuffle_filter_coefficients.

2022-02-17 Thread Alan Kelly
--- libswscale/swscale_internal.h | 2 +- libswscale/utils.c| 11 --- 2 files changed, 9 insertions(+), 4 deletions(-) diff --git a/libswscale/swscale_internal.h b/libswscale/swscale_internal.h index 3a78d95ba6..26d28d42e6 100644 --- a/libswscale/swscale_internal.h +++ b/libs

Re: [FFmpeg-devel] [PATCH 1/4] libswscale: Re-factor ff_shuffle_filter_coefficients.

2022-02-09 Thread Alan Kelly
:11 PM Michael Niedermayer wrote: > On Mon, Jan 10, 2022 at 03:58:33PM +0100, Alan Kelly wrote: > > Make the code more readable, follow the style guide and propagate memory > > allocation errors. > > Cosmetics and bugfixes should not be in the same patch > &

[FFmpeg-devel] [PATCH 5/5] checkasm/sw_scale: hscale does not requires cpuflag test.

2022-02-09 Thread Alan Kelly
This is done in ff_shuffle_filter_coefficients. --- tests/checkasm/sw_scale.c | 6 ++ 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/tests/checkasm/sw_scale.c b/tests/checkasm/sw_scale.c index 3c0a083b42..e7f916d3a8 100644 --- a/tests/checkasm/sw_scale.c +++ b/tests/checkasm/sw_

[FFmpeg-devel] [PATCH 4/5] libswscale: Propagate error codes from ff_shuffle_filter_coefficients

2022-02-09 Thread Alan Kelly
--- libswscale/swscale_internal.h | 2 +- libswscale/utils.c| 14 -- 2 files changed, 9 insertions(+), 7 deletions(-) diff --git a/libswscale/swscale_internal.h b/libswscale/swscale_internal.h index 3a78d95ba6..26d28d42e6 100644 --- a/libswscale/swscale_internal.h +++ b/l

[FFmpeg-devel] [PATCH 3/5] libswscale: Enable hscale_avx2 for all input sizes.

2022-02-09 Thread Alan Kelly
ff_shuffle_filter_coefficients shuffles the tail as required. --- libswscale/utils.c | 19 --- libswscale/x86/swscale.c | 6 ++ 2 files changed, 18 insertions(+), 7 deletions(-) diff --git a/libswscale/utils.c b/libswscale/utils.c index 1d919e863a..31c365fcee 100644 ---

[FFmpeg-devel] [PATCH 2/5] libswscale: Avx2 hscale can process inputs of any size.

2022-02-09 Thread Alan Kelly
The main loop processes blocks of 16 pixels. The tail processes blocks of size 4. --- libswscale/x86/scale_avx2.asm | 48 +-- 1 file changed, 46 insertions(+), 2 deletions(-) diff --git a/libswscale/x86/scale_avx2.asm b/libswscale/x86/scale_avx2.asm index 20acdbd63

[FFmpeg-devel] [PATCH 1/5] libswscale: Re-factor ff_shuffle_filter_coefficients.

2022-02-09 Thread Alan Kelly
Make the code more readable and follow the style guide. --- libswscale/utils.c | 64 +++--- 1 file changed, 37 insertions(+), 27 deletions(-) diff --git a/libswscale/utils.c b/libswscale/utils.c index c5ea8853d5..1d919e863a 100644 --- a/libswscale/utils.c +

Re: [FFmpeg-devel] [PATCH 1/4] libswscale: Re-factor ff_shuffle_filter_coefficients.

2022-02-02 Thread Alan Kelly
Hi, Is anybody interested in this patch set? Thanks! On Mon, Jan 10, 2022, 15:58 Alan Kelly wrote: > Make the code more readable, follow the style guide and propagate memory > allocation errors. > --- > libswscale/swscale_internal.h | 2 +- > libswscale/utils.c

[FFmpeg-devel] [PATCH 4/4] checkasm/sw_scale: hscale does not requires cpuflag test.

2022-01-10 Thread Alan Kelly
This is done in ff_shuffle_filter_coefficients. --- tests/checkasm/sw_scale.c | 6 ++ 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/tests/checkasm/sw_scale.c b/tests/checkasm/sw_scale.c index 3c0a083b42..e7f916d3a8 100644 --- a/tests/checkasm/sw_scale.c +++ b/tests/checkasm/sw_

[FFmpeg-devel] [PATCH 3/4] libswscale: Enable hscale_avx2 for input sizes which ar emultiples of 4.

2022-01-10 Thread Alan Kelly
ff_shuffle_filter_coefficients shuffles the tail as required. --- libswscale/utils.c | 17 +++-- libswscale/x86/swscale.c | 4 ++-- 2 files changed, 17 insertions(+), 4 deletions(-) diff --git a/libswscale/utils.c b/libswscale/utils.c index 52f07e1661..7e1e9c3834 100644 --- a/l

[FFmpeg-devel] [PATCH 2/4] libswscale: Avx2 hscale can process any input of size which is a multiple of 4.

2022-01-10 Thread Alan Kelly
The main loop processes blocks of 16 pixels. The tail processes blocks of size 4. --- libswscale/x86/scale_avx2.asm | 48 +-- 1 file changed, 46 insertions(+), 2 deletions(-) diff --git a/libswscale/x86/scale_avx2.asm b/libswscale/x86/scale_avx2.asm index 20acdbd63

[FFmpeg-devel] [PATCH 1/4] libswscale: Re-factor ff_shuffle_filter_coefficients.

2022-01-10 Thread Alan Kelly
Make the code more readable, follow the style guide and propagate memory allocation errors. --- libswscale/swscale_internal.h | 2 +- libswscale/utils.c| 68 --- 2 files changed, 40 insertions(+), 30 deletions(-) diff --git a/libswscale/swscale_interna

[FFmpeg-devel] [PATCH 1/2] libavutil/cpu: Add AV_CPU_FLAG_SLOW_GATHER.

2021-12-21 Thread Alan Kelly
This flag is set on Haswell and earlier and all AMD cpus. --- Checks for family for Haswell. All checks are done where AVX2 flag is set as this is clearer. libavutil/cpu.h | 1 + libavutil/x86/cpu.c | 15 ++- 2 files changed, 15 insertions(+), 1 deletion(-) diff --git a/libavut

Re: [FFmpeg-devel] [PATCH 1/2] libavutil/cpu: Add AV_CPU_FLAG_SLOW_GATHER.

2021-12-20 Thread Alan Kelly
On Mon, Dec 20, 2021 at 3:53 PM James Almer wrote: > > > On 12/20/2021 11:47 AM, Lynne wrote: > > 20 Dec 2021, 15:43 by alankelly-at-google@ffmpeg.org: > > > >> This flag is set on Haswell and earlier and all AMD cpus. > >> --- > >> Removes unnecessary indentation, clarifies comment and onl

[FFmpeg-devel] [PATCH 1/2] libavutil/cpu: Add AV_CPU_FLAG_SLOW_GATHER.

2021-12-20 Thread Alan Kelly
This flag is set on Haswell and earlier and all AMD cpus. --- Sets this flag on Zen 3 and earlier. libavutil/cpu.h | 1 + libavutil/x86/cpu.c | 14 +- 2 files changed, 14 insertions(+), 1 deletion(-) diff --git a/libavutil/cpu.h b/libavutil/cpu.h index ae443eccad..ce9bf14bf7 100

[FFmpeg-devel] [PATCH 2/2] libswscale: Test AV_CPU_FLAG_SLOW_GATHER for hscale functions.

2021-12-20 Thread Alan Kelly
This is instead of EXTERNAL_AVX2_FAST so that the avx2 hscale functions are only used where they are faster. --- Whoops! Corrects check so that this flag is only enabled where fast avx2 and fast gathers are available. libswscale/utils.c| 2 +- libswscale/x86/swscale.c | 2 +- tests/chec

[FFmpeg-devel] [PATCH 1/2] libavutil/cpu: Add AV_CPU_FLAG_SLOW_GATHER.

2021-12-20 Thread Alan Kelly
This flag is set on Haswell and earlier and all AMD cpus. --- Removes unnecessary indentation, clarifies comment and only sets flag on AMD cpus with AVX2. libavutil/cpu.h | 1 + libavutil/x86/cpu.c | 14 +- 2 files changed, 14 insertions(+), 1 deletion(-) diff --git a/libavutil

[FFmpeg-devel] [PATCH 2/2] libswscale: Test AV_CPU_FLAG_SLOW_GATHER for hscale functions.

2021-12-20 Thread Alan Kelly
This is instead of EXTERNAL_AVX2_FAST so that the avx2 hscale functions are only used where they are faster. --- libswscale/utils.c| 2 +- libswscale/x86/swscale.c | 2 +- tests/checkasm/sw_scale.c | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) diff --git a/libswscale/utils.c b

[FFmpeg-devel] [PATCH 1/2] libavutil/cpu: Add AV_CPU_FLAG_SLOW_GATHER.

2021-12-20 Thread Alan Kelly
This flag is set on Haswell and earlier and all AMD cpus. --- As discussed on IRC last week. libavutil/cpu.h | 57 +++-- libavutil/x86/cpu.c | 13 ++- 2 files changed, 41 insertions(+), 29 deletions(-) diff --git a/libavutil/cpu.h b/libavutil/c

[FFmpeg-devel] [PATCH] x86/scale_avx2: Change asm indent from 2 to 4 spaces.

2021-12-16 Thread Alan Kelly
--- libswscale/x86/scale_avx2.asm | 96 +-- 1 file changed, 48 insertions(+), 48 deletions(-) diff --git a/libswscale/x86/scale_avx2.asm b/libswscale/x86/scale_avx2.asm index 2cd7e968d3..eb472db12f 100644 --- a/libswscale/x86/scale_avx2.asm +++ b/libswscale/x86/sca

Re: [FFmpeg-devel] [PATCH] x86/swscale: fix minor coding style issues

2021-12-16 Thread Alan Kelly
Thanks Lynne for the patch. On Thu, Dec 16, 2021 at 5:05 PM Alan Kelly wrote: > --- > libswscale/x86/swscale.c | 14 +++--- > tests/checkasm/sw_scale.c | 3 +-- > 2 files changed, 8 insertions(+), 9 deletions(-) > > diff --git a/libswscale/x86/swscale.c b/libsws

[FFmpeg-devel] [PATCH] x86/swscale: fix minor coding style issues

2021-12-16 Thread Alan Kelly
--- libswscale/x86/swscale.c | 14 +++--- tests/checkasm/sw_scale.c | 3 +-- 2 files changed, 8 insertions(+), 9 deletions(-) diff --git a/libswscale/x86/swscale.c b/libswscale/x86/swscale.c index 164b06d6ba..c49a05c37b 100644 --- a/libswscale/x86/swscale.c +++ b/libswscale/x86/swscale.

[FFmpeg-devel] [PATCH 2/2] libswscale: Adds ff_hscale8to15_4_avx2 and ff_hscale8to15_X4_avx2 for all filter sizes.

2021-12-15 Thread Alan Kelly
Fixes so that fate under 64 bit Windows passes. These functions replace all ff_hscale8to15_*_ssse3 when avx2 is available. --- libswscale/swscale_internal.h | 2 + libswscale/utils.c| 37 +++ libswscale/x86/Makefile | 1 + libswscale/x86/scale_avx2.asm | 112 +++

Re: [FFmpeg-devel] [PATCH 2/2] libswscale: Adds ff_hscale8to15_4_avx2 and ff_hscale8to15_X4_avx2 for all filter sizes.

2021-12-15 Thread Alan Kelly
On Tue, Dec 14, 2021 at 6:07 PM James Almer wrote: > On 12/14/2021 12:23 PM, Alan Kelly wrote: > > Patch has been rebased from latest commits. > > These functions replace all ff_hscale8to15_*_ssse3 when avx2 is > available. > > --- > > libswscale/swscale_inter

[FFmpeg-devel] [PATCH 2/2] libswscale: Adds ff_hscale8to15_4_avx2 and ff_hscale8to15_X4_avx2 for all filter sizes.

2021-12-14 Thread Alan Kelly
Patch has been rebased from latest commits. These functions replace all ff_hscale8to15_*_ssse3 when avx2 is available. --- libswscale/swscale_internal.h | 2 + libswscale/utils.c| 37 +++ libswscale/x86/Makefile | 1 + libswscale/x86/scale_avx2.asm | 112

Re: [FFmpeg-devel] [PATCH 2/2] libswscale: Adds ff_hscale8to15_4_avx2 and ff_hscale8to15_X4_avx2 for all filter sizes.

2021-07-26 Thread Alan Kelly
On Wed, Jul 21, 2021 at 11:11 AM Alan Kelly wrote: > > > On Fri, Jul 16, 2021 at 3:48 PM Alan Kelly wrote: > >> These functions replace all ff_hscale8to15_*_ssse3 when avx2 is available. >> --- >> EXTERNAL_AVX2_FAST is now used instead of EXTERNAL_AVX2_FAST_GATHE

Re: [FFmpeg-devel] [PATCH 2/2] libswscale: Adds ff_hscale8to15_4_avx2 and ff_hscale8to15_X4_avx2 for all filter sizes.

2021-07-21 Thread Alan Kelly
On Fri, Jul 16, 2021 at 3:48 PM Alan Kelly wrote: > These functions replace all ff_hscale8to15_*_ssse3 when avx2 is available. > --- > EXTERNAL_AVX2_FAST is now used instead of EXTERNAL_AVX2_FAST_GATHER as > discussed in the email thread for part 1 of this patch. > > Benchmark

Re: [FFmpeg-devel] [PATCH 1/2] libavutil/cpu: Adds fast gather detection.

2021-07-16 Thread Alan Kelly
On Fri, Jul 16, 2021 at 4:02 PM James Almer wrote: > On 7/16/2021 10:44 AM, Alan Kelly wrote: > > Broadwell and later and Zen3 and later have fast gather instructions. > > --- > > Haswell is now excluded from EXTERNAL_AVX2_FAST as discussed in the > > email thre

[FFmpeg-devel] [PATCH 2/2] libswscale: Adds ff_hscale8to15_4_avx2 and ff_hscale8to15_X4_avx2 for all filter sizes.

2021-07-16 Thread Alan Kelly
These functions replace all ff_hscale8to15_*_ssse3 when avx2 is available. --- EXTERNAL_AVX2_FAST is now used instead of EXTERNAL_AVX2_FAST_GATHER as discussed in the email thread for part 1 of this patch. Benchmark results on Skylake and Haswell: Skylake Haswell h

[FFmpeg-devel] [PATCH 1/2] libavutil/cpu: Adds fast gather detection.

2021-07-16 Thread Alan Kelly
Broadwell and later and Zen3 and later have fast gather instructions. --- Haswell is now excluded from EXTERNAL_AVX2_FAST as discussed in the email thread. libavutil/cpu.h | 1 + libavutil/x86/cpu.c | 11 ++- 2 files changed, 11 insertions(+), 1 deletion(-) diff --git a/libavutil/c

Re: [FFmpeg-devel] [PATCH 1/2] libavutil/cpu: Adds fast gather detection.

2021-07-12 Thread Alan Kelly
On Fri, Jun 25, 2021 at 1:24 PM Alan Kelly wrote: > On Fri, Jun 25, 2021 at 10:40 AM Lynne wrote: > >> Jun 25, 2021, 09:54 by alankelly-at-google@ffmpeg.org: >> >> > Broadwell and later and Zen3 and later have fast gather instructions. >> > --- >>

Re: [FFmpeg-devel] [PATCH 2/2] libswscale: Adds ff_hscale8to15_4_avx2 and ff_hscale8to15_X4_avx2 for all filter sizes.

2021-06-25 Thread Alan Kelly
On Fri, Jun 25, 2021 at 1:26 PM Ronald S. Bultje wrote: > Hi Alan, > > On Fri, Jun 25, 2021 at 3:59 AM Alan Kelly < > alankelly-at-google@ffmpeg.org> wrote: > >> These functions replace all ff_hscale8to15_*_ssse3 when avx2 is available. >> > > Re-asking

Re: [FFmpeg-devel] [PATCH 1/2] libavutil/cpu: Adds fast gather detection.

2021-06-25 Thread Alan Kelly
On Fri, Jun 25, 2021 at 10:40 AM Lynne wrote: > Jun 25, 2021, 09:54 by alankelly-at-google@ffmpeg.org: > > > Broadwell and later and Zen3 and later have fast gather instructions. > > --- > > Gather requires between 9 and 12 cycles on Haswell, 5 to 7 on Broadwell, > > and 2 to 5 on Skylake a

[FFmpeg-devel] [PATCH 2/2] libswscale: Adds ff_hscale8to15_4_avx2 and ff_hscale8to15_X4_avx2 for all filter sizes.

2021-06-25 Thread Alan Kelly
These functions replace all ff_hscale8to15_*_ssse3 when avx2 is available. --- libswscale/swscale_internal.h | 2 + libswscale/utils.c| 37 +++ libswscale/x86/Makefile | 1 + libswscale/x86/scale_avx2.asm | 112 ++ libswscale/x86/swsca

[FFmpeg-devel] [PATCH 1/2] libavutil/cpu: Adds fast gather detection.

2021-06-25 Thread Alan Kelly
Broadwell and later and Zen3 and later have fast gather instructions. --- Gather requires between 9 and 12 cycles on Haswell, 5 to 7 on Broadwell, and 2 to 5 on Skylake and newer. It is also slow on AMD before Zen 3. libavutil/cpu.h | 2 ++ libavutil/x86/cpu.c | 18 -- libav

Re: [FFmpeg-devel] [PATCH 1/2] libavutil/cpu: Adds av_cpu_has_fast_gather to detect cpus with avx fast gather instruction

2021-06-24 Thread Alan Kelly
je wrote: > > Hi Alan, > > > > On Mon, Jun 14, 2021 at 7:20 AM Alan Kelly < > > alankelly-at-google@ffmpeg.org> wrote: > > > >> Broadwell and later have fast gather instructions. > >> --- > >> This is so that the avx2 version of ff

[FFmpeg-devel] [PATCH 2/2] libswscale: Adds ff_hscale8to15_4_avx2 and ff_hscale8to15_X4_avx2 for all filter sizes.

2021-06-14 Thread Alan Kelly
These functions replace all ff_hscale8to15_*_ssse3 when avx2 is available. --- libswscale/swscale_internal.h | 2 + libswscale/utils.c| 37 +++ libswscale/x86/Makefile | 1 + libswscale/x86/scale_avx2.asm | 112 ++ libswscale/x86/swsca

[FFmpeg-devel] [PATCH 1/2] libavutil/cpu: Adds av_cpu_has_fast_gather to detect cpus with avx fast gather instruction

2021-06-14 Thread Alan Kelly
Broadwell and later have fast gather instructions. --- This is so that the avx2 version of ff_hscale8to15X which uses gather instructions is only selected on machines where it will actually be faster. libavutil/cpu.c | 6 ++ libavutil/cpu.h | 6 ++ libavutil/cpu_inte

[FFmpeg-devel] [PATCH 1/3] libswscale/x86/yuv2yuvX: Removes unrolling for mmx and mmxext

2021-04-01 Thread Alan Kelly
--- This is so that inputs of size 8 are supported, as was the case with the original implementation. A bug was found with inputs not divisible by 16. libswscale/x86/yuv2yuvX.asm | 14 +- 1 file changed, 13 insertions(+), 1 deletion(-) diff --git a/libswscale/x86/yuv2yuvX.asm b/lib

[FFmpeg-devel] [PATCH 3/3] tests/checkasm/sw_scale: adds additional tests sizes for yux2yuvX

2021-04-01 Thread Alan Kelly
--- tests/checkasm/sw_scale.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/tests/checkasm/sw_scale.c b/tests/checkasm/sw_scale.c index a10118704b..3ac0f9082f 100644 --- a/tests/checkasm/sw_scale.c +++ b/tests/checkasm/sw_scale.c @@ -68,8 +68,8 @@ static void check_yuv2

[FFmpeg-devel] [PATCH 2/3] libswscale/x86/swscale: Only call ff_yuv2yuvX functions if the input size is > 0

2021-04-01 Thread Alan Kelly
--- libswscale/x86/swscale.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/libswscale/x86/swscale.c b/libswscale/x86/swscale.c index cc9e8b0155..0848a31461 100644 --- a/libswscale/x86/swscale.c +++ b/libswscale/x86/swscale.c @@ -197,7 +197,8 @@ static void yuv2yuvX_ ##o

[FFmpeg-devel] [PATCH 1/3] libswscale/x86/yuv2yuvX: Removes unrolling for mmx and mmxext

2021-02-23 Thread Alan Kelly
--- This is so that tails of size 8 may safely be processed libswscale/x86/yuv2yuvX.asm | 14 +- 1 file changed, 13 insertions(+), 1 deletion(-) diff --git a/libswscale/x86/yuv2yuvX.asm b/libswscale/x86/yuv2yuvX.asm index 521880dabe..b6294cb919 100644 --- a/libswscale/x86/yuv2yuvX.as

[FFmpeg-devel] [PATCH 3/3] tests/checkasm/sw_scale: adds additional tests sizes for yux2yuvX

2021-02-23 Thread Alan Kelly
--- tests/checkasm/sw_scale.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/tests/checkasm/sw_scale.c b/tests/checkasm/sw_scale.c index a10118704b..3ac0f9082f 100644 --- a/tests/checkasm/sw_scale.c +++ b/tests/checkasm/sw_scale.c @@ -68,8 +68,8 @@ static void check_yuv2

[FFmpeg-devel] [PATCH 2/3] libswscale/x86/swscale: Only call ff_yuv2yuvX functions if the input size is > 0

2021-02-23 Thread Alan Kelly
--- libswscale/x86/swscale.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/libswscale/x86/swscale.c b/libswscale/x86/swscale.c index 1e865914cb..71961a9ae0 100644 --- a/libswscale/x86/swscale.c +++ b/libswscale/x86/swscale.c @@ -206,7 +206,8 @@ static void yuv2yuvX_ ##o

[FFmpeg-devel] [PATCH 1/2] tests/checkasm/sw_scale.c

2021-02-19 Thread Alan Kelly
Initialises each item in src and filter arrays to fix valgrind uninitialised value warning. --- casts pointers to uint8_t* and multiplies the buffer size by sizeof(uint16_t). tests/checkasm/sw_scale.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/tests/checkasm/sw_scale.

Re: [FFmpeg-devel] [PATCH] Moves yuv2yuvX_sse3 to yasm, unrolls main loop and other small optimizations for ~20% speedup.

2021-02-19 Thread Alan Kelly
b94cd55155d8c061f1e1faca9076afe540149c27 as the problematic commit. On Thu, Feb 18, 2021 at 11:23 PM James Almer wrote: > On 2/17/2021 5:24 PM, Paul B Mahol wrote: > > On Tue, Feb 16, 2021 at 6:31 PM Alan Kelly < > > alankelly-at-google@ffmpeg.org> wrote: > > > >> Looks like there are n

[FFmpeg-devel] [PATCH 2/2] tests/checkasm/sw_scale.c

2021-02-19 Thread Alan Kelly
Checks av_mallocs --- tests/checkasm/sw_scale.c | 4 1 file changed, 4 insertions(+) diff --git a/tests/checkasm/sw_scale.c b/tests/checkasm/sw_scale.c index a4866723d7..ef414c0a82 100644 --- a/tests/checkasm/sw_scale.c +++ b/tests/checkasm/sw_scale.c @@ -103,7 +103,11 @@ static void check_y

[FFmpeg-devel] [PATCH 1/2] tests/checkasm/sw_scale.c

2021-02-19 Thread Alan Kelly
Initialises each item in src and filter arrays to fix valgrind uninitialised value warning. --- tests/checkasm/sw_scale.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/tests/checkasm/sw_scale.c b/tests/checkasm/sw_scale.c index 7504f8b45f..a4866723d7 100644 --- a/tests/

Re: [FFmpeg-devel] [PATCH] Moves yuv2yuvX_sse3 to yasm, unrolls main loop and other small optimizations for ~20% speedup.

2021-02-16 Thread Alan Kelly
Looks like there are no comments, is this OK to be applied? Thanks On Tue, Feb 9, 2021 at 6:25 PM Paul B Mahol wrote: > Will apply in no comments. > ___ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > https://ffmpeg.org/mailman/listinfo/ffmpeg-

Re: [FFmpeg-devel] [PATCH] Moves yuv2yuvX_sse3 to yasm, unrolls main loop and other small optimizations for ~20% speedup.

2021-02-09 Thread Alan Kelly
Ping! On Thu, Jan 14, 2021 at 3:47 PM Alan Kelly wrote: > --- > Replaces cpuflag(mmx) with notcpuflag(sse3) for store macro > Tests for multiple sizes in checkasm-sw_scale > checkasm-sw_scale aligns memory on 8 bytes instad of 32 to catch aligned > loads > libsw

[FFmpeg-devel] [PATCH] Moves yuv2yuvX_sse3 to yasm, unrolls main loop and other small optimizations for ~20% speedup.

2021-01-14 Thread Alan Kelly
--- Replaces cpuflag(mmx) with notcpuflag(sse3) for store macro Tests for multiple sizes in checkasm-sw_scale checkasm-sw_scale aligns memory on 8 bytes instad of 32 to catch aligned loads libswscale/x86/Makefile | 1 + libswscale/x86/swscale.c | 130 ---

Re: [FFmpeg-devel] [PATCH] Moves yuv2yuvX_sse3 to yasm, unrolls main loop and other small optimizations for ~20% speedup.

2021-01-14 Thread Alan Kelly
32 so that the test catches problems with alignment. On Thu, Jan 14, 2021 at 1:11 AM Michael Niedermayer wrote: > On Mon, Jan 11, 2021 at 05:46:31PM +0100, Alan Kelly wrote: > > --- > > Fixes a bug where if there is no offset and a tail which is not > processed by the > >

[FFmpeg-devel] [PATCH] Moves yuv2yuvX_sse3 to yasm, unrolls main loop and other small optimizations for ~20% speedup.

2021-01-11 Thread Alan Kelly
--- Fixes a bug where if there is no offset and a tail which is not processed by the sse3/avx2 version the dither is modified Deletes mmx/mmxext yuv2yuvX version from swscale_template and adds it to yuv2yuvX.asm to reduce code duplication and so that it may be used to process the tail from th

Re: [FFmpeg-devel] [PATCH] Moves yuv2yuvX_sse3 to yasm, unrolls main loop and other small optimizations for ~20% speedup.

2021-01-11 Thread Alan Kelly
on a solution. On Sun, Jan 10, 2021 at 4:26 PM Michael Niedermayer wrote: > On Thu, Jan 07, 2021 at 10:41:19AM +0100, Alan Kelly wrote: > > --- > > Replaces mova with movdqu due to alignment issues > > libswscale/x86/Makefile | 1 + > > l

Re: [FFmpeg-devel] [PATCH] Moves yuv2yuvX_sse3 to yasm, unrolls main loop and other small optimizations for ~20% speedup.

2021-01-07 Thread Alan Kelly
Thanks for your patience with this, I have replaced mova with movdqu - movu generated a compile error on ssse3. What system did this crash on? On Wed, Jan 6, 2021 at 9:10 PM Michael Niedermayer wrote: > On Tue, Jan 05, 2021 at 01:31:25PM +0100, Alan Kelly wrote: > > Ping! > >

[FFmpeg-devel] [PATCH] Moves yuv2yuvX_sse3 to yasm, unrolls main loop and other small optimizations for ~20% speedup.

2021-01-07 Thread Alan Kelly
--- Replaces mova with movdqu due to alignment issues libswscale/x86/Makefile | 1 + libswscale/x86/swscale.c| 106 +--- libswscale/x86/yuv2yuvX.asm | 117 tests/checkasm/sw_scale.c | 98 ++

Re: [FFmpeg-devel] [PATCH] Moves yuv2yuvX_sse3 to yasm, unrolls main loop and other small optimizations for ~20% speedup.

2021-01-05 Thread Alan Kelly
Ping! On Thu, Dec 17, 2020 at 11:42 AM Alan Kelly wrote: > --- > Fixes memory alignment problem in checkasm-sw_scale > Tested on Linux 32 and 64 bit and mingw32 > libswscale/x86/Makefile | 1 + > libswscale/x86/swscale.c| 106 +--- &g

[FFmpeg-devel] [PATCH] Moves yuv2yuvX_sse3 to yasm, unrolls main loop and other small optimizations for ~20% speedup.

2020-12-17 Thread Alan Kelly
--- Fixes memory alignment problem in checkasm-sw_scale Tested on Linux 32 and 64 bit and mingw32 libswscale/x86/Makefile | 1 + libswscale/x86/swscale.c| 106 +--- libswscale/x86/yuv2yuvX.asm | 117 tests/checkasm/sw_sca

[FFmpeg-devel] [PATCH] Moves yuv2yuvX_sse3 to yasm, unrolls main loop and other small optimizations for ~20% speedup.

2020-12-10 Thread Alan Kelly
--- Replaces ff_sws_init_swscale_x86 with ff_getSwsFunc Load offset if not gprsize but 8 on both 32 and 64 bit Removes sfence as NT store no longer used libswscale/x86/Makefile | 1 + libswscale/x86/swscale.c| 106 +--- libswscale/x86/yuv2yuvX.asm | 117 +++

Re: [FFmpeg-devel] [PATCH] Moves yuv2yuvX_sse3 to yasm, unrolls main loop and other small optimizations for ~20% speedup.

2020-12-09 Thread Alan Kelly
good reason. If you think it better to use NT stores, I will replace them. On Fri, Dec 4, 2020 at 2:00 PM Anton Khirnov wrote: > Quoting Alan Kelly (2020-11-19 09:41:56) > > --- > > All of Henrik's suggestions have been implemented. Additionally, > > m3 and m6 are per

[FFmpeg-devel] [PATCH] Moves yuv2yuvX_sse3 to yasm, unrolls main loop and other small optimizations for ~20% speedup.

2020-12-09 Thread Alan Kelly
--- Activates avx2 version of yuv2yuvX Adds checkasm for yuv2yuvX Modifies ff_yuv2yuvX_* signature to match yuv2yuvX_* Replaces non-temporal stores with temporal stores libswscale/x86/Makefile | 1 + libswscale/x86/swscale.c| 106 +--- libswscale/x86/yuv2y

Re: [FFmpeg-devel] [PATCH] Moves yuv2yuvX_sse3 to yasm, unrolls main loop and other small optimizations for ~20% speedup.

2020-12-01 Thread Alan Kelly
Ping On Thu, Nov 19, 2020 at 9:42 AM Alan Kelly wrote: > --- > All of Henrik's suggestions have been implemented. Additionally, > m3 and m6 are permuted in avx2 before storing to ensure bit by bit > identical results in avx2. > libswscale/x86/Makefile | 1 + > l

[FFmpeg-devel] [PATCH] Moves yuv2yuvX_sse3 to yasm, unrolls main loop and other small optimizations for ~20% speedup.

2020-11-19 Thread Alan Kelly
--- All of Henrik's suggestions have been implemented. Additionally, m3 and m6 are permuted in avx2 before storing to ensure bit by bit identical results in avx2. libswscale/x86/Makefile | 1 + libswscale/x86/swscale.c| 75 +++ libswscale/x86/yuv2yuvX.asm | 118 ++

[FFmpeg-devel] [PATCH] Moves yuv2yuvX_sse3 to yasm, unrolls main loop and other small optimizations for ~20% speedup.

2020-11-16 Thread Alan Kelly
--- Fixes bug in sse3 path where m1 is not set correctly resulting in off by one errors. The results are now bit by bit identical. libswscale/x86/Makefile | 1 + libswscale/x86/swscale.c| 75 libswscale/x86/yuv2yuvX.asm | 114 ++

[FFmpeg-devel] [PATCH] Moves yuv2yuvX_sse3 to yasm, unrolls main loop and other small optimizations for ~20% speedup.

2020-11-12 Thread Alan Kelly
--- It now works on x86-32 libswscale/x86/Makefile | 1 + libswscale/x86/swscale.c| 75 libswscale/x86/yuv2yuvX.asm | 110 3 files changed, 121 insertions(+), 65 deletions(-) create mode 100644 libswscale/x86/yuv2yuvX.asm

[FFmpeg-devel] [PATCH] Moves yuv2yuvX_sse3 to yasm, unrolls main loop and other small optimizations for ~20% speedup.

2020-11-10 Thread Alan Kelly
--- yuv2yuvX.asm: Ports yuv2yuvX to asm, unrolls main loop and adds other small optimizations for ~20% speed-up. Copyright updated to include the original from swscale.c swscale.c: Removes yuv2yuvX_sse3 and calls new function ff_yuv2yuvX_sse3. Calls yuv2yuvX_mmxext on remainining elements if r

Re: [FFmpeg-devel] [PATCH] Moves yuv2yuvX_sse3 to yasm, unrolls main loop and other small optimizations for ~20% speedup.

2020-11-06 Thread Alan Kelly
? Thank you. On Sat, Oct 31, 2020 at 1:02 PM Carl Eugen Hoyos wrote: > Am Di., 27. Okt. 2020 um 09:56 Uhr schrieb Alan Kelly > : > > > --- /dev/null > > +++ b/libswscale/x86/yuv2yuvX.a

  1   2   >