from:"Alan Kelly"

Re: [FFmpeg-devel] [PATCH] Moves yuv2yuvX_sse3 to yasm, unrolls main loop and other small optimizations for ~20% speedup.

2021-02-09 Thread Alan Kelly

Ping! On Thu, Jan 14, 2021 at 3:47 PM Alan Kelly wrote: > --- > Replaces cpuflag(mmx) with notcpuflag(sse3) for store macro > Tests for multiple sizes in checkasm-sw_scale > checkasm-sw_scale aligns memory on 8 bytes instad of 32 to catch aligned > loads > libsw

Re: [FFmpeg-devel] [PATCH] Moves yuv2yuvX_sse3 to yasm, unrolls main loop and other small optimizations for ~20% speedup.

2021-02-16 Thread Alan Kelly

Looks like there are no comments, is this OK to be applied? Thanks On Tue, Feb 9, 2021 at 6:25 PM Paul B Mahol wrote: > Will apply in no comments. > ___ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > https://ffmpeg.org/mailman/listinfo/ffmpeg-

[FFmpeg-devel] [PATCH 1/2] tests/checkasm/sw_scale.c

2021-02-19 Thread Alan Kelly

Initialises each item in src and filter arrays to fix valgrind uninitialised value warning. --- tests/checkasm/sw_scale.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/tests/checkasm/sw_scale.c b/tests/checkasm/sw_scale.c index 7504f8b45f..a4866723d7 100644 --- a/tests/

[FFmpeg-devel] [PATCH 2/2] tests/checkasm/sw_scale.c

2021-02-19 Thread Alan Kelly

Checks av_mallocs --- tests/checkasm/sw_scale.c | 4 1 file changed, 4 insertions(+) diff --git a/tests/checkasm/sw_scale.c b/tests/checkasm/sw_scale.c index a4866723d7..ef414c0a82 100644 --- a/tests/checkasm/sw_scale.c +++ b/tests/checkasm/sw_scale.c @@ -103,7 +103,11 @@ static void check_y

Re: [FFmpeg-devel] [PATCH] Moves yuv2yuvX_sse3 to yasm, unrolls main loop and other small optimizations for ~20% speedup.

2021-02-19 Thread Alan Kelly

b94cd55155d8c061f1e1faca9076afe540149c27 as the problematic commit. On Thu, Feb 18, 2021 at 11:23 PM James Almer wrote: > On 2/17/2021 5:24 PM, Paul B Mahol wrote: > > On Tue, Feb 16, 2021 at 6:31 PM Alan Kelly < > > alankelly-at-google@ffmpeg.org> wrote: > > > >> Looks like there are n

[FFmpeg-devel] [PATCH 1/2] tests/checkasm/sw_scale.c

2021-02-19 Thread Alan Kelly

Initialises each item in src and filter arrays to fix valgrind uninitialised value warning. --- casts pointers to uint8_t* and multiplies the buffer size by sizeof(uint16_t). tests/checkasm/sw_scale.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/tests/checkasm/sw_scale.

[FFmpeg-devel] [PATCH 2/3] libswscale/x86/swscale: Only call ff_yuv2yuvX functions if the input size is > 0

2021-02-23 Thread Alan Kelly

--- libswscale/x86/swscale.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/libswscale/x86/swscale.c b/libswscale/x86/swscale.c index 1e865914cb..71961a9ae0 100644 --- a/libswscale/x86/swscale.c +++ b/libswscale/x86/swscale.c @@ -206,7 +206,8 @@ static void yuv2yuvX_ ##o

[FFmpeg-devel] [PATCH 3/3] tests/checkasm/sw_scale: adds additional tests sizes for yux2yuvX

2021-02-23 Thread Alan Kelly

--- tests/checkasm/sw_scale.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/tests/checkasm/sw_scale.c b/tests/checkasm/sw_scale.c index a10118704b..3ac0f9082f 100644 --- a/tests/checkasm/sw_scale.c +++ b/tests/checkasm/sw_scale.c @@ -68,8 +68,8 @@ static void check_yuv2

[FFmpeg-devel] [PATCH 1/3] libswscale/x86/yuv2yuvX: Removes unrolling for mmx and mmxext

2021-02-23 Thread Alan Kelly

--- This is so that tails of size 8 may safely be processed libswscale/x86/yuv2yuvX.asm | 14 +- 1 file changed, 13 insertions(+), 1 deletion(-) diff --git a/libswscale/x86/yuv2yuvX.asm b/libswscale/x86/yuv2yuvX.asm index 521880dabe..b6294cb919 100644 --- a/libswscale/x86/yuv2yuvX.as

[FFmpeg-devel] [PATCH 2/3] libswscale/x86/swscale: Only call ff_yuv2yuvX functions if the input size is > 0

2021-04-01 Thread Alan Kelly

--- libswscale/x86/swscale.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/libswscale/x86/swscale.c b/libswscale/x86/swscale.c index cc9e8b0155..0848a31461 100644 --- a/libswscale/x86/swscale.c +++ b/libswscale/x86/swscale.c @@ -197,7 +197,8 @@ static void yuv2yuvX_ ##o

[FFmpeg-devel] [PATCH 3/3] tests/checkasm/sw_scale: adds additional tests sizes for yux2yuvX

2021-04-01 Thread Alan Kelly

--- tests/checkasm/sw_scale.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/tests/checkasm/sw_scale.c b/tests/checkasm/sw_scale.c index a10118704b..3ac0f9082f 100644 --- a/tests/checkasm/sw_scale.c +++ b/tests/checkasm/sw_scale.c @@ -68,8 +68,8 @@ static void check_yuv2

[FFmpeg-devel] [PATCH 1/3] libswscale/x86/yuv2yuvX: Removes unrolling for mmx and mmxext

2021-04-01 Thread Alan Kelly

--- This is so that inputs of size 8 are supported, as was the case with the original implementation. A bug was found with inputs not divisible by 16. libswscale/x86/yuv2yuvX.asm | 14 +- 1 file changed, 13 insertions(+), 1 deletion(-) diff --git a/libswscale/x86/yuv2yuvX.asm b/lib

[FFmpeg-devel] [PATCH 1/2] libavutil/cpu: Adds av_cpu_has_fast_gather to detect cpus with avx fast gather instruction

2021-06-14 Thread Alan Kelly

Broadwell and later have fast gather instructions. --- This is so that the avx2 version of ff_hscale8to15X which uses gather instructions is only selected on machines where it will actually be faster. libavutil/cpu.c | 6 ++ libavutil/cpu.h | 6 ++ libavutil/cpu_inte

[FFmpeg-devel] [PATCH 2/2] libswscale: Adds ff_hscale8to15_4_avx2 and ff_hscale8to15_X4_avx2 for all filter sizes.

2021-06-14 Thread Alan Kelly

These functions replace all ff_hscale8to15_*_ssse3 when avx2 is available. --- libswscale/swscale_internal.h | 2 + libswscale/utils.c| 37 +++ libswscale/x86/Makefile | 1 + libswscale/x86/scale_avx2.asm | 112 ++ libswscale/x86/swsca

Re: [FFmpeg-devel] [PATCH 1/2] libavutil/cpu: Adds av_cpu_has_fast_gather to detect cpus with avx fast gather instruction

2021-06-24 Thread Alan Kelly

je wrote: > > Hi Alan, > > > > On Mon, Jun 14, 2021 at 7:20 AM Alan Kelly < > > alankelly-at-google@ffmpeg.org> wrote: > > > >> Broadwell and later have fast gather instructions. > >> --- > >> This is so that the avx2 version of ff

[FFmpeg-devel] [PATCH 1/2] libavutil/cpu: Adds fast gather detection.

2021-06-25 Thread Alan Kelly

Broadwell and later and Zen3 and later have fast gather instructions. --- Gather requires between 9 and 12 cycles on Haswell, 5 to 7 on Broadwell, and 2 to 5 on Skylake and newer. It is also slow on AMD before Zen 3. libavutil/cpu.h | 2 ++ libavutil/x86/cpu.c | 18 -- libav

[FFmpeg-devel] [PATCH 2/2] libswscale: Adds ff_hscale8to15_4_avx2 and ff_hscale8to15_X4_avx2 for all filter sizes.

2021-06-25 Thread Alan Kelly

These functions replace all ff_hscale8to15_*_ssse3 when avx2 is available. --- libswscale/swscale_internal.h | 2 + libswscale/utils.c| 37 +++ libswscale/x86/Makefile | 1 + libswscale/x86/scale_avx2.asm | 112 ++ libswscale/x86/swsca

Re: [FFmpeg-devel] [PATCH 1/2] libavutil/cpu: Adds fast gather detection.

2021-06-25 Thread Alan Kelly

On Fri, Jun 25, 2021 at 10:40 AM Lynne wrote: > Jun 25, 2021, 09:54 by alankelly-at-google@ffmpeg.org: > > > Broadwell and later and Zen3 and later have fast gather instructions. > > --- > > Gather requires between 9 and 12 cycles on Haswell, 5 to 7 on Broadwell, > > and 2 to 5 on Skylake a

Re: [FFmpeg-devel] [PATCH 2/2] libswscale: Adds ff_hscale8to15_4_avx2 and ff_hscale8to15_X4_avx2 for all filter sizes.

2021-06-25 Thread Alan Kelly

On Fri, Jun 25, 2021 at 1:26 PM Ronald S. Bultje wrote: > Hi Alan, > > On Fri, Jun 25, 2021 at 3:59 AM Alan Kelly < > alankelly-at-google@ffmpeg.org> wrote: > >> These functions replace all ff_hscale8to15_*_ssse3 when avx2 is available. >> > > Re-asking

Re: [FFmpeg-devel] [PATCH 1/2] libavutil/cpu: Adds fast gather detection.

2021-07-12 Thread Alan Kelly

On Fri, Jun 25, 2021 at 1:24 PM Alan Kelly wrote: > On Fri, Jun 25, 2021 at 10:40 AM Lynne wrote: > >> Jun 25, 2021, 09:54 by alankelly-at-google@ffmpeg.org: >> >> > Broadwell and later and Zen3 and later have fast gather instructions. >> > --- >>

[FFmpeg-devel] [PATCH 1/2] libavutil/cpu: Adds fast gather detection.

2021-07-16 Thread Alan Kelly

Broadwell and later and Zen3 and later have fast gather instructions. --- Haswell is now excluded from EXTERNAL_AVX2_FAST as discussed in the email thread. libavutil/cpu.h | 1 + libavutil/x86/cpu.c | 11 ++- 2 files changed, 11 insertions(+), 1 deletion(-) diff --git a/libavutil/c

[FFmpeg-devel] [PATCH 2/2] libswscale: Adds ff_hscale8to15_4_avx2 and ff_hscale8to15_X4_avx2 for all filter sizes.

2021-07-16 Thread Alan Kelly

These functions replace all ff_hscale8to15_*_ssse3 when avx2 is available. --- EXTERNAL_AVX2_FAST is now used instead of EXTERNAL_AVX2_FAST_GATHER as discussed in the email thread for part 1 of this patch. Benchmark results on Skylake and Haswell: Skylake Haswell h

Re: [FFmpeg-devel] [PATCH 1/2] libavutil/cpu: Adds fast gather detection.

2021-07-16 Thread Alan Kelly

On Fri, Jul 16, 2021 at 4:02 PM James Almer wrote: > On 7/16/2021 10:44 AM, Alan Kelly wrote: > > Broadwell and later and Zen3 and later have fast gather instructions. > > --- > > Haswell is now excluded from EXTERNAL_AVX2_FAST as discussed in the > > email thre

Re: [FFmpeg-devel] [PATCH 2/2] libswscale: Adds ff_hscale8to15_4_avx2 and ff_hscale8to15_X4_avx2 for all filter sizes.

2021-07-21 Thread Alan Kelly

On Fri, Jul 16, 2021 at 3:48 PM Alan Kelly wrote: > These functions replace all ff_hscale8to15_*_ssse3 when avx2 is available. > --- > EXTERNAL_AVX2_FAST is now used instead of EXTERNAL_AVX2_FAST_GATHER as > discussed in the email thread for part 1 of this patch. > > Benchmark

Re: [FFmpeg-devel] [PATCH 2/2] libswscale: Adds ff_hscale8to15_4_avx2 and ff_hscale8to15_X4_avx2 for all filter sizes.

2021-07-26 Thread Alan Kelly

On Wed, Jul 21, 2021 at 11:11 AM Alan Kelly wrote: > > > On Fri, Jul 16, 2021 at 3:48 PM Alan Kelly wrote: > >> These functions replace all ff_hscale8to15_*_ssse3 when avx2 is available. >> --- >> EXTERNAL_AVX2_FAST is now used instead of EXTERNAL_AVX2_FAST_GATHE

[FFmpeg-devel] [PATCH] Unrolls main loop of yuv2yuvX_sse3 and general code tidying for ~20% speedup

2020-09-15 Thread Alan Kelly

--- libswscale/x86/swscale.c | 138 --- 1 file changed, 72 insertions(+), 66 deletions(-) diff --git a/libswscale/x86/swscale.c b/libswscale/x86/swscale.c index 3160fedf04..e47fee2bbd 100644 --- a/libswscale/x86/swscale.c +++ b/libswscale/x86/swscale.c @@ -201,

[FFmpeg-devel] [PATCH] Moves yuv2yuvX_sse3 to yasm, unrolls main loop and other small optimizations for ~20% speedup. AVX2 version is ready and tested, although local tests show a significant speed-up

2020-10-22 Thread Alan Kelly

Other functions to be ported to avx2 have been identified and are on the todo list. --- libswscale/x86/Makefile | 1 + libswscale/x86/swscale.c| 72 +++-- libswscale/x86/yuv2yuvX.asm | 105 3 files changed, 112 insertions(+), 66 d

[FFmpeg-devel] [PATCH] Moves yuv2yuvX_sse3 to yasm, unrolls main loop and other small optimizations for ~20% speedup. AVX2 version is ready and tested, however, although local tests show a significant

2020-10-23 Thread Alan Kelly

Fixed. The wrong step size was used causing a write passed the end of the buffer. yuv2yuvX_mmxext is now called if there are any remaining pixels. --- libswscale/x86/Makefile | 1 + libswscale/x86/swscale.c| 75 -- libswscale/x86/yuv2yuvX.asm | 105

Re: [FFmpeg-devel] [PATCH] Moves yuv2yuvX_sse3 to yasm, unrolls main loop and other small optimizations for ~20% speedup. AVX2 version is ready and tested, although local tests show a significant spee

2020-10-23 Thread Alan Kelly

pmulhw m5, m0, [srcq + offsetq * 2 + 3 * mmsize] + paddw m6, m6, m2 +paddwm1, m1, m5 +add rsiq, $10 +mov srcq, [rsiq] +test srcd, srcd +jnz

[FFmpeg-devel] [PATCH] Moves yuv2yuvX_sse3 to yasm, unrolls main loop and other small optimizations for ~20% speedup.

2020-10-27 Thread Alan Kelly

--- libswscale/x86/Makefile | 1 + libswscale/x86/swscale.c| 75 -- libswscale/x86/yuv2yuvX.asm | 105 3 files changed, 116 insertions(+), 65 deletions(-) create mode 100644 libswscale/x86/yuv2yuvX.asm diff --git a/libswscal

Re: [FFmpeg-devel] [PATCH] Moves yuv2yuvX_sse3 to yasm, unrolls main loop and other small optimizations for ~20% speedup. AVX2 version is ready and tested, however, although local tests show a signifi

2020-10-27 Thread Alan Kelly

Thanks for the review, I have made the required changes. As I have changed the subject the patch is in a new thread. On Fri, Oct 23, 2020 at 4:10 PM James Almer wrote: > On 10/23/2020 10:17 AM, Alan Kelly wrote: > > Fixed. The wrong step size was used causing a write passed the end of

Re: [FFmpeg-devel] [PATCH] Moves yuv2yuvX_sse3 to yasm, unrolls main loop and other small optimizations for ~20% speedup.

2020-10-27 Thread Alan Kelly

probably due to cpu frequency scaling. checkasm will follow in a separate patch. On Tue, Oct 27, 2020 at 9:56 AM Alan Kelly wrote: > --- > libswscale/x86/Makefile | 1 + > libswscale/x86/swscale.c| 75 -- > libswscale/x86/yuv2yu

[FFmpeg-devel] [PATCH] Moves yuv2yuvX_sse3 to yasm, unrolls main loop and other small optimizations for ~20% speedup.

2020-10-27 Thread Alan Kelly

--- libswscale/x86/Makefile | 1 + libswscale/x86/swscale.c| 75 - libswscale/x86/yuv2yuvX.asm | 109 3 files changed, 120 insertions(+), 65 deletions(-) create mode 100644 libswscale/x86/yuv2yuvX.asm diff --git a/libswscale

Re: [FFmpeg-devel] [PATCH] Moves yuv2yuvX_sse3 to yasm, unrolls main loop and other small optimizations for ~20% speedup.

2020-10-27 Thread Alan Kelly

Thanks for the feedback Anton. The second patch incorporates changes suggested by James Almer: avx2 instructions are wrapped in if cpuflag(avx2) and movddup restored mm1 is replaced by m1 on x86_32 On Tue, Oct 27, 2020 at 10:40 AM Anton Khirnov wrote: > Hi, > Quoting Alan Kelly (2020

Re: [FFmpeg-devel] [PATCH] Moves yuv2yuvX_sse3 to yasm, unrolls main loop and other small optimizations for ~20% speedup.

2020-11-06 Thread Alan Kelly

? Thank you. On Sat, Oct 31, 2020 at 1:02 PM Carl Eugen Hoyos wrote: > Am Di., 27. Okt. 2020 um 09:56 Uhr schrieb Alan Kelly > : > > > --- /dev/null > > +++ b/libswscale/x86/yuv2yuvX.a

[FFmpeg-devel] [PATCH] Moves yuv2yuvX_sse3 to yasm, unrolls main loop and other small optimizations for ~20% speedup.

2020-11-10 Thread Alan Kelly

--- yuv2yuvX.asm: Ports yuv2yuvX to asm, unrolls main loop and adds other small optimizations for ~20% speed-up. Copyright updated to include the original from swscale.c swscale.c: Removes yuv2yuvX_sse3 and calls new function ff_yuv2yuvX_sse3. Calls yuv2yuvX_mmxext on remainining elements if r

[FFmpeg-devel] [PATCH] Moves yuv2yuvX_sse3 to yasm, unrolls main loop and other small optimizations for ~20% speedup.

2020-11-12 Thread Alan Kelly

--- It now works on x86-32 libswscale/x86/Makefile | 1 + libswscale/x86/swscale.c| 75 libswscale/x86/yuv2yuvX.asm | 110 3 files changed, 121 insertions(+), 65 deletions(-) create mode 100644 libswscale/x86/yuv2yuvX.asm

[FFmpeg-devel] [PATCH] Moves yuv2yuvX_sse3 to yasm, unrolls main loop and other small optimizations for ~20% speedup.

2020-11-16 Thread Alan Kelly

--- Fixes bug in sse3 path where m1 is not set correctly resulting in off by one errors. The results are now bit by bit identical. libswscale/x86/Makefile | 1 + libswscale/x86/swscale.c| 75 libswscale/x86/yuv2yuvX.asm | 114 ++

[FFmpeg-devel] [PATCH] Moves yuv2yuvX_sse3 to yasm, unrolls main loop and other small optimizations for ~20% speedup.

2020-11-19 Thread Alan Kelly

--- All of Henrik's suggestions have been implemented. Additionally, m3 and m6 are permuted in avx2 before storing to ensure bit by bit identical results in avx2. libswscale/x86/Makefile | 1 + libswscale/x86/swscale.c| 75 +++ libswscale/x86/yuv2yuvX.asm | 118 ++

Re: [FFmpeg-devel] [PATCH] Moves yuv2yuvX_sse3 to yasm, unrolls main loop and other small optimizations for ~20% speedup.

2020-12-01 Thread Alan Kelly

Ping On Thu, Nov 19, 2020 at 9:42 AM Alan Kelly wrote: > --- > All of Henrik's suggestions have been implemented. Additionally, > m3 and m6 are permuted in avx2 before storing to ensure bit by bit > identical results in avx2. > libswscale/x86/Makefile | 1 + > l

[FFmpeg-devel] [PATCH] Moves yuv2yuvX_sse3 to yasm, unrolls main loop and other small optimizations for ~20% speedup.

2020-12-09 Thread Alan Kelly

--- Activates avx2 version of yuv2yuvX Adds checkasm for yuv2yuvX Modifies ff_yuv2yuvX_* signature to match yuv2yuvX_* Replaces non-temporal stores with temporal stores libswscale/x86/Makefile | 1 + libswscale/x86/swscale.c| 106 +--- libswscale/x86/yuv2y

Re: [FFmpeg-devel] [PATCH] Moves yuv2yuvX_sse3 to yasm, unrolls main loop and other small optimizations for ~20% speedup.

2020-12-09 Thread Alan Kelly

good reason. If you think it better to use NT stores, I will replace them. On Fri, Dec 4, 2020 at 2:00 PM Anton Khirnov wrote: > Quoting Alan Kelly (2020-11-19 09:41:56) > > --- > > All of Henrik's suggestions have been implemented. Additionally, > > m3 and m6 are per

[FFmpeg-devel] [PATCH] Moves yuv2yuvX_sse3 to yasm, unrolls main loop and other small optimizations for ~20% speedup.

2020-12-10 Thread Alan Kelly

--- Replaces ff_sws_init_swscale_x86 with ff_getSwsFunc Load offset if not gprsize but 8 on both 32 and 64 bit Removes sfence as NT store no longer used libswscale/x86/Makefile | 1 + libswscale/x86/swscale.c| 106 +--- libswscale/x86/yuv2yuvX.asm | 117 +++

[FFmpeg-devel] [PATCH] Moves yuv2yuvX_sse3 to yasm, unrolls main loop and other small optimizations for ~20% speedup.

2020-12-17 Thread Alan Kelly

--- Fixes memory alignment problem in checkasm-sw_scale Tested on Linux 32 and 64 bit and mingw32 libswscale/x86/Makefile | 1 + libswscale/x86/swscale.c| 106 +--- libswscale/x86/yuv2yuvX.asm | 117 tests/checkasm/sw_sca

Re: [FFmpeg-devel] [PATCH] Moves yuv2yuvX_sse3 to yasm, unrolls main loop and other small optimizations for ~20% speedup.

2021-01-05 Thread Alan Kelly

Ping! On Thu, Dec 17, 2020 at 11:42 AM Alan Kelly wrote: > --- > Fixes memory alignment problem in checkasm-sw_scale > Tested on Linux 32 and 64 bit and mingw32 > libswscale/x86/Makefile | 1 + > libswscale/x86/swscale.c| 106 +--- &g

[FFmpeg-devel] [PATCH] Moves yuv2yuvX_sse3 to yasm, unrolls main loop and other small optimizations for ~20% speedup.

2021-01-07 Thread Alan Kelly

--- Replaces mova with movdqu due to alignment issues libswscale/x86/Makefile | 1 + libswscale/x86/swscale.c| 106 +--- libswscale/x86/yuv2yuvX.asm | 117 tests/checkasm/sw_scale.c | 98 ++

Re: [FFmpeg-devel] [PATCH] Moves yuv2yuvX_sse3 to yasm, unrolls main loop and other small optimizations for ~20% speedup.

2021-01-07 Thread Alan Kelly

Thanks for your patience with this, I have replaced mova with movdqu - movu generated a compile error on ssse3. What system did this crash on? On Wed, Jan 6, 2021 at 9:10 PM Michael Niedermayer wrote: > On Tue, Jan 05, 2021 at 01:31:25PM +0100, Alan Kelly wrote: > > Ping! > >

Re: [FFmpeg-devel] [PATCH] Moves yuv2yuvX_sse3 to yasm, unrolls main loop and other small optimizations for ~20% speedup.

2021-01-11 Thread Alan Kelly

on a solution. On Sun, Jan 10, 2021 at 4:26 PM Michael Niedermayer wrote: > On Thu, Jan 07, 2021 at 10:41:19AM +0100, Alan Kelly wrote: > > --- > > Replaces mova with movdqu due to alignment issues > > libswscale/x86/Makefile | 1 + > > l

[FFmpeg-devel] [PATCH] Moves yuv2yuvX_sse3 to yasm, unrolls main loop and other small optimizations for ~20% speedup.

2021-01-11 Thread Alan Kelly

--- Fixes a bug where if there is no offset and a tail which is not processed by the sse3/avx2 version the dither is modified Deletes mmx/mmxext yuv2yuvX version from swscale_template and adds it to yuv2yuvX.asm to reduce code duplication and so that it may be used to process the tail from th

Re: [FFmpeg-devel] [PATCH] Moves yuv2yuvX_sse3 to yasm, unrolls main loop and other small optimizations for ~20% speedup.

2021-01-14 Thread Alan Kelly

32 so that the test catches problems with alignment. On Thu, Jan 14, 2021 at 1:11 AM Michael Niedermayer wrote: > On Mon, Jan 11, 2021 at 05:46:31PM +0100, Alan Kelly wrote: > > --- > > Fixes a bug where if there is no offset and a tail which is not > processed by the > >

[FFmpeg-devel] [PATCH] Moves yuv2yuvX_sse3 to yasm, unrolls main loop and other small optimizations for ~20% speedup.

2021-01-14 Thread Alan Kelly

--- Replaces cpuflag(mmx) with notcpuflag(sse3) for store macro Tests for multiple sizes in checkasm-sw_scale checkasm-sw_scale aligns memory on 8 bytes instad of 32 to catch aligned loads libswscale/x86/Makefile | 1 + libswscale/x86/swscale.c | 130 ---

[FFmpeg-devel] [PATCH 2/2] libswscale: Adds ff_hscale8to15_4_avx2 and ff_hscale8to15_X4_avx2 for all filter sizes.

2021-12-14 Thread Alan Kelly

Patch has been rebased from latest commits. These functions replace all ff_hscale8to15_*_ssse3 when avx2 is available. --- libswscale/swscale_internal.h | 2 + libswscale/utils.c| 37 +++ libswscale/x86/Makefile | 1 + libswscale/x86/scale_avx2.asm | 112

Re: [FFmpeg-devel] [PATCH 2/2] libswscale: Adds ff_hscale8to15_4_avx2 and ff_hscale8to15_X4_avx2 for all filter sizes.

2021-12-15 Thread Alan Kelly

On Tue, Dec 14, 2021 at 6:07 PM James Almer wrote: > On 12/14/2021 12:23 PM, Alan Kelly wrote: > > Patch has been rebased from latest commits. > > These functions replace all ff_hscale8to15_*_ssse3 when avx2 is > available. > > --- > > libswscale/swscale_inter

[FFmpeg-devel] [PATCH 2/2] libswscale: Adds ff_hscale8to15_4_avx2 and ff_hscale8to15_X4_avx2 for all filter sizes.

2021-12-15 Thread Alan Kelly

Fixes so that fate under 64 bit Windows passes. These functions replace all ff_hscale8to15_*_ssse3 when avx2 is available. --- libswscale/swscale_internal.h | 2 + libswscale/utils.c| 37 +++ libswscale/x86/Makefile | 1 + libswscale/x86/scale_avx2.asm | 112 +++

[FFmpeg-devel] [PATCH] x86/swscale: fix minor coding style issues

2021-12-16 Thread Alan Kelly

--- libswscale/x86/swscale.c | 14 +++--- tests/checkasm/sw_scale.c | 3 +-- 2 files changed, 8 insertions(+), 9 deletions(-) diff --git a/libswscale/x86/swscale.c b/libswscale/x86/swscale.c index 164b06d6ba..c49a05c37b 100644 --- a/libswscale/x86/swscale.c +++ b/libswscale/x86/swscale.

Re: [FFmpeg-devel] [PATCH] x86/swscale: fix minor coding style issues

2021-12-16 Thread Alan Kelly

Thanks Lynne for the patch. On Thu, Dec 16, 2021 at 5:05 PM Alan Kelly wrote: > --- > libswscale/x86/swscale.c | 14 +++--- > tests/checkasm/sw_scale.c | 3 +-- > 2 files changed, 8 insertions(+), 9 deletions(-) > > diff --git a/libswscale/x86/swscale.c b/libsws

[FFmpeg-devel] [PATCH] x86/scale_avx2: Change asm indent from 2 to 4 spaces.

2021-12-16 Thread Alan Kelly

--- libswscale/x86/scale_avx2.asm | 96 +-- 1 file changed, 48 insertions(+), 48 deletions(-) diff --git a/libswscale/x86/scale_avx2.asm b/libswscale/x86/scale_avx2.asm index 2cd7e968d3..eb472db12f 100644 --- a/libswscale/x86/scale_avx2.asm +++ b/libswscale/x86/sca

[FFmpeg-devel] [PATCH 1/2] libavutil/cpu: Add AV_CPU_FLAG_SLOW_GATHER.

2021-12-20 Thread Alan Kelly

This flag is set on Haswell and earlier and all AMD cpus. --- As discussed on IRC last week. libavutil/cpu.h | 57 +++-- libavutil/x86/cpu.c | 13 ++- 2 files changed, 41 insertions(+), 29 deletions(-) diff --git a/libavutil/cpu.h b/libavutil/c

[FFmpeg-devel] [PATCH 2/2] libswscale: Test AV_CPU_FLAG_SLOW_GATHER for hscale functions.

2021-12-20 Thread Alan Kelly

This is instead of EXTERNAL_AVX2_FAST so that the avx2 hscale functions are only used where they are faster. --- libswscale/utils.c| 2 +- libswscale/x86/swscale.c | 2 +- tests/checkasm/sw_scale.c | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) diff --git a/libswscale/utils.c b

[FFmpeg-devel] [PATCH 1/2] libavutil/cpu: Add AV_CPU_FLAG_SLOW_GATHER.

2021-12-20 Thread Alan Kelly

This flag is set on Haswell and earlier and all AMD cpus. --- Removes unnecessary indentation, clarifies comment and only sets flag on AMD cpus with AVX2. libavutil/cpu.h | 1 + libavutil/x86/cpu.c | 14 +- 2 files changed, 14 insertions(+), 1 deletion(-) diff --git a/libavutil

[FFmpeg-devel] [PATCH 2/2] libswscale: Test AV_CPU_FLAG_SLOW_GATHER for hscale functions.

2021-12-20 Thread Alan Kelly

This is instead of EXTERNAL_AVX2_FAST so that the avx2 hscale functions are only used where they are faster. --- Whoops! Corrects check so that this flag is only enabled where fast avx2 and fast gathers are available. libswscale/utils.c| 2 +- libswscale/x86/swscale.c | 2 +- tests/chec

[FFmpeg-devel] [PATCH 1/2] libavutil/cpu: Add AV_CPU_FLAG_SLOW_GATHER.

2021-12-20 Thread Alan Kelly

This flag is set on Haswell and earlier and all AMD cpus. --- Sets this flag on Zen 3 and earlier. libavutil/cpu.h | 1 + libavutil/x86/cpu.c | 14 +- 2 files changed, 14 insertions(+), 1 deletion(-) diff --git a/libavutil/cpu.h b/libavutil/cpu.h index ae443eccad..ce9bf14bf7 100

Re: [FFmpeg-devel] [PATCH 1/2] libavutil/cpu: Add AV_CPU_FLAG_SLOW_GATHER.

2021-12-20 Thread Alan Kelly

On Mon, Dec 20, 2021 at 3:53 PM James Almer wrote: > > > On 12/20/2021 11:47 AM, Lynne wrote: > > 20 Dec 2021, 15:43 by alankelly-at-google@ffmpeg.org: > > > >> This flag is set on Haswell and earlier and all AMD cpus. > >> --- > >> Removes unnecessary indentation, clarifies comment and onl

[FFmpeg-devel] [PATCH 1/2] libavutil/cpu: Add AV_CPU_FLAG_SLOW_GATHER.

2021-12-21 Thread Alan Kelly

This flag is set on Haswell and earlier and all AMD cpus. --- Checks for family for Haswell. All checks are done where AVX2 flag is set as this is clearer. libavutil/cpu.h | 1 + libavutil/x86/cpu.c | 15 ++- 2 files changed, 15 insertions(+), 1 deletion(-) diff --git a/libavut

[FFmpeg-devel] [PATCH 1/4] libswscale: Re-factor ff_shuffle_filter_coefficients.

2022-01-10 Thread Alan Kelly

Make the code more readable, follow the style guide and propagate memory allocation errors. --- libswscale/swscale_internal.h | 2 +- libswscale/utils.c| 68 --- 2 files changed, 40 insertions(+), 30 deletions(-) diff --git a/libswscale/swscale_interna

[FFmpeg-devel] [PATCH 2/4] libswscale: Avx2 hscale can process any input of size which is a multiple of 4.

2022-01-10 Thread Alan Kelly

The main loop processes blocks of 16 pixels. The tail processes blocks of size 4. --- libswscale/x86/scale_avx2.asm | 48 +-- 1 file changed, 46 insertions(+), 2 deletions(-) diff --git a/libswscale/x86/scale_avx2.asm b/libswscale/x86/scale_avx2.asm index 20acdbd63

[FFmpeg-devel] [PATCH 3/4] libswscale: Enable hscale_avx2 for input sizes which ar emultiples of 4.

2022-01-10 Thread Alan Kelly

ff_shuffle_filter_coefficients shuffles the tail as required. --- libswscale/utils.c | 17 +++-- libswscale/x86/swscale.c | 4 ++-- 2 files changed, 17 insertions(+), 4 deletions(-) diff --git a/libswscale/utils.c b/libswscale/utils.c index 52f07e1661..7e1e9c3834 100644 --- a/l

[FFmpeg-devel] [PATCH 4/4] checkasm/sw_scale: hscale does not requires cpuflag test.

2022-01-10 Thread Alan Kelly

This is done in ff_shuffle_filter_coefficients. --- tests/checkasm/sw_scale.c | 6 ++ 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/tests/checkasm/sw_scale.c b/tests/checkasm/sw_scale.c index 3c0a083b42..e7f916d3a8 100644 --- a/tests/checkasm/sw_scale.c +++ b/tests/checkasm/sw_

Re: [FFmpeg-devel] [PATCH 1/4] libswscale: Re-factor ff_shuffle_filter_coefficients.

2022-02-02 Thread Alan Kelly

Hi, Is anybody interested in this patch set? Thanks! On Mon, Jan 10, 2022, 15:58 Alan Kelly wrote: > Make the code more readable, follow the style guide and propagate memory > allocation errors. > --- > libswscale/swscale_internal.h | 2 +- > libswscale/utils.c

[FFmpeg-devel] [PATCH 1/5] libswscale: Re-factor ff_shuffle_filter_coefficients.

2022-02-09 Thread Alan Kelly

Make the code more readable and follow the style guide. --- libswscale/utils.c | 64 +++--- 1 file changed, 37 insertions(+), 27 deletions(-) diff --git a/libswscale/utils.c b/libswscale/utils.c index c5ea8853d5..1d919e863a 100644 --- a/libswscale/utils.c +

[FFmpeg-devel] [PATCH 2/5] libswscale: Avx2 hscale can process inputs of any size.

2022-02-09 Thread Alan Kelly

The main loop processes blocks of 16 pixels. The tail processes blocks of size 4. --- libswscale/x86/scale_avx2.asm | 48 +-- 1 file changed, 46 insertions(+), 2 deletions(-) diff --git a/libswscale/x86/scale_avx2.asm b/libswscale/x86/scale_avx2.asm index 20acdbd63

[FFmpeg-devel] [PATCH 3/5] libswscale: Enable hscale_avx2 for all input sizes.

2022-02-09 Thread Alan Kelly

ff_shuffle_filter_coefficients shuffles the tail as required. --- libswscale/utils.c | 19 --- libswscale/x86/swscale.c | 6 ++ 2 files changed, 18 insertions(+), 7 deletions(-) diff --git a/libswscale/utils.c b/libswscale/utils.c index 1d919e863a..31c365fcee 100644 ---

[FFmpeg-devel] [PATCH 4/5] libswscale: Propagate error codes from ff_shuffle_filter_coefficients

2022-02-09 Thread Alan Kelly

--- libswscale/swscale_internal.h | 2 +- libswscale/utils.c| 14 -- 2 files changed, 9 insertions(+), 7 deletions(-) diff --git a/libswscale/swscale_internal.h b/libswscale/swscale_internal.h index 3a78d95ba6..26d28d42e6 100644 --- a/libswscale/swscale_internal.h +++ b/l

[FFmpeg-devel] [PATCH 5/5] checkasm/sw_scale: hscale does not requires cpuflag test.

2022-02-09 Thread Alan Kelly

This is done in ff_shuffle_filter_coefficients. --- tests/checkasm/sw_scale.c | 6 ++ 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/tests/checkasm/sw_scale.c b/tests/checkasm/sw_scale.c index 3c0a083b42..e7f916d3a8 100644 --- a/tests/checkasm/sw_scale.c +++ b/tests/checkasm/sw_

Re: [FFmpeg-devel] [PATCH 1/4] libswscale: Re-factor ff_shuffle_filter_coefficients.

2022-02-09 Thread Alan Kelly

:11 PM Michael Niedermayer wrote: > On Mon, Jan 10, 2022 at 03:58:33PM +0100, Alan Kelly wrote: > > Make the code more readable, follow the style guide and propagate memory > > allocation errors. > > Cosmetics and bugfixes should not be in the same patch > &

[FFmpeg-devel] [PATCH v2 1/5] libswscale: Check and propagate memory allocation errors from ff_shuffle_filter_coefficients.

2022-02-17 Thread Alan Kelly

--- libswscale/swscale_internal.h | 2 +- libswscale/utils.c| 11 --- 2 files changed, 9 insertions(+), 4 deletions(-) diff --git a/libswscale/swscale_internal.h b/libswscale/swscale_internal.h index 3a78d95ba6..26d28d42e6 100644 --- a/libswscale/swscale_internal.h +++ b/libs

[FFmpeg-devel] [PATCH v2 2/5] libswscale: Re-factor ff_shuffle_filter_coefficients.

2022-02-17 Thread Alan Kelly

Make the code more readable and follow the style guide. --- libswscale/utils.c | 66 +- 1 file changed, 36 insertions(+), 30 deletions(-) diff --git a/libswscale/utils.c b/libswscale/utils.c index 344c87dfdf..7c8e1bbdde 100644 --- a/libswscale/utils.c +

[FFmpeg-devel] [PATCH v2 3/5] libswscale: Avx2 hscale can process inputs of any size.

2022-02-17 Thread Alan Kelly

The main loop processes blocks of 16 pixels. The tail processes blocks of size 4. --- libswscale/x86/scale_avx2.asm | 48 +-- 1 file changed, 46 insertions(+), 2 deletions(-) diff --git a/libswscale/x86/scale_avx2.asm b/libswscale/x86/scale_avx2.asm index 20acdbd63

[FFmpeg-devel] [PATCH v2 4/5] libswscale: Enable hscale_avx2 for all input sizes.

2022-02-17 Thread Alan Kelly

ff_shuffle_filter_coefficients shuffles the tail as required. --- libswscale/utils.c | 19 --- libswscale/x86/swscale.c | 6 ++ 2 files changed, 18 insertions(+), 7 deletions(-) diff --git a/libswscale/utils.c b/libswscale/utils.c index 7c8e1bbdde..d818c9ce55 100644 ---

[FFmpeg-devel] [PATCH v2 5/5] checkasm/sw_scale: hscale does not requires cpuflag test.

2022-02-17 Thread Alan Kelly

This is done in ff_shuffle_filter_coefficients. --- tests/checkasm/sw_scale.c | 5 + 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/tests/checkasm/sw_scale.c b/tests/checkasm/sw_scale.c index 3c0a083b42..4c57b6a372 100644 --- a/tests/checkasm/sw_scale.c +++ b/tests/checkasm/sw_sc

Re: [FFmpeg-devel] [PATCH v2 3/5] libswscale: Avx2 hscale can process inputs of any size.

2022-03-07 Thread Alan Kelly

Hi Michael, Thanks for reviewing the first two parts of this patchset. Is there anybody interested in reviewing this part? Thanks, Alan On Thu, Feb 17, 2022 at 5:21 PM Michael Niedermayer wrote: > On Thu, Feb 17, 2022 at 11:04:04AM +0100, Alan Kelly wrote: > > The main loop process

[FFmpeg-devel] [PATCH 1/3] swscale/x86/swscale: Process yuv2yuvX tails using next largest register size

2023-07-14 Thread Alan Kelly

--- libswscale/x86/swscale.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/libswscale/x86/swscale.c b/libswscale/x86/swscale.c index ff16398988..8c67bf4fab 100644 --- a/libswscale/x86/swscale.c +++ b/libswscale/x86/swscale.c @@ -194,7 +194,7 @@ static void yuv2yuvX_ #

[FFmpeg-devel] [PATCH 2/3] swscale/x86/yuv2yuvX: Add yuv2yuvX avx512

2023-07-14 Thread Alan Kelly

--- libswscale/x86/swscale.c| 7 +++ libswscale/x86/yuv2yuvX.asm | 19 ++- 2 files changed, 25 insertions(+), 1 deletion(-) diff --git a/libswscale/x86/swscale.c b/libswscale/x86/swscale.c index 8c67bf4fab..52423a1199 100644 --- a/libswscale/x86/swscale.c +++ b/libswscale

[FFmpeg-devel] [PATCH 3/3] swscale/x86/yuv2yuvX: Process tails by jumping back into the main loop.

2023-07-14 Thread Alan Kelly

--- libswscale/x86/swscale.c| 11 --- libswscale/x86/yuv2yuvX.asm | 12 ++-- 2 files changed, 14 insertions(+), 9 deletions(-) diff --git a/libswscale/x86/swscale.c b/libswscale/x86/swscale.c index 52423a1199..71434f58d3 100644 --- a/libswscale/x86/swscale.c +++ b/libswscale/x

Re: [FFmpeg-devel] [PATCH 2/3] swscale/x86/yuv2yuvX: Add yuv2yuvX avx512

2023-07-17 Thread Alan Kelly

Happy to add the check. Thanks, Alan On Fri, Jul 14, 2023 at 4:59 PM James Almer wrote: > On 7/14/2023 11:57 AM, Kieran Kunhya wrote: > > On Fri, 14 Jul 2023 at 14:03, James Almer wrote: > > > >> On 7/14/2023 9:59 AM, Kieran Kunhya wrote: > +#if ARCH_X86_64 && HAVE_AVX512_EXTERNAL >

[FFmpeg-devel] [PATCH 2/3] swscale/x86/yuv2yuvX: Add yuv2yuvX avx512

2023-07-17 Thread Alan Kelly

--- Checks for EXTERNAL_AVX512ICL to prevent downclocking on Skylake libswscale/x86/swscale.c| 7 +++ libswscale/x86/yuv2yuvX.asm | 19 ++- 2 files changed, 25 insertions(+), 1 deletion(-) diff --git a/libswscale/x86/swscale.c b/libswscale/x86/swscale.c index 8c67bf4fab.

Re: [FFmpeg-devel] [PATCH 3/3] swscale/x86/yuv2yuvX: Process tails by jumping back into the main loop.

2023-07-17 Thread Alan Kelly

On Sat, Jul 15, 2023 at 10:40 PM Michael Niedermayer wrote: > On Fri, Jul 14, 2023 at 12:08:46PM +0200, Alan Kelly wrote: > > --- > > libswscale/x86/swscale.c| 11 --- > > libswscale/x86/yuv2yuvX.asm | 12 ++-- > > 2 files changed, 14 insertions(+

[FFmpeg-devel] [PATCH 3/3] swscale/x86/yuv2yuvX: Process tails by jumping back into the main loop.

2023-07-17 Thread Alan Kelly

--- libswscale/x86/swscale.c| 11 --- libswscale/x86/yuv2yuvX.asm | 24 ++-- 2 files changed, 22 insertions(+), 13 deletions(-) diff --git a/libswscale/x86/swscale.c b/libswscale/x86/swscale.c index 600c7d6c91..6980002e9e 100644 --- a/libswscale/x86/swscale.c +++ b

Re: [FFmpeg-devel] [PATCH v2 3/5] libswscale: Avx2 hscale can process inputs of any size.

2022-07-13 Thread Alan Kelly

Hi, Are there any further comments on this patch or can it be committed? Thanks, Alan On Tue, Apr 26, 2022 at 10:00 AM Alan Kelly wrote: > The main loop processes blocks of 16 pixels. The tail processes blocks > of size 4. > --- > libswscale/x86/scale_a

Re: [FFmpeg-devel] [PATCH v2 4/5] libswscale: Enable hscale_avx2 for all input sizes.

2022-07-13 Thread Alan Kelly

Pushing this back up to the top. This is required to enable the previous patch in this chain. Thanks On Fri, Apr 22, 2022 at 10:04 AM Alan Kelly wrote: > Ping! > > On Thu, Feb 17, 2022 at 11:04 AM Alan Kelly wrote: > >> ff_shuffle_filter_coefficients shuffles th

[FFmpeg-devel] [PATCH v2 4/5] libswscale: Enable hscale_avx2 for all input sizes.

2022-07-15 Thread Alan Kelly

ff_shuffle_filter_coefficients shuffles the tail as required. --- libswscale/utils.c| 19 --- libswscale/x86/swscale.c | 6 ++ tests/checkasm/sw_scale.c | 2 +- 3 files changed, 19 insertions(+), 8 deletions(-) diff --git a/libswscale/utils.c b/libswscale/utils.c in

[FFmpeg-devel] [PATCH v2 5/5] checkasm/sw_scale: hscale does not requires cpuflag test.

2022-07-15 Thread Alan Kelly

This is done in ff_shuffle_filter_coefficients. --- tests/checkasm/sw_scale.c | 5 + 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/tests/checkasm/sw_scale.c b/tests/checkasm/sw_scale.c index 798990a6cf..7be107bef1 100644 --- a/tests/checkasm/sw_scale.c +++ b/tests/checkasm/sw_sc

Re: [FFmpeg-devel] [PATCH v2 4/5] libswscale: Enable hscale_avx2 for all input sizes.

2022-07-15 Thread Alan Kelly

Hi Michael, Thanks for looking at this. I fixed the test issue. Alan On Fri, Jul 15, 2022 at 4:59 PM Alan Kelly wrote: > ff_shuffle_filter_coefficients shuffles the tail as required. > --- > libswscale/utils.c| 19 --- > libswscale/x86/swscale.c | 6 ++-

Re: [FFmpeg-devel] [PATCH v2 4/5] libswscale: Enable hscale_avx2 for all input sizes.

2022-07-18 Thread Alan Kelly

Sat, Jul 16, 2022 at 1:14 PM Michael Niedermayer wrote: > On Fri, Jul 15, 2022 at 05:03:56PM +0200, Alan Kelly wrote: > > Hi Michael, > > > > Thanks for looking at this. I fixed the test issue. > > seems to be still failing here: > make distclean ; ./configure &am

Re: [FFmpeg-devel] [PATCH v2 4/5] libswscale: Enable hscale_avx2 for all input sizes.

2022-08-15 Thread Alan Kelly

Hi Michael, Is there anything blocking this change being applied? Is there anything I can do to help? Thanks, Alan On Mon, Jul 18, 2022 at 6:49 PM Michael Niedermayer wrote: > On Mon, Jul 18, 2022 at 09:54:39AM +0200, Alan Kelly wrote: > > Hi Michael, > > > > I have t

[FFmpeg-devel] [PATCH] sws: Replace call to yuv2yuvX_mmx by yuv2yuvX_mmxext

2022-08-17 Thread Alan Kelly

--- libswscale/x86/swscale.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/libswscale/x86/swscale.c b/libswscale/x86/swscale.c index 32d441245d..881a4b7798 100644 --- a/libswscale/x86/swscale.c +++ b/libswscale/x86/swscale.c @@ -211,7 +211,7 @@ static void yuv2yuvX_ ##opt(con

[FFmpeg-devel] [PATCH] sws: Replace call to yuv2yuvX_mmx by yuv2yuvX_mmxext

2022-08-17 Thread Alan Kelly

--- Call yuv2yuvX_mmxext on line 208 also. libswscale/x86/swscale.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/libswscale/x86/swscale.c b/libswscale/x86/swscale.c index 32d441245d..e0f90d5c58 100644 --- a/libswscale/x86/swscale.c +++ b/libswscale/x86/swscale.c @@ -205

[FFmpeg-devel] [PATCH] sws: Replace call to yuv2yuvX_mmx by yuv2yuvX_mmxext

2022-08-17 Thread Alan Kelly

--- Remove yuv2yuvX_mmx as it is no longer used. libswscale/x86/swscale.c | 7 ++- 1 file changed, 2 insertions(+), 5 deletions(-) diff --git a/libswscale/x86/swscale.c b/libswscale/x86/swscale.c index 32d441245d..89ef9f5d2b 100644 --- a/libswscale/x86/swscale.c +++ b/libswscale/x86/swscale.

Re: [FFmpeg-devel] [PATCH v2] checkasm: sw_scale: Produce more realistic test filter coefficients for yuv2yuvX

2022-08-18 Thread Alan Kelly

Thanks Martin for doing this. On Thu, Aug 18, 2022 at 10:16 AM Martin Storsjö wrote: > This avoids triggering overflows in the filters, and avoids stray > test failures in the approximate functions on x86; due to rounding > differences, one implementation might overflow while another one > doesn

[FFmpeg-devel] [PATCH] sws: Don't compile yuv2yuvX for mmx

2022-08-19 Thread Alan Kelly

--- libswscale/x86/yuv2yuvX.asm | 2 -- 1 file changed, 2 deletions(-) diff --git a/libswscale/x86/yuv2yuvX.asm b/libswscale/x86/yuv2yuvX.asm index b6294cb919..d5b03495fd 100644 --- a/libswscale/x86/yuv2yuvX.asm +++ b/libswscale/x86/yuv2yuvX.asm @@ -124,8 +124,6 @@ cglobal yuv2yuvX, 7, 7, 8, filt

1 2 >

1 - 100 of 109 matches

Mail list logo