Ping!
On Thu, Jan 14, 2021 at 3:47 PM Alan Kelly wrote:
> ---
> Replaces cpuflag(mmx) with notcpuflag(sse3) for store macro
> Tests for multiple sizes in checkasm-sw_scale
> checkasm-sw_scale aligns memory on 8 bytes instad of 32 to catch aligned
> loads
> libsw
Looks like there are no comments, is this OK to be applied? Thanks
On Tue, Feb 9, 2021 at 6:25 PM Paul B Mahol wrote:
> Will apply in no comments.
> ___
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-
Initialises each item in src and filter arrays to fix valgrind
uninitialised value warning.
---
tests/checkasm/sw_scale.c | 6 --
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/tests/checkasm/sw_scale.c b/tests/checkasm/sw_scale.c
index 7504f8b45f..a4866723d7 100644
--- a/tests/
Checks av_mallocs
---
tests/checkasm/sw_scale.c | 4
1 file changed, 4 insertions(+)
diff --git a/tests/checkasm/sw_scale.c b/tests/checkasm/sw_scale.c
index a4866723d7..ef414c0a82 100644
--- a/tests/checkasm/sw_scale.c
+++ b/tests/checkasm/sw_scale.c
@@ -103,7 +103,11 @@ static void check_y
b94cd55155d8c061f1e1faca9076afe540149c27 as the problematic
commit.
On Thu, Feb 18, 2021 at 11:23 PM James Almer wrote:
> On 2/17/2021 5:24 PM, Paul B Mahol wrote:
> > On Tue, Feb 16, 2021 at 6:31 PM Alan Kelly <
> > alankelly-at-google@ffmpeg.org> wrote:
> >
> >> Looks like there are n
Initialises each item in src and filter arrays to fix valgrind
uninitialised value warning.
---
casts pointers to uint8_t* and multiplies the buffer size by sizeof(uint16_t).
tests/checkasm/sw_scale.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tests/checkasm/sw_scale.
---
libswscale/x86/swscale.c | 6 --
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/libswscale/x86/swscale.c b/libswscale/x86/swscale.c
index 1e865914cb..71961a9ae0 100644
--- a/libswscale/x86/swscale.c
+++ b/libswscale/x86/swscale.c
@@ -206,7 +206,8 @@ static void yuv2yuvX_ ##o
---
tests/checkasm/sw_scale.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/tests/checkasm/sw_scale.c b/tests/checkasm/sw_scale.c
index a10118704b..3ac0f9082f 100644
--- a/tests/checkasm/sw_scale.c
+++ b/tests/checkasm/sw_scale.c
@@ -68,8 +68,8 @@ static void check_yuv2
---
This is so that tails of size 8 may safely be processed
libswscale/x86/yuv2yuvX.asm | 14 +-
1 file changed, 13 insertions(+), 1 deletion(-)
diff --git a/libswscale/x86/yuv2yuvX.asm b/libswscale/x86/yuv2yuvX.asm
index 521880dabe..b6294cb919 100644
--- a/libswscale/x86/yuv2yuvX.as
---
libswscale/x86/swscale.c | 6 --
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/libswscale/x86/swscale.c b/libswscale/x86/swscale.c
index cc9e8b0155..0848a31461 100644
--- a/libswscale/x86/swscale.c
+++ b/libswscale/x86/swscale.c
@@ -197,7 +197,8 @@ static void yuv2yuvX_ ##o
---
tests/checkasm/sw_scale.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/tests/checkasm/sw_scale.c b/tests/checkasm/sw_scale.c
index a10118704b..3ac0f9082f 100644
--- a/tests/checkasm/sw_scale.c
+++ b/tests/checkasm/sw_scale.c
@@ -68,8 +68,8 @@ static void check_yuv2
---
This is so that inputs of size 8 are supported, as was the case with
the original implementation. A bug was found with inputs not divisible
by 16.
libswscale/x86/yuv2yuvX.asm | 14 +-
1 file changed, 13 insertions(+), 1 deletion(-)
diff --git a/libswscale/x86/yuv2yuvX.asm b/lib
Broadwell and later have fast gather instructions.
---
This is so that the avx2 version of ff_hscale8to15X which uses gather
instructions is only selected on machines where it will actually be
faster.
libavutil/cpu.c | 6 ++
libavutil/cpu.h | 6 ++
libavutil/cpu_inte
These functions replace all ff_hscale8to15_*_ssse3 when avx2 is available.
---
libswscale/swscale_internal.h | 2 +
libswscale/utils.c| 37 +++
libswscale/x86/Makefile | 1 +
libswscale/x86/scale_avx2.asm | 112 ++
libswscale/x86/swsca
je wrote:
> > Hi Alan,
> >
> > On Mon, Jun 14, 2021 at 7:20 AM Alan Kelly <
> > alankelly-at-google@ffmpeg.org> wrote:
> >
> >> Broadwell and later have fast gather instructions.
> >> ---
> >> This is so that the avx2 version of ff
Broadwell and later and Zen3 and later have fast gather instructions.
---
Gather requires between 9 and 12 cycles on Haswell, 5 to 7 on Broadwell,
and 2 to 5 on Skylake and newer. It is also slow on AMD before Zen 3.
libavutil/cpu.h | 2 ++
libavutil/x86/cpu.c | 18 --
libav
These functions replace all ff_hscale8to15_*_ssse3 when avx2 is available.
---
libswscale/swscale_internal.h | 2 +
libswscale/utils.c| 37 +++
libswscale/x86/Makefile | 1 +
libswscale/x86/scale_avx2.asm | 112 ++
libswscale/x86/swsca
On Fri, Jun 25, 2021 at 10:40 AM Lynne wrote:
> Jun 25, 2021, 09:54 by alankelly-at-google@ffmpeg.org:
>
> > Broadwell and later and Zen3 and later have fast gather instructions.
> > ---
> > Gather requires between 9 and 12 cycles on Haswell, 5 to 7 on Broadwell,
> > and 2 to 5 on Skylake a
On Fri, Jun 25, 2021 at 1:26 PM Ronald S. Bultje wrote:
> Hi Alan,
>
> On Fri, Jun 25, 2021 at 3:59 AM Alan Kelly <
> alankelly-at-google@ffmpeg.org> wrote:
>
>> These functions replace all ff_hscale8to15_*_ssse3 when avx2 is available.
>>
>
> Re-asking
On Fri, Jun 25, 2021 at 1:24 PM Alan Kelly wrote:
> On Fri, Jun 25, 2021 at 10:40 AM Lynne wrote:
>
>> Jun 25, 2021, 09:54 by alankelly-at-google@ffmpeg.org:
>>
>> > Broadwell and later and Zen3 and later have fast gather instructions.
>> > ---
>>
Broadwell and later and Zen3 and later have fast gather instructions.
---
Haswell is now excluded from EXTERNAL_AVX2_FAST as discussed in the
email thread.
libavutil/cpu.h | 1 +
libavutil/x86/cpu.c | 11 ++-
2 files changed, 11 insertions(+), 1 deletion(-)
diff --git a/libavutil/c
These functions replace all ff_hscale8to15_*_ssse3 when avx2 is available.
---
EXTERNAL_AVX2_FAST is now used instead of EXTERNAL_AVX2_FAST_GATHER as
discussed in the email thread for part 1 of this patch.
Benchmark results on Skylake and Haswell:
Skylake Haswell
h
On Fri, Jul 16, 2021 at 4:02 PM James Almer wrote:
> On 7/16/2021 10:44 AM, Alan Kelly wrote:
> > Broadwell and later and Zen3 and later have fast gather instructions.
> > ---
> > Haswell is now excluded from EXTERNAL_AVX2_FAST as discussed in the
> > email thre
On Fri, Jul 16, 2021 at 3:48 PM Alan Kelly wrote:
> These functions replace all ff_hscale8to15_*_ssse3 when avx2 is available.
> ---
> EXTERNAL_AVX2_FAST is now used instead of EXTERNAL_AVX2_FAST_GATHER as
> discussed in the email thread for part 1 of this patch.
>
> Benchmark
On Wed, Jul 21, 2021 at 11:11 AM Alan Kelly wrote:
>
>
> On Fri, Jul 16, 2021 at 3:48 PM Alan Kelly wrote:
>
>> These functions replace all ff_hscale8to15_*_ssse3 when avx2 is available.
>> ---
>> EXTERNAL_AVX2_FAST is now used instead of EXTERNAL_AVX2_FAST_GATHE
---
libswscale/x86/swscale.c | 138 ---
1 file changed, 72 insertions(+), 66 deletions(-)
diff --git a/libswscale/x86/swscale.c b/libswscale/x86/swscale.c
index 3160fedf04..e47fee2bbd 100644
--- a/libswscale/x86/swscale.c
+++ b/libswscale/x86/swscale.c
@@ -201,
Other functions to be ported to avx2 have been identified and are on
the todo list.
---
libswscale/x86/Makefile | 1 +
libswscale/x86/swscale.c| 72 +++--
libswscale/x86/yuv2yuvX.asm | 105
3 files changed, 112 insertions(+), 66 d
Fixed. The wrong step size was used causing a write passed the end of
the buffer. yuv2yuvX_mmxext is now called if there are any remaining pixels.
---
libswscale/x86/Makefile | 1 +
libswscale/x86/swscale.c| 75 --
libswscale/x86/yuv2yuvX.asm | 105
pmulhw m5, m0, [srcq + offsetq * 2 + 3 * mmsize]
+ paddw m6, m6, m2
+paddwm1, m1, m5
+add rsiq, $10
+mov srcq, [rsiq]
+test srcd, srcd
+jnz
---
libswscale/x86/Makefile | 1 +
libswscale/x86/swscale.c| 75 --
libswscale/x86/yuv2yuvX.asm | 105
3 files changed, 116 insertions(+), 65 deletions(-)
create mode 100644 libswscale/x86/yuv2yuvX.asm
diff --git a/libswscal
Thanks for the review, I have made the required changes. As I have changed
the subject the patch is in a new thread.
On Fri, Oct 23, 2020 at 4:10 PM James Almer wrote:
> On 10/23/2020 10:17 AM, Alan Kelly wrote:
> > Fixed. The wrong step size was used causing a write passed the end of
probably due to cpu
frequency scaling.
checkasm will follow in a separate patch.
On Tue, Oct 27, 2020 at 9:56 AM Alan Kelly wrote:
> ---
> libswscale/x86/Makefile | 1 +
> libswscale/x86/swscale.c| 75 --
> libswscale/x86/yuv2yu
---
libswscale/x86/Makefile | 1 +
libswscale/x86/swscale.c| 75 -
libswscale/x86/yuv2yuvX.asm | 109
3 files changed, 120 insertions(+), 65 deletions(-)
create mode 100644 libswscale/x86/yuv2yuvX.asm
diff --git a/libswscale
Thanks for the feedback Anton.
The second patch incorporates changes suggested by James Almer:
avx2 instructions are wrapped in if cpuflag(avx2) and movddup restored
mm1 is replaced by m1 on x86_32
On Tue, Oct 27, 2020 at 10:40 AM Anton Khirnov wrote:
> Hi,
> Quoting Alan Kelly (2020
? Thank you.
On Sat, Oct 31, 2020 at 1:02 PM Carl Eugen Hoyos wrote:
> Am Di., 27. Okt. 2020 um 09:56 Uhr schrieb Alan Kelly
> :
>
> > --- /dev/null
> > +++ b/libswscale/x86/yuv2yuvX.a
---
yuv2yuvX.asm: Ports yuv2yuvX to asm, unrolls main loop and adds
other small optimizations for ~20% speed-up. Copyright updated to
include the original from swscale.c
swscale.c: Removes yuv2yuvX_sse3 and calls new function ff_yuv2yuvX_sse3.
Calls yuv2yuvX_mmxext on remainining elements if r
---
It now works on x86-32
libswscale/x86/Makefile | 1 +
libswscale/x86/swscale.c| 75
libswscale/x86/yuv2yuvX.asm | 110
3 files changed, 121 insertions(+), 65 deletions(-)
create mode 100644 libswscale/x86/yuv2yuvX.asm
---
Fixes bug in sse3 path where m1 is not set correctly resulting in off
by one errors. The results are now bit by bit identical.
libswscale/x86/Makefile | 1 +
libswscale/x86/swscale.c| 75
libswscale/x86/yuv2yuvX.asm | 114 ++
---
All of Henrik's suggestions have been implemented. Additionally,
m3 and m6 are permuted in avx2 before storing to ensure bit by bit
identical results in avx2.
libswscale/x86/Makefile | 1 +
libswscale/x86/swscale.c| 75 +++
libswscale/x86/yuv2yuvX.asm | 118 ++
Ping
On Thu, Nov 19, 2020 at 9:42 AM Alan Kelly wrote:
> ---
> All of Henrik's suggestions have been implemented. Additionally,
> m3 and m6 are permuted in avx2 before storing to ensure bit by bit
> identical results in avx2.
> libswscale/x86/Makefile | 1 +
> l
---
Activates avx2 version of yuv2yuvX
Adds checkasm for yuv2yuvX
Modifies ff_yuv2yuvX_* signature to match yuv2yuvX_*
Replaces non-temporal stores with temporal stores
libswscale/x86/Makefile | 1 +
libswscale/x86/swscale.c| 106 +---
libswscale/x86/yuv2y
good reason. If you think it better to use NT stores, I
will replace them.
On Fri, Dec 4, 2020 at 2:00 PM Anton Khirnov wrote:
> Quoting Alan Kelly (2020-11-19 09:41:56)
> > ---
> > All of Henrik's suggestions have been implemented. Additionally,
> > m3 and m6 are per
---
Replaces ff_sws_init_swscale_x86 with ff_getSwsFunc
Load offset if not gprsize but 8 on both 32 and 64 bit
Removes sfence as NT store no longer used
libswscale/x86/Makefile | 1 +
libswscale/x86/swscale.c| 106 +---
libswscale/x86/yuv2yuvX.asm | 117 +++
---
Fixes memory alignment problem in checkasm-sw_scale
Tested on Linux 32 and 64 bit and mingw32
libswscale/x86/Makefile | 1 +
libswscale/x86/swscale.c| 106 +---
libswscale/x86/yuv2yuvX.asm | 117
tests/checkasm/sw_sca
Ping!
On Thu, Dec 17, 2020 at 11:42 AM Alan Kelly wrote:
> ---
> Fixes memory alignment problem in checkasm-sw_scale
> Tested on Linux 32 and 64 bit and mingw32
> libswscale/x86/Makefile | 1 +
> libswscale/x86/swscale.c| 106 +---
&g
---
Replaces mova with movdqu due to alignment issues
libswscale/x86/Makefile | 1 +
libswscale/x86/swscale.c| 106 +---
libswscale/x86/yuv2yuvX.asm | 117
tests/checkasm/sw_scale.c | 98 ++
Thanks for your patience with this, I have replaced mova with movdqu - movu
generated a compile error on ssse3. What system did this crash on?
On Wed, Jan 6, 2021 at 9:10 PM Michael Niedermayer
wrote:
> On Tue, Jan 05, 2021 at 01:31:25PM +0100, Alan Kelly wrote:
> > Ping!
>
>
on a
solution.
On Sun, Jan 10, 2021 at 4:26 PM Michael Niedermayer
wrote:
> On Thu, Jan 07, 2021 at 10:41:19AM +0100, Alan Kelly wrote:
> > ---
> > Replaces mova with movdqu due to alignment issues
> > libswscale/x86/Makefile | 1 +
> > l
---
Fixes a bug where if there is no offset and a tail which is not processed by
the
sse3/avx2 version the dither is modified
Deletes mmx/mmxext yuv2yuvX version from swscale_template and adds it
to yuv2yuvX.asm to reduce code duplication and so that it may be used
to process the tail from th
32 so that
the test catches problems with alignment.
On Thu, Jan 14, 2021 at 1:11 AM Michael Niedermayer
wrote:
> On Mon, Jan 11, 2021 at 05:46:31PM +0100, Alan Kelly wrote:
> > ---
> > Fixes a bug where if there is no offset and a tail which is not
> processed by the
> >
---
Replaces cpuflag(mmx) with notcpuflag(sse3) for store macro
Tests for multiple sizes in checkasm-sw_scale
checkasm-sw_scale aligns memory on 8 bytes instad of 32 to catch aligned loads
libswscale/x86/Makefile | 1 +
libswscale/x86/swscale.c | 130 ---
Patch has been rebased from latest commits.
These functions replace all ff_hscale8to15_*_ssse3 when avx2 is available.
---
libswscale/swscale_internal.h | 2 +
libswscale/utils.c| 37 +++
libswscale/x86/Makefile | 1 +
libswscale/x86/scale_avx2.asm | 112
On Tue, Dec 14, 2021 at 6:07 PM James Almer wrote:
> On 12/14/2021 12:23 PM, Alan Kelly wrote:
> > Patch has been rebased from latest commits.
> > These functions replace all ff_hscale8to15_*_ssse3 when avx2 is
> available.
> > ---
> > libswscale/swscale_inter
Fixes so that fate under 64 bit Windows passes.
These functions replace all ff_hscale8to15_*_ssse3 when avx2 is available.
---
libswscale/swscale_internal.h | 2 +
libswscale/utils.c| 37 +++
libswscale/x86/Makefile | 1 +
libswscale/x86/scale_avx2.asm | 112 +++
---
libswscale/x86/swscale.c | 14 +++---
tests/checkasm/sw_scale.c | 3 +--
2 files changed, 8 insertions(+), 9 deletions(-)
diff --git a/libswscale/x86/swscale.c b/libswscale/x86/swscale.c
index 164b06d6ba..c49a05c37b 100644
--- a/libswscale/x86/swscale.c
+++ b/libswscale/x86/swscale.
Thanks Lynne for the patch.
On Thu, Dec 16, 2021 at 5:05 PM Alan Kelly wrote:
> ---
> libswscale/x86/swscale.c | 14 +++---
> tests/checkasm/sw_scale.c | 3 +--
> 2 files changed, 8 insertions(+), 9 deletions(-)
>
> diff --git a/libswscale/x86/swscale.c b/libsws
---
libswscale/x86/scale_avx2.asm | 96 +--
1 file changed, 48 insertions(+), 48 deletions(-)
diff --git a/libswscale/x86/scale_avx2.asm b/libswscale/x86/scale_avx2.asm
index 2cd7e968d3..eb472db12f 100644
--- a/libswscale/x86/scale_avx2.asm
+++ b/libswscale/x86/sca
This flag is set on Haswell and earlier and all AMD cpus.
---
As discussed on IRC last week.
libavutil/cpu.h | 57 +++--
libavutil/x86/cpu.c | 13 ++-
2 files changed, 41 insertions(+), 29 deletions(-)
diff --git a/libavutil/cpu.h b/libavutil/c
This is instead of EXTERNAL_AVX2_FAST so that the avx2 hscale functions
are only used where they are faster.
---
libswscale/utils.c| 2 +-
libswscale/x86/swscale.c | 2 +-
tests/checkasm/sw_scale.c | 2 +-
3 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/libswscale/utils.c b
This flag is set on Haswell and earlier and all AMD cpus.
---
Removes unnecessary indentation, clarifies comment and only sets flag on AMD
cpus with AVX2.
libavutil/cpu.h | 1 +
libavutil/x86/cpu.c | 14 +-
2 files changed, 14 insertions(+), 1 deletion(-)
diff --git a/libavutil
This is instead of EXTERNAL_AVX2_FAST so that the avx2 hscale functions
are only used where they are faster.
---
Whoops! Corrects check so that this flag is only enabled where fast
avx2 and fast gathers are available.
libswscale/utils.c| 2 +-
libswscale/x86/swscale.c | 2 +-
tests/chec
This flag is set on Haswell and earlier and all AMD cpus.
---
Sets this flag on Zen 3 and earlier.
libavutil/cpu.h | 1 +
libavutil/x86/cpu.c | 14 +-
2 files changed, 14 insertions(+), 1 deletion(-)
diff --git a/libavutil/cpu.h b/libavutil/cpu.h
index ae443eccad..ce9bf14bf7 100
On Mon, Dec 20, 2021 at 3:53 PM James Almer wrote:
>
>
> On 12/20/2021 11:47 AM, Lynne wrote:
> > 20 Dec 2021, 15:43 by alankelly-at-google@ffmpeg.org:
> >
> >> This flag is set on Haswell and earlier and all AMD cpus.
> >> ---
> >> Removes unnecessary indentation, clarifies comment and onl
This flag is set on Haswell and earlier and all AMD cpus.
---
Checks for family for Haswell. All checks are done where AVX2 flag is
set as this is clearer.
libavutil/cpu.h | 1 +
libavutil/x86/cpu.c | 15 ++-
2 files changed, 15 insertions(+), 1 deletion(-)
diff --git a/libavut
Make the code more readable, follow the style guide and propagate memory
allocation errors.
---
libswscale/swscale_internal.h | 2 +-
libswscale/utils.c| 68 ---
2 files changed, 40 insertions(+), 30 deletions(-)
diff --git a/libswscale/swscale_interna
The main loop processes blocks of 16 pixels. The tail processes blocks
of size 4.
---
libswscale/x86/scale_avx2.asm | 48 +--
1 file changed, 46 insertions(+), 2 deletions(-)
diff --git a/libswscale/x86/scale_avx2.asm b/libswscale/x86/scale_avx2.asm
index 20acdbd63
ff_shuffle_filter_coefficients shuffles the tail as required.
---
libswscale/utils.c | 17 +++--
libswscale/x86/swscale.c | 4 ++--
2 files changed, 17 insertions(+), 4 deletions(-)
diff --git a/libswscale/utils.c b/libswscale/utils.c
index 52f07e1661..7e1e9c3834 100644
--- a/l
This is done in ff_shuffle_filter_coefficients.
---
tests/checkasm/sw_scale.c | 6 ++
1 file changed, 2 insertions(+), 4 deletions(-)
diff --git a/tests/checkasm/sw_scale.c b/tests/checkasm/sw_scale.c
index 3c0a083b42..e7f916d3a8 100644
--- a/tests/checkasm/sw_scale.c
+++ b/tests/checkasm/sw_
Hi,
Is anybody interested in this patch set?
Thanks!
On Mon, Jan 10, 2022, 15:58 Alan Kelly wrote:
> Make the code more readable, follow the style guide and propagate memory
> allocation errors.
> ---
> libswscale/swscale_internal.h | 2 +-
> libswscale/utils.c
Make the code more readable and follow the style guide.
---
libswscale/utils.c | 64 +++---
1 file changed, 37 insertions(+), 27 deletions(-)
diff --git a/libswscale/utils.c b/libswscale/utils.c
index c5ea8853d5..1d919e863a 100644
--- a/libswscale/utils.c
+
The main loop processes blocks of 16 pixels. The tail processes blocks
of size 4.
---
libswscale/x86/scale_avx2.asm | 48 +--
1 file changed, 46 insertions(+), 2 deletions(-)
diff --git a/libswscale/x86/scale_avx2.asm b/libswscale/x86/scale_avx2.asm
index 20acdbd63
ff_shuffle_filter_coefficients shuffles the tail as required.
---
libswscale/utils.c | 19 ---
libswscale/x86/swscale.c | 6 ++
2 files changed, 18 insertions(+), 7 deletions(-)
diff --git a/libswscale/utils.c b/libswscale/utils.c
index 1d919e863a..31c365fcee 100644
---
---
libswscale/swscale_internal.h | 2 +-
libswscale/utils.c| 14 --
2 files changed, 9 insertions(+), 7 deletions(-)
diff --git a/libswscale/swscale_internal.h b/libswscale/swscale_internal.h
index 3a78d95ba6..26d28d42e6 100644
--- a/libswscale/swscale_internal.h
+++ b/l
This is done in ff_shuffle_filter_coefficients.
---
tests/checkasm/sw_scale.c | 6 ++
1 file changed, 2 insertions(+), 4 deletions(-)
diff --git a/tests/checkasm/sw_scale.c b/tests/checkasm/sw_scale.c
index 3c0a083b42..e7f916d3a8 100644
--- a/tests/checkasm/sw_scale.c
+++ b/tests/checkasm/sw_
:11 PM Michael Niedermayer
wrote:
> On Mon, Jan 10, 2022 at 03:58:33PM +0100, Alan Kelly wrote:
> > Make the code more readable, follow the style guide and propagate memory
> > allocation errors.
>
> Cosmetics and bugfixes should not be in the same patch
>
&
---
libswscale/swscale_internal.h | 2 +-
libswscale/utils.c| 11 ---
2 files changed, 9 insertions(+), 4 deletions(-)
diff --git a/libswscale/swscale_internal.h b/libswscale/swscale_internal.h
index 3a78d95ba6..26d28d42e6 100644
--- a/libswscale/swscale_internal.h
+++ b/libs
Make the code more readable and follow the style guide.
---
libswscale/utils.c | 66 +-
1 file changed, 36 insertions(+), 30 deletions(-)
diff --git a/libswscale/utils.c b/libswscale/utils.c
index 344c87dfdf..7c8e1bbdde 100644
--- a/libswscale/utils.c
+
The main loop processes blocks of 16 pixels. The tail processes blocks
of size 4.
---
libswscale/x86/scale_avx2.asm | 48 +--
1 file changed, 46 insertions(+), 2 deletions(-)
diff --git a/libswscale/x86/scale_avx2.asm b/libswscale/x86/scale_avx2.asm
index 20acdbd63
ff_shuffle_filter_coefficients shuffles the tail as required.
---
libswscale/utils.c | 19 ---
libswscale/x86/swscale.c | 6 ++
2 files changed, 18 insertions(+), 7 deletions(-)
diff --git a/libswscale/utils.c b/libswscale/utils.c
index 7c8e1bbdde..d818c9ce55 100644
---
This is done in ff_shuffle_filter_coefficients.
---
tests/checkasm/sw_scale.c | 5 +
1 file changed, 1 insertion(+), 4 deletions(-)
diff --git a/tests/checkasm/sw_scale.c b/tests/checkasm/sw_scale.c
index 3c0a083b42..4c57b6a372 100644
--- a/tests/checkasm/sw_scale.c
+++ b/tests/checkasm/sw_sc
Hi Michael,
Thanks for reviewing the first two parts of this patchset.
Is there anybody interested in reviewing this part?
Thanks,
Alan
On Thu, Feb 17, 2022 at 5:21 PM Michael Niedermayer
wrote:
> On Thu, Feb 17, 2022 at 11:04:04AM +0100, Alan Kelly wrote:
> > The main loop process
---
libswscale/x86/swscale.c | 8
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/libswscale/x86/swscale.c b/libswscale/x86/swscale.c
index ff16398988..8c67bf4fab 100644
--- a/libswscale/x86/swscale.c
+++ b/libswscale/x86/swscale.c
@@ -194,7 +194,7 @@ static void yuv2yuvX_ #
---
libswscale/x86/swscale.c| 7 +++
libswscale/x86/yuv2yuvX.asm | 19 ++-
2 files changed, 25 insertions(+), 1 deletion(-)
diff --git a/libswscale/x86/swscale.c b/libswscale/x86/swscale.c
index 8c67bf4fab..52423a1199 100644
--- a/libswscale/x86/swscale.c
+++ b/libswscale
---
libswscale/x86/swscale.c| 11 ---
libswscale/x86/yuv2yuvX.asm | 12 ++--
2 files changed, 14 insertions(+), 9 deletions(-)
diff --git a/libswscale/x86/swscale.c b/libswscale/x86/swscale.c
index 52423a1199..71434f58d3 100644
--- a/libswscale/x86/swscale.c
+++ b/libswscale/x
Happy to add the check.
Thanks,
Alan
On Fri, Jul 14, 2023 at 4:59 PM James Almer wrote:
> On 7/14/2023 11:57 AM, Kieran Kunhya wrote:
> > On Fri, 14 Jul 2023 at 14:03, James Almer wrote:
> >
> >> On 7/14/2023 9:59 AM, Kieran Kunhya wrote:
> +#if ARCH_X86_64 && HAVE_AVX512_EXTERNAL
>
---
Checks for EXTERNAL_AVX512ICL to prevent downclocking on Skylake
libswscale/x86/swscale.c| 7 +++
libswscale/x86/yuv2yuvX.asm | 19 ++-
2 files changed, 25 insertions(+), 1 deletion(-)
diff --git a/libswscale/x86/swscale.c b/libswscale/x86/swscale.c
index 8c67bf4fab.
On Sat, Jul 15, 2023 at 10:40 PM Michael Niedermayer
wrote:
> On Fri, Jul 14, 2023 at 12:08:46PM +0200, Alan Kelly wrote:
> > ---
> > libswscale/x86/swscale.c| 11 ---
> > libswscale/x86/yuv2yuvX.asm | 12 ++--
> > 2 files changed, 14 insertions(+
---
libswscale/x86/swscale.c| 11 ---
libswscale/x86/yuv2yuvX.asm | 24 ++--
2 files changed, 22 insertions(+), 13 deletions(-)
diff --git a/libswscale/x86/swscale.c b/libswscale/x86/swscale.c
index 600c7d6c91..6980002e9e 100644
--- a/libswscale/x86/swscale.c
+++ b
Hi,
Are there any further comments on this patch or can it be committed?
Thanks,
Alan
On Tue, Apr 26, 2022 at 10:00 AM Alan Kelly wrote:
> The main loop processes blocks of 16 pixels. The tail processes blocks
> of size 4.
> ---
> libswscale/x86/scale_a
Pushing this back up to the top. This is required to enable the previous
patch in this chain. Thanks
On Fri, Apr 22, 2022 at 10:04 AM Alan Kelly wrote:
> Ping!
>
> On Thu, Feb 17, 2022 at 11:04 AM Alan Kelly wrote:
>
>> ff_shuffle_filter_coefficients shuffles th
ff_shuffle_filter_coefficients shuffles the tail as required.
---
libswscale/utils.c| 19 ---
libswscale/x86/swscale.c | 6 ++
tests/checkasm/sw_scale.c | 2 +-
3 files changed, 19 insertions(+), 8 deletions(-)
diff --git a/libswscale/utils.c b/libswscale/utils.c
in
This is done in ff_shuffle_filter_coefficients.
---
tests/checkasm/sw_scale.c | 5 +
1 file changed, 1 insertion(+), 4 deletions(-)
diff --git a/tests/checkasm/sw_scale.c b/tests/checkasm/sw_scale.c
index 798990a6cf..7be107bef1 100644
--- a/tests/checkasm/sw_scale.c
+++ b/tests/checkasm/sw_sc
Hi Michael,
Thanks for looking at this. I fixed the test issue.
Alan
On Fri, Jul 15, 2022 at 4:59 PM Alan Kelly wrote:
> ff_shuffle_filter_coefficients shuffles the tail as required.
> ---
> libswscale/utils.c| 19 ---
> libswscale/x86/swscale.c | 6 ++-
Sat, Jul 16, 2022 at 1:14 PM Michael Niedermayer
wrote:
> On Fri, Jul 15, 2022 at 05:03:56PM +0200, Alan Kelly wrote:
> > Hi Michael,
> >
> > Thanks for looking at this. I fixed the test issue.
>
> seems to be still failing here:
> make distclean ; ./configure &am
Hi Michael,
Is there anything blocking this change being applied? Is there anything I
can do to help?
Thanks,
Alan
On Mon, Jul 18, 2022 at 6:49 PM Michael Niedermayer
wrote:
> On Mon, Jul 18, 2022 at 09:54:39AM +0200, Alan Kelly wrote:
> > Hi Michael,
> >
> > I have t
---
libswscale/x86/swscale.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/libswscale/x86/swscale.c b/libswscale/x86/swscale.c
index 32d441245d..881a4b7798 100644
--- a/libswscale/x86/swscale.c
+++ b/libswscale/x86/swscale.c
@@ -211,7 +211,7 @@ static void yuv2yuvX_ ##opt(con
---
Call yuv2yuvX_mmxext on line 208 also.
libswscale/x86/swscale.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/libswscale/x86/swscale.c b/libswscale/x86/swscale.c
index 32d441245d..e0f90d5c58 100644
--- a/libswscale/x86/swscale.c
+++ b/libswscale/x86/swscale.c
@@ -205
---
Remove yuv2yuvX_mmx as it is no longer used.
libswscale/x86/swscale.c | 7 ++-
1 file changed, 2 insertions(+), 5 deletions(-)
diff --git a/libswscale/x86/swscale.c b/libswscale/x86/swscale.c
index 32d441245d..89ef9f5d2b 100644
--- a/libswscale/x86/swscale.c
+++ b/libswscale/x86/swscale.
Thanks Martin for doing this.
On Thu, Aug 18, 2022 at 10:16 AM Martin Storsjö wrote:
> This avoids triggering overflows in the filters, and avoids stray
> test failures in the approximate functions on x86; due to rounding
> differences, one implementation might overflow while another one
> doesn
---
libswscale/x86/yuv2yuvX.asm | 2 --
1 file changed, 2 deletions(-)
diff --git a/libswscale/x86/yuv2yuvX.asm b/libswscale/x86/yuv2yuvX.asm
index b6294cb919..d5b03495fd 100644
--- a/libswscale/x86/yuv2yuvX.asm
+++ b/libswscale/x86/yuv2yuvX.asm
@@ -124,8 +124,6 @@ cglobal yuv2yuvX, 7, 7, 8, filt
1 - 100 of 109 matches
Mail list logo