On Tue, Sep 5, 2023 at 12:03 AM Michael Niedermayer
wrote:
> On Mon, Sep 04, 2023 at 02:30:00PM +0200, Alan Kelly via ffmpeg-devel
> wrote:
> > Hi,
> >
> > Any issues with this patch or can it be merged?
>
> are all cases covered by tests ?
> if yes and the te
---
libswscale/x86/swscale.c| 19 ---
libswscale/x86/yuv2yuvX.asm | 24 ++--
2 files changed, 26 insertions(+), 17 deletions(-)
diff --git a/libswscale/x86/swscale.c b/libswscale/x86/swscale.c
index 00e42b4bec..6980002e9e 100644
--- a/libswscale/x86/swscale
---
libswscale/x86/swscale.c| 7 +++
libswscale/x86/yuv2yuvX.asm | 19 ++-
2 files changed, 25 insertions(+), 1 deletion(-)
diff --git a/libswscale/x86/swscale.c b/libswscale/x86/swscale.c
index ff16398988..00e42b4bec 100644
--- a/libswscale/x86/swscale.c
+++ b/libswscale
Hi,
Any issues with this patch or can it be merged?
Thanks,
Alan
On Fri, Jul 14, 2023 at 12:08 PM Alan Kelly wrote:
> ---
> libswscale/x86/swscale.c | 8
> 1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/libswscale/x86/swscale.c b/libswscale/x86/sw
---
libswscale/x86/swscale.c| 11 ---
libswscale/x86/yuv2yuvX.asm | 24 ++--
2 files changed, 22 insertions(+), 13 deletions(-)
diff --git a/libswscale/x86/swscale.c b/libswscale/x86/swscale.c
index 600c7d6c91..6980002e9e 100644
--- a/libswscale/x86/swscale.c
+++ b
On Sat, Jul 15, 2023 at 10:40 PM Michael Niedermayer
wrote:
> On Fri, Jul 14, 2023 at 12:08:46PM +0200, Alan Kelly wrote:
> > ---
> > libswscale/x86/swscale.c| 11 ---
> > libswscale/x86/yuv2yuvX.asm | 12 ++--
> > 2 files changed, 14 insertions(+
---
Checks for EXTERNAL_AVX512ICL to prevent downclocking on Skylake
libswscale/x86/swscale.c| 7 +++
libswscale/x86/yuv2yuvX.asm | 19 ++-
2 files changed, 25 insertions(+), 1 deletion(-)
diff --git a/libswscale/x86/swscale.c b/libswscale/x86/swscale.c
index 8c67bf4fab.
Happy to add the check.
Thanks,
Alan
On Fri, Jul 14, 2023 at 4:59 PM James Almer wrote:
> On 7/14/2023 11:57 AM, Kieran Kunhya wrote:
> > On Fri, 14 Jul 2023 at 14:03, James Almer wrote:
> >
> >> On 7/14/2023 9:59 AM, Kieran Kunhya wrote:
> +#if ARCH_X86_64 && HAVE_AVX512_EXTERNAL
>
---
libswscale/x86/swscale.c| 11 ---
libswscale/x86/yuv2yuvX.asm | 12 ++--
2 files changed, 14 insertions(+), 9 deletions(-)
diff --git a/libswscale/x86/swscale.c b/libswscale/x86/swscale.c
index 52423a1199..71434f58d3 100644
--- a/libswscale/x86/swscale.c
+++ b/libswscale/x
---
libswscale/x86/swscale.c| 7 +++
libswscale/x86/yuv2yuvX.asm | 19 ++-
2 files changed, 25 insertions(+), 1 deletion(-)
diff --git a/libswscale/x86/swscale.c b/libswscale/x86/swscale.c
index 8c67bf4fab..52423a1199 100644
--- a/libswscale/x86/swscale.c
+++ b/libswscale
---
libswscale/x86/swscale.c | 8
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/libswscale/x86/swscale.c b/libswscale/x86/swscale.c
index ff16398988..8c67bf4fab 100644
--- a/libswscale/x86/swscale.c
+++ b/libswscale/x86/swscale.c
@@ -194,7 +194,7 @@ static void yuv2yuvX_ #
Thanks for doing this!
On Fri, Aug 19, 2022 at 10:53 AM Andreas Rheinhardt <
andreas.rheinha...@outlook.com> wrote:
> Alan Kelly:
> > ---
> > libswscale/x86/yuv2yuvX.asm | 2 --
> > 1 file changed, 2 deletions(-)
> >
> > diff --git a/libswscale/x86/yuv
---
libswscale/x86/yuv2yuvX.asm | 2 --
1 file changed, 2 deletions(-)
diff --git a/libswscale/x86/yuv2yuvX.asm b/libswscale/x86/yuv2yuvX.asm
index b6294cb919..d5b03495fd 100644
--- a/libswscale/x86/yuv2yuvX.asm
+++ b/libswscale/x86/yuv2yuvX.asm
@@ -124,8 +124,6 @@ cglobal yuv2yuvX, 7, 7, 8, filt
Thanks Martin for doing this.
On Thu, Aug 18, 2022 at 10:16 AM Martin Storsjö wrote:
> This avoids triggering overflows in the filters, and avoids stray
> test failures in the approximate functions on x86; due to rounding
> differences, one implementation might overflow while another one
> doesn
---
Remove yuv2yuvX_mmx as it is no longer used.
libswscale/x86/swscale.c | 7 ++-
1 file changed, 2 insertions(+), 5 deletions(-)
diff --git a/libswscale/x86/swscale.c b/libswscale/x86/swscale.c
index 32d441245d..89ef9f5d2b 100644
--- a/libswscale/x86/swscale.c
+++ b/libswscale/x86/swscale.
---
Call yuv2yuvX_mmxext on line 208 also.
libswscale/x86/swscale.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/libswscale/x86/swscale.c b/libswscale/x86/swscale.c
index 32d441245d..e0f90d5c58 100644
--- a/libswscale/x86/swscale.c
+++ b/libswscale/x86/swscale.c
@@ -205
---
libswscale/x86/swscale.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/libswscale/x86/swscale.c b/libswscale/x86/swscale.c
index 32d441245d..881a4b7798 100644
--- a/libswscale/x86/swscale.c
+++ b/libswscale/x86/swscale.c
@@ -211,7 +211,7 @@ static void yuv2yuvX_ ##opt(con
Hi Michael,
Is there anything blocking this change being applied? Is there anything I
can do to help?
Thanks,
Alan
On Mon, Jul 18, 2022 at 6:49 PM Michael Niedermayer
wrote:
> On Mon, Jul 18, 2022 at 09:54:39AM +0200, Alan Kelly wrote:
> > Hi Michael,
> >
> > I have t
Sat, Jul 16, 2022 at 1:14 PM Michael Niedermayer
wrote:
> On Fri, Jul 15, 2022 at 05:03:56PM +0200, Alan Kelly wrote:
> > Hi Michael,
> >
> > Thanks for looking at this. I fixed the test issue.
>
> seems to be still failing here:
> make distclean ; ./configure &am
Hi Michael,
Thanks for looking at this. I fixed the test issue.
Alan
On Fri, Jul 15, 2022 at 4:59 PM Alan Kelly wrote:
> ff_shuffle_filter_coefficients shuffles the tail as required.
> ---
> libswscale/utils.c| 19 ---
> libswscale/x86/swscale.c | 6 ++-
This is done in ff_shuffle_filter_coefficients.
---
tests/checkasm/sw_scale.c | 5 +
1 file changed, 1 insertion(+), 4 deletions(-)
diff --git a/tests/checkasm/sw_scale.c b/tests/checkasm/sw_scale.c
index 798990a6cf..7be107bef1 100644
--- a/tests/checkasm/sw_scale.c
+++ b/tests/checkasm/sw_sc
ff_shuffle_filter_coefficients shuffles the tail as required.
---
libswscale/utils.c| 19 ---
libswscale/x86/swscale.c | 6 ++
tests/checkasm/sw_scale.c | 2 +-
3 files changed, 19 insertions(+), 8 deletions(-)
diff --git a/libswscale/utils.c b/libswscale/utils.c
in
Pushing this back up to the top. This is required to enable the previous
patch in this chain. Thanks
On Fri, Apr 22, 2022 at 10:04 AM Alan Kelly wrote:
> Ping!
>
> On Thu, Feb 17, 2022 at 11:04 AM Alan Kelly wrote:
>
>> ff_shuffle_filter_coefficients shuffles th
Hi,
Are there any further comments on this patch or can it be committed?
Thanks,
Alan
On Tue, Apr 26, 2022 at 10:00 AM Alan Kelly wrote:
> The main loop processes blocks of 16 pixels. The tail processes blocks
> of size 4.
> ---
> libswscale/x86/scale_a
The main loop processes blocks of 16 pixels. The tail processes blocks
of size 4.
---
libswscale/x86/scale_avx2.asm | 44 ++-
1 file changed, 43 insertions(+), 1 deletion(-)
diff --git a/libswscale/x86/scale_avx2.asm b/libswscale/x86/scale_avx2.asm
index 20acdbd633
filter size of 4 is
processed 35% faster (506 vs 771). Thanks for the tip on countq, one add
has been removed from each loop.
Alan
On Fri, Apr 22, 2022 at 7:43 PM Michael Niedermayer
wrote:
> On Thu, Feb 17, 2022 at 11:04:04AM +0100, Alan Kelly wrote:
> > The main loop processes blo
Ping!
On Thu, Feb 17, 2022 at 11:04 AM Alan Kelly wrote:
> ff_shuffle_filter_coefficients shuffles the tail as required.
> ---
> libswscale/utils.c | 19 ---
> libswscale/x86/swscale.c | 6 ++
> 2 files changed, 18 insertions(+), 7 deletions(-)
&
Hi,
Is anyone interested in this patch? This makes AVX2 hscale work on all
input sizes.
Thanks,
Alan
On Mon, Mar 7, 2022 at 4:27 PM Alan Kelly wrote:
> Hi Michael,
>
> Thanks for reviewing the first two parts of this patchset.
>
> Is there anybody interested in revi
Hi Michael,
Thanks for reviewing the first two parts of this patchset.
Is there anybody interested in reviewing this part?
Thanks,
Alan
On Thu, Feb 17, 2022 at 5:21 PM Michael Niedermayer
wrote:
> On Thu, Feb 17, 2022 at 11:04:04AM +0100, Alan Kelly wrote:
> > The main loop process
This is done in ff_shuffle_filter_coefficients.
---
tests/checkasm/sw_scale.c | 5 +
1 file changed, 1 insertion(+), 4 deletions(-)
diff --git a/tests/checkasm/sw_scale.c b/tests/checkasm/sw_scale.c
index 3c0a083b42..4c57b6a372 100644
--- a/tests/checkasm/sw_scale.c
+++ b/tests/checkasm/sw_sc
ff_shuffle_filter_coefficients shuffles the tail as required.
---
libswscale/utils.c | 19 ---
libswscale/x86/swscale.c | 6 ++
2 files changed, 18 insertions(+), 7 deletions(-)
diff --git a/libswscale/utils.c b/libswscale/utils.c
index 7c8e1bbdde..d818c9ce55 100644
---
The main loop processes blocks of 16 pixels. The tail processes blocks
of size 4.
---
libswscale/x86/scale_avx2.asm | 48 +--
1 file changed, 46 insertions(+), 2 deletions(-)
diff --git a/libswscale/x86/scale_avx2.asm b/libswscale/x86/scale_avx2.asm
index 20acdbd63
Make the code more readable and follow the style guide.
---
libswscale/utils.c | 66 +-
1 file changed, 36 insertions(+), 30 deletions(-)
diff --git a/libswscale/utils.c b/libswscale/utils.c
index 344c87dfdf..7c8e1bbdde 100644
--- a/libswscale/utils.c
+
---
libswscale/swscale_internal.h | 2 +-
libswscale/utils.c| 11 ---
2 files changed, 9 insertions(+), 4 deletions(-)
diff --git a/libswscale/swscale_internal.h b/libswscale/swscale_internal.h
index 3a78d95ba6..26d28d42e6 100644
--- a/libswscale/swscale_internal.h
+++ b/libs
:11 PM Michael Niedermayer
wrote:
> On Mon, Jan 10, 2022 at 03:58:33PM +0100, Alan Kelly wrote:
> > Make the code more readable, follow the style guide and propagate memory
> > allocation errors.
>
> Cosmetics and bugfixes should not be in the same patch
>
&
This is done in ff_shuffle_filter_coefficients.
---
tests/checkasm/sw_scale.c | 6 ++
1 file changed, 2 insertions(+), 4 deletions(-)
diff --git a/tests/checkasm/sw_scale.c b/tests/checkasm/sw_scale.c
index 3c0a083b42..e7f916d3a8 100644
--- a/tests/checkasm/sw_scale.c
+++ b/tests/checkasm/sw_
---
libswscale/swscale_internal.h | 2 +-
libswscale/utils.c| 14 --
2 files changed, 9 insertions(+), 7 deletions(-)
diff --git a/libswscale/swscale_internal.h b/libswscale/swscale_internal.h
index 3a78d95ba6..26d28d42e6 100644
--- a/libswscale/swscale_internal.h
+++ b/l
ff_shuffle_filter_coefficients shuffles the tail as required.
---
libswscale/utils.c | 19 ---
libswscale/x86/swscale.c | 6 ++
2 files changed, 18 insertions(+), 7 deletions(-)
diff --git a/libswscale/utils.c b/libswscale/utils.c
index 1d919e863a..31c365fcee 100644
---
The main loop processes blocks of 16 pixels. The tail processes blocks
of size 4.
---
libswscale/x86/scale_avx2.asm | 48 +--
1 file changed, 46 insertions(+), 2 deletions(-)
diff --git a/libswscale/x86/scale_avx2.asm b/libswscale/x86/scale_avx2.asm
index 20acdbd63
Make the code more readable and follow the style guide.
---
libswscale/utils.c | 64 +++---
1 file changed, 37 insertions(+), 27 deletions(-)
diff --git a/libswscale/utils.c b/libswscale/utils.c
index c5ea8853d5..1d919e863a 100644
--- a/libswscale/utils.c
+
Hi,
Is anybody interested in this patch set?
Thanks!
On Mon, Jan 10, 2022, 15:58 Alan Kelly wrote:
> Make the code more readable, follow the style guide and propagate memory
> allocation errors.
> ---
> libswscale/swscale_internal.h | 2 +-
> libswscale/utils.c
This is done in ff_shuffle_filter_coefficients.
---
tests/checkasm/sw_scale.c | 6 ++
1 file changed, 2 insertions(+), 4 deletions(-)
diff --git a/tests/checkasm/sw_scale.c b/tests/checkasm/sw_scale.c
index 3c0a083b42..e7f916d3a8 100644
--- a/tests/checkasm/sw_scale.c
+++ b/tests/checkasm/sw_
ff_shuffle_filter_coefficients shuffles the tail as required.
---
libswscale/utils.c | 17 +++--
libswscale/x86/swscale.c | 4 ++--
2 files changed, 17 insertions(+), 4 deletions(-)
diff --git a/libswscale/utils.c b/libswscale/utils.c
index 52f07e1661..7e1e9c3834 100644
--- a/l
The main loop processes blocks of 16 pixels. The tail processes blocks
of size 4.
---
libswscale/x86/scale_avx2.asm | 48 +--
1 file changed, 46 insertions(+), 2 deletions(-)
diff --git a/libswscale/x86/scale_avx2.asm b/libswscale/x86/scale_avx2.asm
index 20acdbd63
Make the code more readable, follow the style guide and propagate memory
allocation errors.
---
libswscale/swscale_internal.h | 2 +-
libswscale/utils.c| 68 ---
2 files changed, 40 insertions(+), 30 deletions(-)
diff --git a/libswscale/swscale_interna
This flag is set on Haswell and earlier and all AMD cpus.
---
Checks for family for Haswell. All checks are done where AVX2 flag is
set as this is clearer.
libavutil/cpu.h | 1 +
libavutil/x86/cpu.c | 15 ++-
2 files changed, 15 insertions(+), 1 deletion(-)
diff --git a/libavut
On Mon, Dec 20, 2021 at 3:53 PM James Almer wrote:
>
>
> On 12/20/2021 11:47 AM, Lynne wrote:
> > 20 Dec 2021, 15:43 by alankelly-at-google@ffmpeg.org:
> >
> >> This flag is set on Haswell and earlier and all AMD cpus.
> >> ---
> >> Removes unnecessary indentation, clarifies comment and onl
This flag is set on Haswell and earlier and all AMD cpus.
---
Sets this flag on Zen 3 and earlier.
libavutil/cpu.h | 1 +
libavutil/x86/cpu.c | 14 +-
2 files changed, 14 insertions(+), 1 deletion(-)
diff --git a/libavutil/cpu.h b/libavutil/cpu.h
index ae443eccad..ce9bf14bf7 100
This is instead of EXTERNAL_AVX2_FAST so that the avx2 hscale functions
are only used where they are faster.
---
Whoops! Corrects check so that this flag is only enabled where fast
avx2 and fast gathers are available.
libswscale/utils.c| 2 +-
libswscale/x86/swscale.c | 2 +-
tests/chec
This flag is set on Haswell and earlier and all AMD cpus.
---
Removes unnecessary indentation, clarifies comment and only sets flag on AMD
cpus with AVX2.
libavutil/cpu.h | 1 +
libavutil/x86/cpu.c | 14 +-
2 files changed, 14 insertions(+), 1 deletion(-)
diff --git a/libavutil
This is instead of EXTERNAL_AVX2_FAST so that the avx2 hscale functions
are only used where they are faster.
---
libswscale/utils.c| 2 +-
libswscale/x86/swscale.c | 2 +-
tests/checkasm/sw_scale.c | 2 +-
3 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/libswscale/utils.c b
This flag is set on Haswell and earlier and all AMD cpus.
---
As discussed on IRC last week.
libavutil/cpu.h | 57 +++--
libavutil/x86/cpu.c | 13 ++-
2 files changed, 41 insertions(+), 29 deletions(-)
diff --git a/libavutil/cpu.h b/libavutil/c
---
libswscale/x86/scale_avx2.asm | 96 +--
1 file changed, 48 insertions(+), 48 deletions(-)
diff --git a/libswscale/x86/scale_avx2.asm b/libswscale/x86/scale_avx2.asm
index 2cd7e968d3..eb472db12f 100644
--- a/libswscale/x86/scale_avx2.asm
+++ b/libswscale/x86/sca
Thanks Lynne for the patch.
On Thu, Dec 16, 2021 at 5:05 PM Alan Kelly wrote:
> ---
> libswscale/x86/swscale.c | 14 +++---
> tests/checkasm/sw_scale.c | 3 +--
> 2 files changed, 8 insertions(+), 9 deletions(-)
>
> diff --git a/libswscale/x86/swscale.c b/libsws
---
libswscale/x86/swscale.c | 14 +++---
tests/checkasm/sw_scale.c | 3 +--
2 files changed, 8 insertions(+), 9 deletions(-)
diff --git a/libswscale/x86/swscale.c b/libswscale/x86/swscale.c
index 164b06d6ba..c49a05c37b 100644
--- a/libswscale/x86/swscale.c
+++ b/libswscale/x86/swscale.
Fixes so that fate under 64 bit Windows passes.
These functions replace all ff_hscale8to15_*_ssse3 when avx2 is available.
---
libswscale/swscale_internal.h | 2 +
libswscale/utils.c| 37 +++
libswscale/x86/Makefile | 1 +
libswscale/x86/scale_avx2.asm | 112 +++
On Tue, Dec 14, 2021 at 6:07 PM James Almer wrote:
> On 12/14/2021 12:23 PM, Alan Kelly wrote:
> > Patch has been rebased from latest commits.
> > These functions replace all ff_hscale8to15_*_ssse3 when avx2 is
> available.
> > ---
> > libswscale/swscale_inter
Patch has been rebased from latest commits.
These functions replace all ff_hscale8to15_*_ssse3 when avx2 is available.
---
libswscale/swscale_internal.h | 2 +
libswscale/utils.c| 37 +++
libswscale/x86/Makefile | 1 +
libswscale/x86/scale_avx2.asm | 112
On Wed, Jul 21, 2021 at 11:11 AM Alan Kelly wrote:
>
>
> On Fri, Jul 16, 2021 at 3:48 PM Alan Kelly wrote:
>
>> These functions replace all ff_hscale8to15_*_ssse3 when avx2 is available.
>> ---
>> EXTERNAL_AVX2_FAST is now used instead of EXTERNAL_AVX2_FAST_GATHE
On Fri, Jul 16, 2021 at 3:48 PM Alan Kelly wrote:
> These functions replace all ff_hscale8to15_*_ssse3 when avx2 is available.
> ---
> EXTERNAL_AVX2_FAST is now used instead of EXTERNAL_AVX2_FAST_GATHER as
> discussed in the email thread for part 1 of this patch.
>
> Benchmark
On Fri, Jul 16, 2021 at 4:02 PM James Almer wrote:
> On 7/16/2021 10:44 AM, Alan Kelly wrote:
> > Broadwell and later and Zen3 and later have fast gather instructions.
> > ---
> > Haswell is now excluded from EXTERNAL_AVX2_FAST as discussed in the
> > email thre
These functions replace all ff_hscale8to15_*_ssse3 when avx2 is available.
---
EXTERNAL_AVX2_FAST is now used instead of EXTERNAL_AVX2_FAST_GATHER as
discussed in the email thread for part 1 of this patch.
Benchmark results on Skylake and Haswell:
Skylake Haswell
h
Broadwell and later and Zen3 and later have fast gather instructions.
---
Haswell is now excluded from EXTERNAL_AVX2_FAST as discussed in the
email thread.
libavutil/cpu.h | 1 +
libavutil/x86/cpu.c | 11 ++-
2 files changed, 11 insertions(+), 1 deletion(-)
diff --git a/libavutil/c
On Fri, Jun 25, 2021 at 1:24 PM Alan Kelly wrote:
> On Fri, Jun 25, 2021 at 10:40 AM Lynne wrote:
>
>> Jun 25, 2021, 09:54 by alankelly-at-google@ffmpeg.org:
>>
>> > Broadwell and later and Zen3 and later have fast gather instructions.
>> > ---
>>
On Fri, Jun 25, 2021 at 1:26 PM Ronald S. Bultje wrote:
> Hi Alan,
>
> On Fri, Jun 25, 2021 at 3:59 AM Alan Kelly <
> alankelly-at-google@ffmpeg.org> wrote:
>
>> These functions replace all ff_hscale8to15_*_ssse3 when avx2 is available.
>>
>
> Re-asking
On Fri, Jun 25, 2021 at 10:40 AM Lynne wrote:
> Jun 25, 2021, 09:54 by alankelly-at-google@ffmpeg.org:
>
> > Broadwell and later and Zen3 and later have fast gather instructions.
> > ---
> > Gather requires between 9 and 12 cycles on Haswell, 5 to 7 on Broadwell,
> > and 2 to 5 on Skylake a
These functions replace all ff_hscale8to15_*_ssse3 when avx2 is available.
---
libswscale/swscale_internal.h | 2 +
libswscale/utils.c| 37 +++
libswscale/x86/Makefile | 1 +
libswscale/x86/scale_avx2.asm | 112 ++
libswscale/x86/swsca
Broadwell and later and Zen3 and later have fast gather instructions.
---
Gather requires between 9 and 12 cycles on Haswell, 5 to 7 on Broadwell,
and 2 to 5 on Skylake and newer. It is also slow on AMD before Zen 3.
libavutil/cpu.h | 2 ++
libavutil/x86/cpu.c | 18 --
libav
je wrote:
> > Hi Alan,
> >
> > On Mon, Jun 14, 2021 at 7:20 AM Alan Kelly <
> > alankelly-at-google@ffmpeg.org> wrote:
> >
> >> Broadwell and later have fast gather instructions.
> >> ---
> >> This is so that the avx2 version of ff
These functions replace all ff_hscale8to15_*_ssse3 when avx2 is available.
---
libswscale/swscale_internal.h | 2 +
libswscale/utils.c| 37 +++
libswscale/x86/Makefile | 1 +
libswscale/x86/scale_avx2.asm | 112 ++
libswscale/x86/swsca
Broadwell and later have fast gather instructions.
---
This is so that the avx2 version of ff_hscale8to15X which uses gather
instructions is only selected on machines where it will actually be
faster.
libavutil/cpu.c | 6 ++
libavutil/cpu.h | 6 ++
libavutil/cpu_inte
---
This is so that inputs of size 8 are supported, as was the case with
the original implementation. A bug was found with inputs not divisible
by 16.
libswscale/x86/yuv2yuvX.asm | 14 +-
1 file changed, 13 insertions(+), 1 deletion(-)
diff --git a/libswscale/x86/yuv2yuvX.asm b/lib
---
tests/checkasm/sw_scale.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/tests/checkasm/sw_scale.c b/tests/checkasm/sw_scale.c
index a10118704b..3ac0f9082f 100644
--- a/tests/checkasm/sw_scale.c
+++ b/tests/checkasm/sw_scale.c
@@ -68,8 +68,8 @@ static void check_yuv2
---
libswscale/x86/swscale.c | 6 --
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/libswscale/x86/swscale.c b/libswscale/x86/swscale.c
index cc9e8b0155..0848a31461 100644
--- a/libswscale/x86/swscale.c
+++ b/libswscale/x86/swscale.c
@@ -197,7 +197,8 @@ static void yuv2yuvX_ ##o
---
This is so that tails of size 8 may safely be processed
libswscale/x86/yuv2yuvX.asm | 14 +-
1 file changed, 13 insertions(+), 1 deletion(-)
diff --git a/libswscale/x86/yuv2yuvX.asm b/libswscale/x86/yuv2yuvX.asm
index 521880dabe..b6294cb919 100644
--- a/libswscale/x86/yuv2yuvX.as
---
tests/checkasm/sw_scale.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/tests/checkasm/sw_scale.c b/tests/checkasm/sw_scale.c
index a10118704b..3ac0f9082f 100644
--- a/tests/checkasm/sw_scale.c
+++ b/tests/checkasm/sw_scale.c
@@ -68,8 +68,8 @@ static void check_yuv2
---
libswscale/x86/swscale.c | 6 --
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/libswscale/x86/swscale.c b/libswscale/x86/swscale.c
index 1e865914cb..71961a9ae0 100644
--- a/libswscale/x86/swscale.c
+++ b/libswscale/x86/swscale.c
@@ -206,7 +206,8 @@ static void yuv2yuvX_ ##o
Initialises each item in src and filter arrays to fix valgrind
uninitialised value warning.
---
casts pointers to uint8_t* and multiplies the buffer size by sizeof(uint16_t).
tests/checkasm/sw_scale.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tests/checkasm/sw_scale.
b94cd55155d8c061f1e1faca9076afe540149c27 as the problematic
commit.
On Thu, Feb 18, 2021 at 11:23 PM James Almer wrote:
> On 2/17/2021 5:24 PM, Paul B Mahol wrote:
> > On Tue, Feb 16, 2021 at 6:31 PM Alan Kelly <
> > alankelly-at-google@ffmpeg.org> wrote:
> >
> >> Looks like there are n
Checks av_mallocs
---
tests/checkasm/sw_scale.c | 4
1 file changed, 4 insertions(+)
diff --git a/tests/checkasm/sw_scale.c b/tests/checkasm/sw_scale.c
index a4866723d7..ef414c0a82 100644
--- a/tests/checkasm/sw_scale.c
+++ b/tests/checkasm/sw_scale.c
@@ -103,7 +103,11 @@ static void check_y
Initialises each item in src and filter arrays to fix valgrind
uninitialised value warning.
---
tests/checkasm/sw_scale.c | 6 --
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/tests/checkasm/sw_scale.c b/tests/checkasm/sw_scale.c
index 7504f8b45f..a4866723d7 100644
--- a/tests/
Looks like there are no comments, is this OK to be applied? Thanks
On Tue, Feb 9, 2021 at 6:25 PM Paul B Mahol wrote:
> Will apply in no comments.
> ___
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-
Ping!
On Thu, Jan 14, 2021 at 3:47 PM Alan Kelly wrote:
> ---
> Replaces cpuflag(mmx) with notcpuflag(sse3) for store macro
> Tests for multiple sizes in checkasm-sw_scale
> checkasm-sw_scale aligns memory on 8 bytes instad of 32 to catch aligned
> loads
> libsw
---
Replaces cpuflag(mmx) with notcpuflag(sse3) for store macro
Tests for multiple sizes in checkasm-sw_scale
checkasm-sw_scale aligns memory on 8 bytes instad of 32 to catch aligned loads
libswscale/x86/Makefile | 1 +
libswscale/x86/swscale.c | 130 ---
32 so that
the test catches problems with alignment.
On Thu, Jan 14, 2021 at 1:11 AM Michael Niedermayer
wrote:
> On Mon, Jan 11, 2021 at 05:46:31PM +0100, Alan Kelly wrote:
> > ---
> > Fixes a bug where if there is no offset and a tail which is not
> processed by the
> >
---
Fixes a bug where if there is no offset and a tail which is not processed by
the
sse3/avx2 version the dither is modified
Deletes mmx/mmxext yuv2yuvX version from swscale_template and adds it
to yuv2yuvX.asm to reduce code duplication and so that it may be used
to process the tail from th
on a
solution.
On Sun, Jan 10, 2021 at 4:26 PM Michael Niedermayer
wrote:
> On Thu, Jan 07, 2021 at 10:41:19AM +0100, Alan Kelly wrote:
> > ---
> > Replaces mova with movdqu due to alignment issues
> > libswscale/x86/Makefile | 1 +
> > l
Thanks for your patience with this, I have replaced mova with movdqu - movu
generated a compile error on ssse3. What system did this crash on?
On Wed, Jan 6, 2021 at 9:10 PM Michael Niedermayer
wrote:
> On Tue, Jan 05, 2021 at 01:31:25PM +0100, Alan Kelly wrote:
> > Ping!
>
>
---
Replaces mova with movdqu due to alignment issues
libswscale/x86/Makefile | 1 +
libswscale/x86/swscale.c| 106 +---
libswscale/x86/yuv2yuvX.asm | 117
tests/checkasm/sw_scale.c | 98 ++
Ping!
On Thu, Dec 17, 2020 at 11:42 AM Alan Kelly wrote:
> ---
> Fixes memory alignment problem in checkasm-sw_scale
> Tested on Linux 32 and 64 bit and mingw32
> libswscale/x86/Makefile | 1 +
> libswscale/x86/swscale.c| 106 +---
&g
---
Fixes memory alignment problem in checkasm-sw_scale
Tested on Linux 32 and 64 bit and mingw32
libswscale/x86/Makefile | 1 +
libswscale/x86/swscale.c| 106 +---
libswscale/x86/yuv2yuvX.asm | 117
tests/checkasm/sw_sca
---
Replaces ff_sws_init_swscale_x86 with ff_getSwsFunc
Load offset if not gprsize but 8 on both 32 and 64 bit
Removes sfence as NT store no longer used
libswscale/x86/Makefile | 1 +
libswscale/x86/swscale.c| 106 +---
libswscale/x86/yuv2yuvX.asm | 117 +++
good reason. If you think it better to use NT stores, I
will replace them.
On Fri, Dec 4, 2020 at 2:00 PM Anton Khirnov wrote:
> Quoting Alan Kelly (2020-11-19 09:41:56)
> > ---
> > All of Henrik's suggestions have been implemented. Additionally,
> > m3 and m6 are per
---
Activates avx2 version of yuv2yuvX
Adds checkasm for yuv2yuvX
Modifies ff_yuv2yuvX_* signature to match yuv2yuvX_*
Replaces non-temporal stores with temporal stores
libswscale/x86/Makefile | 1 +
libswscale/x86/swscale.c| 106 +---
libswscale/x86/yuv2y
Ping
On Thu, Nov 19, 2020 at 9:42 AM Alan Kelly wrote:
> ---
> All of Henrik's suggestions have been implemented. Additionally,
> m3 and m6 are permuted in avx2 before storing to ensure bit by bit
> identical results in avx2.
> libswscale/x86/Makefile | 1 +
> l
---
All of Henrik's suggestions have been implemented. Additionally,
m3 and m6 are permuted in avx2 before storing to ensure bit by bit
identical results in avx2.
libswscale/x86/Makefile | 1 +
libswscale/x86/swscale.c| 75 +++
libswscale/x86/yuv2yuvX.asm | 118 ++
---
Fixes bug in sse3 path where m1 is not set correctly resulting in off
by one errors. The results are now bit by bit identical.
libswscale/x86/Makefile | 1 +
libswscale/x86/swscale.c| 75
libswscale/x86/yuv2yuvX.asm | 114 ++
---
It now works on x86-32
libswscale/x86/Makefile | 1 +
libswscale/x86/swscale.c| 75
libswscale/x86/yuv2yuvX.asm | 110
3 files changed, 121 insertions(+), 65 deletions(-)
create mode 100644 libswscale/x86/yuv2yuvX.asm
---
yuv2yuvX.asm: Ports yuv2yuvX to asm, unrolls main loop and adds
other small optimizations for ~20% speed-up. Copyright updated to
include the original from swscale.c
swscale.c: Removes yuv2yuvX_sse3 and calls new function ff_yuv2yuvX_sse3.
Calls yuv2yuvX_mmxext on remainining elements if r
? Thank you.
On Sat, Oct 31, 2020 at 1:02 PM Carl Eugen Hoyos wrote:
> Am Di., 27. Okt. 2020 um 09:56 Uhr schrieb Alan Kelly
> :
>
> > --- /dev/null
> > +++ b/libswscale/x86/yuv2yuvX.a
1 - 100 of 109 matches
Mail list logo