[FFmpeg-devel] [RFC] x86 external assembler requirements

2017-06-09 Thread James Darnley
to prevent confusion. We could just change them to "nasm" and be done. We could provide compatability options. We could adopt Libav's generic "x86asm". James Darnley (1): configure: require NASM version 2.11 or newer for external x86 assembly configure | 17 +

Re: [FFmpeg-devel] [WIP][PATCH] Opus Piramid Vector Quantization Search in x86 SIMD asm

2017-06-10 Thread James Darnley
On 2017-06-09 13:41, Ivan Kalvachev wrote: > On 6/9/17, Michael Niedermayer wrote: >> seems this breaks build with mingw64, didnt investigate but it >> fails with these errors: >> >> libavcodec/libavcodec.a(opus_pvq_search.o):src/libavcodec/x86/opus_pvq_search.asm:(.text+0x2d): >> relocation trunc

[FFmpeg-devel] [WIP] [PATCH 0/5] sse2/avx functions for 8-bit simple_idct

2017-06-10 Thread James Darnley
still have a small optimisation to make and I need to use the correct coefficients. This will require a large change to the macros. I am sending this so that people can nitpick my changes. James Darnley (5): avcodec/x86: cleanup simple_idct10 avcodec/x86: add x86-64 8-bit simple_idct function

[FFmpeg-devel] [PATCH 3/5] more cleanup

2017-06-10 Thread James Darnley
--- libavcodec/x86/simple_idct10_template.asm | 16 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/libavcodec/x86/simple_idct10_template.asm b/libavcodec/x86/simple_idct10_template.asm index c0d1637ca2..3f398985a5 100644 --- a/libavcodec/x86/simple_idct10_template.

[FFmpeg-devel] [PATCH 1/5] avcodec/x86: cleanup simple_idct10

2017-06-10 Thread James Darnley
Use named arguments for the functions so we can remove a define. The stride/linesize argument is now ptrdiff_t type so we no longer need to sign extend the register. --- libavcodec/x86/proresdsp.asm | 2 +- libavcodec/x86/simple_idct10.asm | 8 ++-- libavcodec/x86/simple_i

[FFmpeg-devel] [PATCH 2/5] avcodec/x86: add x86-64 8-bit simple_idct function

2017-06-10 Thread James Darnley
Rounding contributed by Ronald S. Bultje --- libavcodec/tests/x86/dct.c | 2 ++ libavcodec/x86/idctdsp_init.c| 19 +++ libavcodec/x86/simple_idct.h | 3 +++ libavcodec/x86/simple_idct10.asm | 8 4 files changed, 32 insertions(+) diff --git a/libavcodec/te

[FFmpeg-devel] [PATCH 4/5] avcodec/x86: add x86-64 8-bit simple_idct put function

2017-06-10 Thread James Darnley
--- libavcodec/x86/idctdsp_init.c| 2 ++ libavcodec/x86/simple_idct.h | 3 +++ libavcodec/x86/simple_idct10.asm | 23 +++ 3 files changed, 28 insertions(+) diff --git a/libavcodec/x86/idctdsp_init.c b/libavcodec/x86/idctdsp_init.c index 4b2145e478..1826d01e0e 100644

[FFmpeg-devel] [PATCH 5/5] avcodec/x86: add x86-64 8-bit simple_idct add function

2017-06-10 Thread James Darnley
--- libavcodec/x86/idctdsp_init.c| 2 ++ libavcodec/x86/simple_idct.h | 3 ++ libavcodec/x86/simple_idct10.asm | 61 3 files changed, 66 insertions(+) diff --git a/libavcodec/x86/idctdsp_init.c b/libavcodec/x86/idctdsp_init.c index 1826d01e0e..ca

Re: [FFmpeg-devel] [WIP][PATCH] Opus Piramid Vector Quantization Search in x86 SIMD asm

2017-06-11 Thread James Darnley
On 2017-06-11 11:34, Ivan Kalvachev wrote: > On 6/10/17, James Darnley wrote: >> On 2017-06-09 13:41, Ivan Kalvachev wrote: >>> >>> const_*_edge is used on only one place is the code. >>> Would you check if this patch fixes the issue. >>> >>>

[FFmpeg-devel] [PATCH 0/6] sse2/avx functions for 8-bit simple idct

2017-06-12 Thread James Darnley
kindly give their opinion on the 2nd and 6th patches in particular I would greatly appreciate it. Performance gain decoding an MPEG2 HD sample over the old MMX: - Yorkfield: 210 to 224 fps - Haswell: 387 to 426 fps Would anyone like me to get some timer figures for the functions themselves? James

[FFmpeg-devel] [PATCH 1/6] avcodec/x86: cleanup simple_idct10

2017-06-12 Thread James Darnley
Use named arguments for the functions so we can remove a define. The stride/linesize argument is now ptrdiff_t type so we no longer need to sign extend the register. --- libavcodec/x86/proresdsp.asm | 2 +- libavcodec/x86/simple_idct10.asm | 8 ++-- libavcodec/x86/simple_i

[FFmpeg-devel] [PATCH 6/6] avcodec/x86: allow 8-bit simple_idct to use slightly different coefficients

2017-06-12 Thread James Darnley
This makes it exact to the old MMX one, as reported by libavcodec/tests/dct. --- libavcodec/x86/proresdsp.asm | 18 + libavcodec/x86/simple_idct10.asm | 33 ++- libavcodec/x86/simple_idct10_template.asm | 19 ++ 3 fi

[FFmpeg-devel] [PATCH 5/6] avcodec/x86: add x86-64 8-bit simple_idct add function

2017-06-12 Thread James Darnley
--- libavcodec/x86/idctdsp_init.c| 2 ++ libavcodec/x86/simple_idct.h | 3 ++ libavcodec/x86/simple_idct10.asm | 61 3 files changed, 66 insertions(+) diff --git a/libavcodec/x86/idctdsp_init.c b/libavcodec/x86/idctdsp_init.c index 1826d01e0e..9d

[FFmpeg-devel] [PATCH 2/6] avcodec/x86: modify simple_idct10 macros to add an action paramter

2017-06-12 Thread James Darnley
--- libavcodec/x86/proresdsp.asm | 2 +- libavcodec/x86/simple_idct10.asm | 8 +++ libavcodec/x86/simple_idct10_template.asm | 37 +-- 3 files changed, 25 insertions(+), 22 deletions(-) diff --git a/libavcodec/x86/proresdsp.asm b/libavcodec/

[FFmpeg-devel] [PATCH 4/6] avcodec/x86: add x86-64 8-bit simple_idct put function

2017-06-12 Thread James Darnley
--- libavcodec/x86/idctdsp_init.c| 2 ++ libavcodec/x86/simple_idct.h | 3 +++ libavcodec/x86/simple_idct10.asm | 23 +++ 3 files changed, 28 insertions(+) diff --git a/libavcodec/x86/idctdsp_init.c b/libavcodec/x86/idctdsp_init.c index 4b2145e478..1826d01e0e 100644

[FFmpeg-devel] [PATCH 3/6] avcodec/x86: add x86-64 8-bit simple_idct function

2017-06-12 Thread James Darnley
Rounding contributed by Ronald S. Bultje --- libavcodec/tests/x86/dct.c | 2 ++ libavcodec/x86/idctdsp_init.c| 19 +++ libavcodec/x86/simple_idct.h | 3 +++ libavcodec/x86/simple_idct10.asm | 8 4 files changed, 32 insertions(+) diff --git a/libavcodec/te

Re: [FFmpeg-devel] [PATCH 0/6] sse2/avx functions for 8-bit simple idct

2017-06-12 Thread James Darnley
On 2017-06-12 18:57, Michael Niedermayer wrote: > On Mon, Jun 12, 2017 at 03:36:06PM +0200, James Darnley wrote: >> Rounding contributed by Ronald S. Bultje >> --- >> libavcodec/tests/x86/dct.c | 2 ++ >> libavcodec/x86/idctdsp_init.c| 19 +++

Re: [FFmpeg-devel] [PATCH 0/6] sse2/avx functions for 8-bit simple idct

2017-06-13 Thread James Darnley
On 2017-06-13 00:18, James Darnley wrote: > On 2017-06-12 18:57, Michael Niedermayer wrote: >> ./ffplay ~/videos/matrixbench_mpeg2.mpg >> looks pretty bad > > If that would happen to be the FATE sample > mpeg2/matrixbench_mpeg2.lq1.mpg then I see that too. > >

[FFmpeg-devel] [PATCH 1/6] fate: add test of -idct simpleauto

2017-06-15 Thread James Darnley
--- tests/fate/video.mak | 3 +++ tests/ref/fate/idct-simpleauto | 27 +++ 2 files changed, 30 insertions(+) create mode 100644 tests/ref/fate/idct-simpleauto diff --git a/tests/fate/video.mak b/tests/fate/video.mak index d1d35335f2..455c1d3564 100644 --- a/tes

[FFmpeg-devel] [PATCH 0/6] [v2] sse2/avx functions for 8-bit simple idct

2017-06-15 Thread James Darnley
r platforms use their own functions for simpleauto. I might follow this with a patch to cleanup idctdsp_init.c James Darnley (6): fate: add test of -idct simpleauto avcodec/x86: cleanup simple_idct10 avcodec/x86: modify simple_idct10 macros to add an action paramter avcodec/x86: allow future

[FFmpeg-devel] [PATCH 5/6] avcodec/x86: allow future 8-bit simple idct to have "DC only hack"

2017-06-15 Thread James Darnley
Created by Ronald S. Bultje --- libavcodec/x86/simple_idct10_template.asm | 38 +++ 1 file changed, 38 insertions(+) diff --git a/libavcodec/x86/simple_idct10_template.asm b/libavcodec/x86/simple_idct10_template.asm index d8ea0bcc6b..51baf84c82 100644 --- a/libavcodec

[FFmpeg-devel] [PATCH 4/6] avcodec/x86: allow future 8-bit simple idct to use slightly different coefficients

2017-06-15 Thread James Darnley
--- libavcodec/x86/proresdsp.asm | 18 ++ libavcodec/x86/simple_idct10.asm | 29 + libavcodec/x86/simple_idct10_template.asm | 19 +++ 3 files changed, 50 insertions(+), 16 deletions(-) diff --git a/libavcodec/x86/p

[FFmpeg-devel] [PATCH 3/6] avcodec/x86: modify simple_idct10 macros to add an action paramter

2017-06-15 Thread James Darnley
--- libavcodec/x86/proresdsp.asm | 2 +- libavcodec/x86/simple_idct10.asm | 8 +++ libavcodec/x86/simple_idct10_template.asm | 37 +-- 3 files changed, 25 insertions(+), 22 deletions(-) diff --git a/libavcodec/x86/proresdsp.asm b/libavcodec/

[FFmpeg-devel] [PATCH 2/6] avcodec/x86: cleanup simple_idct10

2017-06-15 Thread James Darnley
Use named arguments for the functions so we can remove a define. The stride/linesize argument is now ptrdiff_t type so we no longer need to sign extend the register. --- libavcodec/x86/proresdsp.asm | 2 +- libavcodec/x86/simple_idct10.asm | 8 ++-- libavcodec/x86/simple_i

[FFmpeg-devel] [PATCH 6/6] avcodec/x86: add an 8-bit simple IDCT function based on the x86-64 high depth functions

2017-06-15 Thread James Darnley
Includes add/put functions Rounding contributed by Ronald S. Bultje --- libavcodec/tests/x86/dct.c | 2 + libavcodec/x86/idctdsp_init.c| 23 ++ libavcodec/x86/simple_idct.h | 9 libavcodec/x86/simple_idct10.asm | 94 4 files ch

[FFmpeg-devel] [PATCH] avcodec/x86: add an 8-bit simple IDCT function based on the x86-64 high depth functions

2017-06-15 Thread James Darnley
Includes add/put functions Rounding contributed by Ronald S. Bultje --- I must be stupid. I dropped the stack space change somewhere. libavcodec/tests/x86/dct.c | 2 + libavcodec/x86/idctdsp_init.c| 23 ++ libavcodec/x86/simple_idct.h | 9 libavcodec/x86/simple_idct

Re: [FFmpeg-devel] [PATCH] avcodec/x86: add an 8-bit simple IDCT function based on the x86-64 high depth functions

2017-06-16 Thread James Darnley
On 2017-06-16 03:58, Michael Niedermayer wrote: > On Thu, Jun 15, 2017 at 05:08:33PM +0200, James Darnley wrote: >> Includes add/put functions >> >> Rounding contributed by Ronald S. Bultje >> --- >> I must be stupid. I dropped the stack space change somewhere.

Re: [FFmpeg-devel] [PATCH] avcodec/x86: add an 8-bit simple IDCT function based on the x86-64 high depth functions

2017-06-16 Thread James Darnley
On 2017-06-16 12:48, Paul B Mahol wrote: > On 6/16/17, James Darnley wrote: >> On 2017-06-16 03:58, Michael Niedermayer wrote: >>> theres something wrong with this >>> it totally breaks this: >>> make -j12 ffmpeg && ./ffmpeg -ss 1 -i cache:matrixbench

[FFmpeg-devel] [PATCH 2/2] avcodec/x86/mpegenc: support transpose permuation type

2017-06-16 Thread James Darnley
--- libavcodec/x86/mpegvideoenc_template.c | 47 +- 1 file changed, 46 insertions(+), 1 deletion(-) diff --git a/libavcodec/x86/mpegvideoenc_template.c b/libavcodec/x86/mpegvideoenc_template.c index 3ce72e1367..1201be514b 100644 --- a/libavcodec/x86/mpegvideoenc_t

[FFmpeg-devel] [PATCH 1/2] avcodec/x86/mpegenc: check IDCT permutation type is a valid value

2017-06-16 Thread James Darnley
--- libavcodec/x86/mpegvideoenc_template.c | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/libavcodec/x86/mpegvideoenc_template.c b/libavcodec/x86/mpegvideoenc_template.c index b2512744ca..3ce72e1367 100644 --- a/libavcodec/x86/mpegvideoenc_template.c +++ b/libavcodec/x

Re: [FFmpeg-devel] [PATCH 2/2] avcodec/x86/mpegenc: support transpose permuation type

2017-06-16 Thread James Darnley
On 2017-06-16 17:48, Michael Niedermayer wrote: > On Fri, Jun 16, 2017 at 03:53:28PM +0200, James Darnley wrote: >> --- >> libavcodec/x86/mpegvideoenc_template.c | 47 >> +- >> 1 file changed, 46 insertions(+), 1 deletion(-) >

Re: [FFmpeg-devel] [PATCH 1/6] fate: add test of -idct simpleauto

2017-06-16 Thread James Darnley
On 2017-06-16 20:31, Michael Niedermayer wrote: > On Thu, Jun 15, 2017 at 03:34:21PM +0200, James Darnley wrote: >> --- >> tests/fate/video.mak | 3 +++ >> tests/ref/fate/idct-simpleauto | 27 +++ >> 2 files changed, 30 insertions(+) &

[FFmpeg-devel] [PATCH 01/11] avcodec/x86/mpegenc: check IDCT permutation type is a valid value

2017-06-19 Thread James Darnley
--- libavcodec/x86/mpegvideoenc_template.c | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/libavcodec/x86/mpegvideoenc_template.c b/libavcodec/x86/mpegvideoenc_template.c index b2512744ca..3ce72e1367 100644 --- a/libavcodec/x86/mpegvideoenc_template.c +++ b/libavcodec/x

[FFmpeg-devel] [PATCH 00/11] [v3] sse2/avx functions for 8-bit simple idct

2017-06-19 Thread James Darnley
since outdated and I added a comment to the "DC-only hack" above it in the contitional section. The other patches either fix or workaround bugs in other code. James Darnley (9): avcodec/x86/mpegenc: check IDCT permutation type is a valid value avcodec/x86/mpegenc: support transpose

[FFmpeg-devel] [PATCH 02/11] avcodec/x86/mpegenc: support transpose permuation type

2017-06-19 Thread James Darnley
--- libavcodec/x86/mpegvideoenc_template.c | 47 +- 1 file changed, 46 insertions(+), 1 deletion(-) diff --git a/libavcodec/x86/mpegvideoenc_template.c b/libavcodec/x86/mpegvideoenc_template.c index 3ce72e1367..1201be514b 100644 --- a/libavcodec/x86/mpegvideoenc_t

[FFmpeg-devel] [PATCH 03/11] avcodec/mdec: override IDCT choice before initing DSP structs

2017-06-19 Thread James Darnley
--- libavcodec/mdec.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/libavcodec/mdec.c b/libavcodec/mdec.c index 8e28aa04f0..97bfebbeb7 100644 --- a/libavcodec/mdec.c +++ b/libavcodec/mdec.c @@ -213,6 +213,9 @@ static av_cold int decode_init(AVCodecContext *avctx) {

[FFmpeg-devel] [PATCH 08/11] avcodec/x86: allow future 8-bit simple idct to use slightly different coefficients

2017-06-19 Thread James Darnley
--- libavcodec/x86/proresdsp.asm | 18 ++ libavcodec/x86/simple_idct10.asm | 29 + libavcodec/x86/simple_idct10_template.asm | 19 +++ 3 files changed, 50 insertions(+), 16 deletions(-) diff --git a/libavcodec/x86/p

[FFmpeg-devel] [PATCH 09/11] avcodec/x86: allow future 8-bit simple idct to have "DC only hack"

2017-06-19 Thread James Darnley
Created by Ronald S. Bultje --- libavcodec/x86/simple_idct10_template.asm | 38 +++ 1 file changed, 38 insertions(+) diff --git a/libavcodec/x86/simple_idct10_template.asm b/libavcodec/x86/simple_idct10_template.asm index d8ea0bcc6b..51baf84c82 100644 --- a/libavcodec

[FFmpeg-devel] [PATCH 06/11] avcodec/x86: cleanup simple_idct10

2017-06-19 Thread James Darnley
Use named arguments for the functions so we can remove a define. The stride/linesize argument is now ptrdiff_t type so we no longer need to sign extend the register. --- libavcodec/x86/proresdsp.asm | 2 +- libavcodec/x86/simple_idct10.asm | 8 ++-- libavcodec/x86/simple_i

[FFmpeg-devel] [PATCH 10/11] avcodec/x86: add an 8-bit simple IDCT function based on the x86-64 high depth functions

2017-06-19 Thread James Darnley
Includes add/put functions Rounding contributed by Ronald S. Bultje --- libavcodec/tests/x86/dct.c| 2 + libavcodec/x86/idctdsp_init.c | 23 libavcodec/x86/simple_idct.h | 9 +++ libavcodec/x86/simple_idct10.asm | 92 +++

[FFmpeg-devel] [PATCH 07/11] avcodec/x86: modify simple_idct10 macros to add an action paramter

2017-06-19 Thread James Darnley
--- libavcodec/x86/proresdsp.asm | 2 +- libavcodec/x86/simple_idct10.asm | 8 +++ libavcodec/x86/simple_idct10_template.asm | 37 +-- 3 files changed, 25 insertions(+), 22 deletions(-) diff --git a/libavcodec/x86/proresdsp.asm b/libavcodec/

[FFmpeg-devel] [PATCH 05/11] avcodec/mpegenc: do not use unquantize shortcuts for wmv1

2017-06-19 Thread James Darnley
From: "Ronald S. Bultje" Commit message by James Darnley The shortcut is based on end-of-block positions. This leads to some coefficients not being unquantized. This is the symptom of the bug. A possible candidate for the real bug is the scan table used here in unquantize does not

[FFmpeg-devel] [PATCH 04/11] avcodec/mdec: allow/use permuted IDCTs

2017-06-19 Thread James Darnley
From: "Ronald S. Bultje" --- libavcodec/mdec.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/libavcodec/mdec.c b/libavcodec/mdec.c index 97bfebbeb7..1e1c8f4c55 100644 --- a/libavcodec/mdec.c +++ b/libavcodec/mdec.c @@ -94,7 +94,7 @@ static inline int mdec_decode_block

[FFmpeg-devel] [PATCH 11/11] avcodec/x86: use new x86-64 functions for -idct simple

2017-06-19 Thread James Darnley
They now match according to FATE, barring any further bugs with untested parts --- libavcodec/x86/idctdsp_init.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/libavcodec/x86/idctdsp_init.c b/libavcodec/x86/idctdsp_init.c index 9da60d1a1e..162560d411 100644 --- a/libavco

Re: [FFmpeg-devel] [PATCH 05/11] avcodec/mpegenc: do not use unquantize shortcuts for wmv1

2017-06-20 Thread James Darnley
On 2017-06-20 00:08, Michael Niedermayer wrote: > On Mon, Jun 19, 2017 at 05:10:58PM +0200, James Darnley wrote: >> From: "Ronald S. Bultje" >> >> Commit message by James Darnley >> >> The shortcut is based on end-of-block positions. This leads to

Re: [FFmpeg-devel] [PATCH 02/11] avcodec/x86/mpegenc: support transpose permuation type

2017-06-20 Thread James Darnley
On 2017-06-19 18:30, Michael Niedermayer wrote: > On Mon, Jun 19, 2017 at 05:10:55PM +0200, James Darnley wrote: >> --- >> libavcodec/x86/mpegvideoenc_template.c | 47 >> +- >> 1 file changed, 46 insertions(+), 1 deletion(-) > &

Re: [FFmpeg-devel] [PATCH 06/11] avcodec/x86: cleanup simple_idct10

2017-06-20 Thread James Darnley
On 2017-06-19 20:30, Ronald S. Bultje wrote: > Hi, > > On Mon, Jun 19, 2017 at 11:10 AM, James Darnley wrote: > >> Use named arguments for the functions so we can remove a define. The >> stride/linesize argument is now ptrdiff_t type so we no longer need to >

Re: [FFmpeg-devel] [PATCH 07/11] avcodec/x86: modify simple_idct10 macros to add an action paramter

2017-06-20 Thread James Darnley
On 2017-06-19 20:31, Ronald S. Bultje wrote: > Hi, > > On Mon, Jun 19, 2017 at 11:11 AM, James Darnley wrote: > >> --- >> libavcodec/x86/proresdsp.asm | 2 +- >> libavcodec/x86/simple_idct10.asm | 8 +++ >> libavcodec

Re: [FFmpeg-devel] [PATCH 09/11] avcodec/x86: allow future 8-bit simple idct to have "DC only hack"

2017-06-20 Thread James Darnley
On 2017-06-20 13:56, Ronald S. Bultje wrote: > Hi, > > On Mon, Jun 19, 2017 at 11:11 AM, James Darnley wrote: > >> Created by Ronald S. Bultje >> --- >> libavcodec/x86/simple_idct10_template.asm | 38 >> +++ >> 1 file ch

Re: [FFmpeg-devel] [PATCH] mdec: use correctly permutated quant matrix for dequantization.

2017-06-20 Thread James Darnley
On 2017-06-20 14:47, Ronald S. Bultje wrote: > This allows using non-simple (e.g. simplemmx) IDCT implementations. > The result is not bitexact (which is why the fate test continues to > use -idct simple), but the PSNR between C/MMX goes from ~35 to ~90. > --- > libavcodec/mdec.c | 14 ++--

Re: [FFmpeg-devel] [PATCH] mdec: use correctly permutated quant matrix for dequantization.

2017-06-20 Thread James Darnley
On 2017-06-20 18:16, Ronald S. Bultje wrote: > On Tue, Jun 20, 2017 at 12:04 PM, James Darnley wrote: >>> @@ -231,6 +230,13 @@ static av_cold int decode_init(AVCodecContext >> *avctx) >>> avctx->pix_fmt = AV_PIX_FMT_YUVJ420P; >>>

Re: [FFmpeg-devel] [PATCH 08/11] avcodec/x86: allow future 8-bit simple idct to use slightly different coefficients

2017-06-20 Thread James Darnley
On 2017-06-20 13:55, Ronald S. Bultje wrote: > Hi, > > On Mon, Jun 19, 2017 at 11:11 AM, James Darnley wrote: > >> --- >> libavcodec/x86/proresdsp.asm | 18 ++ >> libavcodec/x86/simple_idct10.asm | 29 >>

Re: [FFmpeg-devel] [PATCH 10/11] avcodec/x86: add an 8-bit simple IDCT function based on the x86-64 high depth functions

2017-06-20 Thread James Darnley
On 2017-06-19 17:11, James Darnley wrote: > diff --git a/libavcodec/x86/simple_idct10_template.asm > b/libavcodec/x86/simple_idct10_template.asm > index 51baf84c82..02fd445ec0 100644 > --- a/libavcodec/x86/simple_idct10_template.asm > +++ b/libavcodec/x86/simple_idct10_template.

Re: [FFmpeg-devel] [PATCH 1/2] avcodec/mpegvideo: Use intra_scantable in dct_unquantize_h263_intra_c()

2017-06-20 Thread James Darnley
On 2017-06-20 00:37, Michael Niedermayer wrote: > Signed-off-by: Michael Niedermayer > --- > libavcodec/mpegvideo.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/libavcodec/mpegvideo.c b/libavcodec/mpegvideo.c > index 63a30b93ce..e29558b3a2 100644 > --- a/libavcodec/mp

Re: [FFmpeg-devel] [PATCH 6/8] x86inc: reduce difference to x264 upstream

2017-11-06 Thread James Darnley
On 2017-10-31 04:30, Michael Niedermayer wrote: > On Mon, Oct 30, 2017 at 02:08:33PM +0100, James Darnley wrote: >> These changes were commited to x264 in b568a256 "Experimental nasm >> support" >> --- >> libavutil/x86/x86inc.asm | 16 ++-- &g

Re: [FFmpeg-devel] [PATCH 6/8] x86inc: reduce difference to x264 upstream

2017-11-06 Thread James Darnley
On 2017-11-06 21:15, James Almer wrote: > On 11/6/2017 4:56 PM, James Darnley wrote: >> Line 733 is the align command to align the start of the function. I >> can't see why it fails here but not on any other function in that file >> or any other file. >> >> A

[FFmpeg-devel] [PATCH 00/11] AVX-512 support (v.2)

2017-11-09 Thread James Darnley
hat acutally use ZMM registers. Henrik Gramner (1): x86inc: AVX-512 support James Darnley (10): configure: test whether x86 assembler supports AVX-512 avutil: add AVX-512 flags avutil: detect when AVX-512 is available avutil: add alignment needed for AVX-512 avcodec: add stride alignmen

[FFmpeg-devel] [PATCH 01/11] configure: test whether x86 assembler supports AVX-512

2017-11-09 Thread James Darnley
--- configure | 5 + 1 file changed, 5 insertions(+) diff --git a/configure b/configure index f396abda5b..146a87324c 100755 --- a/configure +++ b/configure @@ -406,6 +406,7 @@ Optimization options (experts only): --disable-fma3 disable FMA3 optimizations --disable-fma4

[FFmpeg-devel] [PATCH 04/11] avutil: add alignment needed for AVX-512

2017-11-09 Thread James Darnley
--- This patch gained the alignmnet increase in mem.c libavutil/mem.c | 2 +- libavutil/x86/cpu.c | 2 ++ 2 files changed, 3 insertions(+), 1 deletion(-) diff --git a/libavutil/mem.c b/libavutil/mem.c index 6ad409daf4..cdf539306f 100644 --- a/libavutil/mem.c +++ b/libavutil/mem.c @@ -61,7 +6

[FFmpeg-devel] [PATCH 11/11] avcodec/lossless_videodsp: add AVX-512 version of add_bytes

2017-11-09 Thread James Darnley
--- libavcodec/x86/lossless_videodsp.asm| 5 + libavcodec/x86/lossless_videodsp_init.c | 5 + 2 files changed, 10 insertions(+) diff --git a/libavcodec/x86/lossless_videodsp.asm b/libavcodec/x86/lossless_videodsp.asm index ba4d4f0153..5649348f86 100644 --- a/libavcodec/x86/lossless_v

[FFmpeg-devel] [PATCH 07/11] checkasm: support for AVX-512 functions

2017-11-09 Thread James Darnley
--- tests/checkasm/checkasm.c | 1 + 1 file changed, 1 insertion(+) diff --git a/tests/checkasm/checkasm.c b/tests/checkasm/checkasm.c index b8b0e32dbd..9fb1438bdb 100644 --- a/tests/checkasm/checkasm.c +++ b/tests/checkasm/checkasm.c @@ -192,6 +192,7 @@ static const struct { { "FMA3", "

[FFmpeg-devel] [PATCH 10/11] avcodec/blockdsp: add AVX-512 version of clear_block(s)

2017-11-09 Thread James Darnley
From: James Darnley Also adjust alignment requirements where nessecary. --- Whether this patch is committed or not the change to 4xm.c should be picked to master because the alignment is wrong for the AVX version of this function. I assume it hasn't been noticed yet because it manages to

[FFmpeg-devel] [PATCH 06/11] x86inc: AVX-512 support

2017-11-09 Thread James Darnley
From: Henrik Gramner AVX-512 consists of a plethora of different extensions, but in order to keep things a bit more manageable we group together the following extensions under a single baseline cpu flag which should cover SKL-X and future CPUs: * AVX-512 Foundation (F) * AVX-512 Conflict Detect

[FFmpeg-devel] [PATCH 03/11] avutil: detect when AVX-512 is available

2017-11-09 Thread James Darnley
--- I've changed this patch slightly because I discovered that it would cause an illegal instruction exception on much older processors (probably all without AVX). I was running xgetbv() almost uncontitionally. Now it is a little more like what is the in x264 patch. libavutil/x86/cpu.c | 12 +++

[FFmpeg-devel] [PATCH 09/11] avcodec/blockdsp: roll-up x86asm preprocessor loop

2017-11-09 Thread James Darnley
From: James Darnley --- libavcodec/x86/blockdsp.asm | 11 --- 1 file changed, 4 insertions(+), 7 deletions(-) diff --git a/libavcodec/x86/blockdsp.asm b/libavcodec/x86/blockdsp.asm index 9d203df8f5..9d0e8a3242 100644 --- a/libavcodec/x86/blockdsp.asm +++ b/libavcodec/x86/blockdsp.asm

[FFmpeg-devel] [PATCH 02/11] avutil: add AVX-512 flags

2017-11-09 Thread James Darnley
--- libavutil/cpu.c | 6 +- libavutil/cpu.h | 1 + libavutil/tests/cpu.c | 1 + libavutil/x86/cpu.h | 2 ++ 4 files changed, 9 insertions(+), 1 deletion(-) diff --git a/libavutil/cpu.c b/libavutil/cpu.c index c8401b8258..6548cc3042 100644 --- a/libavutil/cpu.c +++ b/libavutil/cp

[FFmpeg-devel] [PATCH 08/11] avcodec/v210enc: add AVX-512 10-bit line pack function

2017-11-09 Thread James Darnley
--- libavcodec/x86/v210enc.asm| 5 + libavcodec/x86/v210enc_init.c | 7 +++ 2 files changed, 12 insertions(+) diff --git a/libavcodec/x86/v210enc.asm b/libavcodec/x86/v210enc.asm index 965f2bea3c..5068af27f8 100644 --- a/libavcodec/x86/v210enc.asm +++ b/libavcodec/x86/v210enc.asm @@ -

[FFmpeg-devel] [PATCH 05/11] avcodec: add stride alignment needed for AVX-512

2017-11-09 Thread James Darnley
--- configure | 2 ++ libavcodec/internal.h | 4 +++- 2 files changed, 5 insertions(+), 1 deletion(-) diff --git a/configure b/configure index 146a87324c..fce8030d91 100755 --- a/configure +++ b/configure @@ -1886,6 +1886,7 @@ ARCH_FEATURES=" local_aligned simd_align_16

Re: [FFmpeg-devel] [PATCH 10/11] avcodec/blockdsp: add AVX-512 version of clear_block(s)

2017-11-10 Thread James Darnley
On 2017-11-09 20:35, Martin Vignali wrote: > 2017-11-09 12:58 GMT+01:00 James Darnley : > >> From: James Darnley >> >> Also adjust alignment requirements where nessecary. >> --- >> Whether this patch is committed or not the change to 4xm.c should be >>

Re: [FFmpeg-devel] [PATCH 11/11] avcodec/lossless_videodsp: add AVX-512 version of add_bytes

2017-11-10 Thread James Darnley
On 2017-11-09 20:43, Martin Vignali wrote: > 2017-11-09 20:37 GMT+01:00 Martin Vignali : >> lgtm >> >> Can you post your checkasm benchmark result for this ? Yep > $ ./tests/checkasm/checkasm --bench --test=llviddsp > benchmarking with native FFmpeg timers > nop: 26.0 > checkasm: using random see

Re: [FFmpeg-devel] [PATCH 08/11] avcodec/v210enc: add AVX-512 10-bit line pack function

2017-11-10 Thread James Darnley
On 2017-11-09 20:42, Martin Vignali wrote: > I doesn't want to block this patch, but > like you say (in your previous version), that this version is not faster, > i'm not sure, it's interesting to apply it. > You already made "real" avx512 version for other funcs, in order to check > the rest of yo

Re: [FFmpeg-devel] [PATCH 05/11] avcodec: add stride alignment needed for AVX-512

2017-11-10 Thread James Darnley
On 2017-11-10 02:38, James Almer wrote: > On 11/9/2017 8:58 AM, James Darnley wrote: >> --- >> configure | 2 ++ >> libavcodec/internal.h | 4 +++- >> 2 files changed, 5 insertions(+), 1 deletion(-) >> >> diff --git a/configure b/configure

Re: [FFmpeg-devel] [PATCH 08/11] avcodec/v210enc: add AVX-512 10-bit line pack function

2017-11-10 Thread James Darnley
On 2017-11-10 14:32, James Darnley wrote: > I mentioned previously that using ZMM registers will cause the CPU to > reduce its frequency. > > Gramner said on IRC that a user should spend 20-30% of time in > AVX-512/ZMM code for it to be a net gain in speed. > From ffmpeg-devel

Re: [FFmpeg-devel] [PATCH] avfilter/vf_threshold: add x86 SIMD

2017-11-12 Thread James Darnley
On 2017-11-12 21:15, Rostislav Pehlivanov wrote: > On 12 November 2017 at 19:15, Paul B Mahol wrote: > +movam7, [pb_128] >> +addinq, wq >> +add thresholdq, wq >> +add minq, wq >> +add maxq, wq >> +add outq, wq >> +neg wq >> +.ne

Re: [FFmpeg-devel] [PATCH 08/11] avcodec/v210enc: add AVX-512 10-bit line pack function

2017-11-13 Thread James Darnley
On 2017-11-10 22:13, James Darnley wrote: > The IRC log should appear at the link below. >> https://lists.ffmpeg.org/pipermail/ffmpeg-devel-irc/2017-November/004651.html Of course when I try to predict what number an email will get based on the past few it ends up being out of order. T

[FFmpeg-devel] [PATCH] configure: add audio_frame_queue dependency for aptx codec

2017-11-19 Thread James Darnley
--- configure | 2 ++ 1 file changed, 2 insertions(+) diff --git a/configure b/configure index 8b7b7e164b..48761934be 100755 --- a/configure +++ b/configure @@ -2439,6 +2439,8 @@ amv_encoder_select="aandcttables jpegtables mpegvideoenc" ape_decoder_select="bswapdsp llauddsp" apng_decoder_select

Re: [FFmpeg-devel] [PATCH 03/11] avutil: detect when AVX-512 is available

2017-11-20 Thread James Darnley
On 2017-11-10 03:11, James Almer wrote: > On 11/9/2017 8:58 AM, James Darnley wrote: >> @@ -154,6 +155,13 @@ int ff_get_cpu_flags_x86(void) >> if (ebx & 0x0100) >> rval |= AV_CPU_FLAG_BMI2; >> } >> +#if HAVE_AVX512 /*

[FFmpeg-devel] [PATCH 5/8] lavc/x86/flac_dsp_gpl: cosmetic whitespace alignment

2017-11-26 Thread James Darnley
--- libavcodec/x86/flac_dsp_gpl.asm | 40 1 file changed, 20 insertions(+), 20 deletions(-) diff --git a/libavcodec/x86/flac_dsp_gpl.asm b/libavcodec/x86/flac_dsp_gpl.asm index 4d212ed212..952fc8b86b 100644 --- a/libavcodec/x86/flac_dsp_gpl.asm +++ b/libav

[FFmpeg-devel] [PATCH 0/8] left-overs of an ancient patch set for the flac encoder

2017-11-26 Thread James Darnley
benchmarking I originally did a little less useful because both types of the lpc coder are used for both sample depths (16 and 24). That does make the 32-bit version more useful though because it gets used with 16-bit samples when the intermediates overflow 32 bits. James Darnley (8): avcodec/flac

[FFmpeg-devel] [PATCH 4/8] avcodec/flac: partially unroll loop in flac_enc_lpc_32

2017-11-26 Thread James Darnley
Now does 6 samples per iteration, up from 2. From 1.6 to 2.1 times faster again. 2.5 to 3.9 times faster overall. Runtime is reduced by a further 4 to 17%. Reduced by 9 to 65% overall. Same conditions as previously. --- libavcodec/x86/flac_dsp_gpl.asm | 30 +- 1 fil

[FFmpeg-devel] [PATCH 2/8] avcodec/flac: add AVX2 version of the 16-bit LPC encoder

2017-11-26 Thread James Darnley
When compared to the SSE4 version, runtime is reduced by 0.5 to 20%. After a bug fix log, long ago in e609cfd697 the 16-bit lpc encoder is used so little that the runtime reduction is no longer correct. The function itself is around 2 times faster. (As one might expect for doing twice as many sam

[FFmpeg-devel] [PATCH 8/8] checkasm: add tests for flacenc lpc coder

2017-11-26 Thread James Darnley
--- tests/checkasm/flacdsp.c | 72 1 file changed, 72 insertions(+) diff --git a/tests/checkasm/flacdsp.c b/tests/checkasm/flacdsp.c index dccb54d672..08e5e264ea 100644 --- a/tests/checkasm/flacdsp.c +++ b/tests/checkasm/flacdsp.c @@ -20,13 +20,16

[FFmpeg-devel] [PATCH 3/8] avcodec/flac: add SSE4.2 version of the 32-bit lpc encoder

2017-11-26 Thread James Darnley
From 1.3 to 2.5 times faster. Runtime reduced by 4 to 58%. As with the 16-bit version the speed-up generally increases with compression_level. Also like the 16-bit version, it is not used with levels less than 3. After this bug fix in long, long ago in e609cfd697 this 32-bit lpc encoder is heav

[FFmpeg-devel] [PATCH 1/8] avcodec/flac: document limitations of the LPC encoder

2017-11-26 Thread James Darnley
State that the maximum value of order is 32. This limit is used in both C and x86 assebly code. --- libavcodec/flacdsp.h | 8 1 file changed, 8 insertions(+) diff --git a/libavcodec/flacdsp.h b/libavcodec/flacdsp.h index 7bb0dd0e9a..90fd3f04b5 100644 --- a/libavcodec/flacdsp.h +++ b/lib

[FFmpeg-devel] [PATCH 6/8] lavc/x86/flac_dsp_gpl: partially unroll 32-bit LPC encoder

2017-11-26 Thread James Darnley
Around 1.1 times faster and reduces runtime by up to 6%. --- libavcodec/x86/flac_dsp_gpl.asm | 91 - 1 file changed, 72 insertions(+), 19 deletions(-) diff --git a/libavcodec/x86/flac_dsp_gpl.asm b/libavcodec/x86/flac_dsp_gpl.asm index 952fc8b86b..91989ce56

[FFmpeg-devel] [PATCH 7/8] lavc/flacenc: add AVX2 version of the 32-bit LPC encoder

2017-11-26 Thread James Darnley
When compared to the SSE4.2 version runtime, is reduced by 1 to 26%. The function itself is around 2 times faster. --- libavcodec/x86/flac_dsp_gpl.asm | 56 +++-- libavcodec/x86/flacdsp_init.c | 5 +++- 2 files changed, 47 insertions(+), 14 deletions(-) dif

Re: [FFmpeg-devel] [PATCH 7/8] lavc/flacenc: add AVX2 version of the 32-bit LPC encoder

2017-11-26 Thread James Darnley
On 2017-11-27 00:13, Rostislav Pehlivanov wrote: > On 26 November 2017 at 22:51, James Darnley wrote: >> @@ -123,7 +123,10 @@ RET >> %endmacro >> >> %macro PMINSQ 3 >> -pcmpgtq %3, %2, %1 >> +mova%3, %2 >> +; We cannot use the

Re: [FFmpeg-devel] [PATCH 6/8] lavc/x86/flac_dsp_gpl: partially unroll 32-bit LPC encoder

2017-11-26 Thread James Darnley
On 2017-11-27 00:17, Rostislav Pehlivanov wrote: > On 26 November 2017 at 22:51, James Darnley wrote: >> @@ -152,13 +152,13 @@ RET >> %macro FUNCTION_BODY_32 0 >> >> %if ARCH_X86_64 >> -cglobal flac_enc_lpc_32, 5, 7, 8, mmsize, res, smp, len, order, coefs &

Re: [FFmpeg-devel] avutil/x86util : add macro for 128 bits constant load

2017-11-27 Thread James Darnley
On 2017-11-27 20:19, Martin Vignali wrote: > +%macro VBROADCASTI128 2 ; dst xmm/ymm, src : 128bits val > +%if mmsize == 32 > +vbroadcasti128 %1, %2 > +%else > +mova %1, %2 > +%endif > +%endmacro If the condition was made "mmsize > 16" would this work correctly for zmm registers?

Re: [FFmpeg-devel] [PATCH 7/8] lavc/flacenc: add AVX2 version of the 32-bit LPC encoder

2017-12-02 Thread James Darnley
On 2017-11-27 17:50, Henrik Gramner wrote: > On Sun, Nov 26, 2017 at 11:51 PM, James Darnley > wrote: >> -pd_0_int_min: times 2 dd 0, -2147483648 >> -pq_int_min: times 2 dq -2147483648 >> -pq_int_max: times 2 dq 2147483647 >> +pd_0_int_min: times 4 dd

Re: [FFmpeg-devel] avfilter/x86/vf_threshold : add SSE4 and AVX2 for threshold 16

2017-12-03 Thread James Darnley
On 2017-12-03 19:30, Martin Vignali wrote: > libavfilter/x86/vf_threshold.asm| 19 ++- > libavfilter/x86/vf_threshold_init.c | 34 -- > 2 files changed, 34 insertions(+), 19 deletions(-) > > diff --git a/libavfilter/x86/vf_threshold.asm > b/lib

[FFmpeg-devel] [PATCH 6/7] x86inc: AVX-512 support

2017-12-21 Thread James Darnley
From: Henrik Gramner AVX-512 consists of a plethora of different extensions, but in order to keep things a bit more manageable we group together the following extensions under a single baseline cpu flag which should cover SKL-X and future CPUs: * AVX-512 Foundation (F) * AVX-512 Conflict Detect

[FFmpeg-devel] [PATCH 3/7] avutil: detect when AVX-512 is available

2017-12-21 Thread James Darnley
--- libavutil/x86/cpu.c | 12 ++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/libavutil/x86/cpu.c b/libavutil/x86/cpu.c index f33088c8c7..696f47b3bf 100644 --- a/libavutil/x86/cpu.c +++ b/libavutil/x86/cpu.c @@ -97,6 +97,7 @@ int ff_get_cpu_flags_x86(void) int max_

[FFmpeg-devel] [PATCH 5/7] avcodec: add stride alignment needed for AVX-512

2017-12-21 Thread James Darnley
--- configure | 2 ++ libavcodec/internal.h | 4 +++- 2 files changed, 5 insertions(+), 1 deletion(-) diff --git a/configure b/configure index 07fb825f91..d3187d71ed 100755 --- a/configure +++ b/configure @@ -1892,6 +1892,7 @@ ARCH_FEATURES=" local_aligned simd_align_16

[FFmpeg-devel] [PATCH 1/7] configure: test whether x86 assembler supports AVX-512

2017-12-21 Thread James Darnley
--- configure | 5 + 1 file changed, 5 insertions(+) diff --git a/configure b/configure index d09eec4155..07fb825f91 100755 --- a/configure +++ b/configure @@ -411,6 +411,7 @@ Optimization options (experts only): --disable-fma3 disable FMA3 optimizations --disable-fma4

[FFmpeg-devel] [PATCH 7/7] checkasm: support for AVX-512 functions

2017-12-21 Thread James Darnley
--- tests/checkasm/checkasm.c | 1 + 1 file changed, 1 insertion(+) diff --git a/tests/checkasm/checkasm.c b/tests/checkasm/checkasm.c index 45a70aa87f..ff0ca5b68d 100644 --- a/tests/checkasm/checkasm.c +++ b/tests/checkasm/checkasm.c @@ -204,6 +204,7 @@ static const struct { { "FMA3", "

[FFmpeg-devel] [PATCH 0/7] AVX-512 support (v.3)

2017-12-21 Thread James Darnley
I have addressed all the comments raised in the previous threads. While some patches were okayed last time I am still sending them as part of these to give everyone a final change to see them again and to object if they wish. Henrik Gramner (1): x86inc: AVX-512 support James Darnley (6

[FFmpeg-devel] [PATCH 2/7] avutil: add AVX-512 flags

2017-12-21 Thread James Darnley
--- Changelog | 1 + doc/APIchanges| 3 +++ libavutil/cpu.c | 6 +- libavutil/cpu.h | 1 + libavutil/tests/cpu.c | 1 + libavutil/version.h | 2 +- libavutil/x86/cpu.h | 2 ++ 7 files changed, 14 insertions(+), 2 deletions(-) diff --git a/Changelog b/Change

[FFmpeg-devel] [PATCH 4/7] avutil: add alignment needed for AVX-512

2017-12-21 Thread James Darnley
--- libavutil/mem.c | 2 +- libavutil/x86/cpu.c | 2 ++ 2 files changed, 3 insertions(+), 1 deletion(-) diff --git a/libavutil/mem.c b/libavutil/mem.c index 6ad409daf4..79e8b597f1 100644 --- a/libavutil/mem.c +++ b/libavutil/mem.c @@ -61,7 +61,7 @@ void free(void *ptr); #include "mem_inte

Re: [FFmpeg-devel] [PATCH 0/7] AVX-512 support (v.3)

2017-12-21 Thread James Darnley
On 2017-12-21 15:06, Carl Eugen Hoyos wrote: > 2017-12-21 14:40 GMT+01:00 James Darnley : >> I have addressed all the comments raised in the previous threads. >> While some patches were okayed last time I am still sending them >> as part of these to give everyone a final cha

<    1   2   3   4   5   6   >