to prevent confusion.
We could just change them to "nasm" and be done. We could provide compatability
options. We could adopt Libav's generic "x86asm".
James Darnley (1):
configure: require NASM version 2.11 or newer for external x86
assembly
configure | 17 +
On 2017-06-09 13:41, Ivan Kalvachev wrote:
> On 6/9/17, Michael Niedermayer wrote:
>> seems this breaks build with mingw64, didnt investigate but it
>> fails with these errors:
>>
>> libavcodec/libavcodec.a(opus_pvq_search.o):src/libavcodec/x86/opus_pvq_search.asm:(.text+0x2d):
>> relocation trunc
still have a small optimisation to make and I need to use the correct
coefficients. This will require a large change to the macros. I am sending
this so that people can nitpick my changes.
James Darnley (5):
avcodec/x86: cleanup simple_idct10
avcodec/x86: add x86-64 8-bit simple_idct function
---
libavcodec/x86/simple_idct10_template.asm | 16
1 file changed, 8 insertions(+), 8 deletions(-)
diff --git a/libavcodec/x86/simple_idct10_template.asm
b/libavcodec/x86/simple_idct10_template.asm
index c0d1637ca2..3f398985a5 100644
--- a/libavcodec/x86/simple_idct10_template.
Use named arguments for the functions so we can remove a define. The
stride/linesize argument is now ptrdiff_t type so we no longer need to
sign extend the register.
---
libavcodec/x86/proresdsp.asm | 2 +-
libavcodec/x86/simple_idct10.asm | 8 ++--
libavcodec/x86/simple_i
Rounding contributed by Ronald S. Bultje
---
libavcodec/tests/x86/dct.c | 2 ++
libavcodec/x86/idctdsp_init.c| 19 +++
libavcodec/x86/simple_idct.h | 3 +++
libavcodec/x86/simple_idct10.asm | 8
4 files changed, 32 insertions(+)
diff --git a/libavcodec/te
---
libavcodec/x86/idctdsp_init.c| 2 ++
libavcodec/x86/simple_idct.h | 3 +++
libavcodec/x86/simple_idct10.asm | 23 +++
3 files changed, 28 insertions(+)
diff --git a/libavcodec/x86/idctdsp_init.c b/libavcodec/x86/idctdsp_init.c
index 4b2145e478..1826d01e0e 100644
---
libavcodec/x86/idctdsp_init.c| 2 ++
libavcodec/x86/simple_idct.h | 3 ++
libavcodec/x86/simple_idct10.asm | 61
3 files changed, 66 insertions(+)
diff --git a/libavcodec/x86/idctdsp_init.c b/libavcodec/x86/idctdsp_init.c
index 1826d01e0e..ca
On 2017-06-11 11:34, Ivan Kalvachev wrote:
> On 6/10/17, James Darnley wrote:
>> On 2017-06-09 13:41, Ivan Kalvachev wrote:
>>>
>>> const_*_edge is used on only one place is the code.
>>> Would you check if this patch fixes the issue.
>>>
>>>
kindly give their opinion on the 2nd and 6th patches in
particular I would greatly appreciate it.
Performance gain decoding an MPEG2 HD sample over the old MMX:
- Yorkfield: 210 to 224 fps
- Haswell: 387 to 426 fps
Would anyone like me to get some timer figures for the functions themselves?
James
Use named arguments for the functions so we can remove a define. The
stride/linesize argument is now ptrdiff_t type so we no longer need to
sign extend the register.
---
libavcodec/x86/proresdsp.asm | 2 +-
libavcodec/x86/simple_idct10.asm | 8 ++--
libavcodec/x86/simple_i
This makes it exact to the old MMX one, as reported by libavcodec/tests/dct.
---
libavcodec/x86/proresdsp.asm | 18 +
libavcodec/x86/simple_idct10.asm | 33 ++-
libavcodec/x86/simple_idct10_template.asm | 19 ++
3 fi
---
libavcodec/x86/idctdsp_init.c| 2 ++
libavcodec/x86/simple_idct.h | 3 ++
libavcodec/x86/simple_idct10.asm | 61
3 files changed, 66 insertions(+)
diff --git a/libavcodec/x86/idctdsp_init.c b/libavcodec/x86/idctdsp_init.c
index 1826d01e0e..9d
---
libavcodec/x86/proresdsp.asm | 2 +-
libavcodec/x86/simple_idct10.asm | 8 +++
libavcodec/x86/simple_idct10_template.asm | 37 +--
3 files changed, 25 insertions(+), 22 deletions(-)
diff --git a/libavcodec/x86/proresdsp.asm b/libavcodec/
---
libavcodec/x86/idctdsp_init.c| 2 ++
libavcodec/x86/simple_idct.h | 3 +++
libavcodec/x86/simple_idct10.asm | 23 +++
3 files changed, 28 insertions(+)
diff --git a/libavcodec/x86/idctdsp_init.c b/libavcodec/x86/idctdsp_init.c
index 4b2145e478..1826d01e0e 100644
Rounding contributed by Ronald S. Bultje
---
libavcodec/tests/x86/dct.c | 2 ++
libavcodec/x86/idctdsp_init.c| 19 +++
libavcodec/x86/simple_idct.h | 3 +++
libavcodec/x86/simple_idct10.asm | 8
4 files changed, 32 insertions(+)
diff --git a/libavcodec/te
On 2017-06-12 18:57, Michael Niedermayer wrote:
> On Mon, Jun 12, 2017 at 03:36:06PM +0200, James Darnley wrote:
>> Rounding contributed by Ronald S. Bultje
>> ---
>> libavcodec/tests/x86/dct.c | 2 ++
>> libavcodec/x86/idctdsp_init.c| 19 +++
On 2017-06-13 00:18, James Darnley wrote:
> On 2017-06-12 18:57, Michael Niedermayer wrote:
>> ./ffplay ~/videos/matrixbench_mpeg2.mpg
>> looks pretty bad
>
> If that would happen to be the FATE sample
> mpeg2/matrixbench_mpeg2.lq1.mpg then I see that too.
>
>
---
tests/fate/video.mak | 3 +++
tests/ref/fate/idct-simpleauto | 27 +++
2 files changed, 30 insertions(+)
create mode 100644 tests/ref/fate/idct-simpleauto
diff --git a/tests/fate/video.mak b/tests/fate/video.mak
index d1d35335f2..455c1d3564 100644
--- a/tes
r platforms use their
own functions for simpleauto.
I might follow this with a patch to cleanup idctdsp_init.c
James Darnley (6):
fate: add test of -idct simpleauto
avcodec/x86: cleanup simple_idct10
avcodec/x86: modify simple_idct10 macros to add an action paramter
avcodec/x86: allow future
Created by Ronald S. Bultje
---
libavcodec/x86/simple_idct10_template.asm | 38 +++
1 file changed, 38 insertions(+)
diff --git a/libavcodec/x86/simple_idct10_template.asm
b/libavcodec/x86/simple_idct10_template.asm
index d8ea0bcc6b..51baf84c82 100644
--- a/libavcodec
---
libavcodec/x86/proresdsp.asm | 18 ++
libavcodec/x86/simple_idct10.asm | 29 +
libavcodec/x86/simple_idct10_template.asm | 19 +++
3 files changed, 50 insertions(+), 16 deletions(-)
diff --git a/libavcodec/x86/p
---
libavcodec/x86/proresdsp.asm | 2 +-
libavcodec/x86/simple_idct10.asm | 8 +++
libavcodec/x86/simple_idct10_template.asm | 37 +--
3 files changed, 25 insertions(+), 22 deletions(-)
diff --git a/libavcodec/x86/proresdsp.asm b/libavcodec/
Use named arguments for the functions so we can remove a define. The
stride/linesize argument is now ptrdiff_t type so we no longer need to
sign extend the register.
---
libavcodec/x86/proresdsp.asm | 2 +-
libavcodec/x86/simple_idct10.asm | 8 ++--
libavcodec/x86/simple_i
Includes add/put functions
Rounding contributed by Ronald S. Bultje
---
libavcodec/tests/x86/dct.c | 2 +
libavcodec/x86/idctdsp_init.c| 23 ++
libavcodec/x86/simple_idct.h | 9
libavcodec/x86/simple_idct10.asm | 94
4 files ch
Includes add/put functions
Rounding contributed by Ronald S. Bultje
---
I must be stupid. I dropped the stack space change somewhere.
libavcodec/tests/x86/dct.c | 2 +
libavcodec/x86/idctdsp_init.c| 23 ++
libavcodec/x86/simple_idct.h | 9
libavcodec/x86/simple_idct
On 2017-06-16 03:58, Michael Niedermayer wrote:
> On Thu, Jun 15, 2017 at 05:08:33PM +0200, James Darnley wrote:
>> Includes add/put functions
>>
>> Rounding contributed by Ronald S. Bultje
>> ---
>> I must be stupid. I dropped the stack space change somewhere.
On 2017-06-16 12:48, Paul B Mahol wrote:
> On 6/16/17, James Darnley wrote:
>> On 2017-06-16 03:58, Michael Niedermayer wrote:
>>> theres something wrong with this
>>> it totally breaks this:
>>> make -j12 ffmpeg && ./ffmpeg -ss 1 -i cache:matrixbench
---
libavcodec/x86/mpegvideoenc_template.c | 47 +-
1 file changed, 46 insertions(+), 1 deletion(-)
diff --git a/libavcodec/x86/mpegvideoenc_template.c
b/libavcodec/x86/mpegvideoenc_template.c
index 3ce72e1367..1201be514b 100644
--- a/libavcodec/x86/mpegvideoenc_t
---
libavcodec/x86/mpegvideoenc_template.c | 8 +++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/libavcodec/x86/mpegvideoenc_template.c
b/libavcodec/x86/mpegvideoenc_template.c
index b2512744ca..3ce72e1367 100644
--- a/libavcodec/x86/mpegvideoenc_template.c
+++ b/libavcodec/x
On 2017-06-16 17:48, Michael Niedermayer wrote:
> On Fri, Jun 16, 2017 at 03:53:28PM +0200, James Darnley wrote:
>> ---
>> libavcodec/x86/mpegvideoenc_template.c | 47
>> +-
>> 1 file changed, 46 insertions(+), 1 deletion(-)
>
On 2017-06-16 20:31, Michael Niedermayer wrote:
> On Thu, Jun 15, 2017 at 03:34:21PM +0200, James Darnley wrote:
>> ---
>> tests/fate/video.mak | 3 +++
>> tests/ref/fate/idct-simpleauto | 27 +++
>> 2 files changed, 30 insertions(+)
&
---
libavcodec/x86/mpegvideoenc_template.c | 8 +++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/libavcodec/x86/mpegvideoenc_template.c
b/libavcodec/x86/mpegvideoenc_template.c
index b2512744ca..3ce72e1367 100644
--- a/libavcodec/x86/mpegvideoenc_template.c
+++ b/libavcodec/x
since outdated and I added
a comment to the "DC-only hack" above it in the contitional section.
The other patches either fix or workaround bugs in other code.
James Darnley (9):
avcodec/x86/mpegenc: check IDCT permutation type is a valid value
avcodec/x86/mpegenc: support transpose
---
libavcodec/x86/mpegvideoenc_template.c | 47 +-
1 file changed, 46 insertions(+), 1 deletion(-)
diff --git a/libavcodec/x86/mpegvideoenc_template.c
b/libavcodec/x86/mpegvideoenc_template.c
index 3ce72e1367..1201be514b 100644
--- a/libavcodec/x86/mpegvideoenc_t
---
libavcodec/mdec.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/libavcodec/mdec.c b/libavcodec/mdec.c
index 8e28aa04f0..97bfebbeb7 100644
--- a/libavcodec/mdec.c
+++ b/libavcodec/mdec.c
@@ -213,6 +213,9 @@ static av_cold int decode_init(AVCodecContext *avctx)
{
---
libavcodec/x86/proresdsp.asm | 18 ++
libavcodec/x86/simple_idct10.asm | 29 +
libavcodec/x86/simple_idct10_template.asm | 19 +++
3 files changed, 50 insertions(+), 16 deletions(-)
diff --git a/libavcodec/x86/p
Created by Ronald S. Bultje
---
libavcodec/x86/simple_idct10_template.asm | 38 +++
1 file changed, 38 insertions(+)
diff --git a/libavcodec/x86/simple_idct10_template.asm
b/libavcodec/x86/simple_idct10_template.asm
index d8ea0bcc6b..51baf84c82 100644
--- a/libavcodec
Use named arguments for the functions so we can remove a define. The
stride/linesize argument is now ptrdiff_t type so we no longer need to
sign extend the register.
---
libavcodec/x86/proresdsp.asm | 2 +-
libavcodec/x86/simple_idct10.asm | 8 ++--
libavcodec/x86/simple_i
Includes add/put functions
Rounding contributed by Ronald S. Bultje
---
libavcodec/tests/x86/dct.c| 2 +
libavcodec/x86/idctdsp_init.c | 23
libavcodec/x86/simple_idct.h | 9 +++
libavcodec/x86/simple_idct10.asm | 92 +++
---
libavcodec/x86/proresdsp.asm | 2 +-
libavcodec/x86/simple_idct10.asm | 8 +++
libavcodec/x86/simple_idct10_template.asm | 37 +--
3 files changed, 25 insertions(+), 22 deletions(-)
diff --git a/libavcodec/x86/proresdsp.asm b/libavcodec/
From: "Ronald S. Bultje"
Commit message by James Darnley
The shortcut is based on end-of-block positions. This leads to some
coefficients not being unquantized. This is the symptom of the bug.
A possible candidate for the real bug is the scan table used here in
unquantize does not
From: "Ronald S. Bultje"
---
libavcodec/mdec.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/libavcodec/mdec.c b/libavcodec/mdec.c
index 97bfebbeb7..1e1c8f4c55 100644
--- a/libavcodec/mdec.c
+++ b/libavcodec/mdec.c
@@ -94,7 +94,7 @@ static inline int mdec_decode_block
They now match according to FATE, barring any further bugs with untested
parts
---
libavcodec/x86/idctdsp_init.c | 6 --
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/libavcodec/x86/idctdsp_init.c b/libavcodec/x86/idctdsp_init.c
index 9da60d1a1e..162560d411 100644
--- a/libavco
On 2017-06-20 00:08, Michael Niedermayer wrote:
> On Mon, Jun 19, 2017 at 05:10:58PM +0200, James Darnley wrote:
>> From: "Ronald S. Bultje"
>>
>> Commit message by James Darnley
>>
>> The shortcut is based on end-of-block positions. This leads to
On 2017-06-19 18:30, Michael Niedermayer wrote:
> On Mon, Jun 19, 2017 at 05:10:55PM +0200, James Darnley wrote:
>> ---
>> libavcodec/x86/mpegvideoenc_template.c | 47
>> +-
>> 1 file changed, 46 insertions(+), 1 deletion(-)
>
&
On 2017-06-19 20:30, Ronald S. Bultje wrote:
> Hi,
>
> On Mon, Jun 19, 2017 at 11:10 AM, James Darnley wrote:
>
>> Use named arguments for the functions so we can remove a define. The
>> stride/linesize argument is now ptrdiff_t type so we no longer need to
>
On 2017-06-19 20:31, Ronald S. Bultje wrote:
> Hi,
>
> On Mon, Jun 19, 2017 at 11:11 AM, James Darnley wrote:
>
>> ---
>> libavcodec/x86/proresdsp.asm | 2 +-
>> libavcodec/x86/simple_idct10.asm | 8 +++
>> libavcodec
On 2017-06-20 13:56, Ronald S. Bultje wrote:
> Hi,
>
> On Mon, Jun 19, 2017 at 11:11 AM, James Darnley wrote:
>
>> Created by Ronald S. Bultje
>> ---
>> libavcodec/x86/simple_idct10_template.asm | 38
>> +++
>> 1 file ch
On 2017-06-20 14:47, Ronald S. Bultje wrote:
> This allows using non-simple (e.g. simplemmx) IDCT implementations.
> The result is not bitexact (which is why the fate test continues to
> use -idct simple), but the PSNR between C/MMX goes from ~35 to ~90.
> ---
> libavcodec/mdec.c | 14 ++--
On 2017-06-20 18:16, Ronald S. Bultje wrote:
> On Tue, Jun 20, 2017 at 12:04 PM, James Darnley wrote:
>>> @@ -231,6 +230,13 @@ static av_cold int decode_init(AVCodecContext
>> *avctx)
>>> avctx->pix_fmt = AV_PIX_FMT_YUVJ420P;
>>>
On 2017-06-20 13:55, Ronald S. Bultje wrote:
> Hi,
>
> On Mon, Jun 19, 2017 at 11:11 AM, James Darnley wrote:
>
>> ---
>> libavcodec/x86/proresdsp.asm | 18 ++
>> libavcodec/x86/simple_idct10.asm | 29
>>
On 2017-06-19 17:11, James Darnley wrote:
> diff --git a/libavcodec/x86/simple_idct10_template.asm
> b/libavcodec/x86/simple_idct10_template.asm
> index 51baf84c82..02fd445ec0 100644
> --- a/libavcodec/x86/simple_idct10_template.asm
> +++ b/libavcodec/x86/simple_idct10_template.
On 2017-06-20 00:37, Michael Niedermayer wrote:
> Signed-off-by: Michael Niedermayer
> ---
> libavcodec/mpegvideo.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/libavcodec/mpegvideo.c b/libavcodec/mpegvideo.c
> index 63a30b93ce..e29558b3a2 100644
> --- a/libavcodec/mp
On 2017-10-31 04:30, Michael Niedermayer wrote:
> On Mon, Oct 30, 2017 at 02:08:33PM +0100, James Darnley wrote:
>> These changes were commited to x264 in b568a256 "Experimental nasm
>> support"
>> ---
>> libavutil/x86/x86inc.asm | 16 ++--
&g
On 2017-11-06 21:15, James Almer wrote:
> On 11/6/2017 4:56 PM, James Darnley wrote:
>> Line 733 is the align command to align the start of the function. I
>> can't see why it fails here but not on any other function in that file
>> or any other file.
>>
>> A
hat acutally use ZMM registers.
Henrik Gramner (1):
x86inc: AVX-512 support
James Darnley (10):
configure: test whether x86 assembler supports AVX-512
avutil: add AVX-512 flags
avutil: detect when AVX-512 is available
avutil: add alignment needed for AVX-512
avcodec: add stride alignmen
---
configure | 5 +
1 file changed, 5 insertions(+)
diff --git a/configure b/configure
index f396abda5b..146a87324c 100755
--- a/configure
+++ b/configure
@@ -406,6 +406,7 @@ Optimization options (experts only):
--disable-fma3 disable FMA3 optimizations
--disable-fma4
---
This patch gained the alignmnet increase in mem.c
libavutil/mem.c | 2 +-
libavutil/x86/cpu.c | 2 ++
2 files changed, 3 insertions(+), 1 deletion(-)
diff --git a/libavutil/mem.c b/libavutil/mem.c
index 6ad409daf4..cdf539306f 100644
--- a/libavutil/mem.c
+++ b/libavutil/mem.c
@@ -61,7 +6
---
libavcodec/x86/lossless_videodsp.asm| 5 +
libavcodec/x86/lossless_videodsp_init.c | 5 +
2 files changed, 10 insertions(+)
diff --git a/libavcodec/x86/lossless_videodsp.asm
b/libavcodec/x86/lossless_videodsp.asm
index ba4d4f0153..5649348f86 100644
--- a/libavcodec/x86/lossless_v
---
tests/checkasm/checkasm.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/tests/checkasm/checkasm.c b/tests/checkasm/checkasm.c
index b8b0e32dbd..9fb1438bdb 100644
--- a/tests/checkasm/checkasm.c
+++ b/tests/checkasm/checkasm.c
@@ -192,6 +192,7 @@ static const struct {
{ "FMA3", "
From: James Darnley
Also adjust alignment requirements where nessecary.
---
Whether this patch is committed or not the change to 4xm.c should be picked to
master because the alignment is wrong for the AVX version of this function. I
assume it hasn't been noticed yet because it manages to
From: Henrik Gramner
AVX-512 consists of a plethora of different extensions, but in order to keep
things a bit more manageable we group together the following extensions
under a single baseline cpu flag which should cover SKL-X and future CPUs:
* AVX-512 Foundation (F)
* AVX-512 Conflict Detect
---
I've changed this patch slightly because I discovered that it would cause an
illegal instruction exception on much older processors (probably all without
AVX). I was running xgetbv() almost uncontitionally. Now it is a little more
like what is the in x264 patch.
libavutil/x86/cpu.c | 12 +++
From: James Darnley
---
libavcodec/x86/blockdsp.asm | 11 ---
1 file changed, 4 insertions(+), 7 deletions(-)
diff --git a/libavcodec/x86/blockdsp.asm b/libavcodec/x86/blockdsp.asm
index 9d203df8f5..9d0e8a3242 100644
--- a/libavcodec/x86/blockdsp.asm
+++ b/libavcodec/x86/blockdsp.asm
---
libavutil/cpu.c | 6 +-
libavutil/cpu.h | 1 +
libavutil/tests/cpu.c | 1 +
libavutil/x86/cpu.h | 2 ++
4 files changed, 9 insertions(+), 1 deletion(-)
diff --git a/libavutil/cpu.c b/libavutil/cpu.c
index c8401b8258..6548cc3042 100644
--- a/libavutil/cpu.c
+++ b/libavutil/cp
---
libavcodec/x86/v210enc.asm| 5 +
libavcodec/x86/v210enc_init.c | 7 +++
2 files changed, 12 insertions(+)
diff --git a/libavcodec/x86/v210enc.asm b/libavcodec/x86/v210enc.asm
index 965f2bea3c..5068af27f8 100644
--- a/libavcodec/x86/v210enc.asm
+++ b/libavcodec/x86/v210enc.asm
@@ -
---
configure | 2 ++
libavcodec/internal.h | 4 +++-
2 files changed, 5 insertions(+), 1 deletion(-)
diff --git a/configure b/configure
index 146a87324c..fce8030d91 100755
--- a/configure
+++ b/configure
@@ -1886,6 +1886,7 @@ ARCH_FEATURES="
local_aligned
simd_align_16
On 2017-11-09 20:35, Martin Vignali wrote:
> 2017-11-09 12:58 GMT+01:00 James Darnley :
>
>> From: James Darnley
>>
>> Also adjust alignment requirements where nessecary.
>> ---
>> Whether this patch is committed or not the change to 4xm.c should be
>>
On 2017-11-09 20:43, Martin Vignali wrote:
> 2017-11-09 20:37 GMT+01:00 Martin Vignali :
>> lgtm
>>
>> Can you post your checkasm benchmark result for this ?
Yep
> $ ./tests/checkasm/checkasm --bench --test=llviddsp
> benchmarking with native FFmpeg timers
> nop: 26.0
> checkasm: using random see
On 2017-11-09 20:42, Martin Vignali wrote:
> I doesn't want to block this patch, but
> like you say (in your previous version), that this version is not faster,
> i'm not sure, it's interesting to apply it.
> You already made "real" avx512 version for other funcs, in order to check
> the rest of yo
On 2017-11-10 02:38, James Almer wrote:
> On 11/9/2017 8:58 AM, James Darnley wrote:
>> ---
>> configure | 2 ++
>> libavcodec/internal.h | 4 +++-
>> 2 files changed, 5 insertions(+), 1 deletion(-)
>>
>> diff --git a/configure b/configure
On 2017-11-10 14:32, James Darnley wrote:
> I mentioned previously that using ZMM registers will cause the CPU to
> reduce its frequency.
>
> Gramner said on IRC that a user should spend 20-30% of time in
> AVX-512/ZMM code for it to be a net gain in speed.
> From ffmpeg-devel
On 2017-11-12 21:15, Rostislav Pehlivanov wrote:
> On 12 November 2017 at 19:15, Paul B Mahol wrote:
> +movam7, [pb_128]
>> +addinq, wq
>> +add thresholdq, wq
>> +add minq, wq
>> +add maxq, wq
>> +add outq, wq
>> +neg wq
>> +.ne
On 2017-11-10 22:13, James Darnley wrote:
> The IRC log should appear at the link below.
>> https://lists.ffmpeg.org/pipermail/ffmpeg-devel-irc/2017-November/004651.html
Of course when I try to predict what number an email will get based on
the past few it ends up being out of order.
T
---
configure | 2 ++
1 file changed, 2 insertions(+)
diff --git a/configure b/configure
index 8b7b7e164b..48761934be 100755
--- a/configure
+++ b/configure
@@ -2439,6 +2439,8 @@ amv_encoder_select="aandcttables jpegtables mpegvideoenc"
ape_decoder_select="bswapdsp llauddsp"
apng_decoder_select
On 2017-11-10 03:11, James Almer wrote:
> On 11/9/2017 8:58 AM, James Darnley wrote:
>> @@ -154,6 +155,13 @@ int ff_get_cpu_flags_x86(void)
>> if (ebx & 0x0100)
>> rval |= AV_CPU_FLAG_BMI2;
>> }
>> +#if HAVE_AVX512 /*
---
libavcodec/x86/flac_dsp_gpl.asm | 40
1 file changed, 20 insertions(+), 20 deletions(-)
diff --git a/libavcodec/x86/flac_dsp_gpl.asm b/libavcodec/x86/flac_dsp_gpl.asm
index 4d212ed212..952fc8b86b 100644
--- a/libavcodec/x86/flac_dsp_gpl.asm
+++ b/libav
benchmarking I originally
did a little less useful because both types of the lpc coder are used for both
sample depths (16 and 24). That does make the 32-bit version more useful though
because it gets used with 16-bit samples when the intermediates overflow 32
bits.
James Darnley (8):
avcodec/flac
Now does 6 samples per iteration, up from 2.
From 1.6 to 2.1 times faster again. 2.5 to 3.9 times faster overall.
Runtime is reduced by a further 4 to 17%. Reduced by 9 to 65% overall.
Same conditions as previously.
---
libavcodec/x86/flac_dsp_gpl.asm | 30 +-
1 fil
When compared to the SSE4 version, runtime is reduced by 0.5 to 20%.
After a bug fix log, long ago in e609cfd697 the 16-bit lpc encoder is
used so little that the runtime reduction is no longer correct. The
function itself is around 2 times faster. (As one might expect for
doing twice as many sam
---
tests/checkasm/flacdsp.c | 72
1 file changed, 72 insertions(+)
diff --git a/tests/checkasm/flacdsp.c b/tests/checkasm/flacdsp.c
index dccb54d672..08e5e264ea 100644
--- a/tests/checkasm/flacdsp.c
+++ b/tests/checkasm/flacdsp.c
@@ -20,13 +20,16
From 1.3 to 2.5 times faster. Runtime reduced by 4 to 58%. As with the
16-bit version the speed-up generally increases with compression_level.
Also like the 16-bit version, it is not used with levels less than 3.
After this bug fix in long, long ago in e609cfd697 this 32-bit lpc
encoder is heav
State that the maximum value of order is 32. This limit is used in both
C and x86 assebly code.
---
libavcodec/flacdsp.h | 8
1 file changed, 8 insertions(+)
diff --git a/libavcodec/flacdsp.h b/libavcodec/flacdsp.h
index 7bb0dd0e9a..90fd3f04b5 100644
--- a/libavcodec/flacdsp.h
+++ b/lib
Around 1.1 times faster and reduces runtime by up to 6%.
---
libavcodec/x86/flac_dsp_gpl.asm | 91 -
1 file changed, 72 insertions(+), 19 deletions(-)
diff --git a/libavcodec/x86/flac_dsp_gpl.asm b/libavcodec/x86/flac_dsp_gpl.asm
index 952fc8b86b..91989ce56
When compared to the SSE4.2 version runtime, is reduced by 1 to 26%. The
function itself is around 2 times faster.
---
libavcodec/x86/flac_dsp_gpl.asm | 56 +++--
libavcodec/x86/flacdsp_init.c | 5 +++-
2 files changed, 47 insertions(+), 14 deletions(-)
dif
On 2017-11-27 00:13, Rostislav Pehlivanov wrote:
> On 26 November 2017 at 22:51, James Darnley wrote:
>> @@ -123,7 +123,10 @@ RET
>> %endmacro
>>
>> %macro PMINSQ 3
>> -pcmpgtq %3, %2, %1
>> +mova%3, %2
>> +; We cannot use the
On 2017-11-27 00:17, Rostislav Pehlivanov wrote:
> On 26 November 2017 at 22:51, James Darnley wrote:
>> @@ -152,13 +152,13 @@ RET
>> %macro FUNCTION_BODY_32 0
>>
>> %if ARCH_X86_64
>> -cglobal flac_enc_lpc_32, 5, 7, 8, mmsize, res, smp, len, order, coefs
&
On 2017-11-27 20:19, Martin Vignali wrote:
> +%macro VBROADCASTI128 2 ; dst xmm/ymm, src : 128bits val
> +%if mmsize == 32
> +vbroadcasti128 %1, %2
> +%else
> +mova %1, %2
> +%endif
> +%endmacro
If the condition was made "mmsize > 16" would this work correctly for
zmm registers?
On 2017-11-27 17:50, Henrik Gramner wrote:
> On Sun, Nov 26, 2017 at 11:51 PM, James Darnley
> wrote:
>> -pd_0_int_min: times 2 dd 0, -2147483648
>> -pq_int_min: times 2 dq -2147483648
>> -pq_int_max: times 2 dq 2147483647
>> +pd_0_int_min: times 4 dd
On 2017-12-03 19:30, Martin Vignali wrote:
> libavfilter/x86/vf_threshold.asm| 19 ++-
> libavfilter/x86/vf_threshold_init.c | 34 --
> 2 files changed, 34 insertions(+), 19 deletions(-)
>
> diff --git a/libavfilter/x86/vf_threshold.asm
> b/lib
From: Henrik Gramner
AVX-512 consists of a plethora of different extensions, but in order to keep
things a bit more manageable we group together the following extensions
under a single baseline cpu flag which should cover SKL-X and future CPUs:
* AVX-512 Foundation (F)
* AVX-512 Conflict Detect
---
libavutil/x86/cpu.c | 12 ++--
1 file changed, 10 insertions(+), 2 deletions(-)
diff --git a/libavutil/x86/cpu.c b/libavutil/x86/cpu.c
index f33088c8c7..696f47b3bf 100644
--- a/libavutil/x86/cpu.c
+++ b/libavutil/x86/cpu.c
@@ -97,6 +97,7 @@ int ff_get_cpu_flags_x86(void)
int max_
---
configure | 2 ++
libavcodec/internal.h | 4 +++-
2 files changed, 5 insertions(+), 1 deletion(-)
diff --git a/configure b/configure
index 07fb825f91..d3187d71ed 100755
--- a/configure
+++ b/configure
@@ -1892,6 +1892,7 @@ ARCH_FEATURES="
local_aligned
simd_align_16
---
configure | 5 +
1 file changed, 5 insertions(+)
diff --git a/configure b/configure
index d09eec4155..07fb825f91 100755
--- a/configure
+++ b/configure
@@ -411,6 +411,7 @@ Optimization options (experts only):
--disable-fma3 disable FMA3 optimizations
--disable-fma4
---
tests/checkasm/checkasm.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/tests/checkasm/checkasm.c b/tests/checkasm/checkasm.c
index 45a70aa87f..ff0ca5b68d 100644
--- a/tests/checkasm/checkasm.c
+++ b/tests/checkasm/checkasm.c
@@ -204,6 +204,7 @@ static const struct {
{ "FMA3", "
I have addressed all the comments raised in the previous threads. While some
patches were okayed last time I am still sending them as part of these to give
everyone a final change to see them again and to object if they wish.
Henrik Gramner (1):
x86inc: AVX-512 support
James Darnley (6
---
Changelog | 1 +
doc/APIchanges| 3 +++
libavutil/cpu.c | 6 +-
libavutil/cpu.h | 1 +
libavutil/tests/cpu.c | 1 +
libavutil/version.h | 2 +-
libavutil/x86/cpu.h | 2 ++
7 files changed, 14 insertions(+), 2 deletions(-)
diff --git a/Changelog b/Change
---
libavutil/mem.c | 2 +-
libavutil/x86/cpu.c | 2 ++
2 files changed, 3 insertions(+), 1 deletion(-)
diff --git a/libavutil/mem.c b/libavutil/mem.c
index 6ad409daf4..79e8b597f1 100644
--- a/libavutil/mem.c
+++ b/libavutil/mem.c
@@ -61,7 +61,7 @@ void free(void *ptr);
#include "mem_inte
On 2017-12-21 15:06, Carl Eugen Hoyos wrote:
> 2017-12-21 14:40 GMT+01:00 James Darnley :
>> I have addressed all the comments raised in the previous threads.
>> While some patches were okayed last time I am still sending them
>> as part of these to give everyone a final cha
101 - 200 of 517 matches
Mail list logo