From: Reimar Döffinger
The test can current pass when _Pragma is not supported, since
_Pragma might be treated as a implicitly declared function.
This happens e.g. with tinycc.
Extending the check to 2 pragmas both matches the actual use
better and avoids this misdetection.
---
configure | 2 +-
From: Reimar Döffinger
Fixes compilation with tcc, which does not have aarch64
inline asm support.
---
libavutil/aarch64/cpu.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/libavutil/aarch64/cpu.c b/libavutil/aarch64/cpu.c
index bd780e8591..0d7c1e268d 100644
--- a/libavutil
From: Reimar Döffinger
This is cleaner, but it is also a workaround for when
the header exists, but cannot be compiled.
This will happen when the compiler has no inline asm
support.
Possibly the configure check should be improved as well.
---
libavutil/log.c | 2 +-
1 file changed, 1 insertion(+
From: Reimar Döffinger
Currently it is done in several different ways, which
might cause needless dependencies or in case of
tx_float_neon.S is incorrect.
Signed-off-by: Reimar Döffinger
---
libavcodec/aarch64/fft_neon.S | 3 +-
libavcodec/aarch64/h264idct_neon.S | 6 +-
libavco
From: Reimar Döffinger
Makes SIMD-optimized 8x8 and 16x16 idcts for 8 and 10 bit depth
available on aarch64.
For a UHD HDR (10 bit) sample video these were consuming the most time
and this optimization reduced overall decode time from 19.4s to 16.4s,
approximately 15% speedup.
Test sample was the
From: Reimar Döffinger
Speedup is fairly small, around 1.5%, but these are fairly simple.
---
libavcodec/aarch64/hevcdsp_idct_neon.S| 190 ++
libavcodec/aarch64/hevcdsp_init_aarch64.c | 24 +++
2 files changed, 214 insertions(+)
diff --git a/libavcodec/aarch64/hevcdsp_i
From: Reimar Döffinger
Speedup is fairly small, around 1.5%, but these are fairly simple.
---
libavcodec/aarch64/hevcdsp_idct_neon.S| 190 ++
libavcodec/aarch64/hevcdsp_init_aarch64.c | 24 +++
2 files changed, 214 insertions(+)
diff --git a/libavcodec/aarch64/hevcdsp_i
From: Reimar Döffinger
This requests loops to be vectorized using SIMD
instructions.
The performance increase is far from hand-optimized
assembly but still significant over the plain C version.
Typical values are a 2-4x speedup where a hand-written
version would achieve 4x-10x.
So it is far from
From: Reimar Döffinger
Trivially expand hscale assembler to support > 8 bit formats
both for input and output.
16-bit input is not supported as I am not certain how to
get sufficient test coverage.
---
libswscale/aarch64/hscale.S | 53 ++--
libswscale/aarch64/sws
From: Reimar Döffinger
Makes SIMD-optimized 8x8 and 16x16 idcts for 8 and 10 bit depth
available on aarch64.
For a UHD HDR (10 bit) sample video these were consuming the most time
and this optimization reduced overall decode time from 19.4s to 16.4s,
approximately 15% speedup.
Test sample was the
From: Reimar Döffinger
It would get immediately overridden to $cc, which in case
of gas-preprocessor missing would result in it trying
to use cl.exe for asm files instead of erroring out.
This is because cl.exe does not fail but just print a warning
when it is given a file it does not know what t
From: Reimar Döffinger
Makes SIMD-optimized 8x8 and 16x16 idcts for 8 and 10 bit depth
available on aarch64.
For a UHD HDR (10 bit) sample video these were consuming the most time
and this optimization reduced overall decode time from 19.4s to 16.4s,
approximately 15% speedup.
Test sample was the
From: Reimar Döffinger
Change some internal APIs a bit to make it harder to make
such mistakes.
In particular, have the read chunk functions return an error
when the result is incomplete.
This might be less flexible, but since there has been no
use-case for that so far, avoiding coding mistakes s
From: Reimar Döffinger
Change some internal APIs a bit to make it harder to make
such mistakes.
In particular, have the read chunk functions return an error
when the result is incomplete.
This might be less flexible, but since there has been no
use-case for that so far, avoiding coding mistakes s
From: Reimar Döffinger
ret can be given an argument instead.
This is also consistent with how other assembler code
in FFmpeg does it.
---
libavcodec/aarch64/hevcdsp_idct_neon.S | 6 ++
1 file changed, 2 insertions(+), 4 deletions(-)
diff --git a/libavcodec/aarch64/hevcdsp_idct_neon.S
b/lib
From: Reimar Döffinger
Change some internal APIs a bit to make it harder to make
such mistakes.
In particular, have the read chunk functions return an error
when the result is incomplete.
This might be less flexible, but since there has been no
use-case for that so far, avoiding coding mistakes s
From: Reimar Döffinger
Enough to make it run on macOS.
In particular:
- fix "empty subexpression" errors caused by constructs like (smth|),
use ? instead to make them optional
- no -d option for xargs, use the more standard -0 and use tr to
replace newlines with 0.
Not sure if these cause is
17 matches
Mail list logo