[FFmpeg-devel] [PATCH 1/3] configure: fix _Pragma check.

2023-10-29 Thread Reimar . Doeffinger
From: Reimar Döffinger The test can current pass when _Pragma is not supported, since _Pragma might be treated as a implicitly declared function. This happens e.g. with tinycc. Extending the check to 2 pragmas both matches the actual use better and avoids this misdetection. --- configure | 2 +-

[FFmpeg-devel] [PATCH 2/3] libavutil/aarch64/cpu.c: HWCAPS requires inline asm support.

2023-10-29 Thread Reimar . Doeffinger
From: Reimar Döffinger Fixes compilation with tcc, which does not have aarch64 inline asm support. --- libavutil/aarch64/cpu.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/libavutil/aarch64/cpu.c b/libavutil/aarch64/cpu.c index bd780e8591..0d7c1e268d 100644 --- a/libavutil

[FFmpeg-devel] [PATCH 3/3] libavutil/log.c: only include valgrind header when used.

2023-10-29 Thread Reimar . Doeffinger
From: Reimar Döffinger This is cleaner, but it is also a workaround for when the header exists, but cannot be compiled. This will happen when the compiler has no inline asm support. Possibly the configure check should be improved as well. --- libavutil/log.c | 2 +- 1 file changed, 1 insertion(+

[FFmpeg-devel] [PATCH] aarch64: Implement stack spilling in a consistent way.

2022-10-09 Thread Reimar . Doeffinger
From: Reimar Döffinger Currently it is done in several different ways, which might cause needless dependencies or in case of tx_float_neon.S is incorrect. Signed-off-by: Reimar Döffinger --- libavcodec/aarch64/fft_neon.S | 3 +- libavcodec/aarch64/h264idct_neon.S | 6 +- libavco

[FFmpeg-devel] [PATCH] libavcodec/hevcdsp: port SIMD idct functions from 32-bit.

2021-01-08 Thread Reimar . Doeffinger
From: Reimar Döffinger Makes SIMD-optimized 8x8 and 16x16 idcts for 8 and 10 bit depth available on aarch64. For a UHD HDR (10 bit) sample video these were consuming the most time and this optimization reduced overall decode time from 19.4s to 16.4s, approximately 15% speedup. Test sample was the

[FFmpeg-devel] [PATCH] libavcodec/aarch64/hevcdsp_idct_neon.S: Also port add_residual functions.

2021-01-10 Thread Reimar . Doeffinger
From: Reimar Döffinger Speedup is fairly small, around 1.5%, but these are fairly simple. --- libavcodec/aarch64/hevcdsp_idct_neon.S| 190 ++ libavcodec/aarch64/hevcdsp_init_aarch64.c | 24 +++ 2 files changed, 214 insertions(+) diff --git a/libavcodec/aarch64/hevcdsp_i

[FFmpeg-devel] [PATCH] libavcodec/aarch64/hevcdsp_idct_neon.S: Also port add_residual functions.

2021-01-10 Thread Reimar . Doeffinger
From: Reimar Döffinger Speedup is fairly small, around 1.5%, but these are fairly simple. --- libavcodec/aarch64/hevcdsp_idct_neon.S| 190 ++ libavcodec/aarch64/hevcdsp_init_aarch64.c | 24 +++ 2 files changed, 214 insertions(+) diff --git a/libavcodec/aarch64/hevcdsp_i

[FFmpeg-devel] [PATCH] Add support for "omp simd" pragma.

2021-01-10 Thread Reimar . Doeffinger
From: Reimar Döffinger This requests loops to be vectorized using SIMD instructions. The performance increase is far from hand-optimized assembly but still significant over the plain C version. Typical values are a 2-4x speedup where a hand-written version would achieve 4x-10x. So it is far from

[FFmpeg-devel] [PATCH] libswscale/aarch64/hscale.S: Support more bit-depth variants.

2021-01-10 Thread Reimar . Doeffinger
From: Reimar Döffinger Trivially expand hscale assembler to support > 8 bit formats both for input and output. 16-bit input is not supported as I am not certain how to get sufficient test coverage. --- libswscale/aarch64/hscale.S | 53 ++-- libswscale/aarch64/sws

[FFmpeg-devel] [PATCH] libavcodec/hevcdsp: port SIMD idct functions from 32-bit.

2021-01-12 Thread Reimar . Doeffinger
From: Reimar Döffinger Makes SIMD-optimized 8x8 and 16x16 idcts for 8 and 10 bit depth available on aarch64. For a UHD HDR (10 bit) sample video these were consuming the most time and this optimization reduced overall decode time from 19.4s to 16.4s, approximately 15% speedup. Test sample was the

[FFmpeg-devel] [PATCH] configure: Set MSVC as_default later.

2021-01-15 Thread Reimar . Doeffinger
From: Reimar Döffinger It would get immediately overridden to $cc, which in case of gas-preprocessor missing would result in it trying to use cl.exe for asm files instead of erroring out. This is because cl.exe does not fail but just print a warning when it is given a file it does not know what t

[FFmpeg-devel] [PATCH] libavcodec/hevcdsp: port SIMD idct functions from 32-bit.

2021-01-15 Thread Reimar . Doeffinger
From: Reimar Döffinger Makes SIMD-optimized 8x8 and 16x16 idcts for 8 and 10 bit depth available on aarch64. For a UHD HDR (10 bit) sample video these were consuming the most time and this optimization reduced overall decode time from 19.4s to 16.4s, approximately 15% speedup. Test sample was the

[FFmpeg-devel] [PATCH] libaformat: fix incorrect handling of incomplete AVBPrint.

2023-06-22 Thread Reimar . Doeffinger
From: Reimar Döffinger Change some internal APIs a bit to make it harder to make such mistakes. In particular, have the read chunk functions return an error when the result is incomplete. This might be less flexible, but since there has been no use-case for that so far, avoiding coding mistakes s

[FFmpeg-devel] [PATCH] libaformat: fix incorrect handling of incomplete AVBPrint.

2023-07-23 Thread Reimar . Doeffinger
From: Reimar Döffinger Change some internal APIs a bit to make it harder to make such mistakes. In particular, have the read chunk functions return an error when the result is incomplete. This might be less flexible, but since there has been no use-case for that so far, avoiding coding mistakes s

[FFmpeg-devel] [PATCH] hevcdsp_idct_neon.S: Avoid unnecessary mov.

2023-07-26 Thread Reimar . Doeffinger
From: Reimar Döffinger ret can be given an argument instead. This is also consistent with how other assembler code in FFmpeg does it. --- libavcodec/aarch64/hevcdsp_idct_neon.S | 6 ++ 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/libavcodec/aarch64/hevcdsp_idct_neon.S b/lib

[FFmpeg-devel] [PATCH] libavformat: fix incorrect handling of incomplete AVBPrint.

2023-07-27 Thread Reimar . Doeffinger
From: Reimar Döffinger Change some internal APIs a bit to make it harder to make such mistakes. In particular, have the read chunk functions return an error when the result is incomplete. This might be less flexible, but since there has been no use-case for that so far, avoiding coding mistakes s

[FFmpeg-devel] [PATCH] [RFC] tools/patcheck: portability fixes.

2023-07-27 Thread Reimar . Doeffinger
From: Reimar Döffinger Enough to make it run on macOS. In particular: - fix "empty subexpression" errors caused by constructs like (smth|), use ? instead to make them optional - no -d option for xargs, use the more standard -0 and use tr to replace newlines with 0. Not sure if these cause is