On 7/11/2024 10:54 AM, Martin Storsjö wrote:
On Thu, 11 Jul 2024, Martin Storsjö wrote:
On Thu, 11 Jul 2024, James Almer wrote:
On 7/11/2024 10:08 AM, Martin Storsjö wrote:
On Wed, 10 Jul 2024, James Almer wrote:
ffmpeg | branch: master | James Almer <jamr...@gmail.com> | Wed Jul
10 13:00:20 2024 -0300| [bd1bcb07e0f29c135103a402d71b343a09ad1690]
| committer: James Almer
x86/intreadwrite: use intrinsics instead of inline asm for AV_COPY128
This has the benefit of removing any SSE -> AVX penalty that may
happen when
the compiler emits VEX encoded instructions.
Signed-off-by: James Almer <jamr...@gmail.com>
http://git.videolan.org/gitweb.cgi/ffmpeg.git/?a=commit;h=bd1bcb07e0f29c135103a402d71b343a09ad1690
---
configure | 5 ++++-
libavutil/x86/intreadwrite.h | 20 +++++++-------------
2 files changed, 11 insertions(+), 14 deletions(-)
diff --git a/configure b/configure
index f84fefeaab..7151ed1de3 100755
--- a/configure
+++ b/configure
@@ -2314,6 +2314,7 @@ HEADERS_LIST="
INTRINSICS_LIST="
intrinsics_neon
+ intrinsics_sse
intrinsics_sse2
"
@@ -2744,7 +2745,8 @@ armv6t2_deps="arm"
armv8_deps="aarch64"
neon_deps_any="aarch64 arm"
intrinsics_neon_deps="neon"
-intrinsics_sse2_deps="sse2"
+intrinsics_sse_deps="sse"
+intrinsics_sse2_deps="sse2 intrinsics_sse"
vfp_deps="arm"
vfpv3_deps="vfp"
setend_deps="arm"
@@ -6446,6 +6448,7 @@ elif enabled loongarch; then
fi
check_cc intrinsics_neon arm_neon.h "int16x8_t test = vdupq_n_s16(0)"
+check_cc intrinsics_sse immintrin.h "__m128 test = _mm_setzero_ps()"
check_cc intrinsics_sse2 emmintrin.h "__m128i test =
_mm_setzero_si128()"
check_ldflags -Wl,--as-needed
diff --git a/libavutil/x86/intreadwrite.h
b/libavutil/x86/intreadwrite.h
index 9bbef00dba..6546eb016c 100644
--- a/libavutil/x86/intreadwrite.h
+++ b/libavutil/x86/intreadwrite.h
@@ -22,29 +22,25 @@
#define AVUTIL_X86_INTREADWRITE_H
#include <stdint.h>
+#if HAVE_INTRINSICS_SSE
+#include <immintrin.h>
+#endif
This change seems to have broken builds for x86 with Clang 16 or
newer. (Clang 15 and lower seems to be fine.)
See e.g.
http://fate.ffmpeg.org/log.cgi?slot=i686-mingw32-clang-trunk&time=20240711035948&log=compile for an example
of the error. The issue is that a clang internal intrinsics header contains "_mm_comige_sh(__m128h A,",
i.e. a parameter with the name "A" (which toolchain provided headers shouldn't use). This clashes with
libavcodec/huffuyv.h, which has a "#define A 3".
This is obviously a Clang intrinsics header bug, but we can't fix
the existing Clang 16-18 releases that are out there, so I guess
what we
can
do is change our "define A" to something more elaborate. (IIRC there
are
some similar issues with names with ncurses and/or android headers
too.)
// Martin
We also do "#define A AV_OPT_FLAG_AUDIO_PARAM" in options_table.h and
probably other places, so changing huffyuv.h may not be enough.
That's quite possible, but those cases may be including intreadwrite.h
before that, do it's possible it might not trigger there.
I'll see how many places need to be changed here.
huffyuvenc.c seems to be the only file that runs into the issue; it
includes put_bits.h (which brings in libavutil/intreadwrite.h) after
huffyuv.h.
Can we move the huffyuv.h include right after put_bits.h then? I
personally prefer that over changing the RGBA defines to workaround a
compiler bug.
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".