Re: [FFmpeg-devel] [FFmpeg-devel, v2] gcc: Relaxing auto-vectorization limitation.

Martin Storsjö Thu, 12 Jun 2025 02:41:59 -0700

On Thu, 29 May 2025, Zhao Zhili wrote:

On May 29, 2025, at 15:03, Jiawei <jia...@iscas.ac.cn> wrote:


This patch modifies the FFmpeg build system to remove the explicit disabling
of GCC's auto-vectorization feature.

Modern GCC versions have demonstrated stable auto-vectorization capabilities
through extensive optimizations in loop analysis and SIMD code generation.
The explicit -fno-tree-vectorize flag originally added in commit 973859f
(2009) to workaround early GCC vectorization instability is no longer
necessary for recent gcc versions.

Key improvements justifying this change:
1. Enhanced heuristics for loop vectorization cost models
2. Mature handling of alignment and memory access patterns
3. Robust fallback mechanisms for unsupported architectures

This change allows FFmpeg to benefit from automated SIMD optimizations
when built with -O3 optimization level, particularly improving
performance on x86_64 (AVX), ARM64 (SVE) and RISC-V(RVV) architectures.

[1] 
https://git.ffmpeg.org/gitweb/ffmpeg.git/commit/973859f5230e77beea7bb59dc081870689d6d191

Version log:
 Only allow GCC versions >= 13 to use auto-vectorization.
Disscussion see:
https://patchwork.ffmpeg.org/project/ffmpeg/patch/20250521061750.54882-1-jia...@iscas.ac.cn/

---
configure | 1 -
1 file changed, 1 deletion(-)

Signed-off-by: Jiawei <jia...@iscas.ac.cn>
---
configure | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/configure b/configure
index 3730b0524c..91e3e107c2 100755
--- a/configure
+++ b/configure
@@ -7656,7 +7656,11 @@ if enabled icc; then
            disable aligned_stack
    fi
elif enabled gcc; then
-    check_optflags -fno-tree-vectorize
+    gcc_version=$($cc -dumpversion)
+    major_version=${gcc_version%%.*}
+    if [ $major_version -lt 13 ]; then
+        check_optflags -fno-tree-vectorize
+    fi
    check_cflags -Werror=format-security
    check_cflags -Werror=implicit-function-declaration
    check_cflags -Werror=missing-prototypes
--
2.43.0

This patch modifies the FFmpeg build system to remove the explicit disabling
of GCC's auto-vectorization feature.

Modern GCC versions have demonstrated stable auto-vectorization capabilities
through extensive optimizations in loop analysis and SIMD code generation.
The explicit -fno-tree-vectorize flag originally added in commit 973859f
(2009) to workaround early GCC vectorization instability is no longer
necessary for recent gcc versions.

Key improvements justifying this change:
1. Enhanced heuristics for loop vectorization cost models
2. Mature handling of alignment and memory access patterns
3. Robust fallback mechanisms for unsupported architectures

This change allows FFmpeg to benefit from automated SIMD optimizations
when built with -O3 optimization level, particularly improving
performance on x86_64 (AVX), ARM64 (SVE) and RISC-V(RVV) architectures.

[1] 
https://git.ffmpeg.org/gitweb/ffmpeg.git/commit/973859f5230e77beea7bb59dc081870689d6d191

Version log:
 Only allow GCC versions >= 13 to use auto-vectorization.
Disscussion see:
https://patchwork.ffmpeg.org/project/ffmpeg/patch/20250521061750.54882-1-jia...@iscas.ac.cn/

---
configure | 1 -
1 file changed, 1 deletion(-)

Signed-off-by: Jiawei <jia...@iscas.ac.cn>
---
configure | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/configure b/configure
index 3730b0524c..91e3e107c2 100755
--- a/configure
+++ b/configure
@@ -7656,7 +7656,11 @@ if enabled icc; then
            disable aligned_stack
    fi
elif enabled gcc; then
-    check_optflags -fno-tree-vectorize
+    gcc_version=$($cc -dumpversion)
+    major_version=${gcc_version%%.*}
+    if [ $major_version -lt 13 ]; then
+        check_optflags -fno-tree-vectorize
+    fi
    check_cflags -Werror=format-security
    check_cflags -Werror=implicit-function-declaration
    check_cflags -Werror=missing-prototypes
--
2.43.0

This patch modifies the FFmpeg build system to remove the explicit disabling
of GCC's auto-vectorization feature.

Modern GCC versions have demonstrated stable auto-vectorization capabilities
through extensive optimizations in loop analysis and SIMD code generation.
The explicit -fno-tree-vectorize flag originally added in commit 973859f
(2009) to workaround early GCC vectorization instability is no longer
necessary for recent gcc versions.

Key improvements justifying this change:
1. Enhanced heuristics for loop vectorization cost models
2. Mature handling of alignment and memory access patterns
3. Robust fallback mechanisms for unsupported architectures

This change allows FFmpeg to benefit from automated SIMD optimizations
when built with -O3 optimization level, particularly improving
performance on x86_64 (AVX), ARM64 (SVE) and RISC-V(RVV) architectures.

[1] 
https://git.ffmpeg.org/gitweb/ffmpeg.git/commit/973859f5230e77beea7bb59dc081870689d6d191

Version log:
 Only allow GCC versions >= 13 to use auto-vectorization.
Disscussion see:
https://patchwork.ffmpeg.org/project/ffmpeg/patch/20250521061750.54882-1-jia...@iscas.ac.cn/

---
configure | 1 -
1 file changed, 1 deletion(-)

Signed-off-by: Jiawei <jia...@iscas.ac.cn>
---
configure | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/configure b/configure
index 3730b0524c..91e3e107c2 100755
--- a/configure
+++ b/configure
@@ -7656,7 +7656,11 @@ if enabled icc; then
            disable aligned_stack
    fi
elif enabled gcc; then
-    check_optflags -fno-tree-vectorize
+    gcc_version=$($cc -dumpversion)
+    major_version=${gcc_version%%.*}
+    if [ $major_version -lt 13 ]; then
+        check_optflags -fno-tree-vectorize
+    fi
    check_cflags -Werror=format-security
    check_cflags -Werror=implicit-function-declaration
    check_cflags -Werror=missing-prototypes
--
2.43.0


It looks like the patch format is corrupted.

I’m OK with the code change. However, the commit message is misleading. As 
already pointed out
by multiple developers, this option doesn’t help with AVX, SVE and RVV because 
we can’t assume
they are available at runtime, unless build and run on a particular hardware.

I'm also ok with the code change in itself, but I would also prefer not toadvertise or motivate the change with non-default instruction sets likeAVX, SVE and RVV. (For instruction sets in the base architecture sets,like NEON, it can be useful though.)

It would also be good to mention previous attempts to do the same, whichwas done in 2016 in cb8646af24bd8e9627cc5e1c62b049a00fe0b07b and revertedin fd6dbc53855fbfc9a782095d0ffe11dd3a98905f. The issues that were noticedat that point were relating to the complicated inline x86 cabac assembly,which nearly exhausts all available registers. In182663a58a7a099e02e76da3b0f96d63e5c26a6d (in 2023) this function was madenon-inline, so the issues with exhausting registers shouldn't affect otherfunctions so much. So this should be essential to mention, as to why wehope this attempt will work better this time, compared to last time.


// Martin
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [FFmpeg-devel, v2] gcc: Relaxing auto-vectorization limitation.

Reply via email to