https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120475
Bug ID: 120475 Summary: vector<bool> is 60x slower with ASan detect_stack_use_after_return=1 Product: gcc Version: 13.3.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: dani at danielbertalan dot dev Target Milestone: --- When compiled with -O2 -fsanitize=address -fsanitize=undefined, the code below runs 60 times slower if ASAN_OPTIONS=detect_stack_use_after_return=1 is set (default since GCC 13) than with stack use after return checking disabled. Benchmark 1: env ASAN_OPTIONS=detect_stack_use_after_return=1 ./bug Time (mean ± σ): 10.443 s ± 0.846 s [User: 10.432 s, System: 0.009 s] Range (min … max): 10.056 s … 12.677 s 10 runs Benchmark 2: env ASAN_OPTIONS=detect_stack_use_after_return=0 ./bug Time (mean ± σ): 167.9 ms ± 3.7 ms [User: 161.7 ms, System: 5.9 ms] Range (min … max): 161.5 ms … 174.1 ms 18 runs Summary env ASAN_OPTIONS=detect_stack_use_after_return=0 ./bug ran 62.20 ± 5.22 times faster than env ASAN_OPTIONS=detect_stack_use_after_return=1 ./bug GCC version: g++ (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0 Target: aarch64-linux-gnu The benchmark numbers are from an AWS instance with the Graviton 2 processor --- // Godbolt: https://godbolt.org/z/5jGc7rc9v #include <vector> #include <iostream> [[gnu::noinline, gnu::noipa]] void test(std::vector<int> &a) { const int M = 2e6 + 10; std::vector<bool> isprime(M+1, true); isprime[0] = isprime[1] = false; for (int i = 2; i <= M; i++) { if (isprime[i] && (long long) i*i <= M) { for (int j = i*i; j <= M; j+=i) isprime[j] = false; } } int ans = 1; std::vector<int> even, res; for (int i = 0; i < a.size(); i++) { if (a[i]%2) even.push_back(a[i]); } res = {a[0]}; for (int e : a) { if (ans < 2 && !(e%2)) { for (int& j : even) { if (isprime[e+j]) { ans = 2; res.clear(); res.push_back(e); res.push_back(j); break; } } } } std::cout << ans << "\n"; for (const int& e : res) std::cout << e << " "; } int main() { std::vector<int> v = {2,3}; test(v); } perf shows that 98% of the time is spent in __asan_stack_malloc_2, which is called from vector<bool>::operator[]. It looks like that function gets big enough for it not to be considered for inlining under -O2.