https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120475

            Bug ID: 120475
           Summary: vector<bool> is 60x slower with ASan
                    detect_stack_use_after_return=1
           Product: gcc
           Version: 13.3.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c++
          Assignee: unassigned at gcc dot gnu.org
          Reporter: dani at danielbertalan dot dev
  Target Milestone: ---

When compiled with -O2 -fsanitize=address -fsanitize=undefined, the code below
runs 60 times slower if ASAN_OPTIONS=detect_stack_use_after_return=1 is set
(default since GCC 13) than with stack use after return checking disabled.

Benchmark 1: env ASAN_OPTIONS=detect_stack_use_after_return=1 ./bug
  Time (mean ± σ):     10.443 s ±  0.846 s    [User: 10.432 s, System: 0.009 s]
  Range (min … max):   10.056 s … 12.677 s    10 runs

Benchmark 2: env ASAN_OPTIONS=detect_stack_use_after_return=0 ./bug
  Time (mean ± σ):     167.9 ms ±   3.7 ms    [User: 161.7 ms, System: 5.9 ms]
  Range (min … max):   161.5 ms … 174.1 ms    18 runs

Summary
  env ASAN_OPTIONS=detect_stack_use_after_return=0 ./bug ran
   62.20 ± 5.22 times faster than env
ASAN_OPTIONS=detect_stack_use_after_return=1 ./bug

GCC version: g++ (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0
Target: aarch64-linux-gnu
The benchmark numbers are from an AWS instance with the Graviton 2 processor

---
// Godbolt: https://godbolt.org/z/5jGc7rc9v

#include <vector>
#include <iostream>

[[gnu::noinline, gnu::noipa]] void test(std::vector<int> &a) {
    const int M = 2e6 + 10;

    std::vector<bool> isprime(M+1, true);
    isprime[0] = isprime[1] = false;
    for (int i = 2; i <= M; i++) {
        if (isprime[i] && (long long) i*i <= M) {
            for (int j = i*i; j <= M; j+=i) isprime[j] = false;
        }
    }
    int ans = 1;
    std::vector<int> even, res;
    for (int i = 0; i < a.size(); i++) {
        if (a[i]%2) even.push_back(a[i]);
    }
    res = {a[0]};

    for (int e : a) {
        if (ans < 2 && !(e%2)) {
            for (int& j : even) {
                if (isprime[e+j]) {
                    ans = 2;
                    res.clear(); res.push_back(e); res.push_back(j);
                    break;
                }
            }
        }
    }
    std::cout << ans << "\n";
    for (const int& e : res) std::cout << e << " ";
}

int main() {
    std::vector<int> v = {2,3};
    test(v);
}


perf shows that 98% of the time is spent in __asan_stack_malloc_2, which is
called from vector<bool>::operator[]. It looks like that function gets big
enough for it not to be considered for inlining under -O2.

Reply via email to