sebpop added a comment. I tested this change on Graviton2 aarch64-linux by building https://github.com/xianyi/OpenBLAS with `clang -O3 -moutline-atomics` and `make test`: all tests pass with and without outline-atomics. Clang was configured to use libgcc.
I also tested https://github.com/boostorg/boost.git with and without -moutline-atomics, and there are no new fails. Here is how I built and ran the tests for boost: git clone --recursive https://github.com/boostorg/boost.git $HOME/boost cd $HOME/boost mkdir usr ./bootstrap.sh --prefix=$HOME/boost/usr # in project-config.jam line 12 # replace `using gcc ;` with `using clang : : $HOME/llvm-project/usr/bin/clang++ ;` ./b2 --build-type=complete --layout=versioned -a cd status ../b2 # runs all regression tests I also looked at the performance of some atomic operations using google-benchmark on Ubuntu 20.04 c6g instance with Graviton2 (Neoverse-N1). Performance is better when using LSE instructions compared to generic armv8-a code. The overhead of -moutline-atomics is negligible compared to armv8-a+lse. clang trunk as of today produces slightly slower code than gcc-9 with and without -moutline-atomics. $ cat a.cc #include <benchmark/benchmark.h> #include <atomic> std::atomic<int> i; static void BM_atomic_increment(benchmark::State& state) { for (auto _ : state) benchmark::DoNotOptimize(i++); } BENCHMARK(BM_atomic_increment); int j; static void BM_atomic_fetch_add(benchmark::State& state) { for (auto _ : state) benchmark::DoNotOptimize(__atomic_fetch_add(&j, 1, __ATOMIC_SEQ_CST)); } BENCHMARK(BM_atomic_fetch_add); int k; static void BM_atomic_compare_exchange(benchmark::State& state) { for (auto _ : state) benchmark::DoNotOptimize(__atomic_compare_exchange (&j, &k, &k, 1, __ATOMIC_ACQUIRE, __ATOMIC_ACQUIRE)); } BENCHMARK(BM_atomic_compare_exchange); template<class T> struct node { T data; node* next; node(const T& data) : data(data), next(nullptr) {} }; static void BM_std_atomic_compare_exchange(benchmark::State& state) { node<int>* new_node = new node<int>(42); std::atomic<node<int>*> head; for (auto _ : state) benchmark::DoNotOptimize(std::atomic_compare_exchange_weak_explicit (&head, &new_node->next, new_node, std::memory_order_release, std::memory_order_relaxed)); } BENCHMARK(BM_std_atomic_compare_exchange); BENCHMARK_MAIN(); --- $ ./go.sh + g++ -o generic-v8 a.cc -std=c++11 -O2 -isystem benchmark/include -Lbenchmark/build/src -lbenchmark -lpthread + ./generic-v8 2020-12-06 01:06:26 Running ./generic-v8 Run on (64 X 243.75 MHz CPU s) CPU Caches: L1 Data 64 KiB (x64) L1 Instruction 64 KiB (x64) L2 Unified 1024 KiB (x64) L3 Unified 32768 KiB (x1) Load Average: 64.36, 59.36, 36.41 ***WARNING*** Library was built as DEBUG. Timings may be affected. ------------------------------------------------------------------------- Benchmark Time CPU Iterations ------------------------------------------------------------------------- BM_atomic_increment 7.21 ns 7.20 ns 97116662 BM_atomic_fetch_add 7.20 ns 7.20 ns 97152394 BM_atomic_compare_exchange 7.71 ns 7.71 ns 90780423 BM_std_atomic_compare_exchange 7.61 ns 7.61 ns 92037159 + /home/ubuntu/llvm-project/nin/bin/clang++ -o clang-generic-v8 a.cc -std=c++11 -O2 -isystem benchmark/include -Lbenchmark/build/src -lbenchmark -lpthread + ./clang-generic-v8 2020-12-06 01:06:30 Running ./clang-generic-v8 Run on (64 X 243.75 MHz CPU s) CPU Caches: L1 Data 64 KiB (x64) L1 Instruction 64 KiB (x64) L2 Unified 1024 KiB (x64) L3 Unified 32768 KiB (x1) Load Average: 64.57, 59.49, 36.57 ***WARNING*** Library was built as DEBUG. Timings may be affected. ------------------------------------------------------------------------- Benchmark Time CPU Iterations ------------------------------------------------------------------------- BM_atomic_increment 9.21 ns 9.21 ns 75989223 BM_atomic_fetch_add 9.21 ns 9.21 ns 76031211 BM_atomic_compare_exchange 7.61 ns 7.61 ns 92012620 BM_std_atomic_compare_exchange 12.4 ns 12.4 ns 56421424 + g++ -o lse -march=armv8-a+lse a.cc -std=c++11 -O2 -isystem benchmark/include -Lbenchmark/build/src -lbenchmark -lpthread + ./lse 2020-12-06 01:06:34 Running ./lse Run on (64 X 243.75 MHz CPU s) CPU Caches: L1 Data 64 KiB (x64) L1 Instruction 64 KiB (x64) L2 Unified 1024 KiB (x64) L3 Unified 32768 KiB (x1) Load Average: 64.85, 59.63, 36.74 ***WARNING*** Library was built as DEBUG. Timings may be affected. ------------------------------------------------------------------------- Benchmark Time CPU Iterations ------------------------------------------------------------------------- BM_atomic_increment 5.21 ns 5.21 ns 134201945 BM_atomic_fetch_add 5.21 ns 5.21 ns 134438848 BM_atomic_compare_exchange 6.80 ns 6.80 ns 102872012 BM_std_atomic_compare_exchange 6.80 ns 6.80 ns 102864719 + clang++ -o clang-lse -march=armv8-a+lse a.cc -std=c++11 -O2 -isystem benchmark/include -Lbenchmark/build/src -lbenchmark -lpthread + ./clang-lse 2020-12-06 01:06:38 Running ./clang-lse Run on (64 X 243.75 MHz CPU s) CPU Caches: L1 Data 64 KiB (x64) L1 Instruction 64 KiB (x64) L2 Unified 1024 KiB (x64) L3 Unified 32768 KiB (x1) Load Average: 64.85, 59.63, 36.74 ***WARNING*** Library was built as DEBUG. Timings may be affected. ------------------------------------------------------------------------- Benchmark Time CPU Iterations ------------------------------------------------------------------------- BM_atomic_increment 7.21 ns 7.21 ns 97086511 BM_atomic_fetch_add 7.21 ns 7.21 ns 97152416 BM_atomic_compare_exchange 7.20 ns 7.20 ns 97186161 BM_std_atomic_compare_exchange 11.6 ns 11.6 ns 60302378 + g++ -o moutline -moutline-atomics a.cc -std=c++11 -O2 -isystem benchmark/include -Lbenchmark/build/src -lbenchmark -lpthread + ./moutline 2020-12-06 01:06:41 Running ./moutline Run on (64 X 243.75 MHz CPU s) CPU Caches: L1 Data 64 KiB (x64) L1 Instruction 64 KiB (x64) L2 Unified 1024 KiB (x64) L3 Unified 32768 KiB (x1) Load Average: 64.94, 59.74, 36.90 ***WARNING*** Library was built as DEBUG. Timings may be affected. ------------------------------------------------------------------------- Benchmark Time CPU Iterations ------------------------------------------------------------------------- BM_atomic_increment 5.60 ns 5.60 ns 124853685 BM_atomic_fetch_add 5.60 ns 5.60 ns 124907943 BM_atomic_compare_exchange 7.21 ns 7.21 ns 97151664 BM_std_atomic_compare_exchange 7.21 ns 7.21 ns 97148224 + /home/ubuntu/llvm-project/nin/bin/clang++ -o clang-moutline -moutline-atomics a.cc -std=c++11 -O2 -isystem benchmark/include -Lbenchmark/build/src -lbenchmark -lpthread + ./clang-moutline 2020-12-06 01:06:45 Running ./clang-moutline Run on (64 X 243.75 MHz CPU s) CPU Caches: L1 Data 64 KiB (x64) L1 Instruction 64 KiB (x64) L2 Unified 1024 KiB (x64) L3 Unified 32768 KiB (x1) Load Average: 64.95, 59.82, 37.05 ***WARNING*** Library was built as DEBUG. Timings may be affected. ------------------------------------------------------------------------- Benchmark Time CPU Iterations ------------------------------------------------------------------------- BM_atomic_increment 7.21 ns 7.21 ns 97071465 BM_atomic_fetch_add 7.21 ns 7.20 ns 97150580 BM_atomic_compare_exchange 7.20 ns 7.20 ns 97164566 BM_std_atomic_compare_exchange 11.6 ns 11.6 ns 60301778 Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D91157/new/ https://reviews.llvm.org/D91157 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits