[llvm-bugs] [Bug 123262] [clang++][aarch64] help optimize __builtin_mul_overflow performance

LLVM Bugs via llvm-bugs Thu, 16 Jan 2025 17:02:18 -0800

Issue	123262
Summary	[clang++][aarch64] help optimize __builtin_mul_overflow performance
Labels	clang
Assignees
Reporter	eric-yq

    Hi team, I have a sample code compiling with clang++, it shows 10 times slower than g++.
The main performance issue is located in function `__builtin_mul_overflow under clang++`
Can you help give some suggestions ? I do not want to use both g++ and clang++ in my CICD pipeline.


Compiling command and `Time taken` comparation:  `( 0.22 seconds vs. 0.02 seconds. )`
```c
# Ubuntu 24.04, g++ 13.3 and clang++ 18.1.3
# Server：AWS c7g.xlarge(AWS Graviton3, Neoverse-V1)

# g++ -std=c++17 -O3 -march=armv8-a+crc testint.cpp -o testint-g++
# ./testint-g++ 
Time taken for 10000000 iterations: 0.0208047 seconds
Sum of results: 9747553088193654009

# clang++ -std=c++17 -O3 -march=armv8-a+crc testint.cpp -o testint-clang++ --rtlib=compiler-rt
# ./testint-clang++ 
Time taken for 10000000 iterations: 0.226598 seconds //// ( 0.22 seconds vs. 0.02 seconds. )
Sum of results: 18269431752893742105
```

Sample code: testint.cpp
```c
#include <iostream>
#include <chrono>
#include <random>
#include <cstdint>
#include <vector>
// 定义 128 位整数类型（如果编译器支持）
using int128_t = __int128;
// 被基准测试的函数
inline bool int128_mul_overflow(int128_t a, int128_t b, volatile int128_t* c) {
    return __builtin_mul_overflow(a, b, c);
}
// 随机生成 128 位整数
int128_t generate_random_int128() {
    static std::mt19937_64 rng(std::random_device{}());
    std::uniform_int_distribution<uint64_t> dist(0, std::numeric_limits<uint64_t>::max());
    // 生成两个 64 位整数，并将它们组合成一个 128 位整数
    int128_t high = static_cast<int128_t>(dist(rng));
    int128_t low = static_cast<int128_t>(dist(rng));
    return (high << 64) | low;
}
// 生成随机数据并存储在 vector 中
std::vector<std::pair<int128_t, int128_t>> generate_random_data(int count) {
    std::vector<std::pair<int128_t, int128_t>> data;
    data.reserve(count);
    for (int i = 0; i < count; ++i) {
        int128_t a = generate_random_int128();
        int128_t b = generate_random_int128();
        data.emplace_back(a, b);
    }
    return data;
}
// 基准测试函数
void benchmark_int128_mul_overflow(const std::vector<std::pair<int128_t, int128_t>>& data) {
    int128_t c = 0; 
    int128_t sum = 0; // 用于累加结果
    auto start = std::chrono::high_resolution_clock::now();
    for (const auto& pair : data) {
        if (int128_mul_overflow(pair.first, pair.second, &c)) {
            sum += c; // 累加结果以防止优化
        }
    }
    auto end = std::chrono::high_resolution_clock::now();
    std::chrono::duration<double> duration = end - start;
    std::cout << "Time taken for " << data.size() << " iterations: " << duration.count() << " seconds\n";
    std::cout << "Sum of results: " << static_cast<uint64_t>(sum) << "\n"; // 输出累加结果
}
int main() {
    int iterations = 10000000; // 可以根据需要调整迭代次数
    auto data = ""
    benchmark_int128_mul_overflow(data);
    return 0;
}
```

_______________________________________________
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs

[llvm-bugs] [Bug 123262] [clang++][aarch64] help optimize __builtin_mul_overflow performance

Reply via email to