Issue |
144237
|
Summary |
Very poor performance in std::bernoulli_distribution
|
Labels |
new issue
|
Assignees |
|
Reporter |
Disservin
|
Fortunately no godbolt reproducer as of now
While working on the c++ data loder of
https://github.com/official-stockfish/nnue-pytorch/blob/master/training_data_loader.cpp
I noticed a 2x performance difference between latest clang and gcc.
Running a perf profile on this showed `__ieee754_logl` at the very top which is no where to be seen with gcc, assuming this function somehow didn't get properly optimized ?
Taking a look at the flamegraph shows it comes from the `std::bernoulli_distribution` call.
https://github.com/official-stockfish/nnue-pytorch/blob/e1f4c5fbd50b37b4f5315f5b364b502c061a8576/training_data_loader.cpp#L922


https://godbolt.org/z/6TMYYsW3e
I haven't been able to create a small standalone example as of yet which reproduces this, so if someone wants to compile the above example, then get the file from godbolt and run
`clang++ -march=native test.cpp -O3 -o loader && ./loader test77-jan2022-2tb7p.high-simple-eval-1k.min-v2.binpack`
The mentioned file can be downloaded from here https://huggingface.co/datasets/official-stockfish/master-smallnet-binpacks/tree/main
If you compile directly with libc++ instead of libstdc++, the program will be another 1.5x slower
```
clang++-21 libc++ 10.0457s
clang++-21 libstdc++ 5.43586s
g++-15 3.56669s
```
_______________________________________________
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs