Issue 144237
Summary Very poor performance in std::bernoulli_distribution
Labels new issue
Assignees
Reporter Disservin
    Fortunately no godbolt reproducer as of now

While working on the c++ data loder of
https://github.com/official-stockfish/nnue-pytorch/blob/master/training_data_loader.cpp
I noticed a 2x performance difference between latest clang and gcc.

Running a perf profile on this showed `__ieee754_logl` at the very top which is no where to be seen with gcc, assuming this function somehow didn't get properly optimized ?
Taking a look at the flamegraph shows it comes from the `std::bernoulli_distribution` call.

https://github.com/official-stockfish/nnue-pytorch/blob/e1f4c5fbd50b37b4f5315f5b364b502c061a8576/training_data_loader.cpp#L922


![Image](https://github.com/user-attachments/assets/bddc9daf-6da4-4600-98dd-01c8ac746f7a)
![Image](https://github.com/user-attachments/assets/df2eae56-917e-4dbd-9684-2e694aa014d6)

https://godbolt.org/z/6TMYYsW3e

I haven't been able to create a small standalone example as of yet which reproduces this, so if someone wants to compile the above example, then get the file from godbolt and run

`clang++ -march=native test.cpp -O3 -o loader && ./loader test77-jan2022-2tb7p.high-simple-eval-1k.min-v2.binpack` 
The mentioned file can be downloaded from here https://huggingface.co/datasets/official-stockfish/master-smallnet-binpacks/tree/main

If you compile directly with libc++ instead of libstdc++, the program will be another 1.5x slower

```
clang++-21 libc++ 10.0457s
clang++-21 libstdc++ 5.43586s
g++-15 3.56669s
```
_______________________________________________
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs

Reply via email to