RE: [C++] Indeterminate poor performance of random number generator

2021-04-22 Thread Yibo Cai
Yes, these soft-float math (in libm.so) makes Arm binary extremely slow. -Original Message- From: Antoine Pitrou Sent: Thursday, April 22, 2021 17:20 To: dev@arrow.apache.org Subject: Re: [C++] Indeterminate poor performance of random number generator Le 22/04/2021 à 03:38, Yibo Cai a

Re: [C++] Indeterminate poor performance of random number generator

2021-04-22 Thread Yibo Cai
On 4/22/21 9:38 AM, Yibo Cai wrote: On 4/21/21 6:07 PM, Antoine Pitrou wrote: Le 21/04/2021 à 11:41, Yibo Cai a écrit : On 4/21/21 5:17 PM, Antoine Pitrou wrote: Le 21/04/2021 à 11:14, Yibo Cai a écrit : When running benchmarks on Arm64 servers, I find some benchmarks are extremely slow w

Re: [C++] Indeterminate poor performance of random number generator

2021-04-22 Thread Antoine Pitrou
Le 22/04/2021 à 03:38, Yibo Cai a écrit : Both using same libstdc++. But std::bernoulli_distribution is inlined, so they are indeed different for clang and gcc. https://godbolt.org/z/aT84x5Yec Looks a pure compiler thing. It looks like clang generates calls to logl() and __divtf3() (soft-fl

Re: [C++] Indeterminate poor performance of random number generator

2021-04-21 Thread Yibo Cai
On 4/21/21 6:07 PM, Antoine Pitrou wrote: Le 21/04/2021 à 11:41, Yibo Cai a écrit : On 4/21/21 5:17 PM, Antoine Pitrou wrote: Le 21/04/2021 à 11:14, Yibo Cai a écrit : When running benchmarks on Arm64 servers, I find some benchmarks are extremely slow when built with clang. E.g., "ModeKern

Re: [C++] Indeterminate poor performance of random number generator

2021-04-21 Thread Antoine Pitrou
Le 21/04/2021 à 11:41, Yibo Cai a écrit : On 4/21/21 5:17 PM, Antoine Pitrou wrote: Le 21/04/2021 à 11:14, Yibo Cai a écrit : When running benchmarks on Arm64 servers, I find some benchmarks are extremely slow when built with clang. E.g., "ModeKernelNarrow/1048576/1" costs 90s to finis

Re: [C++] Indeterminate poor performance of random number generator

2021-04-21 Thread Yibo Cai
On 4/21/21 5:17 PM, Antoine Pitrou wrote: Le 21/04/2021 à 11:14, Yibo Cai a écrit : When running benchmarks on Arm64 servers, I find some benchmarks are extremely slow when built with clang. E.g., "ModeKernelNarrow/1048576/1" costs 90s to finish. I find almost all the time is spent in gene

Re: [C++] Indeterminate poor performance of random number generator

2021-04-21 Thread Antoine Pitrou
Le 21/04/2021 à 11:14, Yibo Cai a écrit : When running benchmarks on Arm64 servers, I find some benchmarks are extremely slow when built with clang. E.g., "ModeKernelNarrow/1048576/1" costs 90s to finish. I find almost all the time is spent in generating random bits (prepare test data)[1]

[C++] Indeterminate poor performance of random number generator

2021-04-21 Thread Yibo Cai
When running benchmarks on Arm64 servers, I find some benchmarks are extremely slow when built with clang. E.g., "ModeKernelNarrow/1048576/1" costs 90s to finish. I find almost all the time is spent in generating random bits (prepare test data)[1], not the test itself. Below sample code is