Yes, these soft-float math (in libm.so) makes Arm binary extremely slow.
-Original Message-
From: Antoine Pitrou
Sent: Thursday, April 22, 2021 17:20
To: dev@arrow.apache.org
Subject: Re: [C++] Indeterminate poor performance of random number generator
Le 22/04/2021 à 03:38, Yibo Cai a
On 4/22/21 9:38 AM, Yibo Cai wrote:
On 4/21/21 6:07 PM, Antoine Pitrou wrote:
Le 21/04/2021 à 11:41, Yibo Cai a écrit :
On 4/21/21 5:17 PM, Antoine Pitrou wrote:
Le 21/04/2021 à 11:14, Yibo Cai a écrit :
When running benchmarks on Arm64 servers, I find some benchmarks are extremely
slow w
Le 22/04/2021 à 03:38, Yibo Cai a écrit :
Both using same libstdc++.
But std::bernoulli_distribution is inlined, so they are indeed different for
clang and gcc.
https://godbolt.org/z/aT84x5Yec
Looks a pure compiler thing.
It looks like clang generates calls to logl() and __divtf3() (soft-fl
On 4/21/21 6:07 PM, Antoine Pitrou wrote:
Le 21/04/2021 à 11:41, Yibo Cai a écrit :
On 4/21/21 5:17 PM, Antoine Pitrou wrote:
Le 21/04/2021 à 11:14, Yibo Cai a écrit :
When running benchmarks on Arm64 servers, I find some benchmarks are extremely
slow when built with clang.
E.g., "ModeKern
Le 21/04/2021 à 11:41, Yibo Cai a écrit :
On 4/21/21 5:17 PM, Antoine Pitrou wrote:
Le 21/04/2021 à 11:14, Yibo Cai a écrit :
When running benchmarks on Arm64 servers, I find some benchmarks are extremely
slow when built with clang.
E.g., "ModeKernelNarrow/1048576/1" costs 90s to finis
On 4/21/21 5:17 PM, Antoine Pitrou wrote:
Le 21/04/2021 à 11:14, Yibo Cai a écrit :
When running benchmarks on Arm64 servers, I find some benchmarks are extremely
slow when built with clang.
E.g., "ModeKernelNarrow/1048576/1" costs 90s to finish.
I find almost all the time is spent in gene
Le 21/04/2021 à 11:14, Yibo Cai a écrit :
When running benchmarks on Arm64 servers, I find some benchmarks are extremely
slow when built with clang.
E.g., "ModeKernelNarrow/1048576/1" costs 90s to finish.
I find almost all the time is spent in generating random bits (prepare test
data)[1]
When running benchmarks on Arm64 servers, I find some benchmarks are extremely
slow when built with clang.
E.g., "ModeKernelNarrow/1048576/1" costs 90s to finish.
I find almost all the time is spent in generating random bits (prepare test
data)[1], not the test itself.
Below sample code is