Re: Improving spin-lock implementation on ARM.

Tom Lane Sun, 29 Nov 2020 22:08:47 -0800

Krunal Bauskar <krunalbaus...@gmail.com> writes:
> On Mon, 30 Nov 2020 at 10:14, Tom Lane <t...@sss.pgh.pa.us> wrote:
>> The results I posted at [1] seem to contradict this for Apple's new
>> machines.


> For the results you saw on Mac-Mini was LSE enabled by default.

Hmm, I don't know how to get Apple's clang to admit what its default
settings are ... anybody?

However, it does accept "-march=armv8-a+lse", and that seems to
not be the default, because I get different results from my spinlock-
pounding test than I did yesterday.  Abbreviating into a table:

                --- CFLAGS=-O2 ---      --- CFLAGS="-O2 -march=armv8-a+lse" ---

TPS             HEAD    CAS patch       HEAD    CAS patch

clients=1       2127    2174            2612    2722
clients=2       1816    859             892     950
clients=4       714     519             610     468
clients=8       -       -               108     185

Unfortunately, that still doesn't lead me to think that either LSE
or CAS are net wins on this hardware.  It's quite clear that LSE
makes the uncontended case a good bit faster, but the contended case
is a lot worse, so is that really a tradeoff we want?

> * I would also suggest if possible try with higher scalability (more than 4
> to check if with increase scalability CAS out-perform).

As I said yesterday, running more than 4 processes is just going
to bring the low-performance cores into the equation, which is likely
to swamp any interesting comparison.  I did run the test with "-c 8"
today, as shown in the right-hand columns, and the results seem
to bear that out.

                        regards, tom lane

Re: Improving spin-lock implementation on ARM.

Reply via email to