Krunal Bauskar <krunalbaus...@gmail.com> writes: > On Mon, 30 Nov 2020 at 10:14, Tom Lane <t...@sss.pgh.pa.us> wrote: >> The results I posted at [1] seem to contradict this for Apple's new >> machines.
> For the results you saw on Mac-Mini was LSE enabled by default. Hmm, I don't know how to get Apple's clang to admit what its default settings are ... anybody? However, it does accept "-march=armv8-a+lse", and that seems to not be the default, because I get different results from my spinlock- pounding test than I did yesterday. Abbreviating into a table: --- CFLAGS=-O2 --- --- CFLAGS="-O2 -march=armv8-a+lse" --- TPS HEAD CAS patch HEAD CAS patch clients=1 2127 2174 2612 2722 clients=2 1816 859 892 950 clients=4 714 519 610 468 clients=8 - - 108 185 Unfortunately, that still doesn't lead me to think that either LSE or CAS are net wins on this hardware. It's quite clear that LSE makes the uncontended case a good bit faster, but the contended case is a lot worse, so is that really a tradeoff we want? > * I would also suggest if possible try with higher scalability (more than 4 > to check if with increase scalability CAS out-perform). As I said yesterday, running more than 4 processes is just going to bring the low-performance cores into the equation, which is likely to swamp any interesting comparison. I did run the test with "-c 8" today, as shown in the right-hand columns, and the results seem to bear that out. regards, tom lane