On Sat, Nov 28, 2020 at 5:36 AM Tom Lane <[email protected]> wrote: > So at least on Apple's hardware, it seems like the CAS > implementation might be a shade faster when uncontended, > but it's very clearly worse when there is contention for > the spinlock. That's interesting, because the argument > that CAS should involve strictly less work seems valid ... > but that's what I'm getting. > > It might be useful to try this on other ARM platforms, > but I lack the energy right now (plus the only other > thing I've got is a Raspberry Pi, which might not be > something we particularly care about performance-wise).
I guess that might depend on the implementation of CAS and TAS. I bet usage of CAS in spinlock gives advantage when ldxr/stxr are used, but not when swpal/casa are used. I found out that I can force clang to use swpal/casa by setting "-march=armv8-a+lse". I'm going to make some experiments on a multicore AWS graviton2 instance with different atomic implementation. ------ Regards, Alexander Korotkov
