Hi, >From the CPU's point of view, getting a cache line for writing is more expensive than reading. See Appendix A.2 Spinlock in:
https://www.intel.com/content/dam/www/public/us/en/documents/white-papers /xeon-lock-scaling-analysis-paper.pdf The full compare and swap will grab the cache line exclusive and causes excessive cache line bouncing. For gomp_mutex_lock_slow, it spins on __atomic_compare_exchange_n, so add load-check to continue spin if cmpxchg may fail. Bootstrapped/regtested on x86_64-pc-linux-gnu{-m32,}. Ok for master? libgomp/ChangeLog: PR libgomp/103068 * config/linux/mutex.c (gomp_mutex_lock_slow): Continue spin loop when mutex is not 0 under x86 target. * config/linux/x86/futex.h (TARGET_X86_AVOID_CMPXCHG): Define. --- libgomp/config/linux/mutex.c | 5 +++++ libgomp/config/linux/x86/futex.h | 2 ++ 2 files changed, 7 insertions(+) diff --git a/libgomp/config/linux/mutex.c b/libgomp/config/linux/mutex.c index 838264dc1f9..4e87566eb2b 100644 --- a/libgomp/config/linux/mutex.c +++ b/libgomp/config/linux/mutex.c @@ -49,6 +49,11 @@ gomp_mutex_lock_slow (gomp_mutex_t *mutex, int oldval) } else { +#ifdef TARGET_X86_AVOID_CMPXCHG + /* For x86, omit cmpxchg when atomic load shows mutex is not 0. */ + if ((oldval = __atomic_load_n (mutex, MEMMODEL_RELAXED)) != 0) + continue; +#endif /* Something changed. If now unlocked, we're good to go. */ oldval = 0; if (__atomic_compare_exchange_n (mutex, &oldval, 1, false, diff --git a/libgomp/config/linux/x86/futex.h b/libgomp/config/linux/x86/futex.h index e7f53399a4e..acc1d1467d7 100644 --- a/libgomp/config/linux/x86/futex.h +++ b/libgomp/config/linux/x86/futex.h @@ -122,3 +122,5 @@ cpu_relax (void) { __builtin_ia32_pause (); } + +#define TARGET_X86_AVOID_CMPXCHG -- 2.18.1