Hi, 

>From the CPU's point of view, getting a cache line for writing is more
expensive than reading.  See Appendix A.2 Spinlock in:

https://www.intel.com/content/dam/www/public/us/en/documents/white-papers
/xeon-lock-scaling-analysis-paper.pdf

The full compare and swap will grab the cache line exclusive and causes
excessive cache line bouncing.

For gomp_mutex_lock_slow, it spins on __atomic_compare_exchange_n, so
add load-check to continue spin if cmpxchg may fail.

Bootstrapped/regtested on x86_64-pc-linux-gnu{-m32,}.
Ok for master?

libgomp/ChangeLog:

        PR libgomp/103068
        * config/linux/mutex.c (gomp_mutex_lock_slow): Continue spin
        loop when mutex is not 0 under x86 target.
        * config/linux/x86/futex.h (TARGET_X86_AVOID_CMPXCHG): Define.
---
 libgomp/config/linux/mutex.c     | 5 +++++
 libgomp/config/linux/x86/futex.h | 2 ++
 2 files changed, 7 insertions(+)

diff --git a/libgomp/config/linux/mutex.c b/libgomp/config/linux/mutex.c
index 838264dc1f9..4e87566eb2b 100644
--- a/libgomp/config/linux/mutex.c
+++ b/libgomp/config/linux/mutex.c
@@ -49,6 +49,11 @@ gomp_mutex_lock_slow (gomp_mutex_t *mutex, int oldval)
        }
       else
        {
+#ifdef TARGET_X86_AVOID_CMPXCHG
+         /* For x86, omit cmpxchg when atomic load shows mutex is not 0.  */
+         if ((oldval = __atomic_load_n (mutex, MEMMODEL_RELAXED)) != 0)
+           continue;
+#endif
          /* Something changed.  If now unlocked, we're good to go.  */
          oldval = 0;
          if (__atomic_compare_exchange_n (mutex, &oldval, 1, false,
diff --git a/libgomp/config/linux/x86/futex.h b/libgomp/config/linux/x86/futex.h
index e7f53399a4e..acc1d1467d7 100644
--- a/libgomp/config/linux/x86/futex.h
+++ b/libgomp/config/linux/x86/futex.h
@@ -122,3 +122,5 @@ cpu_relax (void)
 {
   __builtin_ia32_pause ();
 }
+
+#define TARGET_X86_AVOID_CMPXCHG
-- 
2.18.1

Reply via email to