From: Alexander Sverdlin <alexander.sverd...@nokia.com>

On Octeon smp_mb() translates to SYNC while wmb+rmb translates to SYNCW
only. This brings around 10% performance on tight uncontended spinlock
loops.

Refer to commit 500c2e1fdbcc ("MIPS: Optimize spinlocks.") and the link
below.

On 6-core Octeon machine:
sysbench --test=mutex --num-threads=64 --memory-scope=local run

w/o patch:      1.60s
with patch:     1.51s

Link: https://lore.kernel.org/lkml/5644d08d.4080...@caviumnetworks.com/
Signed-off-by: Alexander Sverdlin <alexander.sverd...@nokia.com>
---
 arch/mips/include/asm/barrier.h | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/arch/mips/include/asm/barrier.h b/arch/mips/include/asm/barrier.h
index 49ff172..24c3f2c 100644
--- a/arch/mips/include/asm/barrier.h
+++ b/arch/mips/include/asm/barrier.h
@@ -113,6 +113,15 @@ static inline void wmb(void)
                                            ".set arch=octeon\n\t"      \
                                            "syncw\n\t"                 \
                                            ".set pop" : : : "memory")
+
+#define __smp_store_release(p, v)                                      \
+do {                                                                   \
+       compiletime_assert_atomic_type(*p);                             \
+       __smp_wmb();                                                    \
+       __smp_rmb();                                                    \
+       WRITE_ONCE(*p, v);                                              \
+} while (0)
+
 #else
 #define smp_mb__before_llsc() smp_llsc_mb()
 #define __smp_mb__before_llsc() smp_llsc_mb()
-- 
2.10.2

Reply via email to