From: Alexander Sverdlin <alexander.sverd...@nokia.com> On Octeon smp_mb() translates to SYNC while wmb+rmb translates to SYNCW only. This brings around 10% performance on tight uncontended spinlock loops.
Refer to commit 500c2e1fdbcc ("MIPS: Optimize spinlocks.") and the link below. On 6-core Octeon machine: sysbench --test=mutex --num-threads=64 --memory-scope=local run w/o patch: 1.60s with patch: 1.51s Link: https://lore.kernel.org/lkml/5644d08d.4080...@caviumnetworks.com/ Signed-off-by: Alexander Sverdlin <alexander.sverd...@nokia.com> --- arch/mips/include/asm/barrier.h | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/arch/mips/include/asm/barrier.h b/arch/mips/include/asm/barrier.h index 49ff172..24c3f2c 100644 --- a/arch/mips/include/asm/barrier.h +++ b/arch/mips/include/asm/barrier.h @@ -113,6 +113,15 @@ static inline void wmb(void) ".set arch=octeon\n\t" \ "syncw\n\t" \ ".set pop" : : : "memory") + +#define __smp_store_release(p, v) \ +do { \ + compiletime_assert_atomic_type(*p); \ + __smp_wmb(); \ + __smp_rmb(); \ + WRITE_ONCE(*p, v); \ +} while (0) + #else #define smp_mb__before_llsc() smp_llsc_mb() #define __smp_mb__before_llsc() smp_llsc_mb() -- 2.10.2