Hi Luming Yu, Luming Yu <luming...@shingroup.cn> writes: > ppc appears to have already supported cmpxchg-local atomic semantics > that is defined by the kernel convention of the feature. > Add this_cpu_cmpxchg ppc local for the performance benefit of arch > sepcific implementation than asm-generic c verison of the locking API.
Implementing this has been suggested before but it wasn't clear that it was actually going to perform better than the generic version. On 64-bit we have interrupt soft masking, so disabling interrupts is relatively cheap. So the generic this_cpu_cmpxhg using irq disable just becomes a few loads & stores, no atomic ops required. In contrast implementing it using __cmpxchg_local() will use ldarx/stdcx etc. which will be more expensive. Have you done any performance measurements? It probably is a bit fishy that we don't mask PMU interrupts vs this_cpu_cmpxchg() etc., but I don't think it's actually a bug given the few places using this_cpu_cmpxchg(). Though I could be wrong about that. > diff --git a/arch/powerpc/include/asm/percpu.h > b/arch/powerpc/include/asm/percpu.h > index 8e5b7d0b851c..ceab5df6e7ab 100644 > --- a/arch/powerpc/include/asm/percpu.h > +++ b/arch/powerpc/include/asm/percpu.h > @@ -18,5 +18,22 @@ > #include <asm-generic/percpu.h> > > #include <asm/paca.h> > +#include <asm/cmpxchg.h> > +#ifdef this_cpu_cmpxchg_1 > +#undef this_cpu_cmpxchg_1 If we need to undef then I think something has gone wrong with the header inclusion order, the model should be that the arch defines what it has and the generic code provides fallbacks if the arch didn't define anything. > +#define this_cpu_cmpxchg_1(pcp, oval, nval) > __cmpxchg_local(raw_cpu_ptr(&(pcp)), oval, nval, 1) I think this is unsafe vs preemption. The raw_cpu_ptr() can generate the per-cpu address, but then the task can be preempted and moved to a different CPU before the ldarx/stdcx do the cmpxchg. The arm64 implementation had the same bug until they fixed it. > +#endif > +#ifdef this_cpu_cmpxchg_2 > +#undef this_cpu_cmpxchg_2 > +#define this_cpu_cmpxchg_2(pcp, oval, nval) > __cmpxchg_local(raw_cpu_ptr(&(pcp)), oval, nval, 2) > +#endif > +#ifdef this_cpu_cmpxchg_4 > +#undef this_cpu_cmpxchg_4 > +#define this_cpu_cmpxchg_4(pcp, oval, nval) > __cmpxchg_local(raw_cpu_ptr(&(pcp)), oval, nval, 4) > +#endif > +#ifdef this_cpu_cmpxchg_8 > +#undef this_cpu_cmpxchg_8 > +#define this_cpu_cmpxchg_8(pcp, oval, nval) > __cmpxchg_local(raw_cpu_ptr(&(pcp)), oval, nval, 8) > +#endif > > #endif /* _ASM_POWERPC_PERCPU_H_ */ cheers