On 2023-02-23 14:51:25, Michael Ellerman wrote: > Hi Paul, > > "Paul E. McKenney" <paul...@kernel.org> writes: > > On Wed, Feb 22, 2023 at 02:33:44PM +0530, Kautuk Consul wrote: > >> A link from ibm.com states: > >> "Ensures that all instructions preceding the call to __lwsync > >> complete before any subsequent store instructions can be executed > >> on the processor that executed the function. Also, it ensures that > >> all load instructions preceding the call to __lwsync complete before > >> any subsequent load instructions can be executed on the processor > >> that executed the function. This allows you to synchronize between > >> multiple processors with minimal performance impact, as __lwsync > >> does not wait for confirmation from each processor." > >> > >> Thats why smp_rmb() and smp_wmb() are defined to lwsync. > >> But this same understanding applies to parallel pipeline > >> execution on each PowerPC processor. > >> So, use the lwsync instruction for rmb() and wmb() on the PPC > >> architectures that support it. > >> > >> Signed-off-by: Kautuk Consul <kcon...@linux.vnet.ibm.com> > >> --- > >> arch/powerpc/include/asm/barrier.h | 7 +++++++ > >> 1 file changed, 7 insertions(+) > >> > >> diff --git a/arch/powerpc/include/asm/barrier.h > >> b/arch/powerpc/include/asm/barrier.h > >> index b95b666f0374..e088dacc0ee8 100644 > >> --- a/arch/powerpc/include/asm/barrier.h > >> +++ b/arch/powerpc/include/asm/barrier.h > >> @@ -36,8 +36,15 @@ > >> * heavy-weight sync, so smp_wmb() can be a lighter-weight eieio. > >> */ > >> #define __mb() __asm__ __volatile__ ("sync" : : : "memory") > >> + > >> +/* The sub-arch has lwsync. */ > >> +#if defined(CONFIG_PPC64) || defined(CONFIG_PPC_E500MC) > >> +#define __rmb() __asm__ __volatile__ ("lwsync" : : : "memory") > >> +#define __wmb() __asm__ __volatile__ ("lwsync" : : : "memory") > > > > Hmmm... > > > > Does the lwsync instruction now order both cached and uncached accesses? > > No. > > > Or have there been changes so that smp_rmb() and smp_wmb() get this > > definition, while rmb() and wmb() still get the sync instruction? > > No. > > They come from: > > include/asm-generic/barrier.h:#define rmb() do { kcsan_rmb(); __rmb(); > } while (0) > include/asm-generic/barrier.h:#define wmb() do { kcsan_wmb(); __wmb(); > } while (0) > > > (Not seeing this, but I could easily be missing something.) > > You are correct, the patch is wrong because it fails to account for IO > accesses. > > Kautuk, I'm not sure what motivated you to look at these barriers, was > it just the documentation you linked to? > > Maybe we need some better documentation in the kernel explaining why we > use the barriers we do? > > Although there's already a pretty decent comment above the definitions, > but maybe it needs further clarification: > > /* > * Memory barrier. > * The sync instruction guarantees that all memory accesses initiated > * by this processor have been performed (with respect to all other > * mechanisms that access memory). The eieio instruction is a barrier > * providing an ordering (separately) for (a) cacheable stores and (b) > * loads and stores to non-cacheable memory (e.g. I/O devices). > * > * mb() prevents loads and stores being reordered across this point. > * rmb() prevents loads being reordered across this point. > * wmb() prevents stores being reordered across this point. > * > * *mb() variants without smp_ prefix must order all types of memory > * operations with one another. sync is the only instruction sufficient > * to do this. > * > * For the smp_ barriers, ordering is for cacheable memory operations > * only. We have to use the sync instruction for smp_mb(), since lwsync > * doesn't order loads with respect to previous stores. Lwsync can be > * used for smp_rmb() and smp_wmb().
Sorry I didn't change the comment. My point is: Is the "sync is the only instruction sufficient to do this" comment completely correct? As I mentioned in my reply to Paul there I didn't find any documentation that up-front states (in the differences between sync and lwsync) that lwsync distinguishes between cached and unchached accesses. > > > cheers