Hi! On Fri, Oct 25, 2024 at 11:29:38AM +1100, Michael Ellerman wrote: > [To += Mathieu] > > "Nysal Jan K.A." <ny...@linux.ibm.com> writes: > > From: "Nysal Jan K.A" <ny...@linux.ibm.com> > > > > On architectures where ARCH_HAS_SYNC_CORE_BEFORE_USERMODE > > is not selected, sync_core_before_usermode() is a no-op. > > In membarrier_mm_sync_core_before_usermode() the compiler does not > > eliminate redundant branches and the load of mm->membarrier_state > > for this case as the atomic_read() cannot be optimized away. > > I was wondering if this was caused by powerpc's arch_atomic_read() which > uses asm volatile. > > But replacing arch_atomic_read() with READ_ONCE() makes no difference, > presumably because the compiler still can't see that the READ_ONCE() is > unnecessary (which is kind of by design).
Exactly. > > GCC 12.2.1: > > ----------- > > add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-32 (-32) > > Function old new delta > > finish_task_switch.isra 852 820 -32 > > GCC 12 is a couple of years old, I assume GCC 14 behaves similarly? GCC 12 is still being actively developed. There will be a 12.5 probably halfway next year (and that will be the last 12.x release, yes). The GCC homepage (<https://gcc.gnu.org>) will tell you what releases are still maintained/supported, and sometimes you can derive our planned plans from there as well :-) But yes, 14 is similar (I did not test, but I feel confident making that assertion :-) ) > > static inline void membarrier_mm_sync_core_before_usermode(struct > > mm_struct *mm) > > { > > + if (!IS_ENABLED(CONFIG_ARCH_HAS_SYNC_CORE_BEFORE_USERMODE)) > > + return; > > if (current->mm != mm) > > return; > > if (likely(!(atomic_read(&mm->membarrier_state) & > > The other option would be to have a completely separate stub, eg: > > #ifdef CONFIG_ARCH_HAS_SYNC_CORE_BEFORE_USERMODE > static inline void membarrier_mm_sync_core_before_usermode(struct mm_struct > *mm) > { > if (current->mm != mm) > return; > if (likely(!(atomic_read(&mm->membarrier_state) & > MEMBARRIER_STATE_PRIVATE_EXPEDITED_SYNC_CORE))) > return; > sync_core_before_usermode(); > } > #else > static inline void membarrier_mm_sync_core_before_usermode(struct mm_struct > *mm) { } > #endif > > Not sure what folks prefer. > > In either case I think it's probably worth a short comment explaining > why it's worth the trouble (ie. that the atomic_read() prevents the > compiler from doing DCE). Since you ask, I like the proposed change (the inline one) best. But yeah, comment please! (And it is not about DCE -- just the definition of __READ_ONCE makes it directly impossible to CSE any expressions with this, it (standards- violatingly) casts the pointers to pointers to volatile, and you cannot CSE any accesses to volatile objects!) So what are the actual semantics the kernel wants from its READ_ONCE, and from its atomics in general? GCC has perfectly fine in-compiler support for such things, there is no need for making a second rate manual implementation of parts of this, when you can use a good implementation of everything instead! Segher