"Nysal Jan K.A." <ny...@linux.ibm.com> writes: > On architectures where ARCH_HAS_SYNC_CORE_BEFORE_USERMODE > is not selected, sync_core_before_usermode() is a no-op. > In membarrier_mm_sync_core_before_usermode() the compiler does not > eliminate redundant branches and load of mm->membarrier_state > for this case as the atomic_read() cannot be optimized away. > > Here's a snippet of the code generated for finish_task_switch() on powerpc > prior to this change: > > 1b786c: ld r26,2624(r30) # mm = rq->prev_mm; > ....... > 1b78c8: cmpdi cr7,r26,0 > 1b78cc: beq cr7,1b78e4 <finish_task_switch+0xd0> > 1b78d0: ld r9,2312(r13) # current > 1b78d4: ld r9,1888(r9) # current->mm > 1b78d8: cmpd cr7,r26,r9 > 1b78dc: beq cr7,1b7a70 <finish_task_switch+0x25c> > 1b78e0: hwsync > 1b78e4: cmplwi cr7,r27,128 > ....... > 1b7a70: lwz r9,176(r26) # atomic_read(&mm->membarrier_state) > 1b7a74: b 1b78e0 <finish_task_switch+0xcc>
Reviewed-by: Michael Ellerman <m...@ellerman.id.au> cheers