----- Original Message ----- > From: "Peter Zijlstra" <[email protected]> > To: "Mathieu Desnoyers" <[email protected]> > Cc: [email protected], "KOSAKI Motohiro" > <[email protected]>, "Steven Rostedt" > <[email protected]>, "Paul E. McKenney" <[email protected]>, > "Nicholas Miell" <[email protected]>, > "Linus Torvalds" <[email protected]>, "Ingo Molnar" > <[email protected]>, "Alan Cox" > <[email protected]>, "Lai Jiangshan" <[email protected]>, > "Stephen Hemminger" > <[email protected]>, "Andrew Morton" <[email protected]>, > "Josh Triplett" <[email protected]>, > "Thomas Gleixner" <[email protected]>, "David Howells" > <[email protected]>, "Nick Piggin" <[email protected]> > Sent: Monday, March 16, 2015 1:21:04 PM > Subject: Re: [RFC PATCH] sys_membarrier(): system/process-wide memory barrier > (x86) (v12) > > On Mon, Mar 16, 2015 at 03:43:56PM +0000, Mathieu Desnoyers wrote: > > > On which; I absolutely hate that rq->lock thing in there. What is > > > 'wrong' with doing a lockless compare there? Other than not actually > > > being able to deref rq->curr of course, but we need to fix that anyhow. > > > > If we can make sure rq->curr deref could be done without holding the rq > > lock, then I think all we would need is to ensure that updates to rq->curr > > are surrounded by memory barriers. Therefore, we would have the following: > > > > * When a thread is scheduled out, a memory barrier would be issued before > > rq->curr is updated to the next thread task_struct. > > > > * Before a thread is scheduled in, a memory barrier needs to be issued > > after rq->curr is updated to the incoming thread. > > I'm not entirely awake atm but I'm not seeing why it would need to be > that strict; I think the current single MB on task switch is sufficient > because if we're in the middle of schedule, userspace isn't actually > running. > > So from the point of userspace the task switch is atomic. Therefore even > if we do not get a barrier before setting ->curr, the expedited thing > missing us doesn't matter as userspace cannot observe the difference.
AFAIU, atomicity is not what matters here. It's more about memory ordering. What is guaranteeing that upon entry in kernel-space, all prior memory accesses (loads and stores) are ordered prior to following loads/stores ? The same applies when returning to user-space: what is guaranteeing that all prior loads/stores are ordered before the user-space loads/stores performed after returning to user-space ? > > > In order to be able to dereference rq->curr->mm without holding the > > rq->lock, do you envision we should protect task reclaim with RCU-sched ? > > A recent discussion had Linus suggest SLAB_DESTROY_BY_RCU, although I > think Oleg did mention it would still be 'interesting'. I've not yet had > time to really think about that. This might be an "interesting" modification. :) This could perhaps come as an optimization later on ? By the way, I now remember why we start from the mm_cpumask, and then double-check the mm: using the mm_cpumask serves as an approximation of the CPUs we need to double-check. Therefore, rather than grabbing the rq lock for all CPUs, we only need to grab it for CPUs that are in the mm_cpumask. Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [email protected] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/

