On Tue, Jul 29, 2014 at 06:07:54PM +0200, Peter Zijlstra wrote:
> On Tue, Jul 29, 2014 at 08:57:47AM -0700, Paul E. McKenney wrote:
> > On Tue, Jul 29, 2014 at 09:50:55AM +0200, Peter Zijlstra wrote:
> > > On Mon, Jul 28, 2014 at 03:56:12PM -0700, Paul E. McKenney wrote:
> > > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > > > index bc1638b33449..a0d2f3a03566 100644
> > > > --- a/kernel/sched/core.c
> > > > +++ b/kernel/sched/core.c
> > > > @@ -2762,6 +2762,7 @@ need_resched:
> > > >                 } else {
> > > >                         deactivate_task(rq, prev, DEQUEUE_SLEEP);
> > > >                         prev->on_rq = 0;
> > > > +                       rcu_note_voluntary_context_switch(prev);
> > > >  
> > > >                         /*
> > > >                          * If a worker went to sleep, notify and ask 
> > > > workqueue
> > > > @@ -2828,6 +2829,7 @@ asmlinkage __visible void __sched schedule(void)
> > > >         struct task_struct *tsk = current;
> > > >  
> > > >         sched_submit_work(tsk);
> > > > +       rcu_note_voluntary_context_switch(tsk);
> > > >         __schedule();
> > > >  }
> > > 
> > > Yeah, not entirely happy with that, you add two calls into one of the
> > > hotest paths of the kernel.
> > 
> > I did look into leveraging counters, but cannot remember why I decided
> > that this was a bad idea.  I guess it is time to recheck...
> > 
> > The ->nvcsw field in the task_struct structure looks promising:
> > 
> > o   Looks like it does in fact get incremented in __schedule() via
> >     the switch_count pointer.
> > 
> > o   Looks like it is unconditionally compiled in.
> > 
> > o   There are no memory barriers, but a synchronize_sched()
> >     should take care of that, given that this counter is
> >     incremented with interrupts disabled.
> 
> Well, there's obviously the actual context switch, which should imply an
> actual MB such that tasks are self ordered even when execution continues
> on another cpu etc..

True enough, except that it appears that the context switch happens
after the ->nvcsw increment, which means that it doesn't help RCU-tasks
guarantee that if it has seen the increment, then all prior processing
has completed.  There might be enough stuff prior the increment, but I
don't see anything that I feel comfortable relying on.  Am I missing
some ordering?

> > So I should be able to snapshot the task_struct structure's ->nvcsw
> > field and avoid the added code in the fastpaths.
> > 
> > Seem plausible, or am I confused about the role of ->nvcsw?
> 
> Nope, that's the 'I scheduled to go to sleep' counter.

I am assuming that the "Nope" goes with "am I confused" rather than
"Seem plausible" -- if not, please let me know.  ;-)

> There is of course the 'polling' issue I raised in a further email...

Yep, and other flavors of RCU go to lengths to avoid scanning the
task_struct lists.  Steven said that updates will be rare and that it
is OK for them to have high latency and overhead.  Thus far, I am taking
him at his word.  ;-)

I considered interrupting the task_struct polling loop periodically,
and would add that if needed.  That said, this requires nailing down the
task_struct at which the vacation is taken.  Here "nailing down" does not
simply mean "prevent from being freed", but rather "prevent from being
removed from the lists traversed by do_each_thread/while_each_thread."

Of course, if there is some easy way of doing this, please let me know!

> > > And I'm still not entirely sure why, your 0/x babbled something about
> > > trampolines, but I'm not sure I understand how those lead to this.
> > 
> > Steven Rostedt sent an email recently giving more detail.  And of course
> > now I am having trouble finding it.  Maybe he will take pity on us and
> > send along a pointer to it.  ;-)
> 
> Yah, would make good Changelog material that ;-)

;-) ;-) ;-)

                                                        Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to