On Tue, Aug 01, 2017 at 04:57:43PM +0800, Yafang Shao wrote: > > And how would that happen? We only call pick_next_entity(.curr=NULL) > > when we _know_ cfs_rq->nr_running. > > It crashed my machine when I did hadoop test, and after I made this change > it works now. > On SMP system, cfs_rq->nr_running isn't protected well, although we _know_ > cfs_rq->nr_running, > but it is modified by other thread running on other CPU and the > sched_entity is set NULL as well. > Then this thread broken here as accessed the NULL pointer here.
cfs_rq->nr_running should be protected by the rq->lock. If it is not, something else is buggered.