On Sat, Oct 08, 2016 at 06:24:38PM +0800, Wanpeng Li wrote: > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index 543b2f2..03a6620 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -5472,19 +5472,29 @@ static inline int select_idle_smt(struct task_struct > *p, struct sched_domain *sd > */ > static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, > int target) > { > - struct sched_domain *this_sd = rcu_dereference(*this_cpu_ptr(&sd_llc));
So select_idle_cpu() <- select_idle_sibling() is called from two places, both which already hold rcu_read_lock() afaict. This would've insta-triggered a rcu-lockdep splat otherwise I think. That is, selsect_task_rq_fair() has rcu_read_lock() taken when calling this, and task_numa_compare() does too. > + struct sched_domain *this_sd; > u64 avg_idle = this_rq()->avg_idle; > - u64 avg_cost = this_sd->avg_scan_cost; > + u64 avg_cost; > u64 time, cost; > s64 delta; > int cpu, wrap; > > + rcu_read_lock(); > + this_sd = rcu_dereference(*this_cpu_ptr(&sd_llc)); > + if (!this_sd) { > + cpu = -1; > + goto unlock; > + } Yes, this is the part that was missing. We need to test this_sd after the lookup. Thanks for looking at this!