On Tue, Apr 30, 2019 at 12:03:18PM +0200, Peter Zijlstra wrote: > On Sat, Apr 27, 2019 at 11:02:46AM -0700, Paul E. McKenney wrote: > > > This actually passes rcutorture. But, as Andrea noted, not klitmus. > > After some investigation, it turned out that klitmus was creating kthreads > > with PF_NO_SETAFFINITY, hence the failures. But that prompted me to > > put checks into my code: After all, rcutorture can be fooled. > > > > void synchronize_rcu(void) > > { > > int cpu; > > > > for_each_online_cpu(cpu) { > > sched_setaffinity(current->pid, cpumask_of(cpu)); > > WARN_ON_ONCE(raw_smp_processor_id() != cpu); > > } > > } > > > > This triggers fairly quickly, usually in less than a minute of rcutorture > > testing. > > > > And further investigation shows that sched_setaffinity() > > always returned 0. > > > Is this expected behavior? Is there some configuration or setup that I > > might be missing? > > ISTR there is hotplug involved in RCU torture? In that case, it can be > sched_setaffinity() succeeds to place us on a CPU, which CPU hotplug > then takes away. So when we run the WARN thingy, we'll be running on a > different CPU than expected.
There can be CPU hotplug involved in rcutorture, but it was disabled during this run. > If OTOH, your loop is written like (as it really should be): > > void synchronize_rcu(void) > { > int cpu; > > cpus_read_lock(); > for_each_online_cpu(cpu) { > sched_setaffinity(current->pid, cpumask_of(cpu)); > WARN_ON_ONCE(raw_smp_processor_id() != cpu); > } > cpus_read_unlock(); > } > > Then I'm not entirely sure how we can return 0 and not run on the > expected CPU. If we look at __set_cpus_allowed_ptr(), the only paths out > to 0 are: > > - if the mask didn't change > - if we already run inside the new mask > - if we migrated ourself with the stop-task > - if we're not in fact running > > That last case should never trigger in your circumstances, since @p == > current and current is obviously running. But for completeness, the > wakeup of @p would do the task placement in that case. Are there some diagnostics I could add that would help track this down, be it my bug or yours? Thanx, Paul