Hello, Peter!

TL;DR: If a normal !PF_NO_SETAFFINITY kthread invokes sched_setaffinity(),
and sched_setaffinity() returns 0, is it expected behavior for that
kthread to be running on some CPU other than one of the ones specified by
the in_mask argument?  All CPUs are online, and there is no CPU-hotplug
activity taking place.

                                                        Thanx, Paul

Details:

I have long showed the following "toy" synchronize_rcu() implementation:

        void synchronize_rcu(void)
        {
                int cpu;

                for_each_online_cpu(cpu)
                        run_on(cpu);
        }

I decided that if I was going to show it, I should test it.  And it
occurred to me that run_on() can be a call to sched_setaffinity():

        void synchronize_rcu(void)
        {
                int cpu;

                for_each_online_cpu(cpu)
                        sched_setaffinity(current->pid, cpumask_of(cpu));
        }

This actually passes rcutorture.  But, as Andrea noted, not klitmus.
After some investigation, it turned out that klitmus was creating kthreads
with PF_NO_SETAFFINITY, hence the failures.  But that prompted me to
put checks into my code: After all, rcutorture can be fooled.

        void synchronize_rcu(void)
        {
                int cpu;

                for_each_online_cpu(cpu) {
                        sched_setaffinity(current->pid, cpumask_of(cpu));
                        WARN_ON_ONCE(raw_smp_processor_id() != cpu);
                }
        }

This triggers fairly quickly, usually in less than a minute of rcutorture
testing.  And further investigation shows that sched_setaffinity()
always returned 0.  So I tried this hack:

        void synchronize_rcu(void)
        {
                int cpu;

                for_each_online_cpu(cpu) {
                        while (raw_smp_processor_id() != cpu)
                                sched_setaffinity(current->pid, 
cpumask_of(cpu));
                        WARN_ON_ONCE(raw_smp_processor_id() != cpu);
                }
        }

This never triggers, and rcutorture's grace-period rate is not significantly
affected.

Is this expected behavior?  Is there some configuration or setup that I
might be missing?

Reply via email to