> >>>Is the core wide lock primarily responsible for the regression? I ran > >>>upto patch > >>>12 which also has the core wide lock for tagged cgroups and also calls > >>>newidle_balance() from pick_next_task(). I don't see any regression. > >>>Of > >>>course > >>>the core sched version of pick_next_task() may be doing more but > >>>comparing with > >>>the __pick_next_task() it doesn't look too horrible. > >>On further testing and investigation, we also agree that spinlock > >>contention > >>is not the major cause for the regression, but we feel that it should be > >>one > >>of the major contributing factors to this performance loss. > >> > >> > >I finally did some code bisection and found the following lines are > >basically responsible for the regression. Commenting them out I don't see > >the regressions. Can you confirm? I am yet to figure if this is needed for > >the correctness of core scheduling and if so can we do this better? > > > >-------->8------------- > > > >diff --git a/kernel/sched/core.c b/kernel/sched/core.c > >index fe3918c..3b3388a 100644 > >--- a/kernel/sched/core.c > >+++ b/kernel/sched/core.c > >@@ -3741,8 +3741,8 @@ pick_next_task(struct rq *rq, struct task_struct > >*prev, struct rq_flags *rf) > > * If there weren't no cookies; we don't > >need > > * to bother with the other siblings. > >*/ > >- if (i == cpu && !rq->core->core_cookie) > >- goto next_class; > >+ //if (i == cpu && !rq->core->core_cookie) > >+ //goto next_class; > > > >continue; > > } > AFAICT this condition is not needed for correctness as cookie matching will > sill be enforced. Peter any thoughts? I get the following numbers with 1 DB > and 2 DB instance. > > 1 DB instance > users baseline %idle core_sched %idle > 16 1 84 -5.5% 84 > 24 1 76 -5% 76 > 32 1 69 -0.45% 69 > > 2 DB instance > users baseline %idle core_sched %idle > 16 1 66 -23.8% 69 > 24 1 54 -3.1% 57 > 32 1 42 -21.1% 48
We tried to comment those lines and it doesn’t seem to get rid of the performance regression we are seeing. Can you elaborate a bit more about the test you are performing, what kind of resources it uses ? Can you also try to replicate our test and see if you see the same problem ? cgcreate -g cpu,cpuset:set1 cat /sys/devices/system/cpu/cpu{0,2,4,6}/topology/thread_siblings_list 0,36 2,38 4,40 6,42 echo "0,2,4,6,36,38,40,42" | sudo tee /sys/fs/cgroup/cpuset/set1/cpuset.cpus echo 0 | sudo tee /sys/fs/cgroup/cpuset/set1/cpuset.mems echo 1 | sudo tee /sys/fs/cgroup/cpu,cpuacct/set1/cpu.tag sysbench --test=fileio prepare cgexec -g cpu,cpuset:set1 sysbench --threads=4 --test=fileio \ --file-test-mode=seqwr run The reason we create a cpuset is to narrow down the investigation to just 4 cores on a highly powerful machine. It might not be needed if testing on a smaller machine. Julien