On Mon, Aug 05, 2019 at 08:55:28AM -0700, Tim Chen wrote: > On 8/2/19 8:37 AM, Julien Desfossez wrote: > > We tested both Aaron's and Tim's patches and here are our results. > > > > Test setup: > > - 2 1-thread sysbench, one running the cpu benchmark, the other one the > > mem benchmark > > - both started at the same time > > - both are pinned on the same core (2 hardware threads) > > - 10 30-seconds runs > > - test script: https://paste.debian.net/plainh/834cf45c > > - only showing the CPU events/sec (higher is better) > > - tested 4 tag configurations: > > - no tag > > - sysbench mem untagged, sysbench cpu tagged > > - sysbench mem tagged, sysbench cpu untagged > > - both tagged with a different tag > > - "Alone" is the sysbench CPU running alone on the core, no tag > > - "nosmt" is both sysbench pinned on the same hardware thread, no tag > > - "Tim's full patchset + sched" is an experiment with Tim's patchset > > combined with Aaron's "hack patch" to get rid of the remaining deep > > idle cases > > - In all test cases, both tasks can run simultaneously (which was not > > the case without those patches), but the standard deviation is a > > pretty good indicator of the fairness/consistency. > > Thanks for testing the patches and giving such detailed data. > > I came to realize that for my scheme, the accumulated deficit of forced idle > could be wiped > out in one execution of a task on the forced idle cpu, with the update of the > min_vruntime, > even if the execution time could be far less than the accumulated deficit. > That's probably one reason my scheme didn't achieve fairness.
Turns out there is a typo error in v3 when setting rq's core_forceidle: diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 26fea68f7f54..542974a8da18 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -3888,7 +3888,7 @@ next_class:; WARN_ON_ONCE(!rq_i->core_pick); if (is_idle_task(rq_i->core_pick) && rq_i->nr_running) - rq->core_forceidle = true; + rq_i->core_forceidle = true; rq_i->core_pick->core_occupation = occ; With this fixed and together with the patch to let schedule always happen, your latest 2 patches work well for the 10s cpuhog test I described previously: https://lore.kernel.org/lkml/20190725143003.GA992@aaronlu/ overloaded workload without any cpu binding doesn't work well though, I haven't taken a closer look yet.