On Wed, May 11, 2016 at 06:17:51AM +0200, Mike Galbraith wrote: > > > static inline unsigned long cfs_rq_runnable_load_avg(struct cfs_rq > > > *cfs_rq) > > > { > > > +> > > > if (sched_feat(LB_TIP_AVG_HIGH) && cfs_rq->load.weight > > > > cfs_rq->runnable_load_avg*2) > > > +> > > > > > return cfs_rq->runnable_load_avg + min_t(unsigned > > > long, NICE_0_LOAD, > > > +> > > > > > > > > > > > > > > > > > > cfs_rq->load.weight/2); > > > > > > > return cfs_rq->runnable_load_avg; > > > } > > > > cfs_rq->runnable_load_avg is for sure no greater than (in this case much > > less > > than, maybe 1/2 of) load.weight, whereas load_avg is not necessarily a rock > > in gearbox that only impedes speed up, but also speed down. > > Yeah, just like everything else, it'll cuts both ways (why you can't > win the sched game). If I can believe tbench, at tasks=cpus, reducing > lag increased utilization and reduced latency a wee bit, as did the > reserve thing once a booboo got fixed up.
Ok, so you have a secret IDLE_RESERVE? Good luck and show it, ;) > Makes sense, robbing Peter > to pay Paul should work out better for Paul. > > NO_LB_TIP_AVG_HIGH > Throughput 27132.9 MB/sec 96 clients 96 procs max_latency=7.656 ms > Throughput 28464.1 MB/sec 96 clients 96 procs max_latency=9.905 ms > Throughput 25369.8 MB/sec 96 clients 96 procs max_latency=7.192 ms > Throughput 25670.3 MB/sec 96 clients 96 procs max_latency=5.874 ms > Throughput 29309.3 MB/sec 96 clients 96 procs max_latency=1.331 ms > avg 27189 1.000 6.391 1.000 > > NO_LB_TIP_AVG_HIGH IDLE_RESERVE > Throughput 24437.5 MB/sec 96 clients 96 procs max_latency=1.837 ms > Throughput 29464.7 MB/sec 96 clients 96 procs max_latency=1.594 ms > Throughput 28023.6 MB/sec 96 clients 96 procs max_latency=1.494 ms > Throughput 28299.0 MB/sec 96 clients 96 procs max_latency=10.404 ms > Throughput 29072.1 MB/sec 96 clients 96 procs max_latency=5.575 ms > avg 27859 1.024 4.180 0.654 > > LB_TIP_AVG_HIGH NO_IDLE_RESERVE > Throughput 29068.1 MB/sec 96 clients 96 procs max_latency=5.599 ms > Throughput 26435.6 MB/sec 96 clients 96 procs max_latency=3.703 ms > Throughput 23930.0 MB/sec 96 clients 96 procs max_latency=7.742 ms > Throughput 29464.2 MB/sec 96 clients 96 procs max_latency=1.549 ms > Throughput 24250.9 MB/sec 96 clients 96 procs max_latency=1.518 ms > avg 26629 0.979 4.022 0.629 > > LB_TIP_AVG_HIGH IDLE_RESERVE > Throughput 30340.1 MB/sec 96 clients 96 procs max_latency=1.465 ms > Throughput 29042.9 MB/sec 96 clients 96 procs max_latency=4.515 ms > Throughput 26718.7 MB/sec 96 clients 96 procs max_latency=1.822 ms > Throughput 28694.4 MB/sec 96 clients 96 procs max_latency=1.503 ms > Throughput 28918.2 MB/sec 96 clients 96 procs max_latency=7.599 ms > avg 28742 1.057 3.380 0.528 > > > But I really don't know the load references in select_task_rq() should be > > what kind. So maybe the real issue is a mix of them, i.e., conflated > > balancing > > and just wanting an idle cpu. ? > > Depends on the goal. For both, load lagging reality means the high > frequency component is squelched, meaning less migration cost, but also > higher latency due to stacking. It's a tradeoff where Chris' latency > is everything" benchmark, and _maybe_ the real world load it's based > upon is on Peter's end of the rob Peter to pay Paul transaction. The > benchmark says it definitely is, the real world load may have already > been fixed up by the select_idle_sibling() rewrite. Obviously, load avgs are good at balancing in a larger scale in a timeframe, so they should be used in comparing/balancing sd's not cpus. However, this is not the case currently: avgs are mixed with idle cpu/core selection, so I think better job can be done before and after select_idle_sibling(). For example, I don't know what the complex wake_affine() is really doing for what. Am i missing something, you think? Kudos to select_idle_sibling() rewrite, like Peter said, a second step and an even third step scans are really helping, in addition to many cleanups and refactors.