On Thu, 2012-09-27 at 07:47 +0200, Ingo Molnar wrote: > * Mike Galbraith <efa...@gmx.de> wrote: > > > I think the pgbench problem is more about latency for the 1 in > > 1:N than spinlocks. > > So my understanding of the psql workload is that basically we've > got a central psql proxy process that is distributing work to > worker psql processes. If a freshly woken worker process ever > preempts the central proxy process then it is preventing a lot > of new work from getting distributed. > > Correct?
Yeah, that's my understanding of the thing, and I played with it quite a bit in the past (only refreshed memories briefly in present). > So the central proxy psql process is 'much more important' to > run than any of the worker processes - an importance that is not > (currently) visible from the behavioral statistics the scheduler > keeps on tasks. Yeah. We had the adaptive waker thing, but it stopped being a winner at the one load it originally did help quite a lot, and it didn't help pgbench all that much in it's then form anyway iirc. > So the scheduler has the following problem here: a new wakee > might be starved enough and the proxy might have run long enough > to really justify the preemption here and now. The buddy > statistics help avoid some of these cases - but not all and the > difference is measurable. > > Yet the 'best' way for psql to run is for this proxy process to > never be preempted. Your SCHED_BATCH experiments confirmed that. Yes. > The way remote CPU selection affects it is that if we ever get > more aggressive in selecting a remote CPU then we, as a side > effect, also reduce the chance of harmful preemption of the > central proxy psql process. Right. > So in that sense sibling selection is somewhat of an indirect > red herring: it really only helps psql indirectly by preventing > the harmful preemption. It also, somewhat paradoxially argues > for suboptimal code: for example tearing apart buddies is > beneficial in the psql workload, because it also allows the more > important part of the buddy to run more (the proxy). Yes, I believe preemption dominates, but it's not alone, you can see that in the numbers. > In that sense the *real* problem isnt even parallelism (although > we obviously should improve the decisions there - and the logic > has suffered in the past from the psql dilemma outlined above), > but whether the scheduler can (and should) identify the central > proxy and keep it running as much as possible, deprioritizing > fairness, wakeup buddies, runtime overlap and cache affinity > considerations. > > There's two broad solutions that I can see: > > - Add a kernel solution to somehow identify 'central' processes > and bias them. Xorg is a similar kind of process, so it would > help other workloads as well. That way lie dragons, but might > be worth an attempt or two. We already try to do a couple of > robust metrics, like overlap statistics to identify buddies. What we do now works well for X and friends I think, because there aren't so many buddies It might work better though, and for the same reasons. I've in fact [re]invented a SCHED_SERVER class a few times, but never one that survived my own scrutiny for long. Arrr, here there be dragons is true ;-) > - Let user-space occasionally identify its important (and less > important) tasks - say psql could mark it worker processes as > SCHED_BATCH and keep its central process(es) higher prio. A > single line of obvious code in 100 KLOCs of user-space code. > > Just to confirm, if you turn off all preemption via a hack > (basically if you turn SCHED_OTHER into SCHED_BATCH), does psql > perform and scale much better, with the quality of sibling > selection and spreading of processes only being a secondary > effect? That has always been the case here. Preemption dominates. Others should play with it too, and let their boxen speak. -Mike -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/