On 4/19/07, Con Kolivas <[EMAIL PROTECTED]> wrote:
The cpu scheduler core is a cpu bandwidth and latency proportionator and should be nothing more or less.
Not really. The CPU scheduler is (or ought to be) what electric utilities call an economic dispatch mechanism -- a real-time controller whose goal is to service competing demands cost-effectively from a limited supply, without compromising system stability. If you live in the 1960's, coal and nuclear (and a little bit of fig-leaf hydro) are all you have, it takes you twelve hours to bring plants on and off line, and there's no live operational control or pricing signal between you and your customers. So you're stuck running your system at projected peak + operating margin, dumping excess power as waste heat most of the time, and browning or blacking people out willy-nilly when there's excess demand. Maybe you get to trade off shedding the loads with the worst transmission efficiency against degrading the customers with the most tolerance for brownouts (or the least regulatory clout). That's life without modern economic dispatch. If you live in 2007, natural gas and (outside the US) better control over nuclear plants give you more ability to ramp supply up and down with demand on something like a 15-minute cycle. Better yet, you can store a little energy "in the grid" to smooth out instantaneous demand fluctuations; if you're lucky, you also have enough fast-twitch hydro (thanks, Canada!) that you can run your coal and lame-ass nuclear very close to base load even when gas is expensive, and even pump water back uphill when demand dips. (Coal is nasty stuff and a worse contributor by far to radiation exposure than nuclear generation; but on current trends it's going to last a lot longer than oil and gas, and it's a lot easier to stockpile next to the generator.) Best of all, you have industrial customers who will trade you live control (within limits) over when and how much power they take in return for a lower price per unit energy. Some of them will even dump power back into the grid when you ask them to. So now the biggest challenge in making supply and demand meet (in the short term) is to damp all the different ways that a control feedback path might result in an oscillation -- or in runaway pricing. Because there's always some asshole greedhead who will gamble with system stability in order to game the pricing mechanism. Lots of 'em, if you're in California and your legislature is so dumb, or so bought, that they let the asshole greedheads design the whole system so they can game it to the max. (But that's a whole 'nother rant.) Embedded systems are already in 2007, and the mainline Linux scheduler frankly sucks on them, because it thinks it's back in the 1960's with a fixed supply and captive demand, pissing away "CPU bandwidth" as waste heat. Not to say it's an easy problem; even academics with a dozen publications in this area don't seem to be able to model energy usage to the nearest big O, let alone design a stable economic dispatch engine. But it helps to acknowledge what the problem is: even in a 1960's raised-floor screaming-air-conditioners screw-the-power-bill machine room, you can't actually run a half-decent CPU flat out any more without burning it to a crisp. You can act ignorant and let the PMIC brown you out when it has to. Or you can start coping in mainline the way that organizations big enough (and smart enough) to feel the heat in their pocketbooks do in their pet kernels. (Boo on Google for not sharing, and props to IBM for doing their damnedest.) And guess what? The system will actually get simpler, and stabler, and faster, and easier to maintain, because it'll be based on a real theory of operation with equations and things instead of a bunch of opaque, undocumented shotgun heuristics. This hypothetical economic-dispatch scheduler will still _have_ heuristics, of course -- you can't begin to model a modern CPU accurately on-line. But they will be contained in _data_ rather than _code_, and issues of numerical stability will be separated cleanly from the rule set. You'll be able to characterize the rule set's domain of stability, given a conservative set of assumptions about the feedback paths in the system under control, with the sort of techniques they teach in the engineering schools that none of us (me included) seem to have attended. (I went to school thinking I was going to be a physicist. Wishful thinking -- but I was young and stupid. What's your excuse? ;-) OK, it feels better to have that off my chest. Apologies to those readers -- doubtless the vast majority of LKML, including everyone else in this thread -- for whom it's irrelevant, pseudo-learned pontification with no patch attached. And my sincere thanks to Ingo, Con, and really everyone else CC'ed, without whom Linux wouldn't be as good as it is (really quite good, all things considered) and wouldn't contribute as much as it does to my own livelihood. Cheers, - Michael - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/