On 4/16/07, Peter Williams <[EMAIL PROTECTED]> wrote:
Note that I talk of run queues not CPUs as I think a shift to multiple CPUs per run queue may be a good idea.
This observation of Peter's is the best thing to come out of this whole foofaraw. Looking at what's happening in CPU-land, I think it's going to be necessary, within a couple of years, to replace the whole idea of "CPU scheduling" with "run queue scheduling" across a complex, possibly dynamic mix of CPU-ish resources. Ergo, there's not much point in churning the mainline scheduler through a design that isn't significantly more flexible than any of those now under discussion. For instance, there are architectures where several "CPUs" (instruction stream decoders feeding execution pipelines) share parts of a cache hierarchy ("chip-level multitasking"). On these machines, you may want to co-schedule a "real" processing task on one pipeline with a "cache warming" task on the other pipeline -- but only for tasks whose memory access patterns have been sufficiently analyzed to write the "cache warming" task code. Some other tasks may want to idle the second pipeline so they can use the full cache-to-RAM bandwidth. Yet other tasks may be genuinely CPU-intensive (or I/O bound but so context-heavy that it's not worth yielding the CPU during quick I/Os), and hence perfectly happy to run concurrently with an unrelated task on the other pipeline. There are other architectures where several "hardware threads" fight over parts of a cache hierarchy (sometimes bizarrely described as "sharing" the cache, kind of the way most two-year-olds "share" toys). On these machines, one instruction pipeline can't help the other along cache-wise, but it sure can hurt. A scheduler designed, tested, and tuned principally on one of these architectures (hint: "hyperthreading") will probably leave a lot of performance on the floor on processors in the former category. In the not-so-distant future, we're likely to see architectures with dynamically reconfigurable interconnect between instruction issue units and execution resources. (This is already quite feasible on, say, Virtex4 FX devices with multiple PPC cores, or Altera FPGAs with as many Nios II cores as fit on the chip.) Restoring task context may involve not just MMU swaps and FPU instructions (with state-dependent hidden costs) but processsor reconfiguration. Achieving "fairness" according to any standard that a platform integrator cares about (let alone an end user) will require a fairly detailed model of the hidden costs associated with different sorts of task switch. So if you are interested in schedulers for some reason other than a paycheck, let the distros worry about 5% improvements on x86[_64]. Get hold of some different "hardware" -- say: - a Xilinx ML410 if you've got $3K to blow and want to explore reconfigurable processors; - a SunFire T2000 if you've got $11K and want to mess with a CMT system that's actually shipping; - a QEMU-simulated massively SMP x86 if you're poor but clever enough to implement funky cross-core cache effects yourself; or - a cycle-accurate simulator from Gaisler or Virtio if you want a real research project. Then go explore some more interesting regions of parameter space and see what the demands on mainline Linux will look like in a few years. Cheers, - Michael - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/