On 4/19/07, hui Bill Huey <[EMAIL PROTECTED]> wrote:
DSP operations like, particularly with digital synthesis, tend to max the CPU doing vector operations on as many processors as it can get a hold of. In a live performance critical application, it's important to be able to deliver a protected amount of CPU to a thread doing that work as well as response to external input such as controllers, etc...
Actual fractional CPU reservation is a bit different, and is probably best handled with "container"-type infrastructure (not quite virtualization, but not quite scheduling classes either). SGI pioneered this (in "open systems" space -- IBM probably had it first, as usual) with GRIO in XFS. (That was I/O throughput reservation of course, not "CPU bandwidth" -- but IIRC IRIX had CPU reservation too). There's a more general class of techniques in which it's worth spending idle cycles speculating along paths that might or might not be taken depending on unpredictable I/O; I'd be surprised if you couldn't approximate most of the sane balancing strategies in this area within the "economic dispatch" scheduler model. (Good JIT bytecode engines more or less do this already if you let them, with a cap on JIT cache size serving as a crude CPU throttle.)
> In practice, you probably don't want to burden desktop Linux with > priority inheritance where you don't have to. Priority queues with > algorithmically efficient decrease-key operations (Fibonacci heaps and > their ilk) are complicated to implement and have correspondingly high > constant factors. (However, a sufficiently clever heuristic for > assigning quasi-static task priorities would usually short-circuit the > priority cascade; if you can keep N small in the > tasks-with-unpredictable-priority queue, you can probably use a > simpler flavor with O(log N) decrease-key. Ask someone who knows more > about data structures than I do.) These are app issue and not really somethings that's mutable in kernel per se with regard to the -rt patch.
I don't know where the -rt patch enters in. But if you need agile reprioritization with a deep runnable queue, either under direct application control or as a side effect of priority inheritance or a related OS-enforced protocol, then you need a kernel-level data structure with a fancier interface than the classic insert/find/delete-min priority queue. From what I've read (this is not my area of expertise and I don't have Knuth handy), the relatively simple heap-based implementations of priority queues can't reprioritize an entry any more quickly than find+delete+insert, which pretty much rules them out as a basis for a scalable scheduler with priority inheritance (let alone PCP emulation).
I have Solaris style adaptive locks in my tree with my lockstat patch under -rt. I've also modified my lockstat patch to track readers correctly now with rwsem and the like to see where the single reader limitation in the rtmutex blows it.
Ooh, that's neat. The next time I can cook up an excuse to run a kernel that won't load this damn WiFi driver, I'll try it out. Some of the people I work with are real engineers and respect in-system instrumentation.
So far I've seen less than 10 percent of in-kernel contention events actually worth spinning on and the rest of the stats imply that the mutex owner in question is either preempted or blocked on something else.
That's a good thing; it implies that in-kernel algorithms don't take locks needlessly as a matter of cargo-cult habit. Attempting to take a lock (other than an uncontended futex, which is practically free) should almost always transfer control to the thread that has the power to deliver the information (or the free slot) that you're looking for -- or in the case of an external data source/sink, should send you into low-power mode until time and tide give you something new to do. Think of it as a just-in-time inventory system; if you keep too much product in stock (or free warehouse space), you're wasting space and harming responsiveness to a shift in demand. Once in a while you have to play Sokoban in order to service a request promptly; that's exactly the case that priority inheritance is meant to help with. The fiddly part, on a non-real-time-no-matter-what-the-label-says system with an opaque cache architecture and mysterious hidden costs of context switching, is to minimize the lossage resulting from brutal timer- or priority-inheritance-driven preemption. Given the way people code these days -- OK, it was probably just as bad back in the day -- the only thing worse than letting the axe fall at random is to steal the CPU away the moment a contended lock is released, because the next 20 lines of code probably poke one last time at all the data structures the task had in cache right before entering the critical section. That doesn't hurt so bad on RTOS-friendly hardware -- an MMU-less system with either zero or near-infinite cache -- but it's got to make this year's Intel/AMD/whatever kotatsu stall left and right when that task gets rescheduled.
I've been trying to get folks to try this on a larger machine than my 2x AMD64 box so that I there is more data regarding Linux contention and overschedulling in -rt.
Dave Miller, maybe? He seems to be one of the few people around here with the skills and the institutional motivation to push for scalability under the typical half-assed middle-tier application workload. Which, make no mistake, stands to benefit a lot more from a properly engineered SCHED_OTHER scheduler than any "real-time" media gadget that sells by the forklift-load at Fry's. (NUMA boxen might also benefit, but AFAICT their target applications are different enough not to count.) And if anyone from Cavium is lurking, now would be a real good time to show up and shoulder some of the load. Heck, even Intel ought to pitch in -- the Yonah team may have saved your ass for now, but you'll be playing the 32-thread throughput-per-erg stakes soon enough. Cheers, - Michael - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/