On Wed, Apr 18, 2007 at 12:55:25AM -0500, Matt Mackall wrote: > Why are processes special? Should user A be able to get more CPU time > for his job than user B by splitting it into N parallel jobs? Should > we be fair per process, per user, per thread group, per session, per > controlling terminal? Some weighted combination of the preceding?[2]
On a side note, I think a combination of all of the above is a very good idea, plus process groups (pgrp's). All the make -j loads should come up in one pgrp of one session for one user and hence should be automatically kept isolated in its own corner by such policies. Thread bombs, forkbombs, and so on get handled too, which is good when on e.g. a compileserver and someone rudely spawns too many tasks. Thinking of the scheduler as a CPU bandwidth allocator, this means handing out shares of CPU bandwidth to all users on the system, which in turn hand out shares of bandwidth to all sessions, which in turn hand out shares of bandwidth to all process groups, which in turn hand out shares of bandwidth to all thread groups, which in turn hand out shares of bandwidth to threads. The event handlers for the scheduler need not deal with this apart from task creation and exit and various sorts of process ID changes (e.g. setsid(), setpgrp(), setuid(), etc.). They just determine what the scheduler sees as ->load_weight or some analogue of ->static_prio, though it is possible to do this by means of data structure organization instead of numerical prioritization. It'd probably have to be calculated on the fly by, say, doing fixpoint arithmetic something like user_share(p)*session_share(p)*pgrp_share(p)*tgrp_share(p)*task_share(p) so that readjusting the shares of aggregates doesn't have to traverse lists and remains O(1). Each of the share computations can instead just do some analogue of the calculation p->load_weight/rq->raw_weighted_load in fixpoint, though precision issues with this make me queasy. There is maybe a slight nasty point in that the ->raw_weighted_load analogue for users or whatever the highest level chosen is ends up being global. One might as well get users in there and omit intermediate levels if any are to be omitted so that the truly global state is as read-only as possible. I suppose jacking up the fixpoint precision to 128-bit or 256-bit all below the radix point (our max is 1.0 after all) until precision issues vanish can be done but the idea of that much number crunching in the scheduler makes me rather uncomfortable. I hope u64 or u32 [2] can be gotten away with as far as fixpoint goes. -- wli - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/