On Thu, 2007-11-01 at 22:01 +0530, Srivatsa Vaddagiri wrote: > On Thu, Nov 01, 2007 at 01:20:08PM +0100, Peter Zijlstra wrote: > > On Thu, 2007-11-01 at 13:03 +0100, Peter Zijlstra wrote: > > > On Thu, 2007-11-01 at 12:58 +0100, Peter Zijlstra wrote: > > > > > > > > sched_slice() is about lantecy, its intended purpose is to ensure each > > > > > task is ran exactly once during sched_period() - which is > > > > > sysctl_sched_latency when nr_running <= sysctl_sched_nr_latency, and > > > > > otherwise linearly scales latency. > > > > > > The thing that got my brain in a twist is what to do about the non-leaf > > > nodes, for those it seems I'm not doing the right thing - I think. > > > > Ok, suppose a tree like so: > > > > > > level 2 cfs_rq > > A B > > > > level 1 cfs_rqA cfs_rqB > > A0 B0 - B99 > > > > > > So for sake of determinism, we want to impose a period in which all > > level 1 tasks will have ran (at least) once. > > Peter, > I fail to see why this requirement to "determine a period in > which all level 1 tasks will have ran (at least) once" is essential.
Because it gives a steady feel to things. For humans its most essential that things run in a predicable fashion. So no only does it matter how much time a task gets, it also very much matters when it gets that. It contributes to the feeling of gradual slow down. > I am visualizing each of the groups to be similar to Xen-like partitions > which are given fair timeslices by the hypervisor (Linux kernel in this > case). How each partition (group in this case) manages the allocated > timeslice(s) to provide fairness to tasks within that partition/group should > not > (IMHO) depend on other groups and esp. how many tasks other groups has. Agreed, I've realised this since my last mail, one group should not influence another in such a fashion, in this respect you don't want to flatten it like I did. > For ex: before this patch, fair time would be allocated to group and > their tasks as below: > > A0 B0-B9 A0 B10-B19 A0 B20-B29 > |--------|--------|--------|--------|--------|--------|-----//--| > 0 10ms 20ms 30ms 40ms 50ms 60ms > > i.e during the first 10ms allocated to group B, B0-B9 run, > during the next 10ms allocated to group B, B10-B19 run etc > > What's wrong with this scheme? What made me start tinkering here is that the nested level is again distributing wall-time without being aware of the fraction it gets from the upper levels. So if we have two groups, A and B, and B is selected for 1/2 of period, then Bn will again divide period, even though in reality it will only have p/2. So I guess, I need a top down traversal, not a bottom up traversal to get this fixed up... I'll ponder this. > By letting __sched_period() be determined for each group independently, > we are building stronger isolation between them, which is good IMO > (imagine a rogue container that does a fork bomb). Agreed. > > Index: linux-2.6/kernel/sched_fair.c > > =================================================================== > > --- linux-2.6.orig/kernel/sched_fair.c > > +++ linux-2.6/kernel/sched_fair.c > > @@ -341,7 +341,7 @@ static u64 sched_slice(struct cfs_rq *cf > > do_div(slice, cfs_rq->load.weight); > > } > > > > - return slice; > > + return min_t(u64, sysctl_sched_latency, slice); > which seems to be more or less giving what we already have w/o the > patch? Well, its basically giving up on overload, admittedly not very nice. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/