* Ingo Molnar <[EMAIL PROTECTED]> wrote: > We'll see - people can still readily tweak these values under > CONFIG_SCHED_DEBUG=y and give us feedback, and if there's enough > demand for very-finegrained scheduling (throughput be damned), we > could introduce yet another config option or enable the runtime > tunable unconditionally. (Or maybe the best option is Peter Zijlstra's > adaptive granularity idea, that gives the best of both worlds.)
hm, glxgears smoothness regresses with the de-HZ-ification change: with an increasing background load the turning of the wheels quickly becomes visually ugly - with small ruckles instead of smooth rotation. The reason for that is that the 20 msec granularity on my testbox (which is a dual-core box, so the default 10msec turns into 20msec) turns into 40, 60, 80, 100 msec 'observed latency' for glxgears as load increases to 2x, 3x, 4x etc - and a 100 msec pause in rotation is easily perceivable to the human eye (and brain). Before that the delay curve with increasing load was 4msec/8msec/12msec etc. Due to the removal of the HZ dependency we now have upset the granularity picture anyway, so i believe we should do the adaptive granularity thing right now. That will aim for a 40msec task-observable latency, in a load-independent manner. (!) (This is an approach we couldnt even dream of with the previous, fixed-timeslice scheduler.) The code is simple (and it is all in the slowpath), it in essence boils down to this new code: +static long +sched_granularity(struct cfs_rq *cfs_rq) +{ + unsigned int gran = sysctl_sched_latency; + unsigned int nr = cfs_rq->nr_running; + + if (nr > 1) { + gran = gran/nr - gran/nr/nr; + gran = max(gran, sysctl_sched_granularity); + } + + return gran; +} IMO it is a good compromise between long slicing and short slicing: there are two values, one is the "CPU-bound task latency the scheduler aims for", the second one is a minimum granularity (to not do too many context-switches). Peter and me tested this all day with various workloads and extreme-load behavior has improved all over the place - while the server benchmarks (which want less preemption) are still fine too. The glxgear ruckles are all gone. If you do not disagree with this (it's pretty late in the game with more than 1 month spent from the kernel cycle already), please pull the latest scheduler tree from: git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched.git Find the shortlog further below. There are 3 commits in it: adaptive granularity, a subsequent cleanup, and a lockdep sysctl bug Peter noticed while hacking on this. (the bug was introduced with the initial CFS commits but nobody noticed because the lockdep sysctls are rarely used.) The linecount increase is mostly due to the comments added to explain the "gran = lat/nr - lat/nr/nr" magic formula and due to the extra parameter. Tested on 32-bit and 64-bit x86, and with a few make randconfig build tests too. Ingo ------------------> Ingo Molnar (1): sched: cleanup, sched_granularity -> sched_min_granularity Peter Zijlstra (2): sched: fix CONFIG_SCHED_DEBUG dependency of lockdep sysctls sched: adaptive scheduler granularity include/linux/sched.h | 3 + kernel/sched.c | 16 ++++++---- kernel/sched_fair.c | 77 ++++++++++++++++++++++++++++++++++++++++++-------- kernel/sysctl.c | 33 ++++++++++++++------- 4 files changed, 99 insertions(+), 30 deletions(-) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/