Re: [Announce] [patch] Modular Scheduler Core and Completely Fair

Al Boldi Mon, 16 Apr 2007 01:52:01 -0700

Peter Williams wrote:
> William Lee Irwin III wrote:
> > On Mon, Apr 16, 2007 at 11:06:56AM +1000, Peter Williams wrote:
> >> PS I no longer read LKML (due to time constraints) and would appreciate
> >> it if I could be CC'd on any e-mails suggesting scheduler changes.
> >> PPS I'm just happy to see that Ingo has finally accepted that the
> >> vanilla scheduler was badly in need of fixing and don't really care who
> >> fixes it.
> >> PPS Different schedulers for different aims (i.e. server or work
> >> station) do make a difference.  E.g. the spa_svr scheduler in plugsched
> >> does about 1% better on kernbench than the next best scheduler in the
> >> bunch. PPPS Con, fairness isn't always best as humans aren't very
> >> altruistic and we need to give unfair preference to interactive tasks
> >> in order to stop the users flinging their PCs out the window.  But the
> >> current scheduler doesn't do this very well and is also not very good
> >> at fairness so needs to change.  But the changes need to address
> >> interactive response and fairness not just fairness.
> >
> > Kernel compiles not so useful a benchmark. SDET, OAST, AIM7, etc. are
> > better ones. I'd not bother citing kernel compile results.
>
> spa_svr actually does its best work when the system isn't fully loaded
> as the type of improvement it strives to achieve (minimizing on queue
> wait time) hasn't got much room to manoeuvre when the system is fully
> loaded.  Therefore, the fact that it's 1% better even in these
> circumstances is a good result and also indicates that the overhead for
> keeping the scheduling statistics it uses for its decision making is
> well spent.  Especially, when you consider that the total available room
> for improvement on this benchmark is less than 3%.
>
> To elaborate, the motivation for this scheduler was acquired from the
> observation of scheduling statistics (in particular, on queue wait time)
> on systems running at about 30% to 50% load.  Theoretically, at these
> load levels there should be no such waiting but the statistics show that
> there is considerable waiting (sometimes as high as 30% to 50%).  I put
> this down to "lack of serendipity" e.g.  everyone sleeping at the same
> time and then trying to run at the same time would be complete lack of
> serendipity.  On the other hand, if everyone is synced then there would
> be total serendipity.
>
> Obviously, from the POV of a client, time the server task spends waiting
> on the queue adds to the response time for any request that has been
> made so reduction of this time on a server is a good thing(tm).  Equally
> obviously, trying to achieve this synchronization by asking the tasks to
> cooperate with each other is not a feasible solution and some external
> influence needs to be exerted and this is what spa_svr does -- it nudges
> the scheduling order of the tasks in a way that makes them become well
> synced.
>
> Unfortunately, this is not a good scheduler for an interactive system as
> it minimizes the response times for ALL tasks (and the system as a
> whole) and this can result in increased response time for some
> interactive tasks (clunkiness) which annoys interactive users.  When you
> start fiddling with this scheduler to bring back "interactive
> unfairness" you kill a lot of its superior low overall wait time
> performance.


spa_svr is my favorite, but as you mentioned doesn't work well with ia.  So I 
started instrumenting its behaviour with chew.c (attached).  What I found is 
that prio-levels are way to coarse.  Setting max_tpt_bonus = 3 bounds this 
somewhat, but it was still not enough.  Looking at spa_svr_reassess_bonus 
and changing it to simply adjust prio based on avg_sleep did the trick like 
this:

static void spa_svr_reassess_bonus(struct task_struct *p)
{
        if (p->sdu.spa.avg_sleep_per_cycle >> 10) {
                incr_throughput_bonus(p, 1);
        } else
                decr_throughput_bonus(p);
}


Thanks!

--
Al

/*
 * original idea by Chris Friesen.  Thanks.
 */

#include <stdio.h>
#include <sys/time.h>
#include <sys/resource.h>

#define THRESHOLD_USEC 2000

unsigned long long stamp()
{
        struct timeval tv;
        gettimeofday(&tv, 0);
        return (unsigned long long) tv.tv_usec + ((unsigned long long) tv.tv_sec)*1000000;
}


int main()
{
        unsigned long long thresh_ticks = THRESHOLD_USEC;
        unsigned long long cur, last, start, act, delta;
        struct timespec ts;

        sched_rr_get_interval(0, &ts);
        printf("pid %d, prio %3d, interval of %d nsec\n", getpid(), getpriority(PRIO_PROCESS, 0), ts.tv_nsec);

        start = last = stamp();
        while(1) {
                cur = stamp();
                delta = cur-last;
                if (delta > thresh_ticks) {
			act = last - start;
                        printf("pid %d, prio %3d, out for %4llu ms, ran for %4llu ms, load %3llu%\n"
			, getpid(), getpriority(PRIO_PROCESS, 0), delta/1000, act/1000,(act*100)/(cur-start));
                        start = cur = stamp();
                }
                last = cur;
        }

        return 0;
}

Re: [Announce] [patch] Modular Scheduler Core and Completely Fair

Reply via email to