Re: [RFC PATCH v1 00/11] Create fast idle path for short idle periods

Peter Zijlstra Tue, 18 Jul 2017 06:23:54 -0700

On Tue, Jul 18, 2017 at 11:14:57AM +0800, Li, Aubrey wrote:
> On 2017/7/18 3:23, Peter Zijlstra wrote:
> > On Fri, Jul 14, 2017 at 09:26:19AM -0700, Andi Kleen wrote:
> >>> And as said; Daniel has been working on a better predictor -- now he's
> >>> probably not used it on the network workload you're looking at, so that
> >>> might be something to consider.
> >>
> >> Deriving a better idle predictor is a bit orthogonal to fast idle.
> > 
> > No. If you want a different C state selected we need to fix the current
> > C state selector. We're not going to tinker.
> > 
> > And the predictor is probably the most fundamental part of the whole C
> > state selection logic.
> > 
> > Now I think the problem is that the current predictor goes for an
> > average idle duration. This means that we, on average, get it wrong 50%
> > of the time. For performance that's bad.
> > 
> > If you want to improve the worst case, we need to consider a cumulative
> > distribution function, and staying with the Gaussian assumption already
> > present, that would mean using:
> > 
> >      1              x - mu
> > CDF(x) = - [ 1 + erf(-------------) ]
> >      2           sigma sqrt(2)
> > 
> > Where, per the normal convention mu is the average and sigma^2 the
> > variance. See also:
> > 
> >   https://en.wikipedia.org/wiki/Normal_distribution
> > 
> > We then solve CDF(x) = n% to find the x for which we get it wrong n% of
> > the time (IIRC something like: 'mu - 2sigma' ends up being 5% or so).
> > 
> > This conceptually gets us better exit latency for the cases where we got
> > it wrong before, and practically pushes down the estimate which gets us
> > C1 longer.
> > 
> > Of course, this all assumes a Gaussian distribution to begin with, if we
> > get bimodal (or worse) distributions we can still get it wrong. To fix
> > that, we'd need to do something better than what we currently have.
> > 
> Maybe you are talking about applying some machine learning algorithm online
> to fit a multivariate normal distribution, :)


Nah, nothing that fancy..

Something that _could_ work and deals with arbitrary distributions is
buckets divided on the actual C-state selection boundaries and a
(cyclic) array of the N most recent idle times.

Something like so:

struct est_data {
        u8 array[64]
        u8 *dist;
        u8 idx;
}

DEFINE_PER_CPU(struct est_data, est_data);

void est_init(void)
{
        int size = drv->state_count;
        int cpu;

        for_each_possible_cpu(cpu) {
                per_cpu(est_data, cpu).dist = kzalloc(size);
                // handle errors
        }
}

u8 est_duration_2_state(u64 duration)
{
        for (i=0; i<drv->state_count; i++) {
                if (duration/1024 < drv->state[i].target_residency)
                        return i;
        }

        return i-1;
}

void est_contemplate(u64 duration)
{
        struct est_data *ed = this_cpu_ptr(&est_data);
        int state = est_duration_2_state(duration);
        int idx = (ed->idx++ % ARRAY_SIZE(ed->array);

        ed->dist[ed->array[idx]]--;
        ed->array[idx] = state;
        ed->dist[ed->array[idx]]++;
}

int est_state(int pct)
{
        struct est_data *ed = this_cpu_ptr(&est_data);
        int limit = pct * ARRAY_SIZE(ed->array) / 100; /* XXX move div out of 
line */
        int cnt, last = 0;

        /* CDF */
        for (i=0; i<drv->state_count; i++) {
                cnt += ed->dist[i];
                if (cnt > limit)
                        break;
                last = i;
        }

        return last;
}


> Well, back to the problem, when the scheduler picks up idle thread, it does
> not look at the history, nor make the prediction. So it's possible it has
> to switch back a task ASAP when it's going into idle(very common under some
> workloads).
> 
> That is, (idle_entry + idle_exit) > idle. If the system has multiple
> hardware idle states, then:
> 
> (idle_entry + idle_exit + HW_entry + HW_exit) > HW_sleep
> 
> So we eventually want the idle path lighter than what we currently have.
> A complex predictor may have high accuracy, but the cost could be high as 
> well.

I never suggested anything complex. The current menu thing uses an
average, all I said is if instead of the average you use something less,
say 'avg - 2*stdev' (it already computes the stdev) you get something,
which assuming Gaussian, is less than ~5 wrong on exit latency.

The above, also simple thing, uses a generic distribution function,
which works because it uses the exact boundaries we're limited to
anyway.

Of course, the above needs to be augmented with IRQ bits etc..

Re: [RFC PATCH v1 00/11] Create fast idle path for short idle periods

Reply via email to