Hi, On Tue, Dec 18, 2018 at 12:50 PM Quentin Perret <quentin.per...@arm.com> wrote: > > Hi Rafael, > > I skimmed through the previous versions, but apologies if some of these > things have been discussed already :-) > > On Tuesday 18 Dec 2018 at 12:09:05 (+0100), Rafael J. Wysocki wrote: > <...> > > + * - Find an idle state on the basis of the sleep length and state > > statistics > > + * collected over time: > > + * > > + * o Find the deepest idle state whose target residency is less than or > > euqal > > s/euqal/equal > > > <...> > > +/** > > + * struct teo_idle_state - Idle state data used by the TEO cpuidle > > governor. > > + * @early_hits: "Early" CPU wakeups "matching" this state. > > + * @hits: "On time" CPU wakeups "matching" this state. > > + * @misses: CPU wakeups "missing" this state. > > + * > > + * A CPU wakeup is "matched" by a given idle state if the idle duration > > measured > > + * after the wakeup is between the target residency of that state and the > > target > > + * residnecy of the next one (or if this is the deepest available idle > > state, it > > s/residnecy/residency > > > + * "matches" a CPU wakeup when the measured idle duration is at least > > equal to > > + * its target residency). > > + * > > + * Also, from the TEO governor prespective, a CPU wakeup from idle is > > "early" if > > s/prespective/perspective
Thanks for the typo fixes (ongoing spellchecker issues here). > > + * it occurs significantly earlier than the closest expected timer event > > (that > > + * is, early enough to match an idle state shallower than the one matching > > the > > + * time till the closest timer event). Otherwise, the wakeup is "on > > time", or > > + * it is a "hit". > > + * > > + * A "miss" occurs when the given state doesn't match the wakeup, but it > > matches > > + * the time till the closest timer event used for idle state selection. > > + */ > > > <...> > > +static void teo_update(struct cpuidle_driver *drv, struct cpuidle_device > > *dev) > > +{ > > + struct teo_cpu *cpu_data = per_cpu_ptr(&teo_cpus, dev->cpu); > > + unsigned int sleep_length_us = ktime_to_us(cpu_data->sleep_length_ns); > > + int i, idx_hit = -1, idx_timer = -1; > > + unsigned int measured_us; > > + > > + if (cpu_data->time_span_ns >= cpu_data->sleep_length_ns) { > > + /* > > + * One of the safety nets has triggered or this was a timer > > + * wakeup (or equivalent). > > + */ > > + measured_us = sleep_length_us; > > + } else { > > + unsigned int lat = > > drv->states[cpu_data->last_state].exit_latency; > > + > > + measured_us = ktime_to_us(cpu_data->time_span_ns); > > + /* > > + * The delay between the wakeup and the first instruction > > + * executed by the CPU is not likely to be worst-case every > > + * time, so take 1/2 of the exit latency as a very rough > > + * approximation of the average of it. > > + */ > > + if (measured_us >= lat) > > + measured_us -= lat / 2; > > + else > > + measured_us /= 2; > > I'm not sure to understand this; why not removing lat/2 in all cases and > cap at 0 ? Well, 0 is special on platforms where state 0 is a "polling" one, so it's better to avoid it. So this is the reason, but that's a matter of handling the corner case this way or another. > > + } > > > <...> > > + /* > > + * Save idle duration values corresponding to non-timer wakeups for > > + * pattern detection. > > + * > > + * If the total time span between idle state selection and the > > "reflect" > > + * callback is greater than or equal to the sleep length determined at > > + * the idle state selection time, the wakeup is likely to be due to a > > + * timer event. > > Should this paragraph go above the first 'if' of this function ? Of course I can move it. > It checks the same thing and it took me some time to understand what you > were doing while the explanation I needed was down here all along :-) Fair enough. > > + */ > > + if (cpu_data->time_span_ns >= cpu_data->sleep_length_ns) > > + measured_us = UINT_MAX; > > + > > + cpu_data->intervals[cpu_data->interval_idx++] = measured_us; > > + if (cpu_data->interval_idx > INTERVALS) > > + cpu_data->interval_idx = 0; > > +} > > > <...> > > + /* > > + * Count and sum the most recent idle duration values less > > than > > + * the target residency of the state selected so far, find the > > + * max. > > + */ > > + for (i = 0; i < INTERVALS; i++) { > > + unsigned int val = cpu_data->intervals[i]; > > + > > + if (val >= drv->states[idx].target_residency) > > + continue; > > + > > + count++; > > + sum += val; > > + } > > + > > + /* > > + * Give up unless the majority of the most recent idle > > duration > > + * values are in the interesting range. > > + */ > > + if (count > INTERVALS / 2) { > > I understand this probably works well enough in practice, but how 'robust' > is supposed to be this filtering here ? If you have, say, 2 tasks with > prime periods, then the hyper-period (which is when the pattern repeats) > can get high really quickly, and you'll never see the full extent of > that pattern with this mechanism. That's correct, but this is just about whether or not there is a reason to select an idle state shallower than the one selected already. How the pattern looks like exactly doesn't matter too much here AFAICS. > Yes, it's hard to do much better without introducing tons of complexity > in here, but how did you define INTERVALS=8 and so ? I blatantly stole this number from the menu governor. :-) > Is this something you'd like to make configurable at some point ? Maybe? > I guess this could always be done later if need be. Precisely. > > + unsigned int avg_us = div64_u64(sum, count); > > + > > + /* > > + * Avoid spending too much time in an idle state that > > + * would be too shallow. > > + */ > > + if (!(tick_nohz_tick_stopped() && avg_us < > > TICK_USEC)) { > > IIUC, this is saying we shouldn't trust the predicted sleep time if it > is shorter than the tick and the tick is stopped. What is the use case > you have in mind for this ? > > In other words, in which scenario do you expect to see a lot of high > frequency non-timer-related wake-ups and have the tick disabled ? It is mostly theoretical, but the rationale here is that the cost of using a shallow idle state with the tick stopped is potentially very high (the CPU may stay in that idle state for a very long time then), so it is better to assume misprediction in that case to avoid that cost. > > + idx = teo_find_shallower_state(drv, dev, idx, > > avg_us); > > + duration_us = avg_us; > > + } > > + } > > + } Cheers, Rafael