Hi Pavan, On Wed, 24 Oct 2018 at 06:53, Pavan Kondeti <pkond...@codeaurora.org> wrote: > > Hi Vincent, > > Thanks for the detailed explanation. > > On Tue, Oct 23, 2018 at 02:15:08PM +0200, Vincent Guittot wrote: > > Hi Pavan, > > > > On Tue, 23 Oct 2018 at 07:59, Pavan Kondeti <pkond...@codeaurora.org> wrote: > > > > > > Hi Vincent, > > > > > > On Fri, Oct 19, 2018 at 06:17:51PM +0200, Vincent Guittot wrote: > > > > > > > > /* > > > > + * The clock_pelt scales the time to reflect the effective amount of > > > > + * computation done during the running delta time but then sync back to > > > > + * clock_task when rq is idle. > > > > + * > > > > + * > > > > + * absolute time | 1| 2| 3| 4| 5| 6| 7| 8| 9|10|11|12|13|14|15|16 > > > > + * @ max capacity ------******---------------******--------------- > > > > + * @ half capacity ------************---------************--------- > > > > + * clock pelt | 1| 2| 3| 4| 7| 8| 9| 10| 11|14|15|16 > > > > + * > > > > + */ > > > > +void update_rq_clock_pelt(struct rq *rq, s64 delta) > > > > +{ > > > > + > > > > + if (is_idle_task(rq->curr)) { > > > > + u32 divider = (LOAD_AVG_MAX - 1024 + > > > > rq->cfs.avg.period_contrib) << SCHED_CAPACITY_SHIFT; > > > > + u32 overload = rq->cfs.avg.util_sum + LOAD_AVG_MAX; > > > > + overload += rq->avg_rt.util_sum; > > > > + overload += rq->avg_dl.util_sum; > > > > + > > > > + /* > > > > + * Reflecting some stolen time makes sense only if the > > > > idle > > > > + * phase would be present at max capacity. As soon as the > > > > + * utilization of a rq has reached the maximum value, it > > > > is > > > > + * considered as an always runnnig rq without idle time to > > > > + * steal. This potential idle time is considered as lost > > > > in > > > > + * this case. We keep track of this lost idle time > > > > compare to > > > > + * rq's clock_task. > > > > + */ > > > > + if (overload >= divider) > > > > + rq->lost_idle_time += rq_clock_task(rq) - > > > > rq->clock_pelt; > > > > + > > > > > > I am trying to understand this better. I believe we run into this > > > scenario, when > > > the frequency is limited due to thermal/userspace constraints. Lets say > > > > Yes these are the most common UCs but this can also happen after tasks > > migration or with a cpufreq governor that doesn't increase OPP fast > > enough for current utilization. > > > > > frequency is limited to Fmax/2. A 50% task at Fmax, becomes 100% running > > > at > > > Fmax/2. The utilization is built up to 100% after several periods. > > > The clock_pelt runs at 1/2 speed of the clock_task. We are loosing the > > > idle time > > > all along. What happens when the CPU enters idle for a short duration and > > > comes > > > back to run this 100% utilization task? > > > > If you are at 100%, we only apply the short idle duration > > > > > > > > If the above block is not present i.e lost_idle_time is not tracked, we > > > stretch the idle time (since clock_pelt is synced to clock_task) and the > > > utilization is dropped. Right? > > > > yes that 's what would happen. I gives more details below > > > > > > > > With the above block, we don't stretch the idle time. In fact we don't > > > consider the idle time at all. Because, > > > > > > idle_time = now - last_time; > > > > > > idle_time = (rq->clock_pelt - rq->lost_idle_time) - last_time > > > idle_time = (rq->clock_task - rq_clock_task + rq->clock_pelt_old) - > > > last_time > > > idle_time = rq->clock_pelt_old - last_time > > > > > > The last time is nothing but the last snapshot of the rq->clock_pelt when > > > the > > > task entered sleep due to which CPU entered idle. > > > > The condition for dropping this idle time is quite important. This > > only happens when the utilization reaches max compute capacity of the > > CPU. Otherwise, the idle time will be fully applied > > Right. > > rq->lost_idle_time += rq_clock_task(rq) - rq->clock_pelt > > This not only tracks the lost idle time due to running slow but also the > absolute/real sleep time. For example, when the slow running 100% task > sleeps for 100 msec, are not we ignoring the 100 msec sleep there? > > For example a task ran 323 msec at full capacity and sleeps for (1000-323) > msec. when it wakes up the utilization is dropped. If the same task runs > for 626 msec at the half capacity and sleeps for (1000-626), should not > drop the utilization by taking (1000-626) sleep time into account. I > understand that why we don't strech idle time to (1000-323) but it is not > clear to me why we completely drop the idle time.
So this should not happen. I' m going to update the way I track lost idle time and move it out of update_rq_clock_pelt() and only do the test when entering idle This is even better as it simplifies update_rq_clock_pelt() and reduces the number of tests for lost idle time Thanks for spotting this I'm preparing a new version with this, some build fix for !SMP and the alignement with cache line suggested by Peter Vincent > > > > > >