Hi Leo, Sorry for the delay in responding...
On Saturday 21 Apr 2018 at 00:27:53 (+0800), Leo Yan wrote: > On Fri, Apr 20, 2018 at 03:42:45PM +0100, Quentin Perret wrote: > > Hi Leo, > > > > On Wednesday 18 Apr 2018 at 20:15:47 (+0800), Leo Yan wrote: > > > Sorry I introduce mess at here to spread my questions in several > > > replying, later will try to ask questions in one replying. Below are > > > more questions which it's good to bring up: > > > > > > The code for energy computation is quite neat and simple, but I think > > > the energy computation mixes two concepts for CPU util: one concept is > > > the estimated CPU util which is used to select CPU OPP in schedutil, > > > another concept is the raw CPU util according to CPU real running time; > > > for example, cpu_util_next() predicts CPU util but this value might be > > > much higher than cpu_util(), especially after enabled UTIL_EST feature > > > (I have shallow understanding for UTIL_EST so correct me as needed); > > > > I'm not not sure to understand what you mean by higher than cpu_util() > > here ... In which case would that happen ? > > After UTIL_EST feature is enabled, cpu_util_next() returns higher value > than cpu_util(), see below code 'util = max(util, util_est);'; as > result cpu_util_next() takes consideration for extra compensention > introduced by UTIL_EST. > > if (sched_feat(UTIL_EST)) { > util_est = READ_ONCE(cfs_rq->avg.util_est.enqueued); > if (dst_cpu == cpu) > util_est += _task_util_est(p); > else > util_est = max_t(long, util_est - _task_util_est(p), 0); > util = max(util, util_est); > } So, cpu_util() accounts for the UTIL_EST compensation: static inline unsigned long cpu_util(int cpu) { struct cfs_rq *cfs_rq; unsigned int util; cfs_rq = &cpu_rq(cpu)->cfs; util = READ_ONCE(cfs_rq->avg.util_avg); if (sched_feat(UTIL_EST)) util = max(util, READ_ONCE(cfs_rq->avg.util_est.enqueued)); return min_t(unsigned long, util, capacity_orig_of(cpu)); } So cpu_util_next() just mimics that. > > > cpu_util_next() is basically used to figure out what will be the > > cpu_util() of CPU A after task p has been enqueued on CPU B (no matter > > what A and B are). > > Same with upper description, cpu_util_next() is not the same thing > with cpu_util(), cpu_util_next() takes consideration for extra > compensention introduced by UTIL_EST. > > > > but this patch simply computes CPU capacity and energy with the single > > > one CPU utilization value (and it will be an inflated value afte enable > > > UTIL_EST). Is this purposed for simple implementation? > > > > > > IMHO, cpu_util_next() can be used to predict CPU capacity, on the other > > > hand, should we use the CPU util without UTIL_EST capping for 'sum_util', > > > this can be more reasonable to reflect the CPU utilization? > > > > Why would a decayed utilisation be a better estimate of the time that > > a task is going to spend on a CPU ? > > IIUC, in the scheduler waken up path task_util() is the task utilisation > before task sleeping, so it's not a decayed value. I don't think this is correct. sync_entity_load_avg() is called in select_task_rq_fair() so task_util() *is* decayed upon wakeup. > cpu_util() is > decayed value, This is not necessarily correct either. As mentioned above, cpu_util() includes the UTIL_EST compensation, so the value isn't necessarily decayed. > but is this just we want to reflect cpu historic > utilisation at the recent past time? This is the reason I bring up to > use 'cpu_util() + task_util()' as estimation. > > I understand this patch tries to use pre-decayed value, No, this patch tries to estimate what will be the return value of cpu_util() if the task is enqueued on a specific CPU. This value can be the util_avg (decayed) or the util_est (non-decayed) depending on the conditions. > please review > below example has issue or not: > if one CPU's cfs_rq->avg.util_est.enqueued is quite high value, then this > CPU enter idle state and sleep for long while, if we use > cfs_rq->avg.util_est.enqueued to estimate CPU utilisation, this might > have big deviation than the CPU run time if place wake task on it? On > the other hand, cpu_util() can decay for CPU idle time... > > > > Furthermore, if we consider RT thread is running on CPU and connect with > > > 'schedutil' governor, the CPU will run at maximum frequency, but we > > > cannot say the CPU has 100% utilization. The RT thread case is not > > > handled in this patch. > > > > Right, we don't account for RT tasks in the OPP prediction for now. > > Vincent's patches to have a util_avg for RT runqueues could help us > > do that I suppose ... > > Good to know this. > > > Thanks ! > > Quentin