Hi Peter, Vincent, is there anything different I can do on this? Cheers, Patrick
On 28-Jun 15:00, Patrick Bellasi wrote: > On 28-Jun 14:38, Peter Zijlstra wrote: > > On Fri, Jun 28, 2019 at 11:08:14AM +0100, Patrick Bellasi wrote: > > > On 26-Jun 13:40, Vincent Guittot wrote: > > > > Hi Patrick, > > > > > > > > On Thu, 20 Jun 2019 at 17:06, Patrick Bellasi <patrick.bell...@arm.com> > > > > wrote: > > > > > > > > > > The estimated utilization for a task is currently defined based on: > > > > > - enqueued: the utilization value at the end of the last activation > > > > > - ewma: an exponential moving average which samples are the > > > > > enqueued values > > > > > > > > > > According to this definition, when a task suddenly change it's > > > > > bandwidth > > > > > requirements from small to big, the EWMA will need to collect multiple > > > > > samples before converging up to track the new big utilization. > > > > > > > > > > Moreover, after the PELT scale invariance update [1], in the above > > > > > scenario we > > > > > can see that the utilization of the task has a significant drop from > > > > > the first > > > > > big activation to the following one. That's implied by the new > > > > > "time-scaling" > > > > > > > > Could you give us more details about this? I'm not sure to understand > > > > what changes between the 1st big activation and the following one ? > > > > > > We are after a solution for the problem Douglas Raillard discussed at > > > OSPM, specifically the "Task util drop after 1st idle" highlighted in > > > slide 6 of his presentation: > > > > > > > > > http://retis.sssup.it/ospm-summit/Downloads/02_05-Douglas_Raillard-How_can_we_make_schedutil_even_more_effective.pdf > > > > > > > So I see the problem, and I don't hate the patch, but I'm still > > struggling to understand how exactly it related to the time-scaling > > stuff. Afaict the fundamental problem here is layering two averages. The > > second (EWMA in our case) will always lag/delay the input of the first > > (PELT). > > > > The time-scaling thing might make matters worse, because that helps PELT > > ramp up faster, but that is not the primary issue. > > Sure, we like the new time-scaling PELT which ramps up faster and, as > long as we have idle time, it's better in predicting what would be the > utilization as if we was running at max OPP. > > However, the experiment above shows that: > > - despite the task being a 75% after a certain activation, it takes > multiple activations for PELT to actually enter that range. > > - the first activation ends at 665, 10% short wrt the configured > utilization > > - while the PELT signal converge toward the 75%, we have some pretty > consistent drops at wakeup time, especially after the first big > activation. > > > Or am I missing something? > > I'm not sure the above happens because of a problem in the new > time-scaling PELT, I actually think it's kind of expected given the > way we re-scale time contributions depending on the current OPPs. > > It's just that a 375 drops in utilization with just 1.1ms sleep time > looks to me more related to the time-scaling invariance then just the > normal/expected PELT decay. > > Could it be an out-of-sync issue between the PELT time scaling code > and capacity scaling code? > Perhaps due to some OPP changes/notification going wrong? > > Sorry for not being much more useful on that, maybe Vincent has some > better ideas. > > The only thing I've kind of convinced myself is that an EWMA on > util_est does not make a lot of sense for increasing utilization > tracking. > > Best, > Patrick > > -- > #include <best/regards.h> > > Patrick Bellasi -- #include <best/regards.h> Patrick Bellasi