On Tue, 14 Feb 2017 23:49:26 +0100 luca abeni <luca.ab...@santannapisa.it> wrote:
> Hi Steven, > > On Tue, 14 Feb 2017 14:28:48 -0500 > "Steven Rostedt (VMware)" <rost...@goodmis.org> wrote: > > > I was testing Daniel's changes with his test case, and tweaked it a > > little. Instead of having the runtime equal to the deadline, I > > increased the deadline ten fold. > > > > Daniel's test case had: > > > > attr.sched_runtime = 2 * 1000 * 1000; /* 2 ms > > */ attr.sched_deadline = 2 * 1000 * 1000; /* 2 ms */ > > attr.sched_period = 2 * 1000 * 1000 * 1000; /* 2 s */ > > > > To make it more interesting, I changed it to: > > > > attr.sched_runtime = 2 * 1000 * 1000; /* 2 > > ms */ attr.sched_deadline = 20 * 1000 * 1000; /* 20 ms > > */ attr.sched_period = 2 * 1000 * 1000 * 1000; /* 2 s */ > > > > > > The results were rather surprising. The behavior that Daniel's patch > > was fixing came back. The task started using much more than .1% of the > > CPU. More like 20%. > > > > Looking into this I found that it was due to the dl_entity_overflow() > > constantly returning true. That's because it uses the relative period > > against relative runtime vs the absolute deadline against absolute > > runtime. > > > > runtime / (deadline - t) > dl_runtime / dl_period > > I agree that there is an inconsistency here (again, using equations > from the "period=deadline" case with a relative deadline different from > period). > > I am not sure about the correct fix (wouldn't > "runtime / (deadline - t) > dl_runtime / dl_deadline" allow the task to > use a fraction of CPU time equal to dl_runtime / dl_deadline?) > > The current code is clearly wrong (as shown by Daniel), but I do not > understand how the current check can allow the task to consume more > than dl_runtime / dl_period... I need some more time to think about > this issue. > This is in dl_entity_overflow() which is called by update_dl_entity() which has this: if (dl_time_before(dl_se->deadline, rq_clock(rq)) || dl_entity_overflow(dl_se, pi_se, rq_clock(rq))) { dl_se->deadline = rq_clock(rq) + pi_se->dl_deadline; dl_se->runtime = pi_se->dl_runtime; } The comments in this code state: * The policy here is that we update the deadline of the entity only if: * - the current deadline is in the past, * - using the remaining runtime with the current deadline would make * the entity exceed its bandwidth. That second comment is saying that when this task woke up, if the percentage left to run will exceed its bandwidth with the rest of the system then reset its deadline and its runtime. What happens in the current logic, is that overflow() check says, when the deadline is much smaller than the period, "yeah, we're going to exceed our percentage!" so give us more, even though it wont exceed its percentage if we compared runtime with deadline. The relative-runtime / relative-period is a tiny percentage, which does not reflect the percentage that the task is allowed to have before the deadline is hit. The tasks bandwidth should be calculated by the relative-runtime / relative-deadline, as runtime <= deadline <= period, and the runtime should happen within the deadline. When the task wakes up, it currently looks at how much time is left absolute-deadline - t, and compares it to the amount of runtime left. The percentage allowed should still be compared with the percentage between relative-runtime and relative-deadline. The relative-period or even absolute-period, should have no influence in this decision. -- Steve