On Thu, 2014-04-24 at 19:14 +0200, Peter Zijlstra wrote: > On Thu, Apr 24, 2014 at 09:53:37AM -0700, Jason Low wrote: > > > > So I thought that the original rationale (commit 1bd77f2d) behind > > updating rq->next_balance in idle_balance() is that, if we are going > > idle (!pulled_task), we want to ensure that the next_balance gets > > calculated without the busy_factor. > > > > If the rq is busy, then rq->next_balance gets updated based on > > sd->interval * busy_factor. However, when the rq goes from "busy" > > to idle, rq->next_balance might still have been calculated under > > the assumption that the rq is busy. Thus, if we are going idle, we > > would then properly update next_balance without the busy factor > > if we update when !pulled_task. > > > > Its late here and I'm confused! > > So the for_each_domain() loop calculates a new next_balance based on > ->balance_interval (which has that busy_factor on, right). > > But if it fails to pull anything, we'll (potentially) iterate the entire > tree up to the largest domain; and supposedly set next_balanced to the > largest possible interval. > > So when we go from busy to idle (!pulled_task), we actually set > ->next_balance to the longest interval. Whereas the commit you > referenced says it sets it to a shorter while. > > Not seeing it.
So this is the way I understand that code: In rebalance_domain, next_balance is suppose to be set to the minimum of all sd->last_balance + interval so that we properly call into rebalance_domains() if one of the domains is due for a balance. In the domain traversals: if (time_after(next_balance, sd->last_balance + interval)) next_balance = sd->last_balance + interval; we update next_balance to a new value if the current next_balance is after, and we only update next_balance to a smaller value. In rebalance_domains, we have code: interval = sd->balance_interval; if (idle != CPU_IDLE) interval *= sd->busy_factor; ... if (time_after(next_balance, sd->last_balance + interval)) { next_balance = sd->last_balance + interval; ... rq->next_balance = next_balance; In the CPU_IDLE case, interval would not include the busy factor, whereas in the !CPU_IDLE case, we multiply the interval by the sd->busy_factor. So as an example, if a CPU is not idle and we run this: rebalance_domain() interval = 1 ms; if (idle != CPU_IDLE) interval *= 64; next_balance = sd->last_balance + 64 ms rq->next_balance = next_balance The rq->next_balance is set to a large value since the CPU is not idle. Then, let's say the CPU then goes idle 1 ms later. The rq->next_balance can be up to 63 ms later, because we computed it when the CPU is not idle. Now that we are going idle, we would have to wait a long time for the next balance. So I believe that the initial reason why rq->next_balance was updated in idle_balance is that if the CPU is in the process of going idle (!pulled_task in idle_balance()), we can reset the rq->next_balance based on the interval = 1 ms, as oppose to having it remain up to 64 ms later (in idle_balance(), interval doesn't get multiplied by sd->busy_factor). -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/