On 08/15/2012 09:15 PM, Borislav Petkov wrote: > On Wed, Aug 15, 2012 at 01:05:38PM +0200, Peter Zijlstra wrote: >> On Mon, 2012-08-13 at 20:21 +0800, Alex Shi wrote: >>> Since there is no power saving consideration in scheduler CFS, I has a >>> very rough idea for enabling a new power saving schema in CFS. >> >> Adding Thomas, he always delights poking holes in power schemes. >> >>> It bases on the following assumption: >>> 1, If there are many task crowd in system, just let few domain cpus >>> running and let other cpus idle can not save power. Let all cpu take the >>> load, finish tasks early, and then get into idle. will save more power >>> and have better user experience. >> >> I'm not sure this is a valid assumption. I've had it explained to me by >> various people that race-to-idle isn't always the best thing. It has to >> do with the cost of switching power states and the duration of execution >> and other such things. > > I think what he means here is that we might want to let all cores on > the node (i.e., domain) finish and then power down the whole node which > should bring much more power savings than letting a subset of the cores > idle. Alex?
Yes, that is my assumption. If my memory service me well. The idea get from Suresh when introducing the old power saving schema. > > [ … ] > >> So I'd leave the currently implemented scheme as performance, and I >> don't think the above describes the current state. >> >>> } else if (schedule policy == power) >>> move tasks from busiest group to >>> idlest group until busiest is just full >>> of capacity. >>> //the busiest group can balance >>> //internally after next time LB, >> >> There's another thing we need to do, and that is collect tasks in a >> minimal amount of power domains. > > Yep. > > Btw, what heuristic would tell here when a domain overflows and another > needs to get woken? Combined load of the whole domain? > > And if I absolutely positively don't want a node to wake up, do I > hotplug its cores off or are we going to have a way to tell the > scheduler to overcommit the non-idle domains and spread the tasks only > among them. You are right. here using the least load non-idle group is better than idlest. > > I'm thinking of short bursts here where it would be probably beneficial > to let the tasks rather wait runnable for a while then wake up the next > node and waste power... True. Maybe that is Peter mentioned '2*capacity' reason? > > Thanks. > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/