On Thu, Mar 13, 2014 at 11:06 PM, <dirk.brande...@gmail.com> wrote: > From: Dirk Brandewie <dirk.j.brande...@intel.com> > > Some drivers (intel_pstate) need to modify state on a core before it > is completely offline. The ->exit() callback is executed during the > CPU_POST_DEAD phase of the cpu offline process which is too late to > change the state of the core. > > Patch 1 adds an optional callback function to the cpufreq_driver > interface which is called by the core during the CPU_DOWN_PREPARE > phase of cpu offline in __cpufreq_remove_dev_prepare(). > > Patch 2 changes intel_pstate to use the ->exit_prepare callback to do > its cleanup during cpu offline.
Copying stuff from other mail thread here so that we can discuss on a single mail chain. On 14 March 2014 03:09, Rafael J. Wysocki <r...@rjwysocki.net> wrote: > On Thursday, March 13, 2014 12:56:02 PM Viresh Kumar wrote: >> On 13 March 2014 08:07, Rafael J. Wysocki <r...@rjwysocki.net> wrote: >> > On Wednesday, March 12, 2014 02:27:07 PM Dirk Brandewie wrote: >> >> >> > I see two possibilities: >> >> > 1. Move the exit() callback to __cpufreq_remove_dev_prepare(). I >> >> > don't >> >> > have a good understanding of what carnage this would cause in the >> >> > core >> >> > or other scaling drivers. >> >> > >> >> > 2. Add another callback to the cpufreq_driver interface that would be >> >> > call >> >> > from __cpufreq_remove_dev_prepare() if the callback was set. >> > >> > I prefer 2, the reason being that it pretty much is guaranteed not to break >> > anything. For the record, I'm not a fan of adding new cpufreq driver >> > callbacks, >> > but in this particular case it seems we can't really do anything better. >> >> I haven't thought a lot about which one of these two looks better, probably >> Rafael might be correct. But I can see another way out here as this is very >> much driver specific. Why can't we do a register_hotcpu_notifier() from >> pstate driver alone? > > Why would that be better than adding a new callback? Because its becoming more and more confusing. Probably we got the problem right but have wrong solutions for it. But having considered this issue in detail now, I have more inputs. All Dirk and Patrick wanted is to set core to min P-state before it gets offlined. We already have some infrastructure in core which is called this today: cpufreq_generic_suspend(). We can rework on that to get a more ideal solution for both the problems. Over that I don't think Dirk's solution is going to work if we twist the systems a bit. For example, Dirk probably wanted to set P-STATE of every core to MIN when it goes down. But his solution probably doesn't do that right now. As exit() is called only when the policy expires or all the CPUs of that policy are down. Suppose only one CPU out of 4 goes down from a policy, then pstate driver would never know that happened. And that core wouldn't go to min state. I think we have two solutions here: - If its just about setting core a particular freq when it goes down, I think it looks a generic enough problem and so better fix core for that. Probably with help of flags field/suspend-freq (maybe renamed) and without calling drivers exit() at all.. - If this is highly driver specific (which doesn't look like if all we have to do is setting freq to MIN), then better have something like register_hotcpu_notifier() with priority set to -1, so that it gets called after cpufreq. -- viresh -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/