On 02/09/2015 11:44 PM, Steven Noonan wrote: > On Mon, Feb 9, 2015 at 9:56 AM, Steven Noonan <ste...@uplinklabs.net> wrote: >> On Mon, Feb 9, 2015 at 3:51 AM, Preeti U Murthy >> <pre...@linux.vnet.ibm.com> wrote: >>> Hi Steven, >>> >>> On 02/09/2015 01:02 PM, Steven Noonan wrote: >>>> On Sun, Feb 8, 2015 at 8:49 PM, Preeti U Murthy >>>> <pre...@linux.vnet.ibm.com> wrote: >>>>> The powerclamp driver injects idle periods to stay within the thermal >>>>> constraints. >>>>> The driver does a fake idle by spawning per-cpu threads that call the >>>>> mwait >>>>> instruction. This behavior of fake idle can confuse the other kernel >>>>> subsystems. >>>>> For instance it calls into the nohz tick handlers, which are meant to be >>>>> called >>>>> only by the idle thread. It sets the state of the tick in each cpu to >>>>> idle and >>>>> stops the tick, when there are tasks on the runqueue. As a result the >>>>> callers of >>>>> idle_cpu()/ tick_nohz_tick_stopped() see different states of the cpu; >>>>> while the >>>>> former thinks that the cpu is busy, the latter thinks that it is idle. >>>>> The outcome >>>>> may be inconsistency in the scheduler/nohz states which can lead to >>>>> serious >>>>> consequences. One of them was reported on this thread: >>>>> https://lkml.org/lkml/2014/12/11/365. >>>>> >>>>> Thomas posted out a patch to disable the powerclamp driver from calling >>>>> into the >>>>> tick nohz code which has taken care of the above regression for the >>>>> moment. However >>>>> powerclamp driver as a result, will not be able to inject idle periods >>>>> due to the >>>>> presence of periodic ticks. With the current design of fake idle, we >>>>> cannot move >>>>> towards a better solution. >>>>> https://lkml.org/lkml/2014/12/18/169 >>>>> >>>>> This patch aims at removing the concept of fake idle and instead makes >>>>> the cpus >>>>> truly idle by throttling the runqueues during the idle injection periods. >>>>> The situation >>>>> is in fact very similar to throttling of cfs_rqs when they exceed their >>>>> bandwidths. >>>>> The idle injection metrics can be mapped to the bandwidth control metrics >>>>> 'quota' and >>>>> 'period' to achieve the same result. When the powerclamping is begun or >>>>> when the >>>>> clamping controls have been modified, the bandwidth for the root task >>>>> group is set. >>>>> The 'quota' will be the amount of time that the system needs to be busy >>>>> and 'period' >>>>> will be the sum of this busy duration and the idle duration as calculated >>>>> by the driver. >>>>> This gets rid of per-cpu kthreads, control cpu, hotplug notifiers and >>>>> clamping mask since >>>>> the thread starting powerclamping will set the bandwidth and throttling >>>>> of all cpus will >>>>> automatically fall in place. None of the other cpus need be bothered >>>>> about this. This >>>>> simplifies the design of the driver. >>>>> >>>>> Of course this is only if the idle injection metrics can be conveniently >>>>> transformed >>>>> into bandwidth control metrics. There are a couple of other primary >>>>> concerns around if >>>>> doing the below two in this patch is valid. >>>>> a. This patch exports the functions to set the quota and period of task >>>>> groups. >>>>> b. This patch removes the constraint of not being able to set the root >>>>> task grp's bandwidth. >>>>> >>>>> Signed-off-by: Preeti U Murthy <pre...@linux.vnet.ibm.com> >>>> >>>> This doesn't compile. >>> >>> Thanks for reporting this! I realized that I had not compiled in the >>> powerclamp driver >>> as a module while compile testing it. I was focusing on the issues with the >>> design and >>> failed to cross verify this. Apologies for the inconvenience. >>> >>> Find the diff compile tested below. >>> >>> I also realized that clamp_cpus() that sets the bandwidth cannot be called >>> from >>> multiple places. Currently I am calling it from end_powerclamp(), when the >>> user changes the >>> idle clamping duration and from a queued timer. This will require >>> synchronization between >>> callers which is not really called for. The queued wakeup_timer alone can >>> re-evaluate the >>> clamping metrics after every throttle-unthrottle period and this should >>> suffice as far >>> as I can see. Thoughts ? >> >> Hmm, I've had two system lockups so far while running a kernel with >> intel_powerclamp loaded. Both times it slowly ground to a halt and >> processes piled up...
Hmm I see. I am not surprised because this patch is not complete yet. The idea was to gain suggestions and review around the approach first before I went ahead to making it robust. Let me ease the creases that I found and repost this patch for you to test. Thanks a lot for the testing efforts! :) Regards Preeti U Murthy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/