On 02/09/2015 11:44 PM, Steven Noonan wrote:
> On Mon, Feb 9, 2015 at 9:56 AM, Steven Noonan <ste...@uplinklabs.net> wrote:
>> On Mon, Feb 9, 2015 at 3:51 AM, Preeti U Murthy
>> <pre...@linux.vnet.ibm.com> wrote:
>>> Hi Steven,
>>>
>>> On 02/09/2015 01:02 PM, Steven Noonan wrote:
>>>> On Sun, Feb 8, 2015 at 8:49 PM, Preeti U Murthy
>>>> <pre...@linux.vnet.ibm.com> wrote:
>>>>> The powerclamp driver injects idle periods to stay within the thermal 
>>>>> constraints.
>>>>> The driver does a fake idle by spawning per-cpu threads that call the 
>>>>> mwait
>>>>> instruction. This behavior of fake idle can confuse the other kernel 
>>>>> subsystems.
>>>>> For instance it calls into the nohz tick handlers, which are meant to be 
>>>>> called
>>>>> only by the idle thread. It sets the state of the tick in each cpu to 
>>>>> idle and
>>>>> stops the tick, when there are tasks on the runqueue. As a result the 
>>>>> callers of
>>>>> idle_cpu()/ tick_nohz_tick_stopped() see different states of the cpu; 
>>>>> while the
>>>>> former thinks that the cpu is busy, the latter thinks that it is idle. 
>>>>> The outcome
>>>>> may be  inconsistency in the scheduler/nohz states which can lead to 
>>>>> serious
>>>>> consequences. One of them was reported on this thread:
>>>>> https://lkml.org/lkml/2014/12/11/365.
>>>>>
>>>>> Thomas posted out a patch to disable the powerclamp driver from calling 
>>>>> into the
>>>>> tick nohz code which has taken care of the above regression for the 
>>>>> moment. However
>>>>> powerclamp driver as a result, will not be able to inject idle periods 
>>>>> due to the
>>>>> presence of periodic ticks. With the current design of fake idle, we 
>>>>> cannot move
>>>>> towards a better solution.
>>>>> https://lkml.org/lkml/2014/12/18/169
>>>>>
>>>>> This patch aims at removing the concept of fake idle and instead makes 
>>>>> the cpus
>>>>> truly idle by throttling the runqueues during the idle injection periods. 
>>>>> The situation
>>>>> is in fact very similar to throttling of cfs_rqs when they exceed their 
>>>>> bandwidths.
>>>>> The idle injection metrics can be mapped to the bandwidth control metrics 
>>>>> 'quota' and
>>>>> 'period' to achieve the same result. When the powerclamping is begun or 
>>>>> when the
>>>>> clamping controls have been modified, the bandwidth for the root task 
>>>>> group is set.
>>>>> The 'quota' will be the amount of time that the system needs to be busy 
>>>>> and 'period'
>>>>> will be the sum of this busy duration and the idle duration as calculated 
>>>>> by the driver.
>>>>> This gets rid of per-cpu kthreads, control cpu, hotplug notifiers and 
>>>>> clamping mask since
>>>>> the thread starting powerclamping will set the bandwidth and throttling 
>>>>> of all cpus will
>>>>> automatically fall in place. None of the other cpus need be bothered 
>>>>> about this. This
>>>>> simplifies the design of the driver.
>>>>>
>>>>> Of course this is only if the idle injection metrics can be conveniently 
>>>>> transformed
>>>>> into bandwidth control metrics. There are a couple of other primary 
>>>>> concerns around if
>>>>> doing the below two in this patch is valid.
>>>>> a. This patch exports the functions to set the quota and period of task 
>>>>> groups.
>>>>> b. This patch removes the constraint of not being able to set the root 
>>>>> task grp's bandwidth.
>>>>>
>>>>> Signed-off-by: Preeti U Murthy <pre...@linux.vnet.ibm.com>
>>>>
>>>> This doesn't compile.
>>>
>>> Thanks for reporting this! I realized that I had not compiled in the 
>>> powerclamp driver
>>> as a module while compile testing it. I was focusing on the issues with the 
>>> design and
>>> failed to cross verify this. Apologies for the inconvenience.
>>>
>>> Find the diff compile tested below.
>>>
>>> I also realized that clamp_cpus() that sets the bandwidth cannot be called 
>>> from
>>> multiple places. Currently I am calling it from end_powerclamp(), when the 
>>> user changes the
>>> idle clamping duration and from a queued timer. This will require 
>>> synchronization between
>>> callers which is not really called for. The queued wakeup_timer alone can 
>>> re-evaluate the
>>> clamping metrics after every throttle-unthrottle period and this should 
>>> suffice as far
>>> as I can see. Thoughts ?
>>
>> Hmm, I've had two system lockups so far while running a kernel with
>> intel_powerclamp loaded. Both times it slowly ground to a halt and
>> processes piled up...

Hmm I see. I am not surprised because this patch is not complete yet.
The idea was to gain suggestions and review around the approach first
before I went ahead to making it robust. Let me ease the creases that I
found and repost this patch for you to test. Thanks a lot for the
testing efforts! :)

Regards
Preeti U Murthy

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to