On Mon, 2015-04-13 at 15:49 -0700, Jason Low wrote:
> hmm, so taking a look at the patch again, it looks like we pass nohz
> balance even when the NOHZ_BALANCE_KICK is not set on the current CPU.
> We should separate the 2 conditions:
>
> if (!test_bit(NOHZ_BALANCE_KICK, nohz_flags(this_cpu))
> > ---
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index fdae26e..d636bf7 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -7620,6 +7620,16 @@ out:
> > }
> >
> > #ifdef CONFIG_NO_HZ_COMMON
> > +static inline bool nohz_kick_needed(struct rq *rq);
> >
On Fri, 2015-04-10 at 14:07 +0530, Srikar Dronamraju wrote:
> At this point, I also wanted to understand why we do
> "nohz.next_balance++" nohz_balancer_kick()?
So this looks like something that was added to avoid
nohz_balancer_kick() getting called too frequently. Otherwise, it may
get called in
On Fri, 2015-04-10 at 14:07 +0530, Srikar Dronamraju wrote:
> > > >
> > > > #ifdef CONFIG_NO_HZ_COMMON
> > > > +static inline bool nohz_kick_needed(struct rq *rq);
> > > > +
> > > > +static inline void pass_nohz_balance(struct rq *this_rq, int this_cpu)
> > > > +{
> > > > + clear_bit(NOHZ_BA
Hi Jason,
On 04/08/2015 05:37 AM, Jason Low wrote:
> On Tue, 2015-04-07 at 16:28 -0700, Jason Low wrote:
>
>> Okay, so perhaps we can also try continuing nohz load balancing if we
>> find that there are overloaded CPUs in the system.
Sorry about the delay. Ok I will test out the below patch and
* Jason Low [2015-04-08 19:39:15]:
> On Wed, 2015-04-08 at 16:42 +0530, Srikar Dronamraju wrote:
> > * Jason Low [2015-04-07 17:07:46]:
> > > @@ -7687,7 +7700,7 @@ static inline bool nohz_kick_needed(struct rq *rq)
> > > int nr_busy, cpu = rq->cpu;
> > > bool kick = false;
> > >
> > > - if
On Wed, 2015-04-08 at 16:42 +0530, Srikar Dronamraju wrote:
> * Jason Low [2015-04-07 17:07:46]:
> > @@ -7687,7 +7700,7 @@ static inline bool nohz_kick_needed(struct rq *rq)
> > int nr_busy, cpu = rq->cpu;
> > bool kick = false;
> >
> > - if (unlikely(rq->idle_balance))
> > + if (unli
On Wed, 2015-04-08 at 16:42 +0530, Srikar Dronamraju wrote:
> * Jason Low [2015-04-07 17:07:46]:
>
> > On Tue, 2015-04-07 at 16:28 -0700, Jason Low wrote:
> >
> > > Okay, so perhaps we can also try continuing nohz load balancing if we
> > > find that there are overloaded CPUs in the system.
> >
* Jason Low [2015-04-07 17:07:46]:
> On Tue, 2015-04-07 at 16:28 -0700, Jason Low wrote:
>
> > Okay, so perhaps we can also try continuing nohz load balancing if we
> > find that there are overloaded CPUs in the system.
>
> Something like the following.
>
> ---
> diff --git a/kernel/sched/fair
On Tue, 2015-04-07 at 16:28 -0700, Jason Low wrote:
> Okay, so perhaps we can also try continuing nohz load balancing if we
> find that there are overloaded CPUs in the system.
Something like the following.
---
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index fdae26e..d636bf7 100644
On Sat, 2015-04-04 at 15:29 +0530, Preeti U Murthy wrote:
> Solution 1: As exists in the mainline
> Solution 2: nohz_idle_balance(); rebalance_domains() on the ILB CPU
> Solution 3: Above patch.
>
> I observe that Solution 3 is not as aggressive in spreading load as
> Solution 2. With Solution 2,
On Tue, 2015-04-07 at 12:39 -0700, Tim Chen wrote:
> How about consolidating the code for passing the
> nohz balancing and call it at both places.
> Something like below. Make the code more readable.
>
> Tim
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 40667cb..16f6904 1
On Tue, 2015-04-07 at 10:42 -0700, Jason Low wrote:
> On Fri, 2015-04-03 at 15:35 -0700, Tim Chen wrote:
> > I think we can get rid of the done_balancing boolean
> > and make it a bit easier to read if we change the above code to
> >
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> >
On Fri, 2015-04-03 at 15:35 -0700, Tim Chen wrote:
> I think we can get rid of the done_balancing boolean
> and make it a bit easier to read if we change the above code to
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index bcfe320..08317dc 100644
> --- a/kernel/sched/fair.c
> +++
On 04/02/2015 11:29 AM, Jason Low wrote:
> On Wed, 2015-04-01 at 18:04 +0100, Morten Rasmussen wrote:
>> On Wed, Apr 01, 2015 at 07:49:56AM +0100, Preeti U Murthy wrote:
>
>>> I am sorry I don't quite get this. Can you please elaborate?
>>
>> I think the scenario is that we are in nohz_idle_balanc
On Wed, 2015-04-01 at 22:59 -0700, Jason Low wrote:
> On Wed, 2015-04-01 at 18:04 +0100, Morten Rasmussen wrote:
> > On Wed, Apr 01, 2015 at 07:49:56AM +0100, Preeti U Murthy wrote:
>
> > > I am sorry I don't quite get this. Can you please elaborate?
> >
> > I think the scenario is that we are in
On Thu, 2015-04-02 at 10:17 +0100, Morten Rasmussen wrote:
> On Thu, Apr 02, 2015 at 06:59:07AM +0100, Jason Low wrote:
> > Also, below is an example patch.
> >
> > (Without the conversion to idle_cpu(), the check for rq->idle_balance
> > would not be accurate anymore)
> I think this should redu
On Thu, Apr 02, 2015 at 06:59:07AM +0100, Jason Low wrote:
> On Wed, 2015-04-01 at 18:04 +0100, Morten Rasmussen wrote:
> > On Wed, Apr 01, 2015 at 07:49:56AM +0100, Preeti U Murthy wrote:
>
> > > I am sorry I don't quite get this. Can you please elaborate?
> >
> > I think the scenario is that we
On Thu, Apr 02, 2015 at 04:30:34AM +0100, Jason Low wrote:
> On Wed, 2015-04-01 at 18:04 +0100, Morten Rasmussen wrote:
> > On Wed, Apr 01, 2015 at 07:49:56AM +0100, Preeti U Murthy wrote:
> > >
> > > On 04/01/2015 12:24 AM, Jason Low wrote:
> > > > On Tue, 2015-03-31 at 14:07 +0530, Preeti U Murt
On 04/02/2015 11:29 AM, Jason Low wrote:
> On Wed, 2015-04-01 at 18:04 +0100, Morten Rasmussen wrote:
>> On Wed, Apr 01, 2015 at 07:49:56AM +0100, Preeti U Murthy wrote:
>
>>> I am sorry I don't quite get this. Can you please elaborate?
>>
>> I think the scenario is that we are in nohz_idle_balanc
On Wed, 2015-04-01 at 18:04 +0100, Morten Rasmussen wrote:
> On Wed, Apr 01, 2015 at 07:49:56AM +0100, Preeti U Murthy wrote:
> > I am sorry I don't quite get this. Can you please elaborate?
>
> I think the scenario is that we are in nohz_idle_balance() and decide to
> bail out because we have pu
On Wed, 2015-04-01 at 18:04 +0100, Morten Rasmussen wrote:
> On Wed, Apr 01, 2015 at 07:49:56AM +0100, Preeti U Murthy wrote:
> >
> > On 04/01/2015 12:24 AM, Jason Low wrote:
> > > On Tue, 2015-03-31 at 14:07 +0530, Preeti U Murthy wrote:
> > >> Hi Jason,
> > >>
> > >> On 03/31/2015 12:25 AM, Jaso
Hi Morten,
On 04/01/2015 06:33 PM, Morten Rasmussen wrote:
>> Alright I see. But it is one additional wake up. And the wake up will be
>> within the cluster. We will not wake up any CPU in the neighboring
>> cluster unless there are tasks to be pulled. So, we can wake up a core
>> out of a deep id
On Tue, 2015-03-31 at 14:07 +0530, Preeti U Murthy wrote:
> On 03/31/2015 12:25 AM, Jason Low wrote:
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index fdae26e..ba8ec1a 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -7644,7 +7644,7 @@ static void nohz_i
On Wed, 2015-04-01 at 14:03 +0100, Morten Rasmussen wrote:
Hi Morten,
> > Alright I see. But it is one additional wake up. And the wake up will be
> > within the cluster. We will not wake up any CPU in the neighboring
> > cluster unless there are tasks to be pulled. So, we can wake up a core
> >
On Wed, Apr 01, 2015 at 07:49:56AM +0100, Preeti U Murthy wrote:
>
> On 04/01/2015 12:24 AM, Jason Low wrote:
> > On Tue, 2015-03-31 at 14:07 +0530, Preeti U Murthy wrote:
> >> Hi Jason,
> >>
> >> On 03/31/2015 12:25 AM, Jason Low wrote:
> >>> Hi Preeti,
> >>>
> >>> I noticed that another commit 4
Hi Preeti and Jason,
On Wed, Apr 01, 2015 at 07:28:03AM +0100, Preeti U Murthy wrote:
> On 03/31/2015 11:00 PM, Jason Low wrote:
> > On Tue, 2015-03-31 at 14:28 +0530, Preeti U Murthy wrote:
> >
> >> Morten,
> >
> >> I am a bit confused about the problem you are pointing to.
> >
> >> I am unabl
On 04/01/2015 12:24 AM, Jason Low wrote:
> On Tue, 2015-03-31 at 14:07 +0530, Preeti U Murthy wrote:
>> Hi Jason,
>>
>> On 03/31/2015 12:25 AM, Jason Low wrote:
>>> Hi Preeti,
>>>
>>> I noticed that another commit 4a725627f21d converted the check in
>>> nohz_kick_needed() from idle_cpu() to rq->id
On 03/31/2015 11:00 PM, Jason Low wrote:
> On Tue, 2015-03-31 at 14:28 +0530, Preeti U Murthy wrote:
>
>> Morten,
>
>> I am a bit confused about the problem you are pointing to.
>
>> I am unable to see the issue. What is it that I am missing ?
>
> Hi Preeti,
>
> Here is one of the potential is
On Tue, 2015-03-31 at 14:07 +0530, Preeti U Murthy wrote:
> Hi Jason,
>
> On 03/31/2015 12:25 AM, Jason Low wrote:
> > Hi Preeti,
> >
> > I noticed that another commit 4a725627f21d converted the check in
> > nohz_kick_needed() from idle_cpu() to rq->idle_balance, causing a
> > potentially outdate
On Tue, 2015-03-31 at 14:28 +0530, Preeti U Murthy wrote:
> Morten,
> I am a bit confused about the problem you are pointing to.
> I am unable to see the issue. What is it that I am missing ?
Hi Preeti,
Here is one of the potential issues that have been described from my
understanding.
In sit
On 03/30/2015 05:33 PM, Morten Rasmussen wrote:
> On Mon, Mar 30, 2015 at 12:06:32PM +0100, Peter Zijlstra wrote:
>> On Fri, Mar 27, 2015 at 05:56:51PM +, Morten Rasmussen wrote:
>>
>>> I agree that it is hard to predict how many additional cpus you need,
>>> but I don't think you necessarily n
Hi Jason,
On 03/31/2015 12:25 AM, Jason Low wrote:
> Hi Preeti,
>
> I noticed that another commit 4a725627f21d converted the check in
> nohz_kick_needed() from idle_cpu() to rq->idle_balance, causing a
> potentially outdated value to be used if this cpu is able to pull tasks
> using rebalance_dom
On 03/30/2015 07:15 PM, Vincent Guittot wrote:
> On 26 March 2015 at 14:02, Preeti U Murthy wrote:
>> When a CPU is kicked to do nohz idle balancing, it wakes up to do load
>> balancing on itself, followed by load balancing on behalf of idle CPUs.
>> But it may end up with load after the load bala
Hi Preeti,
I noticed that another commit 4a725627f21d converted the check in
nohz_kick_needed() from idle_cpu() to rq->idle_balance, causing a
potentially outdated value to be used if this cpu is able to pull tasks
using rebalance_domains(), and nohz_kick_needed() directly returning
false.
Would
On Mon, Mar 30, 2015 at 02:29:09PM +0100, Vincent Guittot wrote:
> On 30 March 2015 at 14:24, Peter Zijlstra wrote:
> > On Mon, Mar 30, 2015 at 01:03:03PM +0100, Morten Rasmussen wrote:
> >> On Mon, Mar 30, 2015 at 12:06:32PM +0100, Peter Zijlstra wrote:
> >> > On Fri, Mar 27, 2015 at 05:56:51PM +
On Mon, Mar 30, 2015 at 03:29:09PM +0200, Vincent Guittot wrote:
> On 30 March 2015 at 14:24, Peter Zijlstra wrote:
> > @@ -7647,6 +7648,8 @@ static void nohz_idle_balance(struct rq *this_rq,
> > enum cpu_idle_type idle)
> > break;
> >
> > rq = cpu_rq(balan
On 26 March 2015 at 14:02, Preeti U Murthy wrote:
> When a CPU is kicked to do nohz idle balancing, it wakes up to do load
> balancing on itself, followed by load balancing on behalf of idle CPUs.
> But it may end up with load after the load balancing attempt on itself.
> This aborts nohz idle bal
On 30 March 2015 at 14:24, Peter Zijlstra wrote:
> On Mon, Mar 30, 2015 at 01:03:03PM +0100, Morten Rasmussen wrote:
>> On Mon, Mar 30, 2015 at 12:06:32PM +0100, Peter Zijlstra wrote:
>> > On Fri, Mar 27, 2015 at 05:56:51PM +, Morten Rasmussen wrote:
>> >
>> > > I agree that it is hard to pred
On Mon, Mar 30, 2015 at 02:24:49PM +0200, Peter Zijlstra wrote:
> @@ -7647,6 +7648,8 @@ static void nohz_idle_balance(struct rq *this_rq, enum
> cpu_idle_type idle)
> break;
>
> rq = cpu_rq(balance_cpu);
> + if (rq == this_rq)
> +
On Mon, Mar 30, 2015 at 01:03:03PM +0100, Morten Rasmussen wrote:
> On Mon, Mar 30, 2015 at 12:06:32PM +0100, Peter Zijlstra wrote:
> > On Fri, Mar 27, 2015 at 05:56:51PM +, Morten Rasmussen wrote:
> >
> > > I agree that it is hard to predict how many additional cpus you need,
> > > but I don'
On Mon, Mar 30, 2015 at 12:06:32PM +0100, Peter Zijlstra wrote:
> On Fri, Mar 27, 2015 at 05:56:51PM +, Morten Rasmussen wrote:
>
> > I agree that it is hard to predict how many additional cpus you need,
> > but I don't think you necessarily need that information as long as you
> > start by fi
On Mon, Mar 30, 2015 at 08:26:19AM +0100, Preeti U Murthy wrote:
> Hi Morten,
>
> On 03/27/2015 11:26 PM, Morten Rasmussen wrote:
> >
> > I agree that the current behaviour is undesirable and should be fixed,
> > but IMHO waking up all idle cpus can not be justified. It is only one
> > additional
On Fri, Mar 27, 2015 at 05:56:51PM +, Morten Rasmussen wrote:
> I agree that it is hard to predict how many additional cpus you need,
> but I don't think you necessarily need that information as long as you
> start by filling up the cpu that was kicked to do the
> nohz_idle_balance() first.
>
Hi Morten,
On 03/27/2015 11:26 PM, Morten Rasmussen wrote:
>
> I agree that the current behaviour is undesirable and should be fixed,
> but IMHO waking up all idle cpus can not be justified. It is only one
> additional cpu though with your patch so it isn't quite that bad.
>
> I agree that it is
On Fri, Mar 27, 2015 at 04:46:30PM +, Preeti U Murthy wrote:
> Hi Morten,
>
> On 03/27/2015 08:08 PM, Morten Rasmussen wrote:
> > Hi Preeti,
> >
> > On Thu, Mar 26, 2015 at 01:02:44PM +, Preeti U Murthy wrote:
> >> Fix this, by checking if a CPU was woken up to do nohz idle load
> >> bala
Hi Morten,
On 03/27/2015 08:08 PM, Morten Rasmussen wrote:
> Hi Preeti,
>
> On Thu, Mar 26, 2015 at 01:02:44PM +, Preeti U Murthy wrote:
>> Fix this, by checking if a CPU was woken up to do nohz idle load
>> balancing, before it does load balancing upon itself. This way we allow
>> idle CPUs
Hi Wanpeng, Jason,
On 03/27/2015 10:37 AM, Jason Low wrote:
> On Fri, 2015-03-27 at 10:12 +0800, Wanpeng Li wrote:
>> Hi Preeti,
>> On Thu, Mar 26, 2015 at 06:32:44PM +0530, Preeti U Murthy wrote:
>>>
>>> 1. An ILB CPU was chosen from the first numa domain to trigger nohz idle
>>> load balancing [
Hi Preeti,
On Thu, Mar 26, 2015 at 01:02:44PM +, Preeti U Murthy wrote:
> Fix this, by checking if a CPU was woken up to do nohz idle load
> balancing, before it does load balancing upon itself. This way we allow
> idle CPUs across the system to do load balancing which results in
> quicker spr
> When a CPU is kicked to do nohz idle balancing, it wakes up to do load
> balancing on itself, followed by load balancing on behalf of idle CPUs.
> But it may end up with load after the load balancing attempt on itself.
> This aborts nohz idle balancing. As a result several idle CPUs are left
> wi
balancing in the presence of idle CPUs
When a CPU is kicked to do nohz idle balancing, it wakes up to do load
balancing on itself, followed by load balancing on behalf of idle CPUs.
But it may end up with load after the load balancing attempt on itself.
This aborts nohz idle balancing. As a result
Hi Srikar,
On Fri, Mar 27, 2015 at 11:09:07AM +0530, Srikar Dronamraju wrote:
>
>Yes, the need_resched() in nohz_idle_balance() would exit the
>nohz_idle_balance if it has something to run. However I wonder if we
>should move the need_resched check out of the for loop. i.e the
>need_resched check s
On Thu, Mar 26, 2015 at 10:07:21PM -0700, Jason Low wrote:
>On Fri, 2015-03-27 at 10:12 +0800, Wanpeng Li wrote:
>> Hi Preeti,
>> On Thu, Mar 26, 2015 at 06:32:44PM +0530, Preeti U Murthy wrote:
>> >
>> >1. An ILB CPU was chosen from the first numa domain to trigger nohz idle
>> >load balancing [Gi
* Jason Low [2015-03-26 22:07:21]:
> On Fri, 2015-03-27 at 10:12 +0800, Wanpeng Li wrote:
> > Hi Preeti,
> > On Thu, Mar 26, 2015 at 06:32:44PM +0530, Preeti U Murthy wrote:
> > >
> > >1. An ILB CPU was chosen from the first numa domain to trigger nohz idle
> > >load balancing [Given the experime
On Fri, 2015-03-27 at 10:12 +0800, Wanpeng Li wrote:
> Hi Preeti,
> On Thu, Mar 26, 2015 at 06:32:44PM +0530, Preeti U Murthy wrote:
> >
> >1. An ILB CPU was chosen from the first numa domain to trigger nohz idle
> >load balancing [Given the experiment, upto 6 CPUs per core could be
> >potentially
On Fri, 2015-03-27 at 10:03 +0530, Preeti U Murthy wrote:
> Hi Wanpeng
>
> On 03/27/2015 07:42 AM, Wanpeng Li wrote:
> > Hi Preeti,
> > On Thu, Mar 26, 2015 at 06:32:44PM +0530, Preeti U Murthy wrote:
> >>
> >> 1. An ILB CPU was chosen from the first numa domain to trigger nohz idle
> >> load bala
Hi Preeti,
On Fri, Mar 27, 2015 at 10:03:21AM +0530, Preeti U Murthy wrote:
>is set to CPU_NOT_IDLE.
>
>""
>idle = idle_cpu(cpu) ? CPU_IDLE : CPU_NOT_IDLE;
>
>And,
>
>When nohz_idle_balance() is called, the state of idle of ILB CPU is
>checked before proceeding with load balancing on idle CPUs.
>
>
Hi Wanpeng
On 03/27/2015 07:42 AM, Wanpeng Li wrote:
> Hi Preeti,
> On Thu, Mar 26, 2015 at 06:32:44PM +0530, Preeti U Murthy wrote:
>>
>> 1. An ILB CPU was chosen from the first numa domain to trigger nohz idle
>> load balancing [Given the experiment, upto 6 CPUs per core could be
>> potentially
Hi Preeti,
On Thu, Mar 26, 2015 at 06:32:44PM +0530, Preeti U Murthy wrote:
>
>1. An ILB CPU was chosen from the first numa domain to trigger nohz idle
>load balancing [Given the experiment, upto 6 CPUs per core could be
>potentially idle in this domain.]
>
>2. However the ILB CPU would call load_b
On Thu, 2015-03-26 at 18:32 +0530, Preeti U Murthy wrote:
> kernel/sched/fair.c |8 +---
> 1 file changed, 5 insertions(+), 3 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index bcfe320..8b6d0d5 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @
When a CPU is kicked to do nohz idle balancing, it wakes up to do load
balancing on itself, followed by load balancing on behalf of idle CPUs.
But it may end up with load after the load balancing attempt on itself.
This aborts nohz idle balancing. As a result several idle CPUs are left
without task
On Thu, Mar 26, 2015 at 01:28:33PM +0530, Preeti U Murthy wrote:
> When a CPU is kicked to do nohz idle balancing, it wakes up to do load
> balancing
> on itself, followed by load balancing on behalf of idle CPUs. But it may end
> up with load after the load balancing attempt on itself. This abort
When a CPU is kicked to do nohz idle balancing, it wakes up to do load balancing
on itself, followed by load balancing on behalf of idle CPUs. But it may end
up with load after the load balancing attempt on itself. This aborts nohz
idle balancing. As a result several idle CPUs are left without task
63 matches
Mail list logo