Hi Tejun,
On 4 May 2017 at 19:43, Tejun Heo wrote:
> Hello,
>
> On Thu, May 04, 2017 at 10:19:46AM +0200, Vincent Guittot wrote:
>> > schbench inside a cgroup and have some base load, it is actually
>> > expected to show worse latency. You need to give higher weight to the
>> > cgroup matching t
Hello, Vincent.
On Thu, May 04, 2017 at 09:02:39PM +0200, Vincent Guittot wrote:
> In the trace i have uploaded, you will see that regressions happen
> whereas there is no other runnable threads around so it's not a matter
> of background activities that disturbs schbench
Understood, yeah, I'm al
Hello,
On Thu, May 04, 2017 at 10:19:46AM +0200, Vincent Guittot wrote:
> > schbench inside a cgroup and have some base load, it is actually
> > expected to show worse latency. You need to give higher weight to the
> > cgroup matching the number of active threads (to be accruate, scaled
> > by du
On 3 May 2017 at 23:49, Tejun Heo wrote:
> On Wed, May 03, 2017 at 03:09:38PM +0200, Peter Zijlstra wrote:
>> On Wed, May 03, 2017 at 12:37:37PM +0200, Vincent Guittot wrote:
>> > On 3 May 2017 at 11:37, Peter Zijlstra wrote:
>>
>> > > Of course, it could be I overlooked something, in which case,
On Wed, May 03, 2017 at 03:09:38PM +0200, Peter Zijlstra wrote:
> On Wed, May 03, 2017 at 12:37:37PM +0200, Vincent Guittot wrote:
> > On 3 May 2017 at 11:37, Peter Zijlstra wrote:
>
> > > Of course, it could be I overlooked something, in which case, please
> > > tell :-)
> >
> > That's mainly b
On Wed, May 03, 2017 at 12:37:37PM +0200, Vincent Guittot wrote:
> On 3 May 2017 at 11:37, Peter Zijlstra wrote:
> > Of course, it could be I overlooked something, in which case, please
> > tell :-)
>
> That's mainly based on the regression i see on my platform. I haven't
> find the root cause o
On 3 May 2017 at 11:37, Peter Zijlstra wrote:
>
> On Wed, May 03, 2017 at 09:34:51AM +0200, Vincent Guittot wrote:
>
> > We use load_avg for calculating a stable share and we want to use it
> > more and more. So breaking it because it's easier doesn't seems to be
> > the right way to do IMHO
>
>
On Wed, May 03, 2017 at 09:34:51AM +0200, Vincent Guittot wrote:
> We use load_avg for calculating a stable share and we want to use it
> more and more. So breaking it because it's easier doesn't seems to be
> the right way to do IMHO
So afaict we calculate group se->load.weight (aka shares, see
On 3 May 2017 at 09:25, Vincent Guittot wrote:
> On 2 May 2017 at 22:56, Tejun Heo wrote:
>> Hello, Vincent.
>>
>> On Tue, May 02, 2017 at 08:56:52AM +0200, Vincent Guittot wrote:
>>> On 28 April 2017 at 18:14, Tejun Heo wrote:
>>> > I'll follow up in the other subthread but there really is fund
Hi Tejun,
On 2 May 2017 at 23:50, Tejun Heo wrote:
> Hello,
>
> On Tue, May 02, 2017 at 09:18:53AM +0200, Vincent Guittot wrote:
>> > dbg_odd: odd: dst=28 idle=2 brk=32 lbtgt=0-31 type=2
>> > dbg_odd_dump: A: grp=1,17 w=2 avg=7.247 grp=8.337 sum=8.337 pertask=2.779
>> > dbg_odd_dump: A: gcap=1
On 2 May 2017 at 22:56, Tejun Heo wrote:
> Hello, Vincent.
>
> On Tue, May 02, 2017 at 08:56:52AM +0200, Vincent Guittot wrote:
>> On 28 April 2017 at 18:14, Tejun Heo wrote:
>> > I'll follow up in the other subthread but there really is fundamental
>> > difference in how we calculate runnable_av
Hello, Vincent.
On Tue, May 02, 2017 at 03:26:12PM +0200, Vincent Guittot wrote:
> > IMHO, we should better improve load balance selection. I'm going to
> > add smarter group selection in load_balance. that's something we
> > should have already done but it was difficult without load/util_avg
> >
Hello,
On Mon, May 01, 2017 at 05:56:13PM +0200, Peter Zijlstra wrote:
> On Fri, Apr 28, 2017 at 04:33:47PM -0400, Tejun Heo wrote:
> > I'm attaching the debug patch. With your change (avg instead of
> > runnable_avg), the following trace shows why it's wrong.
>
> Ah, OK. So you really want runn
Hello,
On Tue, May 02, 2017 at 09:18:53AM +0200, Vincent Guittot wrote:
> > dbg_odd: odd: dst=28 idle=2 brk=32 lbtgt=0-31 type=2
> > dbg_odd_dump: A: grp=1,17 w=2 avg=7.247 grp=8.337 sum=8.337 pertask=2.779
> > dbg_odd_dump: A: gcap=1.150 gutil=1.095 run=3 idle=0 gwt=2 type=2 nocap=1
> > dbg_o
Hello, Vincent.
On Tue, May 02, 2017 at 08:56:52AM +0200, Vincent Guittot wrote:
> On 28 April 2017 at 18:14, Tejun Heo wrote:
> > I'll follow up in the other subthread but there really is fundamental
> > difference in how we calculate runnable_avg w/ and w/o cgroups.
> > Indepndent of whether we
Hi Tejun,
Le Tuesday 02 May 2017 à 09:18:53 (+0200), Vincent Guittot a écrit :
> On 28 April 2017 at 22:33, Tejun Heo wrote:
> > Hello, Vincent.
> >
> > On Thu, Apr 27, 2017 at 10:29:10AM +0200, Vincent Guittot wrote:
> >> On 27 April 2017 at 00:52, Tejun Heo wrote:
> >> > Hello,
> >> >
> >> > O
On 28 April 2017 at 22:33, Tejun Heo wrote:
> Hello, Vincent.
>
> On Thu, Apr 27, 2017 at 10:29:10AM +0200, Vincent Guittot wrote:
>> On 27 April 2017 at 00:52, Tejun Heo wrote:
>> > Hello,
>> >
>> > On Wed, Apr 26, 2017 at 08:12:09PM +0200, Vincent Guittot wrote:
>> >> On 24 April 2017 at 22:14,
On 28 April 2017 at 18:14, Tejun Heo wrote:
> Hello, Vincent.
>
>>
>> The only interest of runnable_load_avg is to be null when a cfs_rq is
>> idle whereas load_avg is not but not to be higher than load_avg. The
>> root cause is that load_balance only looks at "load" but not number of
>> task cur
On Fri, Apr 28, 2017 at 04:33:47PM -0400, Tejun Heo wrote:
> I'm attaching the debug patch. With your change (avg instead of
> runnable_avg), the following trace shows why it's wrong.
Ah, OK. So you really want runnable_avg (and I understand why), which is
rather unfortunate, since we have everyt
Here's the debug patch.
The debug condition triggers when the load balancer picks a group w/o
more than one schbench threads on a CPU over one w/.
/sys/module/fair/parameters/dbg_odd_cnt: resettable counter
/sys/module/fair/parameters/dbg_odd_nth: dump group states on Nth
Hello, Vincent.
On Thu, Apr 27, 2017 at 10:29:10AM +0200, Vincent Guittot wrote:
> On 27 April 2017 at 00:52, Tejun Heo wrote:
> > Hello,
> >
> > On Wed, Apr 26, 2017 at 08:12:09PM +0200, Vincent Guittot wrote:
> >> On 24 April 2017 at 22:14, Tejun Heo wrote:
> >> Can the problem be on the load
Hello, Vincent.
On Thu, Apr 27, 2017 at 10:28:01AM +0200, Vincent Guittot wrote:
> On 27 April 2017 at 02:30, Tejun Heo wrote:
> > Hello, Vincent.
> >
> > On Wed, Apr 26, 2017 at 12:21:52PM +0200, Vincent Guittot wrote:
> >> > This is from the follow-up patch. I was confused. Because we don't
>
On 27 April 2017 at 00:52, Tejun Heo wrote:
> Hello,
>
> On Wed, Apr 26, 2017 at 08:12:09PM +0200, Vincent Guittot wrote:
>> On 24 April 2017 at 22:14, Tejun Heo wrote:
>> Can the problem be on the load balance side instead ? and more
>> precisely in the wakeup path ?
>> After looking at the tra
On 27 April 2017 at 02:30, Tejun Heo wrote:
> Hello, Vincent.
>
> On Wed, Apr 26, 2017 at 12:21:52PM +0200, Vincent Guittot wrote:
>> > This is from the follow-up patch. I was confused. Because we don't
>> > propagate decays, we still should decay the runnable_load_avg;
>> > otherwise, we end up
Hello, Vincent.
On Wed, Apr 26, 2017 at 12:21:52PM +0200, Vincent Guittot wrote:
> > This is from the follow-up patch. I was confused. Because we don't
> > propagate decays, we still should decay the runnable_load_avg;
> > otherwise, we end up accumulating errors in the counter. I'll drop
> > t
Hello,
On Wed, Apr 26, 2017 at 08:12:09PM +0200, Vincent Guittot wrote:
> On 24 April 2017 at 22:14, Tejun Heo wrote:
> Can the problem be on the load balance side instead ? and more
> precisely in the wakeup path ?
> After looking at the trace, it seems that task placement happens at
> wake up
On 24 April 2017 at 22:14, Tejun Heo wrote:
> We noticed that with cgroup CPU controller in use, the scheduling
>
> Note the drastic increase in p99 scheduling latency. After
> investigation, it turned out that the update_sd_lb_stats(), which is
> used by load_balance() to pick the most loaded gr
On 25 April 2017 at 23:08, Tejun Heo wrote:
> On Tue, Apr 25, 2017 at 11:49:41AM -0700, Tejun Heo wrote:
>> > I have run a quick test with your patches and schbench on my platform.
>> > I haven't been able to reproduce your regression but my platform is
>> > quite different from yours (only 8 core
On 04/25/2017 04:49 PM, Tejun Heo wrote:
On Tue, Apr 25, 2017 at 11:49:41AM -0700, Tejun Heo wrote:
Will try that too. I can't see why HT would change it because I see
single CPU queues misevaluated. Just in case, you need to tune the
test params so that it doesn't load the machine too much
On Tue, Apr 25, 2017 at 11:49:41AM -0700, Tejun Heo wrote:
> > I have run a quick test with your patches and schbench on my platform.
> > I haven't been able to reproduce your regression but my platform is
> > quite different from yours (only 8 cores without SMT)
> > But most importantly, the paren
On Tue, Apr 25, 2017 at 11:49:41AM -0700, Tejun Heo wrote:
> Will try that too. I can't see why HT would change it because I see
> single CPU queues misevaluated. Just in case, you need to tune the
> test params so that it doesn't load the machine too much and that
> there are some non-CPU intens
Hello,
On Tue, Apr 25, 2017 at 02:59:18PM +0200, Vincent Guittot wrote:
> >> So you are changing the purpose of propagate_entity_load_avg which
> >> aims to propagate load_avg/util_avg changes only when a task migrate
> >> and you also want to propagate the enqueue/dequeue in the parent
> >> cfs_r
On 25 April 2017 at 11:05, Vincent Guittot wrote:
> On 25 April 2017 at 10:46, Vincent Guittot wrote:
>> On 24 April 2017 at 22:14, Tejun Heo wrote:
>>> We noticed that with cgroup CPU controller in use, the scheduling
>>> latency gets wonky regardless of nesting level or weight
>>> configuratio
On 25 April 2017 at 10:46, Vincent Guittot wrote:
> On 24 April 2017 at 22:14, Tejun Heo wrote:
>> We noticed that with cgroup CPU controller in use, the scheduling
>> latency gets wonky regardless of nesting level or weight
>> configuration. This is easily reproducible with Chris Mason's
>> sch
On 24 April 2017 at 22:14, Tejun Heo wrote:
> We noticed that with cgroup CPU controller in use, the scheduling
> latency gets wonky regardless of nesting level or weight
> configuration. This is easily reproducible with Chris Mason's
> schbench[1].
>
> All tests are run on a single socket, 16 co
We noticed that with cgroup CPU controller in use, the scheduling
latency gets wonky regardless of nesting level or weight
configuration. This is easily reproducible with Chris Mason's
schbench[1].
All tests are run on a single socket, 16 cores, 32 threads machine.
While the machine is mostly idl
36 matches
Mail list logo