Re: [PATCH] sched: fix infinity loop in update_blocked_averages

2018-12-30 Thread Ingo Molnar
* Vincent Guittot wrote: > On Sun, 30 Dec 2018 at 13:04, Ingo Molnar wrote: > > > > > > * Vincent Guittot wrote: > > > > > > Reported-by: Zhipeng Xie > > > > Cc: Bin Li > > > > Cc: [4.10+] > > > > Fixes: 9c2791f936ef (sched/fair: Fix hierarchical order in > > > > rq->leaf_cfs_rq_li

Re: [PATCH] sched: fix infinity loop in update_blocked_averages

2018-12-30 Thread Vincent Guittot
On Sun, 30 Dec 2018 at 13:04, Ingo Molnar wrote: > > > * Vincent Guittot wrote: > > > > Reported-by: Zhipeng Xie > > > Cc: Bin Li > > > Cc: [4.10+] > > > Fixes: 9c2791f936ef (sched/fair: Fix hierarchical order in > > > rq->leaf_cfs_rq_list) > > > > If it only happens in update_blocked_

Re: [PATCH] sched: fix infinity loop in update_blocked_averages

2018-12-30 Thread Ingo Molnar
* Vincent Guittot wrote: > > Reported-by: Zhipeng Xie > > Cc: Bin Li > > Cc: [4.10+] > > Fixes: 9c2791f936ef (sched/fair: Fix hierarchical order in > > rq->leaf_cfs_rq_list) > > If it only happens in update_blocked_averages(), the del leaf has been added > by: > a9e7f6544b9c (sched

Re: [PATCH] sched: fix infinity loop in update_blocked_averages

2018-12-28 Thread Vincent Guittot
On Fri, 28 Dec 2018 at 18:46, Tejun Heo wrote: > > On Fri, Dec 28, 2018 at 06:25:37PM +0100, Vincent Guittot wrote: > > > done without extra space as long as each node has the parent pointer, > > > which they do. Is the dedicated list an optimization? > > > > It prevents to parse and walk all tas

Re: [PATCH] sched: fix infinity loop in update_blocked_averages

2018-12-28 Thread Tejun Heo
On Fri, Dec 28, 2018 at 06:25:37PM +0100, Vincent Guittot wrote: > > done without extra space as long as each node has the parent pointer, > > which they do. Is the dedicated list an optimization? > > It prevents to parse and walk all task group struct every time. > Instead, you just have to foll

Re: [PATCH] sched: fix infinity loop in update_blocked_averages

2018-12-28 Thread Vincent Guittot
On Fri, 28 Dec 2018 at 17:54, Tejun Heo wrote: > > Hello, > > On Fri, Dec 28, 2018 at 10:30:07AM +0100, Vincent Guittot wrote: > > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > > > index d1907506318a..88b9118b5191 100644 > > > --- a/kernel/sched/fair.c > > > +++ b/kernel/sched/fair.c

Re: [PATCH] sched: fix infinity loop in update_blocked_averages

2018-12-28 Thread Tejun Heo
Hello, On Fri, Dec 28, 2018 at 10:30:07AM +0100, Vincent Guittot wrote: > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > > index d1907506318a..88b9118b5191 100644 > > --- a/kernel/sched/fair.c > > +++ b/kernel/sched/fair.c > > @@ -7698,7 +7698,8 @@ static void update_blocked_averages(i

Re: [PATCH] sched: fix infinity loop in update_blocked_averages

2018-12-28 Thread Sargun Dhillon
> > But the lock should not be released during the build of a branch and > tmp_alone_branch must always points to rq->leaf_cfs_rq_list at the end > and before the lock is released > > I think that there is a bigger problem with commit a9e7f6544b9c and > cfs_rq throttling: > Let take the example of

Re: [PATCH] sched: fix infinity loop in update_blocked_averages

2018-12-28 Thread Xiezhipeng (EulerOS)
Hi Tejun, On Fri, Dec 28, 2018 10:03 AM, Tejun Heo wrote: > > On Thu, Dec 27, 2018 at 05:53:52PM -0800, Tejun Heo wrote: > > Vincent knows that part way better than me but I think the safest way > > would be doing the optimization removal iff tmp_alone_branch is > > already pointing to leaf_cfs_r

Re: [PATCH] sched: fix infinity loop in update_blocked_averages

2018-12-28 Thread Vincent Guittot
On Fri, 28 Dec 2018 at 03:02, Tejun Heo wrote: > > On Thu, Dec 27, 2018 at 05:53:52PM -0800, Tejun Heo wrote: > > Vincent knows that part way better than me but I think the safest way > > would be doing the optimization removal iff tmp_alone_branch is > > already pointing to leaf_cfs_rq_list. IIU

Re: [PATCH] sched: fix infinity loop in update_blocked_averages

2018-12-27 Thread Sargun Dhillon
On Thu, Dec 27, 2018 at 9:02 PM Tejun Heo wrote: > > On Thu, Dec 27, 2018 at 05:53:52PM -0800, Tejun Heo wrote: > > Vincent knows that part way better than me but I think the safest way > > would be doing the optimization removal iff tmp_alone_branch is > > already pointing to leaf_cfs_rq_list. I

Re: [PATCH] sched: fix infinity loop in update_blocked_averages

2018-12-27 Thread Xie XiuQi
Hi Tejun, On 2018/12/28 10:02, Tejun Heo wrote: > On Thu, Dec 27, 2018 at 05:53:52PM -0800, Tejun Heo wrote: >> Vincent knows that part way better than me but I think the safest way >> would be doing the optimization removal iff tmp_alone_branch is >> already pointing to leaf_cfs_rq_list. IIUC, i

Re: [PATCH] sched: fix infinity loop in update_blocked_averages

2018-12-27 Thread Tejun Heo
On Thu, Dec 27, 2018 at 05:53:52PM -0800, Tejun Heo wrote: > Vincent knows that part way better than me but I think the safest way > would be doing the optimization removal iff tmp_alone_branch is > already pointing to leaf_cfs_rq_list. IIUC, it's pointing to > something else only while a branch i

Re: [PATCH] sched: fix infinity loop in update_blocked_averages

2018-12-27 Thread Tejun Heo
Hello, On Thu, Dec 27, 2018 at 05:36:47PM -0800, Linus Torvalds wrote: > > Unless I'm totally confused, which is definitely possible, I don't > > think there's a race condition and the only bug is the > > tmp_alone_branch pointer getting dangled, which maybe doesn't happen > > all that much? > >

Re: [PATCH] sched: fix infinity loop in update_blocked_averages

2018-12-27 Thread Linus Torvalds
On Thu, Dec 27, 2018 at 5:15 PM Tejun Heo wrote: > > I'm pretty sure enqueue_entity() *has* to be called with rq lock. > unthrottle_cfs_rq() is called from tg_set_cfs_bandwidth(), > distribute_cfs_runtime() and unthrottle_offline_cfs_rqs. The first > two grabs the rq_lock just around the calls an

Re: [PATCH] sched: fix infinity loop in update_blocked_averages

2018-12-27 Thread Tejun Heo
Happy holidays, everyone. (cc'ing Rik, who has been looking at the scheduler code a lot lately) On Thu, Dec 27, 2018 at 10:15:17AM -0800, Linus Torvalds wrote: > [ goes off and looks ] > > Oh. unthrottle_cfs_rq -> enqueue_entity -> list_add_leaf_cfs_rq() > doesn't actually seem to hold the rq lo

Re: [PATCH] sched: fix infinity loop in update_blocked_averages

2018-12-27 Thread Linus Torvalds
On Thu, Dec 27, 2018 at 1:09 PM Sargun Dhillon wrote: > > This appears to be broken since October on 4.18.5. We've only noticed > it recently with a workload which does ridiculously parallel compiles > in cgroups that are rapidly churned. Yeah, that's probably unusual enough that people will have

Re: [PATCH] sched: fix infinity loop in update_blocked_averages

2018-12-27 Thread Sargun Dhillon
On Thu, Dec 27, 2018 at 1:15 PM Linus Torvalds wrote: > > On Thu, Dec 27, 2018 at 9:02 AM Vincent Guittot > wrote: > > > > In the original behavior, the cs_rq was removed from the list only > > when the cgroup was removed. > > patch a9e7f6544b9c (sched/fair: Fix O(nr_cgroups) in load balance > >

Re: [PATCH] sched: fix infinity loop in update_blocked_averages

2018-12-27 Thread Linus Torvalds
On Thu, Dec 27, 2018 at 9:02 AM Vincent Guittot wrote: > > In the original behavior, the cs_rq was removed from the list only > when the cgroup was removed. > patch a9e7f6544b9c (sched/fair: Fix O(nr_cgroups) in load balance > path) has added an optimization which remove the cfs_rq when there > we

Re: [PATCH] sched: fix infinity loop in update_blocked_averages

2018-12-27 Thread Vincent Guittot
On Thu, 27 Dec 2018 at 17:40, Sargun Dhillon wrote: > > On Thu, Dec 27, 2018 at 5:23 AM Vincent Guittot > wrote: > > > > Adding Sargun and Dimitry who faced similar problem > > Adding Tejun > > > > On Thu, 27 Dec 2018 at 11:21, Vincent Guittot > > wrote: > > > > > > Le Thursday 27 Dec 2018 à 10:

Re: [PATCH] sched: fix infinity loop in update_blocked_averages

2018-12-27 Thread Sargun Dhillon
On Thu, Dec 27, 2018 at 5:23 AM Vincent Guittot wrote: > > Adding Sargun and Dimitry who faced similar problem > Adding Tejun > > On Thu, 27 Dec 2018 at 11:21, Vincent Guittot > wrote: > > > > Le Thursday 27 Dec 2018 à 10:21:53 (+0100), Vincent Guittot a écrit : > > > Hi Xie, > > > > > > On Thu,

Re: [PATCH] sched: fix infinity loop in update_blocked_averages

2018-12-27 Thread Vincent Guittot
Adding Sargun and Dimitry who faced similar problem Adding Tejun On Thu, 27 Dec 2018 at 11:21, Vincent Guittot wrote: > > Le Thursday 27 Dec 2018 à 10:21:53 (+0100), Vincent Guittot a écrit : > > Hi Xie, > > > > On Thu, 27 Dec 2018 at 03:57, Xie XiuQi wrote: > > > > > > Zhepeng Xie report a bug,

Re: [PATCH] sched: fix infinity loop in update_blocked_averages

2018-12-27 Thread Vincent Guittot
Le Thursday 27 Dec 2018 à 10:21:53 (+0100), Vincent Guittot a écrit : > Hi Xie, > > On Thu, 27 Dec 2018 at 03:57, Xie XiuQi wrote: > > > > Zhepeng Xie report a bug, there is a infinity loop in > > update_blocked_averages(). > > > > PID: 14233 TASK: 800b2de08fc0 CPU: 1 COMMAND: "docker" >

Re: [PATCH] sched: fix infinity loop in update_blocked_averages

2018-12-27 Thread Vincent Guittot
Hi Xie, On Thu, 27 Dec 2018 at 03:57, Xie XiuQi wrote: > > Zhepeng Xie report a bug, there is a infinity loop in > update_blocked_averages(). > > PID: 14233 TASK: 800b2de08fc0 CPU: 1 COMMAND: "docker" > #0 [2213b9d0] update_blocked_averages at 0811e4a8 > #1 [2213

[PATCH] sched: fix infinity loop in update_blocked_averages

2018-12-26 Thread Xie XiuQi
Zhepeng Xie report a bug, there is a infinity loop in update_blocked_averages(). PID: 14233 TASK: 800b2de08fc0 CPU: 1 COMMAND: "docker" #0 [2213b9d0] update_blocked_averages at 0811e4a8 #1 [2213ba60] pick_next_task_fair at 0812a3b4 #2 [2213baf0] _