Re: [PATCH] mm, vmscan: do not loop on too_many_isolated for ever

2017-07-24 Thread Tetsuo Handa
Hugh Dickins wrote: > On Thu, 20 Jul 2017, Tetsuo Handa wrote: > > Hugh Dickins wrote: > > > You probably won't welcome getting into alternatives at this late stage; > > > but after hacking around it one way or another because of its pointless > > > lockups, I lost patience with that too_many_isola

Re: [PATCH] mm, vmscan: do not loop on too_many_isolated for ever

2017-07-24 Thread Hugh Dickins
On Thu, 20 Jul 2017, Michal Hocko wrote: > On Wed 19-07-17 18:54:40, Hugh Dickins wrote: > [...] > > You probably won't welcome getting into alternatives at this late stage; > > but after hacking around it one way or another because of its pointless > > lockups, I lost patience with that too_many_i

Re: [PATCH] mm, vmscan: do not loop on too_many_isolated for ever

2017-07-24 Thread Hugh Dickins
On Thu, 20 Jul 2017, Tetsuo Handa wrote: > Hugh Dickins wrote: > > You probably won't welcome getting into alternatives at this late stage; > > but after hacking around it one way or another because of its pointless > > lockups, I lost patience with that too_many_isolated() loop a few months > > ba

Re: [PATCH] mm, vmscan: do not loop on too_many_isolated for ever

2017-07-23 Thread Michal Hocko
On Fri 21-07-17 16:01:04, Andrew Morton wrote: > On Thu, 20 Jul 2017 08:56:26 +0200 Michal Hocko wrote: > > > > > > --- a/mm/vmscan.c > > > > +++ b/mm/vmscan.c > > > > @@ -1713,9 +1713,15 @@ shrink_inactive_list(unsigned long nr_to_scan, > > > > struct lruvec *lruvec, > > > > int file =

Re: [PATCH] mm, vmscan: do not loop on too_many_isolated for ever

2017-07-21 Thread Andrew Morton
On Thu, 20 Jul 2017 08:56:26 +0200 Michal Hocko wrote: > > > > --- a/mm/vmscan.c > > > +++ b/mm/vmscan.c > > > @@ -1713,9 +1713,15 @@ shrink_inactive_list(unsigned long nr_to_scan, > > > struct lruvec *lruvec, > > > int file = is_file_lru(lru); > > > struct pglist_data *pgdat = lruvec_pgdat(

Re: [PATCH] mm, vmscan: do not loop on too_many_isolated for ever

2017-07-20 Thread Michal Hocko
On Wed 19-07-17 18:54:40, Hugh Dickins wrote: [...] > You probably won't welcome getting into alternatives at this late stage; > but after hacking around it one way or another because of its pointless > lockups, I lost patience with that too_many_isolated() loop a few months > back (on realizing th

Re: [PATCH] mm, vmscan: do not loop on too_many_isolated for ever

2017-07-20 Thread Tetsuo Handa
Hugh Dickins wrote: > You probably won't welcome getting into alternatives at this late stage; > but after hacking around it one way or another because of its pointless > lockups, I lost patience with that too_many_isolated() loop a few months > back (on realizing the enormous number of pages that

Re: [PATCH] mm, vmscan: do not loop on too_many_isolated for ever

2017-07-19 Thread Michal Hocko
On Wed 19-07-17 15:20:14, Andrew Morton wrote: > On Mon, 10 Jul 2017 09:48:42 +0200 Michal Hocko wrote: > > > From: Michal Hocko > > > > Tetsuo Handa has reported [1][2][3]that direct reclaimers might get stuck > > in too_many_isolated loop basically for ever because the last few pages > > on t

Re: [PATCH] mm, vmscan: do not loop on too_many_isolated for ever

2017-07-19 Thread Hugh Dickins
On Mon, 10 Jul 2017, Michal Hocko wrote: > From: Michal Hocko > > Tetsuo Handa has reported [1][2][3]that direct reclaimers might get stuck > in too_many_isolated loop basically for ever because the last few pages > on the LRU lists are isolated by the kswapd which is stuck on fs locks > when do

Re: [PATCH] mm, vmscan: do not loop on too_many_isolated for ever

2017-07-19 Thread Andrew Morton
On Mon, 10 Jul 2017 09:48:42 +0200 Michal Hocko wrote: > From: Michal Hocko > > Tetsuo Handa has reported [1][2][3]that direct reclaimers might get stuck > in too_many_isolated loop basically for ever because the last few pages > on the LRU lists are isolated by the kswapd which is stuck on fs

Re: [PATCH] mm, vmscan: do not loop on too_many_isolated for ever

2017-07-10 Thread Michal Hocko
On Mon 10-07-17 12:58:59, Johannes Weiner wrote: > On Mon, Jul 10, 2017 at 09:58:03AM -0400, Rik van Riel wrote: > > On Mon, 2017-07-10 at 09:48 +0200, Michal Hocko wrote: > > > > > Johannes and Rik had some concerns that this could lead to premature > > > OOM kills. I agree with them that we need

Re: [PATCH] mm, vmscan: do not loop on too_many_isolated for ever

2017-07-10 Thread Johannes Weiner
On Mon, Jul 10, 2017 at 09:58:03AM -0400, Rik van Riel wrote: > On Mon, 2017-07-10 at 09:48 +0200, Michal Hocko wrote: > > > Johannes and Rik had some concerns that this could lead to premature > > OOM kills. I agree with them that we need a better throttling > > mechanism. Until now we didn't giv

Re: [PATCH] mm, vmscan: do not loop on too_many_isolated for ever

2017-07-10 Thread Rik van Riel
On Mon, 2017-07-10 at 09:48 +0200, Michal Hocko wrote: > Johannes and Rik had some concerns that this could lead to premature > OOM kills. I agree with them that we need a better throttling > mechanism. Until now we didn't give the issue described above a high > priority because it usually require

Re: [PATCH] mm, vmscan: do not loop on too_many_isolated for ever

2017-07-10 Thread Vlastimil Babka
On 07/10/2017 09:48 AM, Michal Hocko wrote: > From: Michal Hocko > > Tetsuo Handa has reported [1][2][3]that direct reclaimers might get stuck > in too_many_isolated loop basically for ever because the last few pages > on the LRU lists are isolated by the kswapd which is stuck on fs locks > when

[PATCH] mm, vmscan: do not loop on too_many_isolated for ever

2017-07-10 Thread Michal Hocko
From: Michal Hocko Tetsuo Handa has reported [1][2][3]that direct reclaimers might get stuck in too_many_isolated loop basically for ever because the last few pages on the LRU lists are isolated by the kswapd which is stuck on fs locks when doing the pageout or slab reclaim. This in turn means th

Re: [PATCH] mm, vmscan: do not loop on too_many_isolated for ever

2017-07-06 Thread Tetsuo Handa
Michal Hocko wrote: > On Sat 01-07-17 20:43:56, Tetsuo Handa wrote: > > Michal Hocko wrote: > [...] > > > It is really hard to pursue this half solution when there is no clear > > > indication it helps in your testing. So could you try to test with only > > > this patch on top of the current linux-

Re: [PATCH] mm, vmscan: do not loop on too_many_isolated for ever

2017-07-05 Thread Michal Hocko
On Sat 01-07-17 20:43:56, Tetsuo Handa wrote: > Michal Hocko wrote: [...] > > It is really hard to pursue this half solution when there is no clear > > indication it helps in your testing. So could you try to test with only > > this patch on top of the current linux-next tree (or Linus tree) and se

Re: [PATCH] mm, vmscan: do not loop on too_many_isolated for ever

2017-07-05 Thread Michal Hocko
[this is getting tangent again and I will not respond any further if this turn into yet another flame] On Sat 01-07-17 20:43:56, Tetsuo Handa wrote: > Michal Hocko wrote: > > I really do appreciate your testing because it uncovers corner cases > > most people do not test for and we can actually ma

Re: [PATCH] mm, vmscan: do not loop on too_many_isolated for ever

2017-07-01 Thread Tetsuo Handa
Michal Hocko wrote: > I really do appreciate your testing because it uncovers corner cases > most people do not test for and we can actually make the code better in > the end. That statement does not get to my heart at all. Collision between your approach and my approach is wasting both your time

Re: [PATCH] mm, vmscan: do not loop on too_many_isolated for ever

2017-06-30 Thread Michal Hocko
On Sat 01-07-17 00:59:56, Tetsuo Handa wrote: > Michal Hocko wrote: > > On Fri 30-06-17 09:14:22, Tetsuo Handa wrote: > > [...] > > > Ping? Ping? When are we going to apply this patch or watchdog patch? > > > This problem occurs with not so insane stress like shown below. > > > I can't test almost

Re: [PATCH] mm, vmscan: do not loop on too_many_isolated for ever

2017-06-30 Thread Tetsuo Handa
Michal Hocko wrote: > On Fri 30-06-17 09:14:22, Tetsuo Handa wrote: > [...] > > Ping? Ping? When are we going to apply this patch or watchdog patch? > > This problem occurs with not so insane stress like shown below. > > I can't test almost OOM situation because test likely falls into either > > pr

Re: [PATCH] mm, vmscan: do not loop on too_many_isolated for ever

2017-06-30 Thread Michal Hocko
On Fri 30-06-17 09:14:22, Tetsuo Handa wrote: [...] > Ping? Ping? When are we going to apply this patch or watchdog patch? > This problem occurs with not so insane stress like shown below. > I can't test almost OOM situation because test likely falls into either > printk() v.s. oom_lock lockup prob

Re: [PATCH] mm, vmscan: do not loop on too_many_isolated for ever

2017-06-29 Thread Tetsuo Handa
Tetsuo Handa wrote: > Michal Hocko wrote: > > On Thu 09-03-17 13:05:40, Johannes Weiner wrote: > > > On Tue, Mar 07, 2017 at 02:52:36PM -0500, Rik van Riel wrote: > > > > It only does this to some extent. If reclaim made > > > > no progress, for example due to immediately bailing > > > > out becau

Re: [PATCH] mm, vmscan: do not loop on too_many_isolated for ever

2017-04-24 Thread Stanislaw Gruszka
On Mon, Apr 24, 2017 at 10:06:32PM +0900, Tetsuo Handa wrote: > Stanislaw Gruszka wrote: > > On Sun, Apr 23, 2017 at 07:24:21PM +0900, Tetsuo Handa wrote: > > > On 2017/03/10 20:44, Tetsuo Handa wrote: > > > > Michal Hocko wrote: > > > >> I am definitely not against. There is no reason to rush the

Re: [PATCH] mm, vmscan: do not loop on too_many_isolated for ever

2017-04-24 Thread Tetsuo Handa
Stanislaw Gruszka wrote: > On Sun, Apr 23, 2017 at 07:24:21PM +0900, Tetsuo Handa wrote: > > On 2017/03/10 20:44, Tetsuo Handa wrote: > > > Michal Hocko wrote: > > >> I am definitely not against. There is no reason to rush the patch in. > > > > > > I don't hurry if we can check using watchdog whet

Re: [PATCH] mm, vmscan: do not loop on too_many_isolated for ever

2017-04-24 Thread Stanislaw Gruszka
On Sun, Apr 23, 2017 at 07:24:21PM +0900, Tetsuo Handa wrote: > On 2017/03/10 20:44, Tetsuo Handa wrote: > > Michal Hocko wrote: > >> On Thu 09-03-17 13:05:40, Johannes Weiner wrote: > >>> On Tue, Mar 07, 2017 at 02:52:36PM -0500, Rik van Riel wrote: > It only does this to some extent. If rec

Re: [PATCH] mm, vmscan: do not loop on too_many_isolated for ever

2017-04-23 Thread Tetsuo Handa
On 2017/03/10 20:44, Tetsuo Handa wrote: > Michal Hocko wrote: >> On Thu 09-03-17 13:05:40, Johannes Weiner wrote: >>> On Tue, Mar 07, 2017 at 02:52:36PM -0500, Rik van Riel wrote: It only does this to some extent. If reclaim made no progress, for example due to immediately bailing

Re: [PATCH] mm, vmscan: do not loop on too_many_isolated for ever

2017-03-21 Thread Tetsuo Handa
On 2017/03/10 20:44, Tetsuo Handa wrote: > Michal Hocko wrote: >> On Thu 09-03-17 13:05:40, Johannes Weiner wrote: It may be OK, I just do not understand all the implications. I like the general direction your patch takes the code in, but I would like to understand it better...

Re: [PATCH] mm, vmscan: do not loop on too_many_isolated for ever

2017-03-10 Thread Tetsuo Handa
Michal Hocko wrote: > On Thu 09-03-17 13:05:40, Johannes Weiner wrote: > > On Tue, Mar 07, 2017 at 02:52:36PM -0500, Rik van Riel wrote: > > > It only does this to some extent. If reclaim made > > > no progress, for example due to immediately bailing > > > out because the number of already isolate

Re: [PATCH] mm, vmscan: do not loop on too_many_isolated for ever

2017-03-10 Thread Michal Hocko
On Thu 09-03-17 17:18:00, Rik van Riel wrote: > On Thu, 2017-03-09 at 13:05 -0500, Johannes Weiner wrote: > > On Tue, Mar 07, 2017 at 02:52:36PM -0500, Rik van Riel wrote: > > > > > > It only does this to some extent.  If reclaim made > > > no progress, for example due to immediately bailing > > >

Re: [PATCH] mm, vmscan: do not loop on too_many_isolated for ever

2017-03-10 Thread Michal Hocko
On Thu 09-03-17 13:05:40, Johannes Weiner wrote: > On Tue, Mar 07, 2017 at 02:52:36PM -0500, Rik van Riel wrote: > > It only does this to some extent.  If reclaim made > > no progress, for example due to immediately bailing > > out because the number of already isolated pages is > > too high (due t

Re: [PATCH] mm, vmscan: do not loop on too_many_isolated for ever

2017-03-09 Thread Rik van Riel
On Thu, 2017-03-09 at 13:05 -0500, Johannes Weiner wrote: > On Tue, Mar 07, 2017 at 02:52:36PM -0500, Rik van Riel wrote: > > > > It only does this to some extent.  If reclaim made > > no progress, for example due to immediately bailing > > out because the number of already isolated pages is > > t

Re: [PATCH] mm, vmscan: do not loop on too_many_isolated for ever

2017-03-09 Thread Johannes Weiner
On Tue, Mar 07, 2017 at 02:52:36PM -0500, Rik van Riel wrote: > It only does this to some extent.  If reclaim made > no progress, for example due to immediately bailing > out because the number of already isolated pages is > too high (due to many parallel reclaimers), the code > could hit the "no_p

Re: [PATCH] mm, vmscan: do not loop on too_many_isolated for ever

2017-03-09 Thread Michal Hocko
On Thu 09-03-17 09:16:25, Rik van Riel wrote: > On Thu, 2017-03-09 at 10:12 +0100, Michal Hocko wrote: > > On Wed 08-03-17 10:54:57, Rik van Riel wrote: > > > > In fact, false OOM kills with that kind of workload is > > > how we ended up getting the "too many isolated" logic > > > in the first pla

Re: [PATCH] mm, vmscan: do not loop on too_many_isolated for ever

2017-03-09 Thread Mel Gorman
On Tue, Mar 07, 2017 at 02:30:57PM +0100, Michal Hocko wrote: > From: Michal Hocko > > Tetsuo Handa has reported [1][2] that direct reclaimers might get stuck > in too_many_isolated loop basically for ever because the last few pages > on the LRU lists are isolated by the kswapd which is stuck on

Re: [PATCH] mm, vmscan: do not loop on too_many_isolated for ever

2017-03-09 Thread Rik van Riel
On Thu, 2017-03-09 at 10:12 +0100, Michal Hocko wrote: > On Wed 08-03-17 10:54:57, Rik van Riel wrote: > > In fact, false OOM kills with that kind of workload is > > how we ended up getting the "too many isolated" logic > > in the first place. > Right, but the retry logic was considerably differen

Re: [PATCH] mm, vmscan: do not loop on too_many_isolated for ever

2017-03-09 Thread Michal Hocko
On Wed 08-03-17 10:54:57, Rik van Riel wrote: > On Wed, 2017-03-08 at 10:21 +0100, Michal Hocko wrote: > > > > Could that create problems if we have many concurrent > > > reclaimers? > > > > As the changelog mentions it might cause a premature oom killer > > invocation theoretically. We could eas

Re: [PATCH] mm, vmscan: do not loop on too_many_isolated for ever

2017-03-08 Thread Rik van Riel
On Wed, 2017-03-08 at 10:21 +0100, Michal Hocko wrote: > > Could that create problems if we have many concurrent > > reclaimers? > > As the changelog mentions it might cause a premature oom killer > invocation theoretically. We could easily see that from the oom > report > by checking isolated co

Re: [PATCH] mm, vmscan: do not loop on too_many_isolated for ever

2017-03-08 Thread Michal Hocko
On Tue 07-03-17 14:52:36, Rik van Riel wrote: > On Tue, 2017-03-07 at 14:30 +0100, Michal Hocko wrote: > > From: Michal Hocko > > > > Tetsuo Handa has reported [1][2] that direct reclaimers might get > > stuck > > in too_many_isolated loop basically for ever because the last few > > pages > > on

Re: [PATCH] mm, vmscan: do not loop on too_many_isolated for ever

2017-03-07 Thread Rik van Riel
On Tue, 2017-03-07 at 14:30 +0100, Michal Hocko wrote: > From: Michal Hocko > > Tetsuo Handa has reported [1][2] that direct reclaimers might get > stuck > in too_many_isolated loop basically for ever because the last few > pages > on the LRU lists are isolated by the kswapd which is stuck on fs

[PATCH] mm, vmscan: do not loop on too_many_isolated for ever

2017-03-07 Thread Michal Hocko
From: Michal Hocko Tetsuo Handa has reported [1][2] that direct reclaimers might get stuck in too_many_isolated loop basically for ever because the last few pages on the LRU lists are isolated by the kswapd which is stuck on fs locks when doing the pageout or slab reclaim. This in turn means that