On Fri, Jan 20, 2017 at 02:42:24PM +0800, Hillf Danton wrote:
> > @@ -1603,16 +1603,16 @@ int isolate_lru_page(struct page *page)
> >   * the LRU list will go small and be scanned faster than necessary, 
> > leading to
> >   * unnecessary swapping, thrashing and OOM.
> >   */
> > -static int too_many_isolated(struct pglist_data *pgdat, int file,
> > +static bool safe_to_isolate(struct pglist_data *pgdat, int file,
> >             struct scan_control *sc)
> 
> I prefer the current function name.
> 

The restructure is to work with the workqueue api.

> >  {
> >     unsigned long inactive, isolated;
> > 
> >     if (current_is_kswapd())
> > -           return 0;
> > +           return true;
> > 
> > -   if (!sane_reclaim(sc))
> > -           return 0;
> > +   if (sane_reclaim(sc))
> > +           return true;
> 
> We only need a one-line change.

It's bool so the conversion is made to bool while it's being changed
anyway.

> > 
> >     if (file) {
> >             inactive = node_page_state(pgdat, NR_INACTIVE_FILE);
> > @@ -1630,7 +1630,7 @@ static int too_many_isolated(struct pglist_data 
> > *pgdat, int file,
> >     if ((sc->gfp_mask & (__GFP_IO | __GFP_FS)) == (__GFP_IO | __GFP_FS))
> >             inactive >>= 3;
> > 
> > -   return isolated > inactive;
> > +   return isolated < inactive;
> >  }
> > 
> >  static noinline_for_stack void
> > @@ -1719,12 +1719,28 @@ shrink_inactive_list(unsigned long nr_to_scan, 
> > struct lruvec *lruvec,
> >     struct pglist_data *pgdat = lruvec_pgdat(lruvec);
> >     struct zone_reclaim_stat *reclaim_stat = &lruvec->reclaim_stat;
> > 
> > -   while (unlikely(too_many_isolated(pgdat, file, sc))) {
> > -           congestion_wait(BLK_RW_ASYNC, HZ/10);
> > +   while (!safe_to_isolate(pgdat, file, sc)) {
> > +           long ret;
> > +
> > +           ret = wait_event_interruptible_timeout(pgdat->isolated_wait,
> > +                   safe_to_isolate(pgdat, file, sc), HZ/10);
> > 
> >             /* We are about to die and free our memory. Return now. */
> > -           if (fatal_signal_pending(current))
> > -                   return SWAP_CLUSTER_MAX;
> > +           if (fatal_signal_pending(current)) {
> > +                   nr_reclaimed = SWAP_CLUSTER_MAX;
> > +                   goto out;
> > +           }
> > +
> > +           /*
> > +            * If we reached the timeout, this is direct reclaim, and
> > +            * pages cannot be isolated then return. If the situation
> 
> Please add something that we would rather shrink slab than go
> another round of nap.
> 

That's not necessarily true or even a good idea. It could result in
excessive slab shrinking that is no longer in proportion to LRU scanning
and increased contention within shrinkers.

> > +            * persists for a long time then it'll eventually reach
> > +            * the no_progress limit in should_reclaim_retry and consider
> > +            * going OOM. In this case, do not wake the isolated_wait
> > +            * queue as the wakee will still not be able to make progress.
> > +            */
> > +           if (!ret && !current_is_kswapd() && !safe_to_isolate(pgdat, 
> > file, sc))
> > +                   return 0;
> >     }
> > 
> >     lru_add_drain();
> > @@ -1839,6 +1855,10 @@ shrink_inactive_list(unsigned long nr_to_scan, 
> > struct lruvec *lruvec,
> >                     stat.nr_activate, stat.nr_ref_keep,
> >                     stat.nr_unmap_fail,
> >                     sc->priority, file);
> > +
> > +out:
> > +   if (waitqueue_active(&pgdat->isolated_wait))
> > +           wake_up(&pgdat->isolated_wait);
> >     return nr_reclaimed;
> >  }
> > 
> Is it also needed to check isolated_wait active before kswapd 
> takes nap?
> 

No because this is where pages were isolated and there is no putback
event that would justify waking the queue. There is a race between
waitqueue_active() and going to sleep that we rely on the timeout to
recover from.

-- 
Mel Gorman
SUSE Labs

Reply via email to