On Tue, Jul 12, 2016 at 12:18:05PM +0100, Mel Gorman wrote: > On Tue, Jul 12, 2016 at 09:06:04PM +1000, Balbir Singh wrote: > > > diff --git a/Documentation/cgroup-v1/memory.txt > > > b/Documentation/cgroup-v1/memory.txt > > > index b14abf217239..946e69103cdd 100644 > > > --- a/Documentation/cgroup-v1/memory.txt > > > +++ b/Documentation/cgroup-v1/memory.txt > > > @@ -267,11 +267,11 @@ When oom event notifier is registered, event will > > > be delivered. > > > Other lock order is following: > > > PG_locked. > > > mm->page_table_lock > > > - zone->lru_lock > > > + zone_lru_lock > > > > zone_lru_lock is a little confusing, can't we just call it > > node_lru_lock? > > > > It's a matter of perspective. People familiar with the VM already expect > a zone lock so will be looking for it. I can do a rename if you insist > but it may not actually help.
I don't want to insist, but zone_ in the name can be confusing, as to leading us to think that the lru_lock is still in the zone If the rest of the reviewers are fine with, we don't need to rename > > > > @@ -496,7 +496,6 @@ struct zone { > > > /* Write-intensive fields used by page reclaim */ > > > > > > /* Fields commonly accessed by the page reclaim scanner */ > > > - spinlock_t lru_lock; > > > struct lruvec lruvec; > > > > > > /* > > > @@ -690,6 +689,9 @@ typedef struct pglist_data { > > > /* Number of pages migrated during the rate limiting time interval */ > > > unsigned long numabalancing_migrate_nr_pages; > > > #endif > > > + /* Write-intensive fields used by page reclaim */ > > > + ZONE_PADDING(_pad1_)a > > > > I thought this was to have zone->lock and zone->lru_lock in different > > cachelines, do we still need the padding here? > > > > The zone padding current keeps the page lock wait tables, page allocator > lists, compaction and vmstats on separate cache lines. They're still > fine. > > The node padding may not be necessary. It currently ensures that zonelists > and numa balancing are separate from the LRU lock but there is no guarantee > the current arrangement is optimal. It would depend on both the kernel > config and the workload but it may be necessary in the future to split > node into read-mostly sections and then different write-intensive sections > similar to what has happened to struct zone in the past. > Fair enough Balbir Singh.