On Wed, 14 Jun 2017 22:04:55 -0400
Jerome Glisse <jgli...@redhat.com> wrote:

> On Thu, Jun 15, 2017 at 11:41:59AM +1000, Balbir Singh wrote:
> > On Wed, 14 Jun 2017 16:11:43 -0400
> > Jérôme Glisse <jgli...@redhat.com> wrote:
> >   
> > > HMM pages (private or public device pages) are ZONE_DEVICE page and
> > > thus need special handling when it comes to lru or refcount. This
> > > patch make sure that memcontrol properly handle those when it face
> > > them. Those pages are use like regular pages in a process address
> > > space either as anonymous page or as file back page. So from memcg
> > > point of view we want to handle them like regular page for now at
> > > least.
> > > 
> > > Signed-off-by: Jérôme Glisse <jgli...@redhat.com>
> > > Cc: Johannes Weiner <han...@cmpxchg.org>
> > > Cc: Michal Hocko <mho...@kernel.org>
> > > Cc: Vladimir Davydov <vdavydov....@gmail.com>
> > > Cc: cgro...@vger.kernel.org
> > > ---
> > >  kernel/memremap.c |  2 ++
> > >  mm/memcontrol.c   | 58 
> > > ++++++++++++++++++++++++++++++++++++++++++++++++++-----
> > >  2 files changed, 55 insertions(+), 5 deletions(-)
> > > 
> > > diff --git a/kernel/memremap.c b/kernel/memremap.c
> > > index da74775..584984c 100644
> > > --- a/kernel/memremap.c
> > > +++ b/kernel/memremap.c
> > > @@ -479,6 +479,8 @@ void put_zone_device_private_or_public_page(struct 
> > > page *page)
> > >           __ClearPageActive(page);
> > >           __ClearPageWaiters(page);
> > >  
> > > +         mem_cgroup_uncharge(page);
> > > +  
> > 
> > A zone device page could have a mem_cgroup charge if
> > 
> > 1. The old page was charged to a cgroup and the new page from ZONE_DEVICE 
> > then
> > gets the charge that we need to drop here
> > 
> > And should not be charged
> > 
> > 2. If the driver allowed mmap based allocation (these pages are not on LRU
> > 
> > 
> > Since put_zone_device_private_or_public_page() is called from 
> > release_pages(),
> > I think the assumption is that 2 is not a problem? I've not tested the mmap
> > bits yet.  
> 
> Well that is one of the big question. Do we care about memory cgroup despite
> page not being on lru and thus not being reclaimable through the usual path ?
> 
> I believe we do want to keep charging ZONE_DEVICE page against memory cgroup
> so that userspace limit are enforced. This is important especialy for device
> private when migrating back to system memory due to CPU page fault. We do not
> want the migration back to fail because of memory cgroup limit.
> 
> Hence why i do want to charge ZONE_DEVICE page just like regular page. If we
> have people that run into OOM because of this then we can start thinking about
> how to account those pages slightly differently inside the memory cgroup.
> 
> For now i believe we do want this patch.
> 

Yes, we do need the patch, I was trying to check if we'll end up trying to 
uncharge
a page that is not charged, just double checking

> 
> [...]
> 
> > > @@ -4610,6 +4637,9 @@ static enum mc_target_type get_mctgt_type(struct 
> > > vm_area_struct *vma,
> > >            */
> > >           if (page->mem_cgroup == mc.from) {
> > >                   ret = MC_TARGET_PAGE;
> > > +                 if (is_device_private_page(page) ||
> > > +                     is_device_public_page(page))
> > > +                         ret = MC_TARGET_DEVICE;
> > >                   if (target)
> > >                           target->page = page;
> > >           }
> > > @@ -4669,6 +4699,11 @@ static int 
> > > mem_cgroup_count_precharge_pte_range(pmd_t *pmd,
> > >  
> > >   ptl = pmd_trans_huge_lock(pmd, vma);
> > >   if (ptl) {
> > > +         /*
> > > +          * Note their can not be MC_TARGET_DEVICE for now as we do not  
> >                         there  
> > > +          * support transparent huge page with MEMORY_DEVICE_PUBLIC or
> > > +          * MEMORY_DEVICE_PRIVATE but this might change.  
> > 
> > I am trying to remind myself why THP and MEMORY_DEVICE_* pages don't work 
> > well
> > together today, the driver could allocate a THP size set of pages and 
> > migrate it.
> > There are patches to do THP migration, not upstream yet. Could you remind me
> > of any other limitations?  
> 
> No there is nothing that would be problematic AFAICT. Persistent memory 
> already
> use huge page so we should be in the clear. But i would rather enable that as
> a separate patchset alltogether and have proper testing specificaly for such
> scenario.

Agreed
Balbir Singh.

Reply via email to