On 25.03.2014 [13:25:30 -0500], Christoph Lameter wrote:
> On Tue, 25 Mar 2014, Nishanth Aravamudan wrote:
> 
> > On power, very early, we find the 16G pages (gpages in the powerpc arch
> > code) in the device-tree:
> >
> > early_setup ->
> >     early_init_mmu ->
> >             htab_initialize ->
> >                     htab_init_page_sizes ->
> >                             htab_dt_scan_hugepage_blocks ->
> >                                     memblock_reserve
> >                                             which marks the memory
> >                                             as reserved
> >                                     add_gpage
> >                                             which saves the address
> >                                             off so future calls for
> >                                             alloc_bootmem_huge_page()
> >
> > hugetlb_init ->
> >             hugetlb_init_hstates ->
> >                     hugetlb_hstate_alloc_pages ->
> >                             alloc_bootmem_huge_page
> >
> > > Not sure if I understand that correctly.
> >
> > Basically this is present memory that is "reserved" for the 16GB usage
> > per the LPAR configuration. We honor that configuration in Linux based
> > upon the contents of the device-tree. It just so happens in the
> > configuration from my original e-mail that a consequence of this is that
> > a NUMA node has memory (topologically), but none of that memory is free,
> > nor will it ever be free.
> 
> Well dont do that

I appreciate the help you're offering, but that's really not an option.
The customer/user has configured the system in such a way so they can
leverage the gigantic pages. And *most* everything seems to work fine
except for the case I mentioned in my original e-mail. I guess we could
fewer 16GB pages if it would exhaust a NUMA node, but ... I think the
underlying mapping would be a 16GB one, so it will not be accurate from
a performance perspective (although it should perform better).

> > Perhaps, in this case, we could just remove that node from the N_MEMORY
> > mask? Memory allocations will never succeed from the node, and we can
> > never free these 16GB pages. It is really not any different than a
> > memoryless node *except* when you are using the 16GB pages.
> 
> That looks to be the correct way to handle things. Maybe mark the node as
> offline or somehow not present so that the kernel ignores it.

Ok, I'll consider these options. Thanks!

-Nish

_______________________________________________
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Reply via email to