Re: [PATCH v1 1/3] mm: pass meminit_context to __free_pages_core()

2024-06-09 Thread Oscar Salvador
On Fri, Jun 07, 2024 at 11:09:36AM +0200, David Hildenbrand wrote:
> In preparation for further changes, let's teach __free_pages_core()
> about the differences of memory hotplug handling.
> 
> Move the memory hotplug specific handling from generic_online_page() to
> __free_pages_core(), use adjust_managed_page_count() on the memory
> hotplug path, and spell out why memory freed via memblock
> cannot currently use adjust_managed_page_count().
> 
> Signed-off-by: David Hildenbrand 

All looks good but I am puzzled with something.

> + } else {
> + /* memblock adjusts totalram_pages() ahead of time. */
> + atomic_long_add(nr_pages, &page_zone(page)->managed_pages);
> + }

You say that memblock adjusts totalram_pages ahead of time, and I guess
you mean in memblock_free_all()

 pages = free_low_memory_core_early()
 totalram_pages_add(pages);

but that is not ahead, it looks like it is upading __after__ sending
them to buddy?


-- 
Oscar Salvador
SUSE Labs



Re: [PATCH v1 2/3] mm/memory_hotplug: initialize memmap of !ZONE_DEVICE with PageOffline() instead of PageReserved()

2024-06-09 Thread Oscar Salvador
On Fri, Jun 07, 2024 at 11:09:37AM +0200, David Hildenbrand wrote:
> We currently initialize the memmap such that PG_reserved is set and the
> refcount of the page is 1. In virtio-mem code, we have to manually clear
> that PG_reserved flag to make memory offlining with partially hotplugged
> memory blocks possible: has_unmovable_pages() would otherwise bail out on
> such pages.
> 
> We want to avoid PG_reserved where possible and move to typed pages
> instead. Further, we want to further enlighten memory offlining code about
> PG_offline: offline pages in an online memory section. One example is
> handling managed page count adjustments in a cleaner way during memory
> offlining.
> 
> So let's initialize the pages with PG_offline instead of PG_reserved.
> generic_online_page()->__free_pages_core() will now clear that flag before
> handing that memory to the buddy.
> 
> Note that the page refcount is still 1 and would forbid offlining of such
> memory except when special care is take during GOING_OFFLINE as
> currently only implemented by virtio-mem.
> 
> With this change, we can now get non-PageReserved() pages in the XEN
> balloon list. From what I can tell, that can already happen via
> decrease_reservation(), so that should be fine.
> 
> HV-balloon should not really observe a change: partial online memory
> blocks still cannot get surprise-offlined, because the refcount of these
> PageOffline() pages is 1.
> 
> Update virtio-mem, HV-balloon and XEN-balloon code to be aware that
> hotplugged pages are now PageOffline() instead of PageReserved() before
> they are handed over to the buddy.
> 
> We'll leave the ZONE_DEVICE case alone for now.
> 
> Signed-off-by: David Hildenbrand 

> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 27e3be75edcf7..0254059efcbe1 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -734,7 +734,7 @@ static inline void section_taint_zone_device(unsigned 
> long pfn)
>  /*
>   * Associate the pfn range with the given zone, initializing the memmaps
>   * and resizing the pgdat/zone data to span the added pages. After this
> - * call, all affected pages are PG_reserved.
> + * call, all affected pages are PageOffline().
>   *
>   * All aligned pageblocks are initialized to the specified migratetype
>   * (usually MIGRATE_MOVABLE). Besides setting the migratetype, no related
> @@ -1100,8 +1100,12 @@ int mhp_init_memmap_on_memory(unsigned long pfn, 
> unsigned long nr_pages,
>  
>   move_pfn_range_to_zone(zone, pfn, nr_pages, NULL, MIGRATE_UNMOVABLE);
>  
> - for (i = 0; i < nr_pages; i++)
> - SetPageVmemmapSelfHosted(pfn_to_page(pfn + i));
> + for (i = 0; i < nr_pages; i++) {
> + struct page *page = pfn_to_page(pfn + i);
> +
> + __ClearPageOffline(page);
> + SetPageVmemmapSelfHosted(page);

So, refresh my memory here please.
AFAIR, those VmemmapSelfHosted pages were marked Reserved before, but now,
memmap_init_range() will not mark them reserved anymore.
I do not think that is ok? I am worried about walkers getting this wrong.

We usually skip PageReserved pages in walkers because are pages we cannot deal
with for those purposes, but with this change, we will leak
PageVmemmapSelfHosted, and I am not sure whether are ready for that.

Moreover, boot memmap pages are marked as PageReserved, which would be
now inconsistent with those added during hotplug operations.

All in all, I feel uneasy about this change.

-- 
Oscar Salvador
SUSE Labs



Re: [PATCH v1 3/3] mm/memory_hotplug: skip adjust_managed_page_count() for PageOffline() pages when offlining

2024-06-09 Thread Oscar Salvador
On Fri, Jun 07, 2024 at 11:09:38AM +0200, David Hildenbrand wrote:
> We currently have a hack for virtio-mem in place to handle memory
> offlining with PageOffline pages for which we already adjusted the
> managed page count.
> 
> Let's enlighten memory offlining code so we can get rid of that hack,
> and document the situation.
> 
> Signed-off-by: David Hildenbrand 

Acked-by: Oscar Salvador 

-- 
Oscar Salvador
SUSE Labs



Re: [PATCH v1 1/3] mm: pass meminit_context to __free_pages_core()

2024-06-10 Thread Oscar Salvador
On Mon, Jun 10, 2024 at 10:38:05AM +0200, David Hildenbrand wrote:
> On 10.06.24 06:03, Oscar Salvador wrote:
> > On Fri, Jun 07, 2024 at 11:09:36AM +0200, David Hildenbrand wrote:
> > > In preparation for further changes, let's teach __free_pages_core()
> > > about the differences of memory hotplug handling.
> > > 
> > > Move the memory hotplug specific handling from generic_online_page() to
> > > __free_pages_core(), use adjust_managed_page_count() on the memory
> > > hotplug path, and spell out why memory freed via memblock
> > > cannot currently use adjust_managed_page_count().
> > > 
> > > Signed-off-by: David Hildenbrand 
> > 
> > All looks good but I am puzzled with something.
> > 
> > > + } else {
> > > + /* memblock adjusts totalram_pages() ahead of time. */
> > > + atomic_long_add(nr_pages, &page_zone(page)->managed_pages);
> > > + }
> > 
> > You say that memblock adjusts totalram_pages ahead of time, and I guess
> > you mean in memblock_free_all()
> 
> And memblock_free_late(), which uses atomic_long_inc().

Ah yes.

 
> Right (it's suboptimal, but not really problematic so far. Hopefully Wei can
> clean it up and move it in here as well)

That would be great.

> For the time being
> 
> "/* memblock adjusts totalram_pages() manually. */"

Yes, I think that is better ;-)

Thanks!
 

-- 
Oscar Salvador
SUSE Labs



Re: [PATCH v1 2/3] mm/memory_hotplug: initialize memmap of !ZONE_DEVICE with PageOffline() instead of PageReserved()

2024-06-11 Thread Oscar Salvador
On Mon, Jun 10, 2024 at 10:56:02AM +0200, David Hildenbrand wrote:
> There are fortunately not that many left.
> 
> I'd even say marking them (vmemmap) reserved is more wrong than right: note
> that ordinary vmemmap pages after memory hotplug are not reserved! Only
> bootmem should be reserved.

Ok, that is a very good point that I missed.
I thought that hotplugged-vmemmap pages (not selfhosted) were marked as
Reserved, that is why I thought this would be inconsistent.
But then, if that is the case, I think we are safe as kernel can already
encounter vmemmap pages that are not reserved and it deals with them
somehow.

> Let's take at the relevant core-mm ones (arch stuff is mostly just for MMIO
> remapping)
> 
... 
> Any PageReserved user that I am missing, or why we should handle these
> vmemmap pages differently than the ones allocated during ordinary memory
> hotplug?

No, I cannot think of a reason why normal vmemmap pages should behave
different than self-hosted.

I was also confused because I thought that after this change
pfn_to_online_page() would be different for self-hosted vmemmap pages,
because I thought that somehow we relied on PageOffline(), but it is not
the case.

> In the future, we might want to consider using a dedicated page type for
> them, so we can stop using a bit that doesn't allow to reliably identify
> them. (we should mark all vmemmap with that type then)

Yes, a all-vmemmap pages type would be a good thing, so we do not have
to special case.

Just one last thing.
Now self-hosted vmemmap pages will have the PageOffline cleared, and that
will still remain after the memory-block they belong to has gone
offline, which is ok because those vmemmap pages lay around until the
chunk of memory gets removed.

Ok, just wanted to convince myself that there will no be surprises.

Thanks David for claryfing.
 

-- 
Oscar Salvador
SUSE Labs



Re: [PATCH v1 2/3] mm/memory_hotplug: initialize memmap of !ZONE_DEVICE with PageOffline() instead of PageReserved()

2024-06-11 Thread Oscar Salvador
On Fri, Jun 07, 2024 at 11:09:37AM +0200, David Hildenbrand wrote:
> We currently initialize the memmap such that PG_reserved is set and the
> refcount of the page is 1. In virtio-mem code, we have to manually clear
> that PG_reserved flag to make memory offlining with partially hotplugged
> memory blocks possible: has_unmovable_pages() would otherwise bail out on
> such pages.
> 
> We want to avoid PG_reserved where possible and move to typed pages
> instead. Further, we want to further enlighten memory offlining code about
> PG_offline: offline pages in an online memory section. One example is
> handling managed page count adjustments in a cleaner way during memory
> offlining.
> 
> So let's initialize the pages with PG_offline instead of PG_reserved.
> generic_online_page()->__free_pages_core() will now clear that flag before
> handing that memory to the buddy.
> 
> Note that the page refcount is still 1 and would forbid offlining of such
> memory except when special care is take during GOING_OFFLINE as
> currently only implemented by virtio-mem.
> 
> With this change, we can now get non-PageReserved() pages in the XEN
> balloon list. From what I can tell, that can already happen via
> decrease_reservation(), so that should be fine.
> 
> HV-balloon should not really observe a change: partial online memory
> blocks still cannot get surprise-offlined, because the refcount of these
> PageOffline() pages is 1.
> 
> Update virtio-mem, HV-balloon and XEN-balloon code to be aware that
> hotplugged pages are now PageOffline() instead of PageReserved() before
> they are handed over to the buddy.
> 
> We'll leave the ZONE_DEVICE case alone for now.
> 
> Signed-off-by: David Hildenbrand 

Acked-by: Oscar Salvador  # for the generic
memory-hotplug bits


-- 
Oscar Salvador
SUSE Labs