Re: [Xen-devel] [PATCH v1] xen/balloon: Fix mapping PG_offline pages to user space
On Thu, Mar 14, 2019 at 04:40:25PM +0100, David Hildenbrand wrote: > @@ -646,6 +647,7 @@ void free_xenballooned_pages(int nr_pages, struct page > **pages) > > for (i = 0; i < nr_pages; i++) { > if (pages[i]) > + __SetPageOffline(pages[i]); > balloon_append(pages[i]); didn't you forget {} there? ;-) > } > > -- > 2.17.2 > -- Oscar Salvador SUSE L3 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH RFCv2 0/6] mm: online/offline_pages called w.o. mem_hotplug_lock
On Tue, Aug 21, 2018 at 12:44:12PM +0200, David Hildenbrand wrote: > This is the same approach as in the first RFC, but this time without > exporting device_hotplug_lock (requested by Greg) and with some more > details and documentation regarding locking. Tested only on x86 so far. Hi David, I would like to review this but I am on vacation, so I will not be able to get to it soon. I plan to do it once I am back. Thanks -- Oscar Salvador SUSE L3 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v3] memory_hotplug: Free pages as higher order
On Thu, Sep 27, 2018 at 12:28:50PM +0530, Arun KS wrote: > + __free_pages_boot_core(page, order); I am not sure, but if we are going to use that function from the memory-hotplug code, we might want to rename that function to something more generic? The word "boot" suggests that this is only called from the boot stage. And what about the prefetch operations? I saw that you removed them in your previous patch and that had some benefits [1]. Should we remove them here as well? [1] https://patchwork.kernel.org/patch/10613359/ Thanks -- Oscar Salvador SUSE L3 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v3 3/6] mm/memory_hotplug: fix online/offline_pages called w.o. mem_hotplug_lock
On Thu, Sep 27, 2018 at 11:25:51AM +0200, David Hildenbrand wrote: > Reviewed-by: Pavel Tatashin > Reviewed-by: Rashmica Gupta > Signed-off-by: David Hildenbrand Reviewed-by: Oscar Salvador -- Oscar Salvador SUSE L3 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v3 6/6] memory-hotplug.txt: Add some details about locking internals
On Thu, Sep 27, 2018 at 11:25:54AM +0200, David Hildenbrand wrote: > Cc: Jonathan Corbet > Cc: Michal Hocko > Cc: Andrew Morton > Reviewed-by: Pavel Tatashin > Reviewed-by: Rashmica Gupta > Signed-off-by: David Hildenbrand Reviewed-by: Oscar Salvador -- Oscar Salvador SUSE L3 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v3 1/6] mm/memory_hotplug: make remove_memory() take the device_hotplug_lock
On Thu, Sep 27, 2018 at 11:25:49AM +0200, David Hildenbrand wrote: > Reviewed-by: Pavel Tatashin > Reviewed-by: Rafael J. Wysocki > Reviewed-by: Rashmica Gupta > Signed-off-by: David Hildenbrand Reviewed-by: Oscar Salvador -- Oscar Salvador SUSE L3 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v3 2/6] mm/memory_hotplug: make add_memory() take the device_hotplug_lock
On Thu, Sep 27, 2018 at 11:25:50AM +0200, David Hildenbrand wrote: > Reviewed-by: Pavel Tatashin > Reviewed-by: Rafael J. Wysocki > Reviewed-by: Rashmica Gupta > Signed-off-by: David Hildenbrand Reviewed-by: Oscar Salvador -- Oscar Salvador SUSE L3 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v5 1/2] memory_hotplug: Free pages as higher order
On Fri, Oct 05, 2018 at 01:40:05PM +0530, Arun KS wrote: > When free pages are done with higher order, time spend on > coalescing pages by buddy allocator can be reduced. With > section size of 256MB, hot add latency of a single section > shows improvement from 50-60 ms to less than 1 ms, hence > improving the hot add latency by 60%. Modify external > providers of online callback to align with the change. > > Signed-off-by: Arun KS Looks good to me. Reviewed-by: Oscar Salvador Just one thing below: > @@ -1331,7 +1331,7 @@ void __init __free_pages_bootmem(struct page *page, > unsigned long pfn, > { > if (early_page_uninitialised(pfn)) > return; > - return __free_pages_boot_core(page, order); > + return __free_pages_core(page, order); __free_pages_core is void, so I guess we do not need that return there. Probably the code generated is the same though. -- Oscar Salvador SUSE L3 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v5 1/2] memory_hotplug: Free pages as higher order
On Fri, Oct 05, 2018 at 01:40:05PM +0530, Arun KS wrote: > When free pages are done with higher order, time spend on > coalescing pages by buddy allocator can be reduced. With > section size of 256MB, hot add latency of a single section > shows improvement from 50-60 ms to less than 1 ms, hence > improving the hot add latency by 60%. Modify external > providers of online callback to align with the change. Hi Arun, out of curiosity: could you please explain how exactly did you mesure the speed improvement? Thanks -- Oscar Salvador SUSE L3 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v5 1/2] memory_hotplug: Free pages as higher order
On Wed, Oct 10, 2018 at 04:21:16PM +0530, Arun KS wrote: > diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c > index e379e85..2416136 100644 > --- a/mm/memory_hotplug.c > +++ b/mm/memory_hotplug.c > @@ -690,9 +690,13 @@ static int online_pages_range(unsigned long start_pfn, > unsigned long nr_pages, > void *arg) > { > unsigned long onlined_pages = *(unsigned long *)arg; > + u64 t1, t2; > > + t1 = local_clock(); > if (PageReserved(pfn_to_page(start_pfn))) > onlined_pages = online_pages_blocks(start_pfn, nr_pages); > + t2 = local_clock(); > + trace_printk("time spend = %llu us\n", (t2-t1)/(1000)); > > online_mem_sections(start_pfn, start_pfn + nr_pages); Thanks ;-) -- Oscar Salvador SUSE L3 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v1] mm/memory_hotplug: drop "online" parameter from add_memory_resource()
On Fri, 2018-11-23 at 13:37 +0100, David Hildenbrand wrote: > Signed-off-by: David Hildenbrand Thanks ;-) Reviewed-by: Oscar Salvador -- Oscar Salvador SUSE L3 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [PATCH RFC 1/4] mm/page_alloc: convert "report" flag of __free_one_page() to a proper flag
On Wed, Sep 16, 2020 at 08:34:08PM +0200, David Hildenbrand wrote: > Let's prepare for additional flags and avoid long parameter lists of bools. > Follow-up patches will also make use of the flags in __free_pages_ok(), > however, I wasn't able to come up with a better name for the type - should > be good enough for internal purposes. > > Cc: Andrew Morton > Cc: Alexander Duyck > Cc: Mel Gorman > Cc: Michal Hocko > Cc: Dave Hansen > Cc: Vlastimil Babka > Cc: Wei Yang > Cc: Oscar Salvador > Cc: Mike Rapoport > Signed-off-by: David Hildenbrand Reviewed-by: Oscar Salvador -- Oscar Salvador SUSE L3
Re: [PATCH RFC 2/4] mm/page_alloc: place pages to tail in __putback_isolated_page()
On Wed, Sep 16, 2020 at 08:34:09PM +0200, David Hildenbrand wrote: > __putback_isolated_page() already documents that pages will be placed to > the tail of the freelist - this is, however, not the case for > "order >= MAX_ORDER - 2" (see buddy_merge_likely()) - which should be > the case for all existing users. > > This change affects two users: > - free page reporting > - page isolation, when undoing the isolation. > > This behavior is desireable for pages that haven't really been touched > lately, so exactly the two users that don't actually read/write page > content, but rather move untouched pages. > > The new behavior is especially desirable for memory onlining, where we > allow allocation of newly onlined pages via undo_isolate_page_range() > in online_pages(). Right now, we always place them to the head of the > free list, resulting in undesireable behavior: Assume we add > individual memory chunks via add_memory() and online them right away to > the NORMAL zone. We create a dependency chain of unmovable allocations > e.g., via the memmap. The memmap of the next chunk will be placed onto > previous chunks - if the last block cannot get offlined+removed, all > dependent ones cannot get offlined+removed. While this can already be > observed with individual DIMMs, it's more of an issue for virtio-mem > (and I suspect also ppc DLPAR). > > Note: If we observe a degradation due to the changed page isolation > behavior (which I doubt), we can always make this configurable by the > instance triggering undo of isolation (e.g., alloc_contig_range(), > memory onlining, memory offlining). > > Cc: Andrew Morton > Cc: Alexander Duyck > Cc: Mel Gorman > Cc: Michal Hocko > Cc: Dave Hansen > Cc: Vlastimil Babka > Cc: Wei Yang > Cc: Oscar Salvador > Cc: Mike Rapoport > Cc: Scott Cheloha > Cc: Michael Ellerman > Signed-off-by: David Hildenbrand LGTM, the only thing is the shuffe_zone topic that Wei and Vlastimil rose. Feels a bit odd that takes precedence over something we explicitily demanded. With the comment Vlastimil suggested: Reviewed-by: Oscar Salvador -- Oscar Salvador SUSE L3
Re: [PATCH RFC 3/4] mm/page_alloc: always move pages to the tail of the freelist in unset_migratetype_isolate()
On Wed, Sep 16, 2020 at 08:34:10PM +0200, David Hildenbrand wrote: > Page isolation doesn't actually touch the pages, it simply isolates > pageblocks and moves all free pages to the MIGRATE_ISOLATE freelist. > > We already place pages to the tail of the freelists when undoing > isolation via __putback_isolated_page(), let's do it in any case > (e.g., if order == pageblock_order) and document the behavior. > > This change results in all pages getting onlined via online_pages() to > be placed to the tail of the freelist. > > Cc: Andrew Morton > Cc: Alexander Duyck > Cc: Mel Gorman > Cc: Michal Hocko > Cc: Dave Hansen > Cc: Vlastimil Babka > Cc: Wei Yang > Cc: Oscar Salvador > Cc: Mike Rapoport > Cc: Scott Cheloha > Cc: Michael Ellerman > Signed-off-by: David Hildenbrand LGTM. Feel the same way about move_freepages_block_tail/move_freepages_block_tail wrappers, I think we are better off without them. Reviewed-by: Oscar Salvador Thanks -- Oscar Salvador SUSE L3
Re: [PATCH RFC 4/4] mm/page_alloc: place pages to tail in __free_pages_core()
On Wed, Sep 16, 2020 at 08:34:11PM +0200, David Hildenbrand wrote: > @@ -1523,7 +1524,13 @@ void __free_pages_core(struct page *page, unsigned int > order) > > atomic_long_add(nr_pages, &page_zone(page)->managed_pages); > set_page_refcounted(page); > - __free_pages(page, order); > + > + /* > + * Bypass PCP and place fresh pages right to the tail, primarily > + * relevant for memory onlining. > + */ > + page_ref_dec(page); > + __free_pages_ok(page, order, FOP_TO_TAIL); Sorry, I must be missing something obvious here, but I am a bit confused here. I get the part of placing them at the tail so rmqueue_bulk() won't find them, but I do not get why we decrement page's refcount. IIUC, its refcount will be 0, but why do we want to do that? Another thing a bit unrelated... we mess three times with page's refcount (two before this patch). Why do we have this dance in place? Thanks -- Oscar Salvador SUSE L3
Re: [PATCH RFC 4/4] mm/page_alloc: place pages to tail in __free_pages_core()
On Mon, Sep 28, 2020 at 10:36:00AM +0200, David Hildenbrand wrote: > Hi Oscar! Hi David :-) > > Old code: > > set_page_refcounted(): sets the refcount to 1. > __free_pages() > -> put_page_testzero(): sets it to 0 > -> free_the_page()->__free_pages_ok() > > New code: > > set_page_refcounted(): sets the refcount to 1. > page_ref_dec(page): sets it to 0 > __free_pages_ok(): bleh, I misread the patch, somehow I managed to not see that you replaced __free_pages with __free_pages_ok. To be honest, now that we do not need the page's refcount to be 1 for the put_page_testzero to trigger (and since you are decrementing it anyways), I think it would be much clear for those two to be gone. But not strong, so: Reviewed-by: Oscar Salvador -- Oscar Salvador SUSE L3
Re: [PATCH v1] mm/memory_hotplug: MEMHP_MERGE_RESOURCE -> MHP_MERGE_RESOURCE
On Tue, Jan 26, 2021 at 12:58:29PM +0100, David Hildenbrand wrote: > Let's make "MEMHP_MERGE_RESOURCE" consistent with "MHP_NONE", "mhp_t" and > "mhp_flags". As discussed recently [1], "mhp" is our internal > acronym for memory hotplug now. > > [1] > https://lore.kernel.org/linux-mm/c37de2d0-28a1-4f7d-f944-cfd7d81c3...@redhat.com/ > > Cc: Andrew Morton > Cc: "K. Y. Srinivasan" > Cc: Haiyang Zhang > Cc: Stephen Hemminger > Cc: Wei Liu > Cc: "Michael S. Tsirkin" > Cc: Jason Wang > Cc: Boris Ostrovsky > Cc: Juergen Gross > Cc: Stefano Stabellini > Cc: Pankaj Gupta > Cc: Michal Hocko > Cc: Oscar Salvador > Cc: Anshuman Khandual > Cc: Wei Yang > Cc: linux-hyp...@vger.kernel.org > Cc: virtualizat...@lists.linux-foundation.org > Cc: xen-devel@lists.xenproject.org > Signed-off-by: David Hildenbrand Reviewed-by: Oscar Salvador -- Oscar Salvador SUSE L3
Re: [RFC PATCH 00/30] Code tagging framework and applications
On Tue, Aug 30, 2022 at 02:48:49PM -0700, Suren Baghdasaryan wrote: > === > Code tagging framework > === > Code tag is a structure identifying a specific location in the source code > which is generated at compile time and can be embedded in an application- > specific structure. Several applications of code tagging are included in > this RFC, such as memory allocation tracking, dynamic fault injection, > latency tracking and improved error code reporting. > Basically, it takes the old trick of "define a special elf section for > objects of a given type so that we can iterate over them at runtime" and > creates a proper library for it. > > === > Memory allocation tracking > === > The goal for using codetags for memory allocation tracking is to minimize > performance and memory overhead. By recording only the call count and > allocation size, the required operations are kept at the minimum while > collecting statistics for every allocation in the codebase. With that > information, if users are interested in mode detailed context for a > specific allocation, they can enable more in-depth context tracking, > which includes capturing the pid, tgid, task name, allocation size, > timestamp and call stack for every allocation at the specified code > location. > Memory allocation tracking is implemented in two parts: > > part1: instruments page and slab allocators to record call count and total > memory allocated at every allocation in the source code. Every time an > allocation is performed by an instrumented allocator, the codetag at that > location increments its call and size counters. Every time the memory is > freed these counters are decremented. To decrement the counters upon free, > allocated object needs a reference to its codetag. Page allocators use > page_ext to record this reference while slab allocators use memcg_data of > the slab page. > The data is exposed to the user space via a read-only debugfs file called > alloc_tags. Hi Suren, I just posted a patch [1] and reading through your changelog and seeing your PoC, I think we have some kind of overlap. My patchset aims to give you the stacktrace <-> relationship information and it is achieved by a little amount of extra code mostly in page_owner.c/ and lib/stackdepot. Of course, your works seems to be more complete wrt. the information you get. I CCed you in case you want to have a look [1] https://lkml.org/lkml/2022/9/1/36 Thanks -- Oscar Salvador SUSE Labs
Re: [PATCH v1 1/3] mm: pass meminit_context to __free_pages_core()
On Fri, Jun 07, 2024 at 11:09:36AM +0200, David Hildenbrand wrote: > In preparation for further changes, let's teach __free_pages_core() > about the differences of memory hotplug handling. > > Move the memory hotplug specific handling from generic_online_page() to > __free_pages_core(), use adjust_managed_page_count() on the memory > hotplug path, and spell out why memory freed via memblock > cannot currently use adjust_managed_page_count(). > > Signed-off-by: David Hildenbrand All looks good but I am puzzled with something. > + } else { > + /* memblock adjusts totalram_pages() ahead of time. */ > + atomic_long_add(nr_pages, &page_zone(page)->managed_pages); > + } You say that memblock adjusts totalram_pages ahead of time, and I guess you mean in memblock_free_all() pages = free_low_memory_core_early() totalram_pages_add(pages); but that is not ahead, it looks like it is upading __after__ sending them to buddy? -- Oscar Salvador SUSE Labs
Re: [PATCH v1 2/3] mm/memory_hotplug: initialize memmap of !ZONE_DEVICE with PageOffline() instead of PageReserved()
On Fri, Jun 07, 2024 at 11:09:37AM +0200, David Hildenbrand wrote: > We currently initialize the memmap such that PG_reserved is set and the > refcount of the page is 1. In virtio-mem code, we have to manually clear > that PG_reserved flag to make memory offlining with partially hotplugged > memory blocks possible: has_unmovable_pages() would otherwise bail out on > such pages. > > We want to avoid PG_reserved where possible and move to typed pages > instead. Further, we want to further enlighten memory offlining code about > PG_offline: offline pages in an online memory section. One example is > handling managed page count adjustments in a cleaner way during memory > offlining. > > So let's initialize the pages with PG_offline instead of PG_reserved. > generic_online_page()->__free_pages_core() will now clear that flag before > handing that memory to the buddy. > > Note that the page refcount is still 1 and would forbid offlining of such > memory except when special care is take during GOING_OFFLINE as > currently only implemented by virtio-mem. > > With this change, we can now get non-PageReserved() pages in the XEN > balloon list. From what I can tell, that can already happen via > decrease_reservation(), so that should be fine. > > HV-balloon should not really observe a change: partial online memory > blocks still cannot get surprise-offlined, because the refcount of these > PageOffline() pages is 1. > > Update virtio-mem, HV-balloon and XEN-balloon code to be aware that > hotplugged pages are now PageOffline() instead of PageReserved() before > they are handed over to the buddy. > > We'll leave the ZONE_DEVICE case alone for now. > > Signed-off-by: David Hildenbrand > diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c > index 27e3be75edcf7..0254059efcbe1 100644 > --- a/mm/memory_hotplug.c > +++ b/mm/memory_hotplug.c > @@ -734,7 +734,7 @@ static inline void section_taint_zone_device(unsigned > long pfn) > /* > * Associate the pfn range with the given zone, initializing the memmaps > * and resizing the pgdat/zone data to span the added pages. After this > - * call, all affected pages are PG_reserved. > + * call, all affected pages are PageOffline(). > * > * All aligned pageblocks are initialized to the specified migratetype > * (usually MIGRATE_MOVABLE). Besides setting the migratetype, no related > @@ -1100,8 +1100,12 @@ int mhp_init_memmap_on_memory(unsigned long pfn, > unsigned long nr_pages, > > move_pfn_range_to_zone(zone, pfn, nr_pages, NULL, MIGRATE_UNMOVABLE); > > - for (i = 0; i < nr_pages; i++) > - SetPageVmemmapSelfHosted(pfn_to_page(pfn + i)); > + for (i = 0; i < nr_pages; i++) { > + struct page *page = pfn_to_page(pfn + i); > + > + __ClearPageOffline(page); > + SetPageVmemmapSelfHosted(page); So, refresh my memory here please. AFAIR, those VmemmapSelfHosted pages were marked Reserved before, but now, memmap_init_range() will not mark them reserved anymore. I do not think that is ok? I am worried about walkers getting this wrong. We usually skip PageReserved pages in walkers because are pages we cannot deal with for those purposes, but with this change, we will leak PageVmemmapSelfHosted, and I am not sure whether are ready for that. Moreover, boot memmap pages are marked as PageReserved, which would be now inconsistent with those added during hotplug operations. All in all, I feel uneasy about this change. -- Oscar Salvador SUSE Labs
Re: [PATCH v1 3/3] mm/memory_hotplug: skip adjust_managed_page_count() for PageOffline() pages when offlining
On Fri, Jun 07, 2024 at 11:09:38AM +0200, David Hildenbrand wrote: > We currently have a hack for virtio-mem in place to handle memory > offlining with PageOffline pages for which we already adjusted the > managed page count. > > Let's enlighten memory offlining code so we can get rid of that hack, > and document the situation. > > Signed-off-by: David Hildenbrand Acked-by: Oscar Salvador -- Oscar Salvador SUSE Labs
Re: [PATCH v1 1/3] mm: pass meminit_context to __free_pages_core()
On Mon, Jun 10, 2024 at 10:38:05AM +0200, David Hildenbrand wrote: > On 10.06.24 06:03, Oscar Salvador wrote: > > On Fri, Jun 07, 2024 at 11:09:36AM +0200, David Hildenbrand wrote: > > > In preparation for further changes, let's teach __free_pages_core() > > > about the differences of memory hotplug handling. > > > > > > Move the memory hotplug specific handling from generic_online_page() to > > > __free_pages_core(), use adjust_managed_page_count() on the memory > > > hotplug path, and spell out why memory freed via memblock > > > cannot currently use adjust_managed_page_count(). > > > > > > Signed-off-by: David Hildenbrand > > > > All looks good but I am puzzled with something. > > > > > + } else { > > > + /* memblock adjusts totalram_pages() ahead of time. */ > > > + atomic_long_add(nr_pages, &page_zone(page)->managed_pages); > > > + } > > > > You say that memblock adjusts totalram_pages ahead of time, and I guess > > you mean in memblock_free_all() > > And memblock_free_late(), which uses atomic_long_inc(). Ah yes. > Right (it's suboptimal, but not really problematic so far. Hopefully Wei can > clean it up and move it in here as well) That would be great. > For the time being > > "/* memblock adjusts totalram_pages() manually. */" Yes, I think that is better ;-) Thanks! -- Oscar Salvador SUSE Labs
Re: [PATCH v1 2/3] mm/memory_hotplug: initialize memmap of !ZONE_DEVICE with PageOffline() instead of PageReserved()
On Mon, Jun 10, 2024 at 10:56:02AM +0200, David Hildenbrand wrote: > There are fortunately not that many left. > > I'd even say marking them (vmemmap) reserved is more wrong than right: note > that ordinary vmemmap pages after memory hotplug are not reserved! Only > bootmem should be reserved. Ok, that is a very good point that I missed. I thought that hotplugged-vmemmap pages (not selfhosted) were marked as Reserved, that is why I thought this would be inconsistent. But then, if that is the case, I think we are safe as kernel can already encounter vmemmap pages that are not reserved and it deals with them somehow. > Let's take at the relevant core-mm ones (arch stuff is mostly just for MMIO > remapping) > ... > Any PageReserved user that I am missing, or why we should handle these > vmemmap pages differently than the ones allocated during ordinary memory > hotplug? No, I cannot think of a reason why normal vmemmap pages should behave different than self-hosted. I was also confused because I thought that after this change pfn_to_online_page() would be different for self-hosted vmemmap pages, because I thought that somehow we relied on PageOffline(), but it is not the case. > In the future, we might want to consider using a dedicated page type for > them, so we can stop using a bit that doesn't allow to reliably identify > them. (we should mark all vmemmap with that type then) Yes, a all-vmemmap pages type would be a good thing, so we do not have to special case. Just one last thing. Now self-hosted vmemmap pages will have the PageOffline cleared, and that will still remain after the memory-block they belong to has gone offline, which is ok because those vmemmap pages lay around until the chunk of memory gets removed. Ok, just wanted to convince myself that there will no be surprises. Thanks David for claryfing. -- Oscar Salvador SUSE Labs
Re: [PATCH v1 2/3] mm/memory_hotplug: initialize memmap of !ZONE_DEVICE with PageOffline() instead of PageReserved()
On Fri, Jun 07, 2024 at 11:09:37AM +0200, David Hildenbrand wrote: > We currently initialize the memmap such that PG_reserved is set and the > refcount of the page is 1. In virtio-mem code, we have to manually clear > that PG_reserved flag to make memory offlining with partially hotplugged > memory blocks possible: has_unmovable_pages() would otherwise bail out on > such pages. > > We want to avoid PG_reserved where possible and move to typed pages > instead. Further, we want to further enlighten memory offlining code about > PG_offline: offline pages in an online memory section. One example is > handling managed page count adjustments in a cleaner way during memory > offlining. > > So let's initialize the pages with PG_offline instead of PG_reserved. > generic_online_page()->__free_pages_core() will now clear that flag before > handing that memory to the buddy. > > Note that the page refcount is still 1 and would forbid offlining of such > memory except when special care is take during GOING_OFFLINE as > currently only implemented by virtio-mem. > > With this change, we can now get non-PageReserved() pages in the XEN > balloon list. From what I can tell, that can already happen via > decrease_reservation(), so that should be fine. > > HV-balloon should not really observe a change: partial online memory > blocks still cannot get surprise-offlined, because the refcount of these > PageOffline() pages is 1. > > Update virtio-mem, HV-balloon and XEN-balloon code to be aware that > hotplugged pages are now PageOffline() instead of PageReserved() before > they are handed over to the buddy. > > We'll leave the ZONE_DEVICE case alone for now. > > Signed-off-by: David Hildenbrand Acked-by: Oscar Salvador # for the generic memory-hotplug bits -- Oscar Salvador SUSE Labs
Re: [PATCH v1 7/9] mm/memory: factor out common code from vm_normal_page_*()
On Tue, Jul 15, 2025 at 03:23:48PM +0200, David Hildenbrand wrote: > Let's reduce the code duplication and factor out the non-pte/pmd related > magic into vm_normal_page_pfn(). > > To keep it simpler, check the pfn against both zero folios. We could > optimize this, but as it's only for the !CONFIG_ARCH_HAS_PTE_SPECIAL > case, it's not a compelling micro-optimization. > > With CONFIG_ARCH_HAS_PTE_SPECIAL we don't have to check anything else, > really. > > It's a good question if we can even hit the !CONFIG_ARCH_HAS_PTE_SPECIAL > scenario in the PMD case in practice: but doesn't really matter, as > it's now all unified in vm_normal_page_pfn(). > > Add kerneldoc for all involved functions. > > No functional change intended. > > Signed-off-by: David Hildenbrand Reviewed-by: Oscar Salvador -- Oscar Salvador SUSE Labs
Re: [PATCH v1 8/9] mm: introduce and use vm_normal_page_pud()
On Tue, Jul 15, 2025 at 03:23:49PM +0200, David Hildenbrand wrote: > Let's introduce vm_normal_page_pud(), which ends up being fairly simple > because of our new common helpers and there not being a PUD-sized zero > folio. > > Use vm_normal_page_pud() in folio_walk_start() to resolve a TODO, > structuring the code like the other (pmd/pte) cases. Defer > introducing vm_normal_folio_pud() until really used. > > Signed-off-by: David Hildenbrand Reviewed-by: Oscar Salvador -- Oscar Salvador SUSE Labs
Re: [PATCH v1 9/9] mm: rename vm_ops->find_special_page() to vm_ops->find_normal_page()
On Tue, Jul 15, 2025 at 03:23:50PM +0200, David Hildenbrand wrote: > ... and hide it behind a kconfig option. There is really no need for > any !xen code to perform this check. > > The naming is a bit off: we want to find the "normal" page when a PTE > was marked "special". So it's really not "finding a special" page. > > Improve the documentation, and add a comment in the code where XEN ends > up performing the pte_mkspecial() through a hypercall. More details can > be found in commit 923b2919e2c3 ("xen/gntdev: mark userspace PTEs as > special on x86 PV guests"). > > Cc: David Vrabel > Signed-off-by: David Hildenbrand Reviewed-by: Oscar Salvador -- Oscar Salvador SUSE Labs