On 5/20/25 18:42, David Hildenbrand wrote: > A long-term goal is supporting frozen PageOffline pages, and later > PageOffline pages that don't have a refcount at all. Some more work for
Looking forward to that :) > that is needed -- in particular around non-folio page migration and > memory ballooning drivers -- but let's start by handling PageOffline pages > that can be skipped during memory offlining differently. > > Note that PageOffline is used to mark pages that are logically offline > in an otherwise online memory block (e.g., 128 MiB). If a memory > block is offline, the memmap is considered compeltely uninitialized > and stale (see pfn_to_online_page()). > > Let's introduce a PageOffline specific page flag (PG_offline_skippable) > that for now reuses PG_owner_2. In the memdesc future, it will be one of > a small number of per-memdesc flags stored alongside the type. > > By setting PG_offline_skippable, a driver indicates that it can > restore the PageOffline state of these specific pages when re-onlining a > memory block: it knows that these pages are supposed to be PageOffline() > without the information in the vmemmap, so it can filter them out and > not expose them to the buddy -> they stay PageOffline(). > > While PG_offline_offlineable might be clearer, it is also super > confusing. Alternatives (PG_offline_sticky?) also don't quite feel right. > So let's use "skippable" for now. > > The flag is not supposed to be used for movable PageOffline pages as > used for balloon compaction; movable PageOffline() pages can simply be > migrated during the memory offlining stage, turning the migration > destination page PageOffline() and turning the migration source page > into a free buddy page. > > Let's convert the single user from our MEM_GOING_OFFLINE approach > to the new PG_offline_skippable approach: virtio-mem. Fortunately, > this simplifies the code quite a lot. The only corner case we have to > take care of is when force-unloading the virtio-mem driver: we have to > prevent partially-plugged memory blocks from getting offlined by > clearing PG_offline_skippable again. > > What if someone decides to grab a reference on these pages although they > really shouldn't? After all, we'll now keep the refcount at 1 (until we > can properly stop using the refcount completely). > > Well, less worse things will happen than would currently: currently, > if someone would grab a reference to these pages, in MEM_GOING_OFFLINE > we would run into the > if (WARN_ON(!page_ref_dec_and_test(page))) > dump_page(page, "fake-offline page referenced"); > > And once that unexpected reference would get dropped, we would end up > freeing that page to the buddy: ouch. > > Now, we'll allow for offlining that memory, and when that unexpected > reference would get dropped, we would not end up freeing that page to > the buddy. Once we have frozen PageOffline() pages, it will all get a > lot cleaner. Hmm, a question on that later in the code (assuming I identified the right place). > Note that we didn't see the existing WARN_ON so far, because nobody > should ever be referencing such pages. It's mostly a speculative refcount increase from a pfn walker, such as compaction scanner, that can happen due to its inherent raciness. > An alternative might be to have another callback chain from memory hotplug > code, where a driver that owns that page could agree to skip the > PageOffline() page. However, we would have to repeatedly issue these > callbacks for individual PageOffline() pages, which does not sound > compelling. As we have spare bits, let's use this simpler approach for > now. > > Acked-by: Zi Yan <z...@nvidia.com> > Signed-off-by: David Hildenbrand <da...@redhat.com> Acked-by: Vlastimil Babka <vba...@suse.cz> # page allocator I'll leave hotplug to the experts :) <snip> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index f6482223e28a2..7e4c41e46a911 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -7023,12 +7023,12 @@ unsigned long __offline_isolated_pages(unsigned long > start_pfn, > continue; > } > /* > - * At this point all remaining PageOffline() pages have a > - * reference count of 0 and can simply be skipped. > + * At this point all remaining PageOffline() pages must be > + * "skippable" and have exactly one reference. > */ > if (PageOffline(page)) { > - BUG_ON(page_count(page)); > - BUG_ON(PageBuddy(page)); > + WARN_ON_ONCE(!PageOfflineSkippable(page)); > + WARN_ON_ONCE(page_count(page) != 1); So is this the part where an unexpected speculative refcount might be detected? Should be harmless then as it will then decrease the refcount from e.g. 2 to 1 and nothing will happen right. That's assuming that once we pass __offline_isolated_pages(), the following actions wont modify the refcount or the struct page won't be zeroed, or removed completely (vmemmap). Probably something already prevents that... > already_offline++; > pfn++; > continue; > diff --git a/mm/page_isolation.c b/mm/page_isolation.c > index b2fc5266e3d26..debd898b794ea 100644 > --- a/mm/page_isolation.c > +++ b/mm/page_isolation.c > @@ -121,16 +121,11 @@ static struct page *has_unmovable_pages(unsigned long > start_pfn, unsigned long e > continue; > > /* > - * We treat all PageOffline() pages as movable when offlining > - * to give drivers a chance to decrement their reference count > - * in MEM_GOING_OFFLINE in order to indicate that these pages > - * can be offlined as there are no direct references anymore. > - * For actually unmovable PageOffline() where the driver does > - * not support this, we will fail later when trying to actually > - * move these pages that still have a reference count > 0. > - * (false negatives in this function only) > + * PageOffline() pages that are marked as "skippable" > + * are treated like movable pages for memory offlining. > */ > - if ((flags & MEMORY_OFFLINE) && PageOffline(page)) > + if ((flags & MEMORY_OFFLINE) && PageOffline(page) && > + PageOfflineSkippable(page)) > continue; > > if (__PageMovable(page) || PageLRU(page)) > @@ -577,11 +572,11 @@ __test_page_isolated_in_pageblock(unsigned long pfn, > unsigned long end_pfn, > /* A HWPoisoned page cannot be also PageBuddy */ > pfn++; > else if ((flags & MEMORY_OFFLINE) && PageOffline(page) && > - !page_count(page)) > + PageOfflineSkippable(page)) > /* > - * The responsible driver agreed to skip PageOffline() > - * pages when offlining memory by dropping its > - * reference in MEM_GOING_OFFLINE. > + * If the page is a skippable PageOffline() page, > + * we can offline the memory block, as the driver will > + * re-discover them when re-onlining the memory. > */ > pfn++; > else