On Wed, Apr 02, 2025 at 07:16:27PM +0000, Pratyush Yadav wrote: > Hi Changyuan, > > On Wed, Mar 19 2025, Changyuan Lyu wrote: > > > From: "Mike Rapoport (Microsoft)" <r...@kernel.org> > > > > Introduce APIs allowing KHO users to preserve memory across kexec and > > get access to that memory after boot of the kexeced kernel > > > > kho_preserve_folio() - record a folio to be preserved over kexec > > kho_restore_folio() - recreates the folio from the preserved memory > > kho_preserve_phys() - record physically contiguous range to be > > preserved over kexec. > > kho_restore_phys() - recreates order-0 pages corresponding to the > > preserved physical range > > > > The memory preservations are tracked by two levels of xarrays to manage > > chunks of per-order 512 byte bitmaps. For instance the entire 1G order > > of a 1TB x86 system would fit inside a single 512 byte bitmap. For > > order 0 allocations each bitmap will cover 16M of address space. Thus, > > for 16G of memory at most 512K of bitmap memory will be needed for order 0. > > > > At serialization time all bitmaps are recorded in a linked list of pages > > for the next kernel to process and the physical address of the list is > > recorded in KHO FDT. > > > > The next kernel then processes that list, reserves the memory ranges and > > later, when a user requests a folio or a physical range, KHO restores > > corresponding memory map entries. > > > > Suggested-by: Jason Gunthorpe <j...@nvidia.com> > > Signed-off-by: Mike Rapoport (Microsoft) <r...@kernel.org> > > Co-developed-by: Changyuan Lyu <changyu...@google.com> > > Signed-off-by: Changyuan Lyu <changyu...@google.com> > > --- > > include/linux/kexec_handover.h | 38 +++ > > kernel/kexec_handover.c | 486 ++++++++++++++++++++++++++++++++- > > 2 files changed, 522 insertions(+), 2 deletions(-) > [...] > > +int kho_preserve_phys(phys_addr_t phys, size_t size) > > +{ > > + unsigned long pfn = PHYS_PFN(phys), end_pfn = PHYS_PFN(phys + size); > > + unsigned int order = ilog2(end_pfn - pfn); > > This caught my eye when playing around with the code. It does not put > any limit on the order, so it can exceed NR_PAGE_ORDERS. Also, when
I don't see a problem with this > initializing the page after KHO, we pass the order directly to > prep_compound_page() without sanity checking it. The next kernel might > not support all the orders the current one supports. Perhaps something > to fix? And this needs to be fixed and we should refuse to create folios larger than MAX_ORDER. > > + unsigned long failed_pfn; > > + int err = 0; > > + > > + if (!kho_enable) > > + return -EOPNOTSUPP; > > + > > + down_read(&kho_out.tree_lock); > > + if (kho_out.fdt) { > > + err = -EBUSY; > > + goto unlock; > > + } > > + > > + for (; pfn < end_pfn; > > + pfn += (1 << order), order = ilog2(end_pfn - pfn)) { > > + err = __kho_preserve(&kho_mem_track, pfn, order); > > + if (err) { > > + failed_pfn = pfn; > > + break; > > + } > > + } > [... > > +struct folio *kho_restore_folio(phys_addr_t phys) > > +{ > > + struct page *page = pfn_to_online_page(PHYS_PFN(phys)); > > + unsigned long order = page->private; > > + > > + if (!page) > > + return NULL; > > + > > + order = page->private; > > + if (order) > > + prep_compound_page(page, order); > > + else > > + kho_restore_page(page); > > + > > + return page_folio(page); > > +} > [...] > > -- > Regards, > Pratyush Yadav -- Sincerely yours, Mike.