Michael Ellerman <m...@ellerman.id.au> writes: > Nathan Lynch via B4 Relay <devnull+nathanl.linux.ibm....@kernel.org> > writes: >> From: Nathan Lynch <nath...@linux.ibm.com> >> >> Although the H_PAGE_INIT hcall's H_PAGE_SET_UNUSED historically has >> been tied to the cooperative memory overcommit (CMO) platform feature, >> the flag also is treated by the PowerVM hypervisor as a hint that the >> page contents need not be copied to the destination during a live >> partition migration. >> >> Use the "ibm,migratable-partition" root node property to determine >> whether this partition/guest can be migrated. Mark freed pages unused >> if so (or if CMO is in use, as before). >> >> Signed-off-by: Nathan Lynch <nath...@linux.ibm.com> >> --- >> Several things yet to improve here: >> >> * powerpc's arch_free_page()/HAVE_ARCH_FREE_PAGE should be decoupled >> from CONFIG_PPC_SMLPAR. >> >> * powerpc's arch_free_page() could be made to use a static key if >> justified. >> >> * I have not yet measured the overhead this introduces, nor have I >> measured the benefit to a live migration. >> >> To date, I have smoke tested it by doing a live migration and >> performing a build on a kernel with the change, to ensure it doesn't >> introduce obvious memory corruption or anything. It hasn't blown up >> yet :-) >> >> This will be a possibly significant behavior change in that we will be >> flagging pages unused where we typically did not before. Until now, >> having CMO enabled was the only way to do this, and I don't think that >> feature is used all that much? > > Yeah AFAIK it has to be explicitly configured and enabled via the HMC, > so doesn't get much testing or usage. > >> Posting this as RFC to see if there are any major concerns. > > My worry is that this will add overhead for everyone in normal usage, an > hcall per freed set of pages, whereas the benefit is only seen when a > migration happens. > > But that does depend on how often arch_free_page() gets called in normal > usage, which I don't know offhand.
Yes, and as I said in my followup yesterday: >> for this to be safe, powerpc/pseries needs to implement >> arch_alloc_page() to undo setting the "unused" flag. So, perhaps more significantly, we'd also incur an hcall per arch_alloc_page() with the most straightforward implementation that doesn't eat data (unlike this version!). Nevertheless I'll plan on doing that for the next iteration to see if I can measure the overhead and benefit, with the expectation that we'll ultimately need a more sophisticated design.