* David Gibson (da...@gibson.dropbear.id.au) wrote: > On Fri, Oct 03, 2014 at 06:47:37PM +0100, Dr. David Alan Gilbert (git) wrote: > > From: "Dr. David Alan Gilbert" <dgilb...@redhat.com> > > > > Signed-off-by: Dr. David Alan Gilbert <dgilb...@redhat.com> > > --- > > arch_init.c | 11 ++++ > > include/migration/migration.h | 1 + > > include/migration/postcopy-ram.h | 12 +++++ > > migration.c | 1 + > > postcopy-ram.c | 110 > > ++++++++++++++++++++++++++++++++++++++- > > savevm.c | 4 ++ > > 6 files changed, 138 insertions(+), 1 deletion(-) > > > > diff --git a/arch_init.c b/arch_init.c > > index 030d189..4a03171 100644 > > --- a/arch_init.c > > +++ b/arch_init.c > > @@ -1345,6 +1345,17 @@ void ram_handle_compressed(void *host, uint8_t ch, > > uint64_t size) > > } > > } > > > > +/* > > + * Allocate data structures etc needed by incoming migration with > > postcopy-ram > > + * postcopy-ram's similarly names postcopy_ram_incoming_init does the work > > + */ > > +int ram_postcopy_incoming_init(MigrationIncomingState *mis) > > +{ > > + size_t ram_pages = last_ram_offset() >> TARGET_PAGE_BITS; > > + > > + return postcopy_ram_incoming_init(mis, ram_pages); > > +} > > Um.. yeah. I'm sure ram_postcopy_incoming_init versus > postcopy_ram_incoming_init won't get confusing o_O.
agreed; that's why I put the comments on. My problem here is that: 1) last_ram_offset() comes from code that's poisoned so it can't be built in a target independent file 2) I'd managed so far (with a couple of hacks) to keep postcopy_ram.c target independent. 3) ram_ is the prefix for external names in arch_init.c 4) postcopy_ram_ is the prefix for external names in postcopy_ram.c If I threw in the towel and made postcopy_ram target dependent it would remove the need for that wrapper; it might be the best bet. (Other naming suggestions also welcome) > [snip] > > +/* > > + * Setup an area of RAM so that it *can* be used for postcopy later; this > > + * must be done right at the start prior to pre-copy. > > + * opaque should be the MIS. > > + */ > > +static int init_area(const char *block_name, void *host_addr, > > + ram_addr_t offset, ram_addr_t length, void *opaque) > > +{ > > + MigrationIncomingState *mis = opaque; > > + > > + DPRINTF("init_area: %s: %p offset=%zx length=%zd(%zx)", > > + block_name, host_addr, offset, length, length); > > + /* > > + * We need the whole of RAM to be truly empty for postcopy, so things > > + * like ROMs and any data tables built during init must be zero'd > > + * - we're going to get the copy from the source anyway. > > + */ > > + if (postcopy_ram_discard_range(mis, host_addr, (host_addr + length - > > 1))) { > > + return -1; > > + } > > + > > + /* > > + * We also need the area to be normal 4k pages, not huge pages > > + * (otherwise we can't be sure we can use remap_anon_pages to put > > + * a 4k page in later). THP might come along and map a 2MB page > > + * and when it's partially accessed in precopy it might not break > > + * it down, but leave a 2MB zero'd page. > > + */ > > + if (madvise(host_addr, length, MADV_NOHUGEPAGE)) { > > + perror("init_area: NOHUGEPAGE"); > > + return -1; > > + } > > I'm assuming this is because remap_anon_pages() can't automatically > split a THP itself. It's not immediately obvious to me why it can't > though. No, I think this restriction stems from two things: 1) remap_anon_pages not allowing us to map into an area that's already got a page present - it's a good protection mechanism against us doing something stupid and receiving a page that we already received and the destination is busy accessing. 2) We wouldn't want THP to decide to convert a page that we'd only partially received into a HP because we wouldn't then receive userfault messages for it. (Although it might be best to check with Andrea). (1) might disappear with the modifications to replace remap_anon_pages that Andrea is working on. > Also.. what effect will this have on an actual hugetlbfs memory > region? If there's code to handle that case I haven't spotted it yet. I wouldn't expect this code to work with hugetlbfs mappings. > > + > > + return 0; > > +} > > + > > +/* > > + * At the end of migration, undo the effects of init_area > > + * opaque should be the MIS. > > + */ > > +static int cleanup_area(const char *block_name, void *host_addr, > > + ram_addr_t offset, ram_addr_t length, void *opaque) > > +{ > > + /* Turn off userfault here as well? */ > > This comment appears to be obsoleted by the code below. Thanks; I've squashed it. Dave -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK