* Juan Quintela (quint...@redhat.com) wrote: > "Dr. David Alan Gilbert (git)" <dgilb...@redhat.com> wrote: > > From: "Dr. David Alan Gilbert" <dgilb...@redhat.com> > > > > Where postcopy is preceeded by a period of precopy, the destination will > > have received pages that may have been dirtied on the source after the > > page was sent. The destination must throw these pages away before > > starting it's CPUs. > > > > Maintain a 'sentmap' of pages that have already been sent. > > Calculate list of sent & dirty pages > > Provide helpers on the destination side to discard these. > > > > Signed-off-by: Dr. David Alan Gilbert <dgilb...@redhat.com> > > Reviewed-by: Amit Shah <amit.s...@redhat.com> > > > Hi
(I'm going to reply to this mail in a few separate mails as I get to them) > > /* Flag set once the migration has been asked to enter postcopy */ > > bool start_postcopy; > > > This is from a previous patch, but .... > > Change the name of the variable or the comment? From the comment it > sholud be "in_postcopy", no? We have to be careful to differentiate between two separate things: 1) The user has issued 'migrate_start_postcopy' - that sets this 'start_postcopy' flag 2) The non-postcopiable data has dropped below the limit and we've now been able to take notice of 'start_postcopy' and actually enter postcopy. I think 'in_postcopy' would imply (2); while 'start_postcopy' matches the command that's been issued. > > +struct PostcopyDiscardState { > > + const char *name; > > Iht is not obvious to me what name means here. I assume ram block name, > change it to block_name, ramblock? Now ramblock_name. > > > + * returns: 0 on success. > > + */ > > +int postcopy_ram_discard_range(MigrationIncomingState *mis, uint8_t *start, > > + size_t length) > > +{ > > + trace_postcopy_ram_discard_range(start, length); > > + if (madvise(start, length, MADV_DONTNEED)) { > > + error_report("%s MADV_DONTNEED: %s", __func__, strerror(errno)); > > + return -1; > > + } > > + > > + return 0; > > +} > > + > > #else > > /* No target OS support, stubs just fail */ > > bool postcopy_ram_supported_by_host(void) > > @@ -153,5 +192,95 @@ bool postcopy_ram_supported_by_host(void) > > return false; > > } > > > > +int postcopy_ram_discard_range(MigrationIncomingState *mis, uint8_t *start, > > + size_t length) > > +{ > > + assert(0); > > I will assume that just returning -1 would work here. > > But yes, I understand that this code shouldn't be reach ... Yes, it really shouldn't happen if the previous code that says postcopy isn't supported has been obeyed; I'm happy to change it if you want. > > +} > > #endif > > > > +/* > > ------------------------------------------------------------------------- */ > > + > > +/** > > + * postcopy_discard_send_init: Called at the start of each RAMBlock before > > + * asking to discard individual ranges. > > + * > > + * @ms: The current migration state. > > + * @offset: the bitmap offset of the named RAMBlock in the migration > > + * bitmap. > > + * @name: RAMBlock that discards will operate on. > > + * > > + * returns: a new PDS. > > + */ > > +PostcopyDiscardState *postcopy_discard_send_init(MigrationState *ms, > > + unsigned long offset, > > + const char *name) > > +{ > > + PostcopyDiscardState *res = g_try_malloc(sizeof(PostcopyDiscardState)); > > Why are we using here g_try_malloc instead of g_malloc()? Even > g_malloc0()? > > Specially when we don't check if res is NULL on return. Please change. Eek yes; I've gone with malloc0. > > > > + > > + if (res) { > > + res->name = name; > > + res->cur_entry = 0; > > + res->nsentwords = 0; > > + res->nsentcmds = 0; > > With the zero variant, this three can be removed. Done. > > > + res->offset = offset; > > + } > > + > > + return res; > > +} > > > -/* Called with rcu_read_lock() to protect migration_bitmap */ > > +/* Called with rcu_read_lock() to protect migration_bitmap > > + * mr: The region to search for dirty pages in > > Haha, you forgot to update the comment when you moved the function to > use ram blocks O:-) Oops, fixed :-) (Rest of the patch another time) Dave > > @@ -662,6 +672,24 @@ static int save_zero_page(QEMUFile *f, RAMBlock > > *block, ram_addr_t offset, > > } > > > > /** > > + * ram_find_block_by_id: Find a ramblock by name. > > + * > > + * Returns: The RAMBlock with matching ID, or NULL. > > + */ > > +static RAMBlock *ram_find_block_by_id(const char *id) > > +{ > > + RAMBlock *block; > > + > > + QLIST_FOREACH_RCU(block, &ram_list.blocks, next) { > > + if (!strcmp(id, block->idstr)) { > > + return block; > > + } > > + } > > + > > + return NULL; > > +} > > We don't have this function already..... > > Once here, could we split it in its own patch and use it in ram_load? > > > QLIST_FOREACH_RCU(block, &ram_list.blocks, next) { > if (!strncmp(id, block->idstr, sizeof(id))) { > if (length != block->used_length) { > Error *local_err = NULL; > > ret = qemu_ram_resize(block->offset, length, > &local_err); > if (local_err) { > error_report_err(local_err); > } > } > ram_control_load_hook(f, RAM_CONTROL_BLOCK_REG, > block->idstr); > break; > } > } > > if (!block) { > error_report("Unknown ramblock \"%s\", cannot " > "accept migration", id); > ret = -EINVAL; > } > > > We could also use it in: > > host_from_stream_offset > > > > +/* **** functions for postcopy ***** */ > > + > > +/* > > + * Callback from postcopy_each_ram_send_discard for each RAMBlock > > + * start,end: Indexes into the bitmap for the first and last bit > > + * representing the named block > > + */ > > +static int postcopy_send_discard_bm_ram(MigrationState *ms, > > + PostcopyDiscardState *pds, > > + unsigned long start, unsigned long > > end) > > +{ > > + unsigned long current; > > + > > + for (current = start; current <= end; ) { > > + unsigned long set = find_next_bit(ms->sentmap, end + 1, current); > > + > > + if (set <= end) { > > + unsigned long zero = find_next_zero_bit(ms->sentmap, > > + end + 1, set + 1); > > + > > + if (zero > end) { > > + zero = end + 1; > > + } > > + postcopy_discard_send_range(ms, pds, set, zero - 1); > > + current = zero + 1; > > + } else { > > + current = set; > > + } > > + } > > I think I undrestood the logic here at the end.... > > But if we change the meaning of postcopy_discard_send_range() from > (begin, end), to (begin, length), I think everything goes clearer, no? > > if (set <= end) { > unsigned long zero = find_next_zero_bit(ms->sentmap, > end + 1, set + 1); > unsigned long length; > > if (zero > end) { > length = end - set; > } else { > lenght = zero - 1 - set; > current = zero + 1; > } > postcopy_discard_send_range(ms, pds, set, len); > } else { > current = set; > } > } > > Y would clame that if we call one zero, the other would be called one. > Or change to set/unset, but that is just me. Yes, I haven't tested, and > it is possible that there is a off-by-one somewhere... > > > Looking at postocpy_eand_ram_send_discard, I even think that it would be > a good idea to pass length to this function. > > > +/* > > + * Transmit the set of pages to be discarded after precopy to the target > > + * these are pages that: > > + * a) Have been previously transmitted but are now dirty again > > + * b) Pages that have never been transmitted, this ensures that > > + * any pages on the destination that have been mapped by background > > + * tasks get discarded (transparent huge pages is the specific > > concern) > > + * Hopefully this is pretty sparse > > + */ > > +int ram_postcopy_send_discard_bitmap(MigrationState *ms) > > +{ > > + int ret; > > + > > + rcu_read_lock(); > > + > > + /* This should be our last sync, the src is now paused */ > > + migration_bitmap_sync(); > > + > > + /* > > + * Update the sentmap to be sentmap = ~sentmap | dirty > > + */ > > + bitmap_complement(ms->sentmap, ms->sentmap, > > + last_ram_offset() >> TARGET_PAGE_BITS); > > + > > + bitmap_or(ms->sentmap, ms->sentmap, migration_bitmap, > > + last_ram_offset() >> TARGET_PAGE_BITS); > > This bitmaps are really big, I don't know how long would take to do this > operations here, but I think that we can avoid (at least) the > bitmap_complement. We can change the bitmap name to notsentbitmap, init > it to one and clear it each time that we sent a page, no? > > We can also do the bitmap_or() at migration_sync_bitmap() time, at that > point, we shouldn't be on the critical path? > > Later, Juan. -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK