On Wed, Mar 14, 2018 at 07:42:59PM +0000, Dr. David Alan Gilbert wrote: > * Michael S. Tsirkin (m...@redhat.com) wrote: > > On Wed, Mar 14, 2018 at 06:11:37PM +0000, Dr. David Alan Gilbert wrote: > > > > + used_len = block->used_length - offset; > > > > + addr += used_len; > > > > + } > > > > + > > > > + start = offset >> TARGET_PAGE_BITS; > > > > + npages = used_len >> TARGET_PAGE_BITS; > > > > + ram_state->migration_dirty_pages -= > > > > + bitmap_count_one_with_offset(block->bmap, start, > > > > npages); > > > > + bitmap_clear(block->bmap, start, npages); > > > > > > If this is happening while the migration is running, this isn't safe - > > > the migration code could clear a bit at about the same point this > > > happens, so that the count returned by bitmap_count_one_with_offset > > > wouldn't match the word that was cleared by bitmap_clear. > > > > > > The only way I can see to fix it is to run over the range using > > > bitmap_test_and_clear_atomic, using the return value to decrement > > > the number of dirty pages. > > > But you also need to be careful with the update of the > > > migration_dirty_pages value itself, because that's also being read > > > by the migration thread. > > > > > > Dave > > > > I see that there's migration_bitmap_sync but it does not seem to be > > Do you mean bitmap_mutex?
Yes. Sorry. > > taken on all paths. E.g. migration_bitmap_clear_dirty and > > migration_bitmap_find_dirty are called without that lock sometimes. > > Thoughts? > > Hmm, that doesn't seem to protect much at all! It looks like it was > originally added to handle hotplug causing the bitmaps to be resized; > that extension code was removed in 66103a5 so that lock can probably go. > > I don't see how the lock would help us though; the migration thread is > scanning it most of the time so would have to have the lock held > most of the time. > > Dave > > > -- > > MST > -- > Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK