* Michael S. Tsirkin (m...@redhat.com) wrote: > On Wed, Mar 14, 2018 at 06:11:37PM +0000, Dr. David Alan Gilbert wrote: > > > + used_len = block->used_length - offset; > > > + addr += used_len; > > > + } > > > + > > > + start = offset >> TARGET_PAGE_BITS; > > > + npages = used_len >> TARGET_PAGE_BITS; > > > + ram_state->migration_dirty_pages -= > > > + bitmap_count_one_with_offset(block->bmap, start, > > > npages); > > > + bitmap_clear(block->bmap, start, npages); > > > > If this is happening while the migration is running, this isn't safe - > > the migration code could clear a bit at about the same point this > > happens, so that the count returned by bitmap_count_one_with_offset > > wouldn't match the word that was cleared by bitmap_clear. > > > > The only way I can see to fix it is to run over the range using > > bitmap_test_and_clear_atomic, using the return value to decrement > > the number of dirty pages. > > But you also need to be careful with the update of the > > migration_dirty_pages value itself, because that's also being read > > by the migration thread. > > > > Dave > > I see that there's migration_bitmap_sync but it does not seem to be
Do you mean bitmap_mutex? > taken on all paths. E.g. migration_bitmap_clear_dirty and > migration_bitmap_find_dirty are called without that lock sometimes. > Thoughts? Hmm, that doesn't seem to protect much at all! It looks like it was originally added to handle hotplug causing the bitmaps to be resized; that extension code was removed in 66103a5 so that lock can probably go. I don't see how the lock would help us though; the migration thread is scanning it most of the time so would have to have the lock held most of the time. Dave > -- > MST -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK