On Thu, Jul 29, 2021 at 06:19:31PM +0200, David Hildenbrand wrote: > On 29.07.21 18:12, Peter Xu wrote: > > On Thu, Jul 29, 2021 at 10:14:47AM +0200, David Hildenbrand wrote: > > > > > > > > The thing is I still think this extra operation during sync() > > > > > > > > can be ignored by > > > > > > > > simply clear dirty log during bitmap init, then.. why not? :) > > > > > > > > > > > > > > I guess clearing the dirty log (especially in KVM) might be more > > > > > > > expensive. > > > > > > > > > > > > If we send one ioctl per cb that'll be expensive for sure. I think > > > > > > it'll be > > > > > > fine if we send one clear ioctl to kvm, summarizing the whole > > > > > > bitmap to clear. > > > > > > > > > > > > The other thing is imho having overhead during bitmap init is > > > > > > always better > > > > > > than having that during sync(). :) > > > > > > > > > > Oh, right, so you're saying, after we set the dirty bmap to all ones > > > > > and > > > > > excluded the discarded parts, setting the respective bits to 0, we > > > > > simply > > > > > issue clearing of the whole area? > > > > > > > > > > For now I assumed we would have to clear per cb. > > > > > > > > Hmm when I replied I thought we can pass in a bitmap to ->log_clear() > > > > but I > > > > just remembered memory API actually hides the bitmap interface.. > > > > > > > > Reset the whole region works, but it'll slow down migration starts, more > > > > importantly that'll be with mmu write lock so we will lose most > > > > clear-log > > > > benefit for the initial round of migration and stuck the guest #pf at > > > > the > > > > meantime... > > > > > > > > Let's try do that in cb()s as you mentioned; I think that'll still be > > > > okay, > > > > because so far the clear log block size is much larger (1gb), 1tb is > > > > worst case > > > > 1000 ioctls during bitmap init, slightly better than 250k calls during > > > > sync(), > > > > maybe? :) > > > > > > Just to get it right, what you propose is calling > > > migration_clear_memory_region_dirty_bitmap_range() from each cb(). > > > > Right. We can provide a more complicated memory api for passing in bitmap > > but > > I think that can be an overkill and tricky. > > > > > Due to the clear_bmap, we will end up clearing each chunk (e.g., 1GB) at > > > most > > > once. > > > > > > But if our layout is fragmented, we can actually end up clearing all > > > chunks > > > (1024 ioctls for 1TB), resulting in a slower migration start. > > > > > > Any gut feeling how much slower migration start could be with largish > > > (e.g., > > > 1 TiB) regions? > > > > I had a vague memory of KVM_GET_DIRTY_LOG that I used to measure which took > > ~10ms for 1g guest mem, supposing that's mostly used to protect the pages or > > clearing dirties in the EPT pgtables. Then the worst case is ~1 second for > > 1tb. > > > > But note that it's still during setup phase, so we should expect to see a > > somehow large setup time and longer period that migration stays in SETUP > > state, > > but I think it's fine. Reasons: > > > > - We don't care too much about guest dirtying pages during the setup > > process > > because we haven't migrated anything yet, meanwhile we should not > > block any > > other thread either (e.g., we don't hold BQL). > > > > - We don't block guest execution too. Unlike KVM_GET_DIRTY_LOG without > > CLEAR > > we won't hold the mmu lock for a huge long time but do it only in 1g > > chunk, > > so guest page faults can still be serviced. It'll be affected somehow > > since we'll still run with the mmu write lock critical sections for > > each > > single ioctl(), but we do that for 1gb each time so we frequently > > yield it. > > > > Please note that we are holding the iothread lock while setting up the > bitmaps + syncing the dirty log. I'll have to make sure that that code runs > outside of the BQL, otherwise we'll block guest execution.
Oh right. > > In the meantime I adjusted the code but it does the clearing under the > iothread lock, which should not be what we want ... I'll have a look. Thanks; if it takes more changes than expected we can still start from simple, IMHO, by taking bql and timely yield it. At the meantime, I found two things in ram_init_bitmaps() that I'm not sure we need them of not: 1. Do we need WITH_RCU_READ_LOCK_GUARD() if with both bql and ramlist lock? (small question) 2. Do we need migration_bitmap_sync_precopy() even if dirty bmap is all 1's? (bigger question) I feel like those can be dropped. Dave/Juan? -- Peter Xu