On Wed, Jul 07, 2021 at 02:22:32PM -0700, Alexander Duyck wrote: > On Wed, Jul 7, 2021 at 1:08 PM Peter Xu <pet...@redhat.com> wrote: > > > > On Wed, Jul 07, 2021 at 08:57:29PM +0200, David Hildenbrand wrote: > > > On 07.07.21 20:02, Peter Xu wrote: > > > > On Wed, Jul 07, 2021 at 04:06:55PM +0200, David Hildenbrand wrote: > > > > > As it never worked properly, let's disable it via the postcopy > > > > > notifier on > > > > > the destination. Trying to set "migrate_set_capability postcopy-ram > > > > > on" > > > > > on the destination now results in "virtio-balloon: 'free-page-hint' > > > > > does > > > > > not support postcopy Error: Postcopy is not supported". > > > > > > > > Would it be possible to do this in reversed order? Say, dynamically > > > > disable > > > > free-page-hinting if postcopy capability is set when migration starts? > > > > Perhaps > > > > it can also be re-enabled automatically when migration completes? > > > > > > I remember that this might be quite racy. We would have to make sure that > > > no > > > hinting happens before we enable the capability. > > > > > > As soon as we messed with the dirty bitmap (during precopy), postcopy is > > > no > > > longer safe. As noted in the patch, the only runtime alternative is to > > > disable postcopy as soon as we actually do clear a bit. Alternatively, we > > > could ignore any hints if the postcopy capability was enabled. > > > > Logically migration capabilities are applied at VM starts, and these > > capabilities should be constant during migration (I didn't check if there's > > a > > hard requirement; easy to add that if we want to assure it), and in most > > cases > > for the lifecycle of the vm. > > Would it make sense to maybe just look at adding a postcopy value to > the PrecopyNotifyData that you could populate with > migration_in_postcopy() in precopy_notify()?
Should we check migrate_postcopy_ram() rather than migration_in_postcopy()? It's the precopy phase that's dropping the dirty bits and can potentially hang a postcopy vcpu, afaiu. > > Then all you would need to do is check for that value and if it is set > you shut down the page hinting or don't start it since I suspect it > wouldn't likely add any value anyway since I would think flagging > unused pages doesn't add much value in a postcopy environment anyway. > > > > > > > Whatever we do, we have to make sure that a user cannot trick the system > > > into an inconsistent state. Like enabling hinting, starting migration, > > > then > > > enabling the postcopy capability and kicking of postcopy. I did not check > > > if > > > we allow for that, though. > > > > We could turn free page hinting off when migration starts with > > postcopy-ram=on, > > then re-enable it after migration finishes. That looks very safe to me. > > And I > > don't even worry on user trying to mess it up - as that only put their own > > VM > > at risk; that's mostly fine to me. > > We wouldn't necessarily even need to really turn it off, just don't > start it. I wonder if we couldn't just get away with adding a check to > the existing virtio_balloon_free_page_hint_notify to see if we are in > the postcopy state there and just shut things down or not start them. This makes me wonder whether qemu_guest_free_page_hint() should be called at all on destination host when incoming postcopy migration is in progress. Right now the check migration_is_setup_or_active() should return true on destination host, however I am not sure if that's necessary as we don't track dirty at all there. -- Peter Xu