On Wed, Jul 07, 2021 at 09:47:31PM +0200, David Hildenbrand wrote: > On 07.07.21 21:19, Michael S. Tsirkin wrote: > > On Wed, Jul 07, 2021 at 09:14:00PM +0200, David Hildenbrand wrote: > > > On 07.07.21 21:05, Michael S. Tsirkin wrote: > > > > On Wed, Jul 07, 2021 at 04:06:55PM +0200, David Hildenbrand wrote: > > > > > Postcopy never worked properly with 'free-page-hint=on', as there are > > > > > at least two issues: > > > > > > > > > > 1) With postcopy, the guest will never receive a > > > > > VIRTIO_BALLOON_CMD_ID_DONE > > > > > and consequently won't release free pages back to the OS once > > > > > migration finishes. > > > > > > > > > > The issue is that for postcopy, we won't do a final bitmap sync > > > > > while > > > > > the guest is stopped on the source and > > > > > virtio_balloon_free_page_hint_notify() will only call > > > > > virtio_balloon_free_page_done() on the source during > > > > > PRECOPY_NOTIFY_CLEANUP, after the VM state was already migrated > > > > > to > > > > > the destination. > > > > > > > > > > 2) Once the VM touches a page on the destination that has been > > > > > excluded > > > > > from migration on the source via qemu_guest_free_page_hint() > > > > > while > > > > > postcopy is active, that thread will stall until postcopy > > > > > finishes > > > > > and all threads are woken up. (with older Linux kernels that > > > > > won't > > > > > retry faults when woken up via userfaultfd, we might actually > > > > > get a > > > > > SEGFAULT) > > > > > > > > > > The issue is that the source will refuse to migrate any pages > > > > > that > > > > > are not marked as dirty in the dirty bmap -- for example, > > > > > because the > > > > > page might just have been sent. Consequently, the faulting > > > > > thread will > > > > > stall, waiting for the page to be migrated -- which could take > > > > > quite > > > > > a while and result in guest OS issues. > > > > > > > > OK so if source gets a request for a page which is not dirty > > > > it does not respond immediately? Why not just teach it to > > > > respond? It would seem that if destination wants a page we > > > > should just give it to the destination ... > > > > > > The source does not know if a page has already been sent (e.g., via the > > > background migration thread that moves all data over) vs. the page has not > > > been send because the page was hinted. This is the part where we'd need > > > additional tracking on the source to actually know that. > > > > > > We must not send a page twice, otherwise bad things can happen when > > > placing > > > pages that already have been migrated, because that scenario can easily > > > happen with ordinary postcopy (page has already been sent and we're > > > dealing > > > with a stale request from the destination). > > > > OK let me get this straight > > > > A. source sends page > > B. destination requests page > > C. destination changes page > > D. source sends page > > E. destination overwrites page > > > > this is what you are worried about right? > > IIRC E. is with recent kernels: > > E. placing the page fails with -EEXIST and postcopy migration fails > > However, the man page (man ioctl_userfaultfd) doesn't describe what is > actually supposed to happen when double-placing. Could be that it's > "undefined behavior". > > I did not try, though. > > > This is how it works today: > > A. source sends page and marks it clean > B. destination requests page > C. destination receives page and places it > D. source ignores request as page is clean
If it's actually -EEXIST then we could just resend it and teach destination to ignore -EEXIST errors right? Will actually make things a bit more robust: destination handles its own consistency instead of relying on source. > > > > the fix is to mark page clean in A. > > then in D to not send page if it's clean? > > > > And the problem with hinting is this: > > > > A. page is marked clean > > B. destination requests page > > C. destination changes page > > D. source sends page <- does not happen, page is clean! > > E. destination overwrites page > > Simplified it's > > A. page is marked clean by hinting code > B. destination requests page > D. source ignores request as page is clean > E. destination stalls until postcopy unregisters uffd > > > Some thoughts > > 1. We do have a a recv bitmap where we track received pages on the > destination (e.g., ramblock_recv_bitmap_test()), however we only use it to > avoid sending duplicate requests to the hypervisor AFAIKs, and don't check > it when placing pages. > > 2. Changing the migration behavior unconditionally on the source will break > migration to old QEMU binaries that cannot handle this change. We can always make this depend on new machine types. > 3. I think the current behavior is in place to make debugging easier. If > only a single instance of a page will ever be migrated from source to > destination, there cannot be silent data corruption. Further, we avoid > migrating unnecessarily pages twice. > Likely does not matter much for performance, it seems unlikely that the race is all that common. > Maybe Dave and Peter can spot any flaws in my understanding. > > -- > Thanks, > > David / dhildenb