On Fri, Nov 21, 2014 at 11:48:03AM +0800, zhanghailiang wrote: > Hi David, > > When i migrated VM in postcopy way when configuring VM with '-realtime > mlock=on' option, > It failed, and reports "postcopy_ram_hosttest: remap_anon_pages not > available: File exists" in destination, > > Is it a bug of userfaultfd API?
It's not userfaultfd related, but it's remap_anon_pages related (in the future mcopy_atomic or equivalent userfaultfd cmd) and MADV_DONTNEED related. If the destination qemu starts with mlockall(current|future), -EEXIST saves the day by noticing all not yet transferred pages were already present in the destination (as allocated zero pages). We can't trigger non-present faults (in userfaultfd) if the dst starts with mlockall. Furthermore if precopy has been run before postcopy (currently it's always the case as there's no way to specify the number of precopy passes to run before starting postcopy... in turn allowing to specify zero passes) the bitmap with the re-dirtied pages must be transferred to the destination before postcopy can start, and MADV_DONTNEED has to be used to zap those re-dirtied pages. But MADV_DONTNEED will fail with -EINVAL too well before postcopy starts if mlockall is set on the destination qemu. If you didn't fail at -EINVAL in the destination MADV_DONTNEED probably there wasn't any redirtied page. remap_anon_pages is extremely strict (unlike vma-mangling mremap that would just zap the dst range vma silently if it existed) so it cannot overwrite the guest memory and you get EEXIST (the strictness was intentional to eliminate the risk of any memory corruption if userland hits a bug like in this case). But it should have failed before with MADV_DONTNEED returning -EINVAL if there was any re-redirted page between the last precopy pass and postcopy (I assume the guest was idle?). In short I think to fix this qemu should call mlockall in the destination only after postcopy is complete. There's no way to lock the memory in the destination if the memory still resides in the source so some userfault may have to happen (and if userfaults happen, it means we're ot mlocked yet).