* David Gibson (da...@gibson.dropbear.id.au) wrote: > On Fri, Oct 03, 2014 at 06:47:50PM +0100, Dr. David Alan Gilbert (git) wrote: > > From: "Dr. David Alan Gilbert" <dgilb...@redhat.com> > > > > userfaultfd is a Linux syscall that gives an fd that receives a stream > > of notifications of accesses to pages marked as MADV_USERFAULT, and > > allows the program to acknowledge those stalls and tell the accessing > > thread to carry on. > > > > Signed-off-by: Dr. David Alan Gilbert <dgilb...@redhat.com> > > [snip] > > /* > > + * Tell the kernel that we've now got some memory it previously asked for. > > + * Note: We're not allowed to ack a page which wasn't requested. > > + */ > > +static int ack_userfault(MigrationIncomingState *mis, void *start, size_t > > len) > > +{ > > + uint64_t tmp[2]; > > + > > + /* > > + * Kernel wants the range that's now safe to access > > + * Note it always takes 64bit values, even on a 32bit host. > > + */ > > + tmp[0] = (uint64_t)(uintptr_t)start; > > + tmp[1] = (uint64_t)(uintptr_t)start + (uint64_t)len; > > + > > + if (write(mis->userfault_fd, tmp, 16) != 16) { > > + int e = errno; > > Is an EOF (i.e. write() returns 0) ever possible here? If so errno > may not have a meaningful value.
I don't think so; I think any !=16 case is an error; however if I understand correctly the safe thing to do is for me to do an errno = 0 before the call. > > > + if (e == ENOENT) { > > + /* Kernel said it wasn't waiting - one case where this can > > + * happen is where two threads triggered the userfault > > + * and we receive the page and ack it just after we received > > + * the 2nd request and that ends up deciding it should ack it > > + * We could optimise it out, but it's rare. > > + */ > > + /*fprintf(stderr, "ack_userfault: %p/%zx ENOENT\n", start, > > len); */ > > + return 0; > > + } > > + error_report("postcopy_ram: Failed to notify kernel for %p/%zx > > (%d)", > > + start, len, e); > > + return -errno; Hmm, and made that return -e > > +/* > > * Handle faults detected by the USERFAULT markings > > */ > > static void *postcopy_ram_fault_thread(void *opaque) > > { > > MigrationIncomingState *mis = (MigrationIncomingState *)opaque; > > + void *hostaddr; > > + int ret; > > + size_t hostpagesize = getpagesize(); > > + RAMBlock *rb = NULL; > > + RAMBlock *last_rb = NULL; /* last RAMBlock we sent part of */ > > > > - fprintf(stderr, "postcopy_ram_fault_thread\n"); > > - /* TODO: In later patch */ > > + DPRINTF("%s", __func__); > > qemu_sem_post(&mis->fault_thread_sem); > > - while (1) { > > - /* TODO: In later patch */ > > - } > > + while (true) { > > + PostcopyPMIState old_state, tmp_state; > > + ram_addr_t rb_offset; > > + ram_addr_t in_raspace; > > + unsigned long bitmap_index; > > + struct pollfd pfd[2]; > > + > > + /* > > + * We're mainly waiting for the kernel to give us a faulting HVA, > > + * however we can be told to quit via userfault_quit_fd which is > > + * an eventfd > > + */ > > + pfd[0].fd = mis->userfault_fd; > > + pfd[0].events = POLLIN; > > + pfd[0].revents = 0; > > + pfd[1].fd = mis->userfault_quit_fd; > > + pfd[1].events = POLLIN; /* Waiting for eventfd to go positive */ > > + pfd[1].revents = 0; > > + > > + if (poll(pfd, 2, -1 /* Wait forever */) == -1) { > > + perror("userfault poll"); > > + break; > > + } > > > > + if (pfd[1].revents) { > > + DPRINTF("%s got quit event", __func__); > > + break; > > I don't see any cleanup path in the userfault thread. So wouldn't it > be simpler to just pthread_cancel() it rather than using an extra fd > for quit notifications. But it does call functions that take locks (both the pmi and the return path qemu-file), so I don't feel comfortable just cancelling the thread. I guess I could do a pthread_set_cancelstate around the top of the loop to only allow it to cancel there; is that any better than the fd? > > @@ -612,11 +814,12 @@ int postcopy_place_page(MigrationIncomingState *mis, > > void *host, void *from, > > > > if (syscall(__NR_remap_anon_pages, host, from, hps, 0) != > > getpagesize()) { > > + int e = errno; > > perror("remap_anon_pages in postcopy_place_page"); > > fprintf(stderr, "host: %p from: %p pmi=%d\n", host, from, > > postcopy_pmi_get_state(mis, bitmap_offset)); > > > > - return -errno; > > + return -e; > > Unrelated change, should probably be folded into the patch which added > this code. Thanks, fixed. Dave > > > } > > > > tmp_state = postcopy_pmi_get_state(mis, bitmap_offset); > > @@ -629,7 +832,10 @@ int postcopy_place_page(MigrationIncomingState *mis, > > void *host, void *from, > > > > > > if (old_state == POSTCOPY_PMI_REQUESTED) { > > - /* TODO: Notify kernel */ > > + /* Send the kernel the host address that should now be accessible > > */ > > + DPRINTF("%s: Notifying kernel bitmap_offset=0x%lx host=%p", > > + __func__, bitmap_offset, host); > > + return ack_userfault(mis, host, hps); > > } > > > > return 0; > > -- > David Gibson | I'll have my music baroque, and my code > david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ > _other_ > | _way_ _around_! > http://www.ozlabs.org/~dgibson -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK