On Mon, Jan 05, 2015 at 05:13:50PM +0000, Dr. David Alan Gilbert wrote: > * David Gibson (da...@gibson.dropbear.id.au) wrote: > > On Fri, Oct 03, 2014 at 06:47:50PM +0100, Dr. David Alan Gilbert (git) > > wrote: > > > From: "Dr. David Alan Gilbert" <dgilb...@redhat.com> > > > > > > userfaultfd is a Linux syscall that gives an fd that receives a stream > > > of notifications of accesses to pages marked as MADV_USERFAULT, and > > > allows the program to acknowledge those stalls and tell the accessing > > > thread to carry on. > > > > > > Signed-off-by: Dr. David Alan Gilbert <dgilb...@redhat.com> > > > > [snip] > > > /* > > > + * Tell the kernel that we've now got some memory it previously asked > > > for. > > > + * Note: We're not allowed to ack a page which wasn't requested. > > > + */ > > > +static int ack_userfault(MigrationIncomingState *mis, void *start, > > > size_t len) > > > +{ > > > + uint64_t tmp[2]; > > > + > > > + /* > > > + * Kernel wants the range that's now safe to access > > > + * Note it always takes 64bit values, even on a 32bit host. > > > + */ > > > + tmp[0] = (uint64_t)(uintptr_t)start; > > > + tmp[1] = (uint64_t)(uintptr_t)start + (uint64_t)len; > > > + > > > + if (write(mis->userfault_fd, tmp, 16) != 16) { > > > + int e = errno; > > > > Is an EOF (i.e. write() returns 0) ever possible here? If so errno > > may not have a meaningful value. > > I don't think so; I think any !=16 case is an error; however if I understand > correctly the safe thing to do is for me to do an > > errno = 0 > > before the call.
Either that, or handle unexpected EOF / short write as a different case. > > > > > > + if (e == ENOENT) { > > > + /* Kernel said it wasn't waiting - one case where this can > > > + * happen is where two threads triggered the userfault > > > + * and we receive the page and ack it just after we received > > > + * the 2nd request and that ends up deciding it should ack it > > > + * We could optimise it out, but it's rare. > > > + */ > > > + /*fprintf(stderr, "ack_userfault: %p/%zx ENOENT\n", start, > > > len); */ > > > + return 0; > > > + } > > > + error_report("postcopy_ram: Failed to notify kernel for %p/%zx > > > (%d)", > > > + start, len, e); > > > + return -errno; > > Hmm, and made that return -e Ah, yes, otherwise it's very likely that error_report() will clobber the value. > > > +/* > > > * Handle faults detected by the USERFAULT markings > > > */ > > > static void *postcopy_ram_fault_thread(void *opaque) > > > { > > > MigrationIncomingState *mis = (MigrationIncomingState *)opaque; > > > + void *hostaddr; > > > + int ret; > > > + size_t hostpagesize = getpagesize(); > > > + RAMBlock *rb = NULL; > > > + RAMBlock *last_rb = NULL; /* last RAMBlock we sent part of */ > > > > > > - fprintf(stderr, "postcopy_ram_fault_thread\n"); > > > - /* TODO: In later patch */ > > > + DPRINTF("%s", __func__); > > > qemu_sem_post(&mis->fault_thread_sem); > > > - while (1) { > > > - /* TODO: In later patch */ > > > - } > > > + while (true) { > > > + PostcopyPMIState old_state, tmp_state; > > > + ram_addr_t rb_offset; > > > + ram_addr_t in_raspace; > > > + unsigned long bitmap_index; > > > + struct pollfd pfd[2]; > > > + > > > + /* > > > + * We're mainly waiting for the kernel to give us a faulting HVA, > > > + * however we can be told to quit via userfault_quit_fd which is > > > + * an eventfd > > > + */ > > > + pfd[0].fd = mis->userfault_fd; > > > + pfd[0].events = POLLIN; > > > + pfd[0].revents = 0; > > > + pfd[1].fd = mis->userfault_quit_fd; > > > + pfd[1].events = POLLIN; /* Waiting for eventfd to go positive */ > > > + pfd[1].revents = 0; > > > + > > > + if (poll(pfd, 2, -1 /* Wait forever */) == -1) { > > > + perror("userfault poll"); > > > + break; > > > + } > > > > > > + if (pfd[1].revents) { > > > + DPRINTF("%s got quit event", __func__); > > > + break; > > > > I don't see any cleanup path in the userfault thread. So wouldn't it > > be simpler to just pthread_cancel() it rather than using an extra fd > > for quit notifications. > > But it does call functions that take locks (both the pmi and the > return path qemu-file), so I don't feel comfortable just cancelling the > thread. Ah, good point. Use of an event restrict the points at which the thread can exit, which is significant. -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson
pgpv8PJ5yxiwk.pgp
Description: PGP signature