On Wed, Dec 15, 2021 at 10:35:31AM -0500, Jagannathan Raman wrote: > @@ -114,6 +118,62 @@ static void vfu_object_set_device(Object *obj, const > char *str, Error **errp) > vfu_object_init_ctx(o, errp); > } > > +static void vfu_object_ctx_run(void *opaque) > +{ > + VfuObject *o = opaque; > + int ret = -1; > + > + while (ret != 0) { > + ret = vfu_run_ctx(o->vfu_ctx); > + if (ret < 0) { > + if (errno == EINTR) { > + continue; > + } else if (errno == ENOTCONN) { > + qemu_set_fd_handler(o->vfu_poll_fd, NULL, NULL, NULL); > + o->vfu_poll_fd = -1; > + object_unparent(OBJECT(o)); > + break;
If nothing else logs a message then I think that should be done here so users know why their vfio-user server object disappeared. > + } else { > + error_setg(&error_abort, "vfu: Failed to run device %s - %s", > + o->device, strerror(errno)); error_abort is equivalent to assuming !o->daemon. In the case where the user doesn't want to automatically shut down the process we need to log a message without aborting. > + break; Indentation is off. > + } > + } > + } > +} > + > +static void vfu_object_attach_ctx(void *opaque) > +{ > + VfuObject *o = opaque; > + GPollFD pfds[1]; > + int ret; > + > + qemu_set_fd_handler(o->vfu_poll_fd, NULL, NULL, NULL); > + > + pfds[0].fd = o->vfu_poll_fd; > + pfds[0].events = G_IO_IN | G_IO_HUP | G_IO_ERR; > + > +retry_attach: > + ret = vfu_attach_ctx(o->vfu_ctx); > + if (ret < 0 && (errno == EAGAIN || errno == EWOULDBLOCK)) { > + qemu_poll_ns(pfds, 1, 500 * (int64_t)SCALE_MS); > + goto retry_attach; This can block the thread indefinitely. Other events like monitor commands are not handled in this loop. Please make this asynchronous (set an fd handler and return from this function so we can try again later). The vfu_attach_ctx() implementation synchronously negotiates the vfio-user connection :(. That's a shame because even if accept(2) is handled asynchronously, the negotiation can still block. It would be cleanest to have a fully async libvfio-user's vfu_attach_ctx() API to avoid blocking. Is that possible? If libvfio-user can't make vfu_attach_ctx() fully async then it may be possible to spawn a thread just for the blocking vfu_attach_ctx() call and report the result back to the event loop (e.g. using EventNotifier). That adds a bunch of code to work around a blocking API though, so I guess we can leave the blocking part if necessary. At the very minimum, please make EAGAIN/EWOULDBLOCK async as mentioned above and add a comment explaining the situation with the partially-async vfu_attach_ctx() API so it's clear that this can block (that way it's clear that you're aware of the issue and this isn't by accident). > + } else if (ret < 0) { > + error_setg(&error_abort, > + "vfu: Failed to attach device %s to context - %s", > + o->device, strerror(errno)); error_abort assumes !o->daemon. Please handle the o->daemon == true case by logging an error without aborting. > + return; > + } > + > + o->vfu_poll_fd = vfu_get_poll_fd(o->vfu_ctx); > + if (o->vfu_poll_fd < 0) { > + error_setg(&error_abort, "vfu: Failed to get poll fd %s", o->device); Same here. > @@ -208,6 +284,8 @@ static void vfu_object_init(Object *obj) > TYPE_VFU_OBJECT, TYPE_REMOTE_MACHINE); > return; > } > + > + o->vfu_poll_fd = -1; > } This must call qemu_set_fd_handler(o->vfu_poll_fd, NULL, NULL, NULL) when o->vfu_poll_fd != -1 to avoid leaving a dangling fd handler callback registered.
signature.asc
Description: PGP signature