> -----Original Message----- > From: Jag Raman <jag.ra...@oracle.com> > Sent: 17 December 2021 18:00 > To: Stefan Hajnoczi <stefa...@redhat.com>; John Levon > <john.le...@nutanix.com>; Thanos Makatos <thanos.maka...@nutanix.com> > Cc: qemu-devel <qemu-devel@nongnu.org>; Alex Williamson > <alex.william...@redhat.com>; Marc-André Lureau > <marcandre.lur...@gmail.com>; Philippe Mathieu-Daudé > <phi...@redhat.com>; pbonz...@redhat.com; alex.ben...@linaro.org; > th...@redhat.com; cr...@redhat.com; waine...@redhat.com; > bl...@redhat.com; Elena Ufimtseva <elena.ufimts...@oracle.com>; John > Levon <john.le...@nutanix.com>; John Johnson > <john.g.john...@oracle.com>; Thanos Makatos > <thanos.maka...@nutanix.com>; Swapnil Ingle <swapnil.in...@nutanix.com> > Subject: Re: [PATCH v4 07/14] vfio-user: run vfio-user context > > > > > On Dec 16, 2021, at 6:17 AM, Stefan Hajnoczi <stefa...@redhat.com> wrote: > > > > On Wed, Dec 15, 2021 at 10:35:31AM -0500, Jagannathan Raman wrote: > >> @@ -114,6 +118,62 @@ static void vfu_object_set_device(Object *obj, > const char *str, Error **errp) > >> vfu_object_init_ctx(o, errp); > >> } > >> > >> +static void vfu_object_ctx_run(void *opaque) > >> +{ > >> + VfuObject *o = opaque; > >> + int ret = -1; > >> + > >> + while (ret != 0) { > >> + ret = vfu_run_ctx(o->vfu_ctx); > >> + if (ret < 0) { > >> + if (errno == EINTR) { > >> + continue; > >> + } else if (errno == ENOTCONN) { > >> + qemu_set_fd_handler(o->vfu_poll_fd, NULL, NULL, NULL); > >> + o->vfu_poll_fd = -1; > >> + object_unparent(OBJECT(o)); > >> + break; > > > > If nothing else logs a message then I think that should be done here so > > users know why their vfio-user server object disappeared. > > Sure will do. > > Do you prefer a trace, or a message to the console? Trace makes sense to me. > Presently, the client could unplug the vfio-user device which would trigger > the > deletion of this object. This process could happen quietly. > > > > >> + } else { > >> + error_setg(&error_abort, "vfu: Failed to run device %s - > >> %s", > >> + o->device, strerror(errno)); > > > > error_abort is equivalent to assuming !o->daemon. In the case where the > > user doesn't want to automatically shut down the process we need to log > > a message without aborting. > > OK, makes sense. > > > > >> + break; > > > > Indentation is off. > > > >> + } > >> + } > >> + } > >> +} > >> + > >> +static void vfu_object_attach_ctx(void *opaque) > >> +{ > >> + VfuObject *o = opaque; > >> + GPollFD pfds[1]; > >> + int ret; > >> + > >> + qemu_set_fd_handler(o->vfu_poll_fd, NULL, NULL, NULL); > >> + > >> + pfds[0].fd = o->vfu_poll_fd; > >> + pfds[0].events = G_IO_IN | G_IO_HUP | G_IO_ERR; > >> + > >> +retry_attach: > >> + ret = vfu_attach_ctx(o->vfu_ctx); > >> + if (ret < 0 && (errno == EAGAIN || errno == EWOULDBLOCK)) { > >> + qemu_poll_ns(pfds, 1, 500 * (int64_t)SCALE_MS); > >> + goto retry_attach; > > > > This can block the thread indefinitely. Other events like monitor > > commands are not handled in this loop. Please make this asynchronous > > (set an fd handler and return from this function so we can try again > > later). > > > > The vfu_attach_ctx() implementation synchronously negotiates the > > vfio-user connection :(. That's a shame because even if accept(2) is > > handled asynchronously, the negotiation can still block. It would be > > cleanest to have a fully async libvfio-user's vfu_attach_ctx() API to > > avoid blocking. Is that possible? > > Thanos / John, > > Any thoughts on this?
I'm discussing this with John and FYI there are other places where libvfio-user can block, e.g. sending a response or receiving a command. Is it just the negotiation you want it to be asynchronous or _all_ libvfio-user operations? Making libvfio-user fully asynchronous might require a substantial API rewrite. > > > > > If libvfio-user can't make vfu_attach_ctx() fully async then it may be > > possible to spawn a thread just for the blocking vfu_attach_ctx() call > > and report the result back to the event loop (e.g. using EventNotifier). > > That adds a bunch of code to work around a blocking API though, so I > > guess we can leave the blocking part if necessary. > > > > At the very minimum, please make EAGAIN/EWOULDBLOCK async as > mentioned > > above and add a comment explaining the situation with the > > partially-async vfu_attach_ctx() API so it's clear that this can block > > (that way it's clear that you're aware of the issue and this isn't by > > accident). > > Sure, we could make the attach async at QEMU depending on how the > library prefers to do this. > > > > >> + } else if (ret < 0) { > >> + error_setg(&error_abort, > >> + "vfu: Failed to attach device %s to context - %s", > >> + o->device, strerror(errno)); > > > > error_abort assumes !o->daemon. Please handle the o->daemon == true > > case by logging an error without aborting. > > > >> + return; > >> + } > >> + > >> + o->vfu_poll_fd = vfu_get_poll_fd(o->vfu_ctx); > >> + if (o->vfu_poll_fd < 0) { > >> + error_setg(&error_abort, "vfu: Failed to get poll fd %s", > >> o->device); > > > > Same here. > > > >> @@ -208,6 +284,8 @@ static void vfu_object_init(Object *obj) > >> TYPE_VFU_OBJECT, TYPE_REMOTE_MACHINE); > >> return; > >> } > >> + > >> + o->vfu_poll_fd = -1; > >> } > > > > This must call qemu_set_fd_handler(o->vfu_poll_fd, NULL, NULL, NULL) > > when o->vfu_poll_fd != -1 to avoid leaving a dangling fd handler > > callback registered. > > This is during the init phase, and the FD handlers are not set. Do you mean > to add this at finalize? > > I agree it’s good to explicitly add this at finalize. But vfu_destroy_ctx() > should > trigger a ENOTCONN, which would do it anyway. > > Thank you! > -- > Jag