> On Dec 16, 2021, at 6:17 AM, Stefan Hajnoczi <stefa...@redhat.com> wrote: > > On Wed, Dec 15, 2021 at 10:35:31AM -0500, Jagannathan Raman wrote: >> @@ -114,6 +118,62 @@ static void vfu_object_set_device(Object *obj, const >> char *str, Error **errp) >> vfu_object_init_ctx(o, errp); >> } >> >> +static void vfu_object_ctx_run(void *opaque) >> +{ >> + VfuObject *o = opaque; >> + int ret = -1; >> + >> + while (ret != 0) { >> + ret = vfu_run_ctx(o->vfu_ctx); >> + if (ret < 0) { >> + if (errno == EINTR) { >> + continue; >> + } else if (errno == ENOTCONN) { >> + qemu_set_fd_handler(o->vfu_poll_fd, NULL, NULL, NULL); >> + o->vfu_poll_fd = -1; >> + object_unparent(OBJECT(o)); >> + break; > > If nothing else logs a message then I think that should be done here so > users know why their vfio-user server object disappeared.
Sure will do. Do you prefer a trace, or a message to the console? Trace makes sense to me. Presently, the client could unplug the vfio-user device which would trigger the deletion of this object. This process could happen quietly. > >> + } else { >> + error_setg(&error_abort, "vfu: Failed to run device %s - >> %s", >> + o->device, strerror(errno)); > > error_abort is equivalent to assuming !o->daemon. In the case where the > user doesn't want to automatically shut down the process we need to log > a message without aborting. OK, makes sense. > >> + break; > > Indentation is off. > >> + } >> + } >> + } >> +} >> + >> +static void vfu_object_attach_ctx(void *opaque) >> +{ >> + VfuObject *o = opaque; >> + GPollFD pfds[1]; >> + int ret; >> + >> + qemu_set_fd_handler(o->vfu_poll_fd, NULL, NULL, NULL); >> + >> + pfds[0].fd = o->vfu_poll_fd; >> + pfds[0].events = G_IO_IN | G_IO_HUP | G_IO_ERR; >> + >> +retry_attach: >> + ret = vfu_attach_ctx(o->vfu_ctx); >> + if (ret < 0 && (errno == EAGAIN || errno == EWOULDBLOCK)) { >> + qemu_poll_ns(pfds, 1, 500 * (int64_t)SCALE_MS); >> + goto retry_attach; > > This can block the thread indefinitely. Other events like monitor > commands are not handled in this loop. Please make this asynchronous > (set an fd handler and return from this function so we can try again > later). > > The vfu_attach_ctx() implementation synchronously negotiates the > vfio-user connection :(. That's a shame because even if accept(2) is > handled asynchronously, the negotiation can still block. It would be > cleanest to have a fully async libvfio-user's vfu_attach_ctx() API to > avoid blocking. Is that possible? Thanos / John, Any thoughts on this? > > If libvfio-user can't make vfu_attach_ctx() fully async then it may be > possible to spawn a thread just for the blocking vfu_attach_ctx() call > and report the result back to the event loop (e.g. using EventNotifier). > That adds a bunch of code to work around a blocking API though, so I > guess we can leave the blocking part if necessary. > > At the very minimum, please make EAGAIN/EWOULDBLOCK async as mentioned > above and add a comment explaining the situation with the > partially-async vfu_attach_ctx() API so it's clear that this can block > (that way it's clear that you're aware of the issue and this isn't by > accident). Sure, we could make the attach async at QEMU depending on how the library prefers to do this. > >> + } else if (ret < 0) { >> + error_setg(&error_abort, >> + "vfu: Failed to attach device %s to context - %s", >> + o->device, strerror(errno)); > > error_abort assumes !o->daemon. Please handle the o->daemon == true > case by logging an error without aborting. > >> + return; >> + } >> + >> + o->vfu_poll_fd = vfu_get_poll_fd(o->vfu_ctx); >> + if (o->vfu_poll_fd < 0) { >> + error_setg(&error_abort, "vfu: Failed to get poll fd %s", >> o->device); > > Same here. > >> @@ -208,6 +284,8 @@ static void vfu_object_init(Object *obj) >> TYPE_VFU_OBJECT, TYPE_REMOTE_MACHINE); >> return; >> } >> + >> + o->vfu_poll_fd = -1; >> } > > This must call qemu_set_fd_handler(o->vfu_poll_fd, NULL, NULL, NULL) > when o->vfu_poll_fd != -1 to avoid leaving a dangling fd handler > callback registered. This is during the init phase, and the FD handlers are not set. Do you mean to add this at finalize? I agree it’s good to explicitly add this at finalize. But vfu_destroy_ctx() should trigger a ENOTCONN, which would do it anyway. Thank you! -- Jag