> -----Original Message-----
> From: Jag Raman <jag.ra...@oracle.com>
> Sent: 17 December 2021 18:00
> To: Stefan Hajnoczi <stefa...@redhat.com>; John Levon
> <john.le...@nutanix.com>; Thanos Makatos <thanos.maka...@nutanix.com>
> Cc: qemu-devel <qemu-devel@nongnu.org>; Alex Williamson
> <alex.william...@redhat.com>; Marc-André Lureau
> <marcandre.lur...@gmail.com>; Philippe Mathieu-Daudé
> <phi...@redhat.com>; pbonz...@redhat.com; alex.ben...@linaro.org;
> th...@redhat.com; cr...@redhat.com; waine...@redhat.com;
> bl...@redhat.com; Elena Ufimtseva <elena.ufimts...@oracle.com>; John
> Levon <john.le...@nutanix.com>; John Johnson
> <john.g.john...@oracle.com>; Thanos Makatos
> <thanos.maka...@nutanix.com>; Swapnil Ingle <swapnil.in...@nutanix.com>
> Subject: Re: [PATCH v4 07/14] vfio-user: run vfio-user context
> 
> 
> 
> > On Dec 16, 2021, at 6:17 AM, Stefan Hajnoczi <stefa...@redhat.com> wrote:
> >
> > On Wed, Dec 15, 2021 at 10:35:31AM -0500, Jagannathan Raman wrote:
> >> @@ -114,6 +118,62 @@ static void vfu_object_set_device(Object *obj,
> const char *str, Error **errp)
> >>     vfu_object_init_ctx(o, errp);
> >> }
> >>
> >> +static void vfu_object_ctx_run(void *opaque)
> >> +{
> >> +    VfuObject *o = opaque;
> >> +    int ret = -1;
> >> +
> >> +    while (ret != 0) {
> >> +        ret = vfu_run_ctx(o->vfu_ctx);
> >> +        if (ret < 0) {
> >> +            if (errno == EINTR) {
> >> +                continue;
> >> +            } else if (errno == ENOTCONN) {
> >> +                qemu_set_fd_handler(o->vfu_poll_fd, NULL, NULL, NULL);
> >> +                o->vfu_poll_fd = -1;
> >> +                object_unparent(OBJECT(o));
> >> +                break;
> >
> > If nothing else logs a message then I think that should be done here so
> > users know why their vfio-user server object disappeared.
> 
> Sure will do.
> 
> Do you prefer a trace, or a message to the console? Trace makes sense to me.
> Presently, the client could unplug the vfio-user device which would trigger 
> the
> deletion of this object. This process could happen quietly.
> 
> >
> >> +            } else {
> >> +                error_setg(&error_abort, "vfu: Failed to run device %s - 
> >> %s",
> >> +                           o->device, strerror(errno));
> >
> > error_abort is equivalent to assuming !o->daemon. In the case where the
> > user doesn't want to automatically shut down the process we need to log
> > a message without aborting.
> 
> OK, makes sense.
> 
> >
> >> +                 break;
> >
> > Indentation is off.
> >
> >> +            }
> >> +        }
> >> +    }
> >> +}
> >> +
> >> +static void vfu_object_attach_ctx(void *opaque)
> >> +{
> >> +    VfuObject *o = opaque;
> >> +    GPollFD pfds[1];
> >> +    int ret;
> >> +
> >> +    qemu_set_fd_handler(o->vfu_poll_fd, NULL, NULL, NULL);
> >> +
> >> +    pfds[0].fd = o->vfu_poll_fd;
> >> +    pfds[0].events = G_IO_IN | G_IO_HUP | G_IO_ERR;
> >> +
> >> +retry_attach:
> >> +    ret = vfu_attach_ctx(o->vfu_ctx);
> >> +    if (ret < 0 && (errno == EAGAIN || errno == EWOULDBLOCK)) {
> >> +        qemu_poll_ns(pfds, 1, 500 * (int64_t)SCALE_MS);
> >> +        goto retry_attach;
> >
> > This can block the thread indefinitely. Other events like monitor
> > commands are not handled in this loop. Please make this asynchronous
> > (set an fd handler and return from this function so we can try again
> > later).
> >
> > The vfu_attach_ctx() implementation synchronously negotiates the
> > vfio-user connection :(. That's a shame because even if accept(2) is
> > handled asynchronously, the negotiation can still block. It would be
> > cleanest to have a fully async libvfio-user's vfu_attach_ctx() API to
> > avoid blocking. Is that possible?
> 
> Thanos / John,
> 
>     Any thoughts on this?

I'm discussing this with John and FYI there are other places where libvfio-user 
can block, e.g. sending a response or receiving a command. Is it just the 
negotiation you want it to be asynchronous or _all_ libvfio-user operations? 
Making libvfio-user fully asynchronous might require a substantial API rewrite.

> 
> >
> > If libvfio-user can't make vfu_attach_ctx() fully async then it may be
> > possible to spawn a thread just for the blocking vfu_attach_ctx() call
> > and report the result back to the event loop (e.g. using EventNotifier).
> > That adds a bunch of code to work around a blocking API though, so I
> > guess we can leave the blocking part if necessary.
> >
> > At the very minimum, please make EAGAIN/EWOULDBLOCK async as
> mentioned
> > above and add a comment explaining the situation with the
> > partially-async vfu_attach_ctx() API so it's clear that this can block
> > (that way it's clear that you're aware of the issue and this isn't by
> > accident).
> 
> Sure, we could make the attach async at QEMU depending on how the
> library prefers to do this.
> 
> >
> >> +    } else if (ret < 0) {
> >> +        error_setg(&error_abort,
> >> +                   "vfu: Failed to attach device %s to context - %s",
> >> +                   o->device, strerror(errno));
> >
> > error_abort assumes !o->daemon. Please handle the o->daemon == true
> > case by logging an error without aborting.
> >
> >> +        return;
> >> +    }
> >> +
> >> +    o->vfu_poll_fd = vfu_get_poll_fd(o->vfu_ctx);
> >> +    if (o->vfu_poll_fd < 0) {
> >> +        error_setg(&error_abort, "vfu: Failed to get poll fd %s", 
> >> o->device);
> >
> > Same here.
> >
> >> @@ -208,6 +284,8 @@ static void vfu_object_init(Object *obj)
> >>                    TYPE_VFU_OBJECT, TYPE_REMOTE_MACHINE);
> >>         return;
> >>     }
> >> +
> >> +    o->vfu_poll_fd = -1;
> >> }
> >
> > This must call qemu_set_fd_handler(o->vfu_poll_fd, NULL, NULL, NULL)
> > when o->vfu_poll_fd != -1 to avoid leaving a dangling fd handler
> > callback registered.
> 
> This is during the init phase, and the FD handlers are not set. Do you mean
> to add this at finalize?
> 
> I agree it’s good to explicitly add this at finalize. But vfu_destroy_ctx() 
> should
> trigger a ENOTCONN, which would do it anyway.
> 
> Thank you!
> --
> Jag

Reply via email to