On Thu, Jul 26, 2018 at 02:45:39PM -0600, Alex Williamson wrote: > On Mon, 23 Jul 2018 12:59:56 +0800 > Tiwei Bie <tiwei....@intel.com> wrote: > [...] > > > > +static int vhost_user_slave_handle_vfio_group(struct vhost_dev *dev, > > + int *fd) > > +{ > > + struct vhost_user *u = dev->opaque; > > + VhostUserState *user = u->user; > > + VirtIODevice *vdev = dev->vdev; > > + int groupfd = fd[0]; > > + VFIOGroup *group; > > + > > + if (!virtio_has_feature(dev->protocol_features, > > + VHOST_USER_PROTOCOL_F_VFIO_GROUP) || > > + vdev == NULL) { > > + return -1; > > + } > > + > > + if (user->vfio_group) { > > + vfio_put_group(user->vfio_group); > > + user->vfio_group = NULL; > > + } > > +
There should be a below check here (I missed it in this patch, sorry..): if (groupfd < 0) { return 0; } > > + group = vfio_get_group_from_fd(groupfd, vdev->dma_as, NULL); > > + if (group == NULL) { > > + return -1; > > + } > > + > > + if (group->fd != groupfd) { > > + close(groupfd); > > + } > > + > > + user->vfio_group = group; > > + fd[0] = -1; > > + > > + return 0; > > +} > > This all looks very sketchy, we're reaching into vfio internal state > and arbitrarily releasing data structures, reusing existing ones, and > maybe creating new ones. We know that a vfio group only allows a > single open, right? Yeah, exactly! When the vhost-user backend process has opened a VFIO group fd, QEMU won't be able to open this group file via open() any more. So vhost-user backend process will share the VFIO group fd to QEMU via the UNIX socket. And that's why we're introducing a new API: vfio_get_group_from_fd(groupfd, ...); for vfio/common.c to get (setup) a VFIOGroup structure. (Because in this case, vfio/common.c can't open this group file via open(), and what we have is a VFIO group fd shared by another process via UNIX socket). And ... > So if you have a groupfd, vfio will never have > that same group opened already. ... if the same vhost-user backend process shares the same VFIO group fd more than one times via different vhost-user slave channels to different vhost-user frontends in the same QEMU process, the same VFIOGroup could exist already. This case could happen when e.g. there are two vhost accelerator devices in the same VFIO group, and they are managed by the same vhost-user backend process, and used to accelerate two different virtio devices in QEMU. In this case, the same VFIO group fd could be sent twice. If we need to support this case, more changes in vfio/common.c will be needed, e.g. 1) add a refcnt in VFIOGroup to support getting and putting a VFIOGroup structure multiple times, 2) add a lock to make vfio_get_group*() and vfio_put_group() thread-safe. > Is that the goal? It's not obvious in > the code. I don't really understand what vhost goes on to do with this > group, The goal is that, we have a VFIO group opened by an external vhost-user backend process, this process will manage the VFIO device, but want QEMU to manage the VFIO group, including: 1 - program the DMA mappings for this VFIO group based on the front-end device's dma_as in QEMU. 2 - add this VFIO group to KVM (KVM_DEV_VFIO_GROUP_ADD). Vhost-user in QEMU as the vhost-user frontend just wants hw/vfio to do above things after receiving a VFIO group fd from the vhost-user backend process. Vhost-user will just ask hw/vfio to get (setup) a VFIOGroup by calling vfio_get_group_from_fd() or put this VFIOGroup by calling vfio_put_group() when it's not needed anymore, and won't do anything else. Vhost-user will only deal with the VFIOGroups allocated by itself, and won't influence any other VFIOGroups used by the VFIO passthru devices (-device vfio-pci). Because the same VFIO group file can only be opened by one process via open(). > but I'm pretty sure the existing state machine in vfio isn't > designed for it. Thanks, Yeah, more changes (e.g. refcnt, ..) in hw/vfio are needed to make it work robustly. Hmm.. Is above idea (e.g. how vhost-user is going to use VFIOGroup and how we are going to extend hw/vfio) acceptable to you? Thanks! Best regards, Tiwei Bie