On Mon, 27 Mar 2023 02:34:58 -0700
Yi Liu <yi.l....@intel.com> wrote:

> This is a preparation for vfio device cdev as cdev gives userspace the
> capability to open device cdev fd and management stack (e.g. libvirt)
> could pass the device fd to the actual user (e.g. QEMU). As a result,
> the actual user has no idea about the device's bus:devfn information.
> This is a problem when user uses VFIO_DEVICE_GET_PCI_HOT_RESET_INFO to
> know the hot reset affected device scope as this ioctl returns bus:devfn
> info. For the fd passing usage, the acutal user cannot map the bus:devfn
> to the devices it has opened via the fd passed from management stack. So
> a new ioctl is required.
> 
> This new ioctl reports the list of iommufd dev_id that is opened by the
> user. If there is affected device that is not bound to vfio driver or
> opened by another user, this command shall fail with -EPERM. For the
> noiommu mode in the vfio device cdev path, this shall fail as no dev_id
> would be generated, hence nothing to report.
> 
> This ioctl is useless to the users that open vfio group as such users
> have no idea about the iommufd dev_id and it can use the existing
> VFIO_DEVICE_GET_PCI_HOT_RESET_INFO. The user that uses the traditional
> mode vfio group/container would be failed if invoking this ioctl. But
> the user that uses the iommufd compat mode vfio group/container shall
> succeed. This is harmless as long as user cannot make use of it and
> should use VFIO_DEVICE_GET_PCI_HOT_RESET_INFO.


So VFIO_DEVICE_GET_PCI_HOT_RESET_INFO reports a group and bdf, but
VFIO_DEVICE_GET_PCI_HOT_RESET_*GROUP*_INFO is meant for the non-group,
cdev use case and returns a dev_id rather than a group???

Additionally, VFIO_DEVICE_GET_PCI_HOT_RESET_INFO has a flags arg that
isn't used, why do we need a new ioctl vs defining
VFIO_PCI_HOT_RESET_FLAG_IOMMUFD_DEV_ID.  In fact, we could define
vfio_dependent_device as:

struct vfio_pci_dependent_device {
        union {
                __u32   group_id;
                __u32   dev_id;
        }
        __u16   segment;
        __u8    bus;
        __u8    devfn;
};

If the user calls with the above flag, dev_id is valid, otherwise
group_id.  Perhaps segment:buus:devfn could still be filled in with a
NULL/invalid dev_id if the user doesn't have permissions for the device
so that debugging from userspace isn't so opaque.  Thanks,

Alex
 
> Signed-off-by: Yi Liu <yi.l....@intel.com>
> ---
>  drivers/vfio/pci/vfio_pci_core.c | 98 ++++++++++++++++++++++++++++++++
>  include/uapi/linux/vfio.h        | 47 +++++++++++++++
>  2 files changed, 145 insertions(+)
> 
> diff --git a/drivers/vfio/pci/vfio_pci_core.c 
> b/drivers/vfio/pci/vfio_pci_core.c
> index 19f5b075d70a..45edf4e9b98b 100644
> --- a/drivers/vfio/pci/vfio_pci_core.c
> +++ b/drivers/vfio/pci/vfio_pci_core.c
> @@ -1181,6 +1181,102 @@ static int vfio_pci_ioctl_reset(struct 
> vfio_pci_core_device *vdev,
>       return ret;
>  }
>  
> +static struct pci_dev *
> +vfio_pci_dev_set_resettable(struct vfio_device_set *dev_set);
> +
> +static int vfio_pci_ioctl_get_pci_hot_reset_group_info(
> +     struct vfio_pci_core_device *vdev,
> +     struct vfio_pci_hot_reset_group_info __user *arg)
> +{
> +     unsigned long minsz =
> +             offsetofend(struct vfio_pci_hot_reset_group_info, count);
> +     struct vfio_pci_hot_reset_group_info hdr;
> +     struct iommufd_ctx *iommufd, *cur_iommufd;
> +     u32 count = 0, index = 0, *devices = NULL;
> +     struct vfio_pci_core_device *cur;
> +     bool slot = false;
> +     int ret = 0;
> +
> +     if (copy_from_user(&hdr, arg, minsz))
> +             return -EFAULT;
> +
> +     if (hdr.argsz < minsz)
> +             return -EINVAL;
> +
> +     hdr.flags = 0;
> +
> +     /* Can we do a slot or bus reset or neither? */
> +     if (!pci_probe_reset_slot(vdev->pdev->slot))
> +             slot = true;
> +     else if (pci_probe_reset_bus(vdev->pdev->bus))
> +             return -ENODEV;
> +
> +     mutex_lock(&vdev->vdev.dev_set->lock);
> +     if (!vfio_pci_dev_set_resettable(vdev->vdev.dev_set)) {
> +             ret = -EPERM;
> +             goto out_unlock;
> +     }
> +
> +     iommufd = vfio_iommufd_physical_ictx(&vdev->vdev);
> +     if (!iommufd) {
> +             ret = -EPERM;
> +             goto out_unlock;
> +     }
> +
> +     /* How many devices are affected? */
> +     ret = vfio_pci_for_each_slot_or_bus(vdev->pdev, vfio_pci_count_devs,
> +                                         &count, slot);
> +     if (ret)
> +             goto out_unlock;
> +
> +     WARN_ON(!count); /* Should always be at least one */
> +
> +     /*
> +      * If there's enough space, fill it now, otherwise return -ENOSPC and
> +      * the number of devices affected.
> +      */
> +     if (hdr.argsz < sizeof(hdr) + (count * sizeof(*devices))) {
> +             ret = -ENOSPC;
> +             hdr.count = count;
> +             goto reset_info_exit;
> +     }
> +
> +     devices = kcalloc(count, sizeof(*devices), GFP_KERNEL);
> +     if (!devices) {
> +             ret = -ENOMEM;
> +             goto reset_info_exit;
> +     }
> +
> +     list_for_each_entry(cur, &vdev->vdev.dev_set->device_list, 
> vdev.dev_set_list) {
> +             cur_iommufd = vfio_iommufd_physical_ictx(&cur->vdev);
> +             if (cur->vdev.open_count) {
> +                     if (cur_iommufd != iommufd) {
> +                             ret = -EPERM;
> +                             break;
> +                     }
> +                     ret = vfio_iommufd_physical_devid(&cur->vdev, 
> &devices[index]);
> +                     if (ret)
> +                             break;
> +                     index++;
> +             }
> +     }
> +
> +reset_info_exit:
> +     if (copy_to_user(arg, &hdr, minsz))
> +             ret = -EFAULT;
> +
> +     if (!ret) {
> +             if (copy_to_user(&arg->devices, devices,
> +                              hdr.count * sizeof(*devices)))
> +                     ret = -EFAULT;
> +     }
> +
> +     kfree(devices);
> +out_unlock:
> +     mutex_unlock(&vdev->vdev.dev_set->lock);
> +     return ret;
> +}
> +
>  static int vfio_pci_ioctl_get_pci_hot_reset_info(
>       struct vfio_pci_core_device *vdev,
>       struct vfio_pci_hot_reset_info __user *arg)
> @@ -1404,6 +1500,8 @@ long vfio_pci_core_ioctl(struct vfio_device *core_vdev, 
> unsigned int cmd,
>               return vfio_pci_ioctl_get_irq_info(vdev, uarg);
>       case VFIO_DEVICE_GET_PCI_HOT_RESET_INFO:
>               return vfio_pci_ioctl_get_pci_hot_reset_info(vdev, uarg);
> +     case VFIO_DEVICE_GET_PCI_HOT_RESET_GROUP_INFO:
> +             return vfio_pci_ioctl_get_pci_hot_reset_group_info(vdev, uarg);
>       case VFIO_DEVICE_GET_REGION_INFO:
>               return vfio_pci_ioctl_get_region_info(vdev, uarg);
>       case VFIO_DEVICE_IOEVENTFD:
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index 25432ef213ee..61b801dfd40b 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -669,6 +669,53 @@ struct vfio_pci_hot_reset_info {
>  
>  #define VFIO_DEVICE_GET_PCI_HOT_RESET_INFO   _IO(VFIO_TYPE, VFIO_BASE + 12)
>  
> +/**
> + * VFIO_DEVICE_GET_PCI_HOT_RESET_GROUP_INFO - _IOWR(VFIO_TYPE, VFIO_BASE + 
> 12,
> + *                                               struct 
> vfio_pci_hot_reset_group_info)
> + *
> + * This is used in the vfio device cdev mode.  It returns the list of
> + * affected devices (represented by iommufd dev_id) when hot reset is
> + * issued on the current device with which this ioctl is invoked.  It
> + * only includes the devices that are opened by the current user in the
> + * time of this command is invoked.  This list may change when user opens
> + * new device or close opened device, hence user should re-invoke it
> + * after open/close devices.  This command has no guarantee on the result
> + * of VFIO_DEVICE_PCI_HOT_RESET since the not-opened affected device can
> + * be by other users in the window between the two ioctls.  If the affected
> + * devices are opened by multiple users, the VFIO_DEVICE_PCI_HOT_RESET
> + * shall fail, detail can check the description of VFIO_DEVICE_PCI_HOT_RESET.
> + *
> + * For the users that open vfio group/container, this ioctl is useless as
> + * they have no idea about the iommufd dev_id returned by this ioctl.  For
> + * the users of the traditional mode vfio group/container, this ioctl will
> + * fail as this mode does not use iommufd hence no dev_id to report back.
> + * For the users of the iommufd compat mode vfio group/container, this ioctl
> + * would succeed as this mode uses iommufd as container fd.  But such users
> + * still have no idea about the iommufd dev_id as the dev_id is only stored
> + * in kernel in this mode.  For the users of the vfio group/container, the
> + * VFIO_DEVICE_GET_PCI_HOT_RESET_INFO should be used to know the hot reset
> + * affected devices.
> + *
> + * Return: 0 on success, -errno on failure:
> + *   -enospc = insufficient buffer;
> + *   -enodev = unsupported for device;
> + *   -eperm = no permission for device, this error comes:
> + *            - when there are affected devices that are opened but
> + *              not bound to the same iommufd with the current device
> + *              with which this ioctl is invoked,
> + *            - there are affected devices that are not bound to vfio
> + *              driver yet.
> + *            - no valid iommufd is bound (e.g. noiommu mode)
> + */
> +struct vfio_pci_hot_reset_group_info {
> +     __u32   argsz;
> +     __u32   flags;
> +     __u32   count;
> +     __u32   devices[];
> +};
> +
> +#define VFIO_DEVICE_GET_PCI_HOT_RESET_GROUP_INFO     _IO(VFIO_TYPE, 
> VFIO_BASE + 18)
> +
>  /**
>   * VFIO_DEVICE_PCI_HOT_RESET - _IOW(VFIO_TYPE, VFIO_BASE + 13,
>   *                               struct vfio_pci_hot_reset)

Reply via email to