On Fri, Sep 11, 2015 at 02:03:38PM -0600, Alex Williamson wrote: > On Wed, 2015-09-09 at 20:43 -0600, Alex Williamson wrote: > > On Thu, 2015-09-03 at 14:40 +1000, Alexey Kardashevskiy wrote: > > > So far there were 2 limitations enforced on an emulated PHB > > > regarding VFIO: > > > 1) only one IOMMU group per IOMMU container was allowed and > > > the spapr-pci-vfio-host-bridge device has an IOMMU ID property for this; > > > 2) only one IOMMU container per PHB was allowed as a group > > > can only be attached to one container. > > > > > > However these are not really necessary so we are removing them here. > > > > > > This deprecates IOMMU group ID and changes vfio_container_do_ioctl() > > > not to receive it. Instead of getting a container from a group ID, > > > this calls ioctl() for every container attached to the PHB address space. > > > This allows multiple containers on the same PHB which means multiple > > > groups per PHB. Note that this will lead to every H_PUT_TCE&etc call > > > to be passed to every container separately which will affect > > > the performance. For 32bit devices it is still recommended to put > > > every group to a separate PHB. > > > > > > Since the existing VFIO code is already trying to share a container for > > > multiple groups, just removing a group id from > > > the vfio_container_do_ioctl() allows the existing code to share > > > a container if it is supported by the host kernel. > > > > > > This replaces the check for a group id to be set correctly with > > > the check that it is not set. > > > > > > This removes casts to sPAPRPHBVFIOState as none of sPAPRPHBVFIOState > > > members is accessed here. > > > > > > Signed-off-by: Alexey Kardashevskiy <a...@ozlabs.ru> > > > --- > > > hw/ppc/spapr_pci.c | 10 +++++----- > > > hw/ppc/spapr_pci_vfio.c | 17 ++++++----------- > > > hw/vfio/common.c | 22 ++++++---------------- > > > include/hw/vfio/vfio.h | 3 +-- > > > 4 files changed, 18 insertions(+), 34 deletions(-) > > > > > > diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c > > > index 4e289cb..1b7559d 100644 > > > --- a/hw/ppc/spapr_pci.c > > > +++ b/hw/ppc/spapr_pci.c > > > @@ -810,11 +810,6 @@ static int > > > spapr_phb_dma_capabilities_update(sPAPRPHBState *sphb) > > > pci_for_each_device(bus, pci_bus_num(bus), spapr_phb_walk_devices, > > > sphb); > > > > > > if (sphb->vfio_num) { > > > - if (sphb->iommugroupid == -1) { > > > - error_report("Wrong IOMMU group ID %d", sphb->iommugroupid); > > > - return -1; > > > - } > > > - > > > ret = spapr_phb_vfio_dma_capabilities_update(sphb); > > > if (ret) { > > > error_report("Unable to get DMA32 info from VFIO"); > > > @@ -1282,6 +1277,11 @@ static void spapr_phb_realize(DeviceState *dev, > > > Error **errp) > > > PCIBus *bus; > > > uint64_t msi_window_size = 4096; > > > > > > + if ((sphb->iommugroupid != -1) && > > > + object_dynamic_cast(OBJECT(sphb), > > > TYPE_SPAPR_PCI_VFIO_HOST_BRIDGE)) { > > > + error_report("Warning: iommugroupid is deprecated and will be > > > ignored"); > > > + } > > > + > > > if (sphb->index != (uint32_t)-1) { > > > hwaddr windows_base; > > > > > > diff --git a/hw/ppc/spapr_pci_vfio.c b/hw/ppc/spapr_pci_vfio.c > > > index f94d8a4..48137d5 100644 > > > --- a/hw/ppc/spapr_pci_vfio.c > > > +++ b/hw/ppc/spapr_pci_vfio.c > > > @@ -35,7 +35,7 @@ int > > > spapr_phb_vfio_dma_capabilities_update(sPAPRPHBState *sphb) > > > struct vfio_iommu_spapr_tce_info info = { .argsz = sizeof(info) }; > > > int ret; > > > > > > - ret = vfio_container_ioctl(&sphb->iommu_as, sphb->iommugroupid, > > > + ret = vfio_container_ioctl(&sphb->iommu_as, > > > VFIO_IOMMU_SPAPR_TCE_GET_INFO, &info); > > > if (ret) { > > > return ret; > > > @@ -54,8 +54,7 @@ void spapr_phb_vfio_eeh_reenable(sPAPRPHBState *sphb) > > > .op = VFIO_EEH_PE_ENABLE > > > }; > > > > > > - vfio_container_ioctl(&sphb->iommu_as, > > > - sphb->iommugroupid, VFIO_EEH_PE_OP, &op); > > > + vfio_container_ioctl(&sphb->iommu_as, VFIO_EEH_PE_OP, &op); > > > } > > > > > > int spapr_phb_vfio_eeh_set_option(sPAPRPHBState *sphb, > > > @@ -81,8 +80,7 @@ int spapr_phb_vfio_eeh_set_option(sPAPRPHBState *sphb, > > > return RTAS_OUT_PARAM_ERROR; > > > } > > > > > > - ret = vfio_container_ioctl(&sphb->iommu_as, sphb->iommugroupid, > > > - VFIO_EEH_PE_OP, &op); > > > + ret = vfio_container_ioctl(&sphb->iommu_as, VFIO_EEH_PE_OP, &op); > > > if (ret < 0) { > > > return RTAS_OUT_HW_ERROR; > > > } > > > @@ -96,8 +94,7 @@ int spapr_phb_vfio_eeh_get_state(sPAPRPHBState *sphb, > > > int *state) > > > int ret; > > > > > > op.op = VFIO_EEH_PE_GET_STATE; > > > - ret = vfio_container_ioctl(&sphb->iommu_as, sphb->iommugroupid, > > > - VFIO_EEH_PE_OP, &op); > > > + ret = vfio_container_ioctl(&sphb->iommu_as, VFIO_EEH_PE_OP, &op); > > > if (ret < 0) { > > > return RTAS_OUT_PARAM_ERROR; > > > } > > > @@ -170,8 +167,7 @@ int spapr_phb_vfio_eeh_reset(sPAPRPHBState *sphb, int > > > option) > > > return RTAS_OUT_PARAM_ERROR; > > > } > > > > > > - ret = vfio_container_ioctl(&sphb->iommu_as, sphb->iommugroupid, > > > - VFIO_EEH_PE_OP, &op); > > > + ret = vfio_container_ioctl(&sphb->iommu_as, VFIO_EEH_PE_OP, &op); > > > if (ret < 0) { > > > return RTAS_OUT_HW_ERROR; > > > } > > > @@ -185,8 +181,7 @@ int spapr_phb_vfio_eeh_configure(sPAPRPHBState *sphb) > > > int ret; > > > > > > op.op = VFIO_EEH_PE_CONFIGURE; > > > - ret = vfio_container_ioctl(&sphb->iommu_as, sphb->iommugroupid, > > > - VFIO_EEH_PE_OP, &op); > > > + ret = vfio_container_ioctl(&sphb->iommu_as, VFIO_EEH_PE_OP, &op); > > > if (ret < 0) { > > > return RTAS_OUT_PARAM_ERROR; > > > } > > > diff --git a/hw/vfio/common.c b/hw/vfio/common.c > > > index 85ee9b0..a00644e 100644 > > > --- a/hw/vfio/common.c > > > +++ b/hw/vfio/common.c > > > @@ -926,35 +926,25 @@ void vfio_put_base_device(VFIODevice *vbasedev) > > > close(vbasedev->fd); > > > } > > > > > > -static int vfio_container_do_ioctl(AddressSpace *as, int32_t groupid, > > > - int req, void *param) > > > +static int vfio_container_do_ioctl(AddressSpace *as, int req, void > > > *param) > > > { > > > - VFIOGroup *group; > > > VFIOContainer *container; > > > int ret = -1; > > > + VFIOAddressSpace *space = vfio_get_address_space(as); > > > > > > - group = vfio_get_group(groupid, as); > > > - if (!group) { > > > - error_report("vfio: group %d not registered", groupid); > > > - return ret; > > > - } > > > - > > > - container = group->container; > > > - if (group->container) { > > > + QLIST_FOREACH(container, &space->containers, next) { > > > ret = ioctl(container->fd, req, param); > > > if (ret < 0) { > > > error_report("vfio: failed to ioctl %d to container: ret=%d, > > > %s", > > > _IOC_NR(req) - VFIO_BASE, ret, strerror(errno)); > > > + return -errno; > > > } > > > } > > > > This highlights how terrible this ioctl passthrough interface is; what's > > a caller supposed to do on error? Here we don't have visibility into > > the ioctl being called in order to do any unwind on error. The caller > > doesn't have the context to unwind only the failed containers. Is > > returning errno here really sufficient for anyone to figure out how to > > proceed or should we just cut our loses and abort()? What's the least > > horrible way we can realistically handle a failure here? Thanks, > > It seemed like I asked this before and it turns out that I did: > > https://lists.gnu.org/archive/html/qemu-devel/2015-06/msg07399.html > > "Gavin says yes" is not really a supportable solution when I stumbled on > it again and David doesn't know why it's safe either (nor does it answer > my question of how does this work). At a minimum, the reasoning why > this is safe for the ioctls we currently allow here needs to be spelled > out in a comment. We could easily make the mistake of adding another > ioctl to the passthrough for which those assumptions are not valid. We > may even want to abort if it's not one of the ioctls we've evaluated so > we don't overlook this problem later. Thanks,
Yeah, you're right. This is a mess. What we need to do here is to make vfio_container_ioctl() take a VFIOContainer object as the name suggests. The the callers will need to either use it on a specific container, or iterate across all the containers in the VFIOAddressSpace as appropriate for the specific operation. -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson
pgps60Dn9Gg9k.pgp
Description: PGP signature