Hi Jean, > -----Original Message----- > From: Jean-Philippe Brucker [mailto:jean-philippe.bruc...@arm.com] > Sent: Friday, July 07, 2017 8:50 PM > To: Bharat Bhushan <bharat.bhus...@nxp.com>; Auger Eric > <eric.au...@redhat.com>; eric.auger....@gmail.com; > peter.mayd...@linaro.org; alex.william...@redhat.com; m...@redhat.com; > qemu-...@nongnu.org; qemu-devel@nongnu.org > Cc: w...@redhat.com; kevin.t...@intel.com; marc.zyng...@arm.com; > t...@semihalf.com; will.dea...@arm.com; drjo...@redhat.com; > robin.mur...@arm.com; christoffer.d...@linaro.org > Subject: Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device > > On 07/07/17 12:36, Bharat Bhushan wrote: > >>> In this proposal, QEMU reserves a iova-range for guest (not host) and > guest > >> kernel will use this as msi-iova untranslated (IOMMU_RESV_MSI). While > this > >> does not change host interface and it will continue to use host reserved > >> mapping for actual interrupt generation, no? > >> But then userspace needs to provide IOMMU_RESV_MSI range to guest > >> kernel, right? What would be the proposed manner? > > > > Just an opinion, we can define feature > (VIRTIO_IOMMU_F_RES_MSI_RANGE) and provide this info via a command > (VIRTIO_IOMMU_T_MSI_RANGE). Guest iommu-driver will make this call > during initialization and store the value. This value will just replace > MSI_IOVA_BASE and MSI_IOVA_LENGHT hash define. Rest will remain same > in virtio-iommu driver. > > Yes I had something similar in mind, although more generic since we'll > need to get other bits of information from the device in future extensions > (fault handling, page table formats and dynamic reserves of memory for > SVM), and maybe also for finding out per-address-space page granularity > (see my reply of patch 3/8). These are per-endpoint properties that cannot > be advertise in the virtio config space. > > *** > > So I propose to add a per-endpoint probing mechanism on the request > queue:
What is per-endpoint? Is it "per-pci/platform-device"? > > * The device advertises a new command VIRTIO_IOMMU_T_PROBE with > feature > bit VIRTIO_IOMMU_F_PROBE. > * When this feature is advertised, the device sets probe_size field in the > the config space. Probably I did not get how virtio-iommu device emulation decides value of "probe_size", can you share more info? > * When device offers VIRTIO_IOMMU_F_PROBE, the driver should send an > VIRTIO_IOMMU_T_PROBE request for each new endpoint. > * The driver allocates a device-writeable buffer of probe_size (plus > framing) and sends it as a VIRTIO_IOMMU_T_PROBE request. > * The device fills the buffer with various information. > > struct virtio_iommu_req_probe { > /* device-readable */ > struct virtio_iommu_req_head head; > le32 device; > le32 flags; > > /* maybe also le32 content_size, but it must be equal to > probe_size */ Can you please describe why we need to pass size of "probe_size" in probe request? > > /* device-writeable */ > u8 content[]; I assume content_size above is the size of array "content[]" and max value can be equal to probe_size advertised by device? > struct virtio_iommu_req_tail tail; > }; > > I'm still struggling with the content and layout of the probe request, and > would appreciate any feedback. To be easily extended, I think it should > contain a list of fields of variable size: > > |0 15|16 31|32 N| > | type | length | values | > > 'length' might be made optional if it can be deduced from type, but might > make driver-side parsing more robust. > > The probe could either be done for each endpoint, or for each address > space. I much prefer endpoint because it is the smallest granularity. The > driver can then decide what endpoints to put together in the same address > space based on their individual capabilities. The specification would > described how each endpoint property is combined when endpoints are put > in > the same address space. For example, take the minimum of all PASID size, > the maximum of all page granularities, combine doorbell addresses, etc. > > If we did the probe on address spaces instead, the driver would have to > re-send a probe request each time a new endpoint is attached to an > existing address space, to see if it is still capable of page table > handover or if the driver just combined a VFIO and an emulated endpoint by > accident. > > *** > > Using this framework, the device can declare doorbell regions by adding > one or more RESV fields into the probe buffer: > > /* 'type' */ > #define VIRTIO_IOMMU_PROBE_T_RESV 0x1 > > /* 'values'. 'length' is sizeof(struct virtio_iommu_probe_resv) */ > struct virtio_iommu_probe_resv { > le64 gpa; > le64 size; > > #define VIRTIO_IOMMU_PROBE_RESV_MSI 0x1 > u8 type; type is 16 bit above? > }; > > Such a region would be subject to the following rules: > > * Driver should not use any IOVA declared as RESV_MSI in a mapping. > * Device should leave any transaction matching a RESV_MSI region pass > through untranslated. > * If the device does not advertise any RESV region, then the driver should > assume that MSI doorbells, like any other GPA, must be mapped with an > arbitrary IOVA in order for the endpoint to access them. > * Given that the driver *should* perform a probe request if available, and > it *should* understand the VIRTIO_IOMMU_PROBE_T_RESV field, then this > field tells the guest how it should handle MSI doorbells, and whether it > should map the address via MAP requests or not. > > Does this make sense and did I overlook something? Overall it looks good to me. Do you have plans to implements this in virtio-iommu driver and kvmtool? Thanks -Bharat > > Thanks, > Jean