Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device

Bharat Bhushan Mon, 10 Jul 2017 22:55:51 -0700

Hi Jean,

> -----Original Message-----
> From: Jean-Philippe Brucker [mailto:jean-philippe.bruc...@arm.com]
> Sent: Friday, July 07, 2017 8:50 PM
> To: Bharat Bhushan <bharat.bhus...@nxp.com>; Auger Eric
> <eric.au...@redhat.com>; eric.auger....@gmail.com;
> peter.mayd...@linaro.org; alex.william...@redhat.com; m...@redhat.com;
> qemu-...@nongnu.org; qemu-devel@nongnu.org
> Cc: w...@redhat.com; kevin.t...@intel.com; marc.zyng...@arm.com;
> t...@semihalf.com; will.dea...@arm.com; drjo...@redhat.com;
> robin.mur...@arm.com; christoffer.d...@linaro.org
> Subject: Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device
> 
> On 07/07/17 12:36, Bharat Bhushan wrote:
> >>> In this proposal, QEMU reserves a iova-range for guest (not host) and
> guest
> >> kernel will use this as msi-iova untranslated (IOMMU_RESV_MSI). While
> this
> >> does not change host interface and it will continue to use host reserved
> >> mapping for actual interrupt generation, no?
> >> But then userspace needs to provide IOMMU_RESV_MSI range to guest
> >> kernel, right? What would be the proposed manner?
> >
> > Just an opinion, we can define feature
> (VIRTIO_IOMMU_F_RES_MSI_RANGE) and provide this info via a command
> (VIRTIO_IOMMU_T_MSI_RANGE). Guest iommu-driver will make this call
> during initialization and store the value. This value will just replace
> MSI_IOVA_BASE and MSI_IOVA_LENGHT hash define. Rest will remain same
> in virtio-iommu driver.
> 
> Yes I had something similar in mind, although more generic since we'll
> need to get other bits of information from the device in future extensions
> (fault handling, page table formats and dynamic reserves of memory for
> SVM), and maybe also for finding out per-address-space page granularity
> (see my reply of patch 3/8). These are per-endpoint properties that cannot
> be advertise in the virtio config space.
> 
>                                  ***
> 
> So I propose to add a per-endpoint probing mechanism on the request
> queue:


What is per-endpoint? Is it "per-pci/platform-device"?

> 
> * The device advertises a new command VIRTIO_IOMMU_T_PROBE with
> feature
> bit VIRTIO_IOMMU_F_PROBE.
> * When this feature is advertised, the device sets probe_size field in the
> the config space.

Probably I did not get how virtio-iommu device emulation decides value of 
"probe_size", can you share more info?

> * When device offers VIRTIO_IOMMU_F_PROBE, the driver should send an
> VIRTIO_IOMMU_T_PROBE request for each new endpoint.
> * The driver allocates a device-writeable buffer of probe_size (plus
> framing) and sends it as a VIRTIO_IOMMU_T_PROBE request.
> * The device fills the buffer with various information.
> 
> struct virtio_iommu_req_probe {
>       /* device-readable */
>       struct virtio_iommu_req_head head;
>       le32 device;
>       le32 flags;
> 
>       /* maybe also le32 content_size, but it must be equal to
>          probe_size */

Can you please describe why we need to pass size of "probe_size" in probe 
request?

> 
>       /* device-writeable */
>       u8 content[];

I assume content_size above is the size of array "content[]" and max value can 
be equal to probe_size advertised by device?

>       struct virtio_iommu_req_tail tail;
> };
> 
> I'm still struggling with the content and layout of the probe request, and
> would appreciate any feedback. To be easily extended, I think it should
> contain a list of fields of variable size:
> 
>       |0           15|16           31|32               N|
>       |     type     |    length     |      values      |
> 
> 'length' might be made optional if it can be deduced from type, but might
> make driver-side parsing more robust.
> 
> The probe could either be done for each endpoint, or for each address
> space. I much prefer endpoint because it is the smallest granularity. The
> driver can then decide what endpoints to put together in the same address
> space based on their individual capabilities. The specification would
> described how each endpoint property is combined when endpoints are put
> in
> the same address space. For example, take the minimum of all PASID size,
> the maximum of all page granularities, combine doorbell addresses, etc.
> 
> If we did the probe on address spaces instead, the driver would have to
> re-send a probe request each time a new endpoint is attached to an
> existing address space, to see if it is still capable of page table
> handover or if the driver just combined a VFIO and an emulated endpoint by
> accident.
> 
>                                  ***
> 
> Using this framework, the device can declare doorbell regions by adding
> one or more RESV fields into the probe buffer:
> 
> /* 'type' */
> #define VIRTIO_IOMMU_PROBE_T_RESV     0x1
> 
> /* 'values'. 'length' is sizeof(struct virtio_iommu_probe_resv) */
> struct virtio_iommu_probe_resv {
>       le64 gpa;
>       le64 size;
> 
> #define VIRTIO_IOMMU_PROBE_RESV_MSI   0x1
>       u8 type;

type is 16 bit above?

> };
> 
> Such a region would be subject to the following rules:
> 
> * Driver should not use any IOVA declared as RESV_MSI in a mapping.
> * Device should leave any transaction matching a RESV_MSI region pass
> through untranslated.
> * If the device does not advertise any RESV region, then the driver should
> assume that MSI doorbells, like any other GPA, must be mapped with an
> arbitrary IOVA in order for the endpoint to access them.
> * Given that the driver *should* perform a probe request if available, and
> it *should* understand the VIRTIO_IOMMU_PROBE_T_RESV field, then this
> field tells the guest how it should handle MSI doorbells, and whether it
> should map the address via MAP requests or not.
> 
> Does this make sense and did I overlook something?

Overall it looks good to me. Do you have plans to implements this in 
virtio-iommu driver and kvmtool?

Thanks
-Bharat

> 
> Thanks,
> Jean

Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device

Reply via email to