> -----Original Message----- > From: Jean-Philippe Brucker [mailto:jean-philippe.bruc...@arm.com] > Sent: Tuesday, July 11, 2017 6:21 PM > To: Bharat Bhushan <bharat.bhus...@nxp.com>; Auger Eric > <eric.au...@redhat.com>; eric.auger....@gmail.com; > peter.mayd...@linaro.org; alex.william...@redhat.com; m...@redhat.com; > qemu-...@nongnu.org; qemu-devel@nongnu.org > Cc: w...@redhat.com; kevin.t...@intel.com; marc.zyng...@arm.com; > t...@semihalf.com; will.dea...@arm.com; drjo...@redhat.com; > robin.mur...@arm.com; christoffer.d...@linaro.org > Subject: Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device > > On 11/07/17 06:54, Bharat Bhushan wrote: > > Hi Jean, > > > >> -----Original Message----- > >> From: Jean-Philippe Brucker [mailto:jean-philippe.bruc...@arm.com] > >> Sent: Friday, July 07, 2017 8:50 PM > >> To: Bharat Bhushan <bharat.bhus...@nxp.com>; Auger Eric > >> <eric.au...@redhat.com>; eric.auger....@gmail.com; > >> peter.mayd...@linaro.org; alex.william...@redhat.com; > m...@redhat.com; > >> qemu-...@nongnu.org; qemu-devel@nongnu.org > >> Cc: w...@redhat.com; kevin.t...@intel.com; marc.zyng...@arm.com; > >> t...@semihalf.com; will.dea...@arm.com; drjo...@redhat.com; > >> robin.mur...@arm.com; christoffer.d...@linaro.org > >> Subject: Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device > >> > >> On 07/07/17 12:36, Bharat Bhushan wrote: > >>>>> In this proposal, QEMU reserves a iova-range for guest (not host) > >>>>> and > >> guest > >>>> kernel will use this as msi-iova untranslated (IOMMU_RESV_MSI). > >>>> While > >> this > >>>> does not change host interface and it will continue to use host > >>>> reserved mapping for actual interrupt generation, no? > >>>> But then userspace needs to provide IOMMU_RESV_MSI range to > guest > >>>> kernel, right? What would be the proposed manner? > >>> > >>> Just an opinion, we can define feature > >> (VIRTIO_IOMMU_F_RES_MSI_RANGE) and provide this info via a > command > >> (VIRTIO_IOMMU_T_MSI_RANGE). Guest iommu-driver will make this call > >> during initialization and store the value. This value will just > >> replace MSI_IOVA_BASE and MSI_IOVA_LENGHT hash define. Rest will > >> remain same in virtio-iommu driver. > >> > >> Yes I had something similar in mind, although more generic since > >> we'll need to get other bits of information from the device in future > >> extensions (fault handling, page table formats and dynamic reserves > >> of memory for SVM), and maybe also for finding out per-address-space > >> page granularity (see my reply of patch 3/8). These are per-endpoint > >> properties that cannot be advertise in the virtio config space. > >> > >> *** > >> > >> So I propose to add a per-endpoint probing mechanism on the request > >> queue: > > > > What is per-endpoint? Is it "per-pci/platform-device"? > > Yes, it's a pci or platform device managed by the IOMMU. In the spec I'm > now using the term "endpoint" to easily differentiate from the virtio-iommu > device ("the device"). > > >> * The device advertises a new command VIRTIO_IOMMU_T_PROBE with > >> feature bit VIRTIO_IOMMU_F_PROBE. > >> * When this feature is advertised, the device sets probe_size field > >> in the the config space. > > > > Probably I did not get how virtio-iommu device emulation decides value of > "probe_size", can you share more info? > > The size of the virtio_iommu_req_probe structure is variable, and depends > what fields the device implements. So the device initially computes the size > it > needs to fill virtio_iommu_req_probe, describes it in probe_size, and the > driver allocates that many bytes for virtio_iommu_req_probe.content[] > > >> * When device offers VIRTIO_IOMMU_F_PROBE, the driver should send > an > >> VIRTIO_IOMMU_T_PROBE request for each new endpoint. > >> * The driver allocates a device-writeable buffer of probe_size (plus > >> framing) and sends it as a VIRTIO_IOMMU_T_PROBE request. > >> * The device fills the buffer with various information. > >> > >> struct virtio_iommu_req_probe { > >> /* device-readable */ > >> struct virtio_iommu_req_head head; > >> le32 device; > >> le32 flags; > >> > >> /* maybe also le32 content_size, but it must be equal to > >> probe_size */ > > > > Can you please describe why we need to pass size of "probe_size" in probe > request? > > We don't. I don't think we should add this 'content_size' field unless there > is > a compelling reason to do so. > > >> > >> /* device-writeable */ > >> u8 content[]; > > > > I assume content_size above is the size of array "content[]" and max value > can be equal to probe_size advertised by device? > > probe_size is exactly the size of array content[]. The driver must allocate a > buffer of this size (plus the space needed for head, device, flags and tail). > > Then the device is free to leave parts of content[] empty. Field 'type' 0 > will be > reserved and mark the end of the array. > > >> struct virtio_iommu_req_tail tail; > >> }; > >> > >> I'm still struggling with the content and layout of the probe > >> request, and would appreciate any feedback. To be easily extended, I > >> think it should contain a list of fields of variable size: > >> > >> |0 15|16 31|32 N| > >> | type | length | values | > >> > >> 'length' might be made optional if it can be deduced from type, but > >> might make driver-side parsing more robust. > >> > >> The probe could either be done for each endpoint, or for each address > >> space. I much prefer endpoint because it is the smallest granularity. > >> The driver can then decide what endpoints to put together in the same > >> address space based on their individual capabilities. The > >> specification would described how each endpoint property is combined > >> when endpoints are put in the same address space. For example, take > >> the minimum of all PASID size, the maximum of all page granularities, > >> combine doorbell addresses, etc. > >> > >> If we did the probe on address spaces instead, the driver would have > >> to re-send a probe request each time a new endpoint is attached to an > >> existing address space, to see if it is still capable of page table > >> handover or if the driver just combined a VFIO and an emulated > >> endpoint by accident. > >> > >> *** > >> > >> Using this framework, the device can declare doorbell regions by > >> adding one or more RESV fields into the probe buffer: > >> > >> /* 'type' */ > >> #define VIRTIO_IOMMU_PROBE_T_RESV 0x1 > >> > >> /* 'values'. 'length' is sizeof(struct virtio_iommu_probe_resv) */ > >> struct virtio_iommu_probe_resv { > >> le64 gpa; > >> le64 size; > >> > >> #define VIRTIO_IOMMU_PROBE_RESV_MSI 0x1 > >> u8 type;
To be sure I am understanding it correctly, Is this "type" in struct virtio_iommu_req_head? Thanks -Bharat > > > > type is 16 bit above? > > Ah, the naming isn't great. This is not the same as above, and could be called > 'subtype' to avoid confusion. The above 16-bit type describes the field type, > e.g. struct virtio_iommu_probe_resv. I proposed 16-bit because it seems > easy to reach more than 255 kinds of endpoint properties, but > 65535 should do. > > This subtype describes which kind of resv region is described in the > structure. > For the moment there only is VIRTIO_IOMMU_PROBE_RESV_MSI, but we > could for example add resv regions that the driver should never use or that it > should identity-map (equivalent to IOMMU_RESV_RESERVED/DIRECT in > Linux). I think 8 bits should be enough to contain any future types, unless we > make this a bitfield. For identity-map, there may be an additional flags field > describing the protection. > > >> }; > >> > >> Such a region would be subject to the following rules: > >> > >> * Driver should not use any IOVA declared as RESV_MSI in a mapping. > >> * Device should leave any transaction matching a RESV_MSI region pass > >> through untranslated. > >> * If the device does not advertise any RESV region, then the driver > >> should assume that MSI doorbells, like any other GPA, must be mapped > >> with an arbitrary IOVA in order for the endpoint to access them. > >> * Given that the driver *should* perform a probe request if > >> available, and it *should* understand the > VIRTIO_IOMMU_PROBE_T_RESV > >> field, then this field tells the guest how it should handle MSI > >> doorbells, and whether it should map the address via MAP requests or > not. > >> > >> Does this make sense and did I overlook something? > > > > Overall it looks good to me. Do you have plans to implements this in virtio- > iommu driver and kvmtool? > > Yes, if there is no objection I'll try to formalize it and implement it right > away. > > Thanks, > Jean