On 12/07/17 04:50, Bharat Bhushan wrote: [...] >> The size of the virtio_iommu_req_probe structure is variable, and depends >> what fields the device implements. So the device initially computes the size >> it >> needs to fill virtio_iommu_req_probe, describes it in probe_size, and the >> driver allocates that many bytes for virtio_iommu_req_probe.content[] >> >>>> * When device offers VIRTIO_IOMMU_F_PROBE, the driver should send >> an >>>> VIRTIO_IOMMU_T_PROBE request for each new endpoint. >>>> * The driver allocates a device-writeable buffer of probe_size (plus >>>> framing) and sends it as a VIRTIO_IOMMU_T_PROBE request. >>>> * The device fills the buffer with various information. >>>> >>>> struct virtio_iommu_req_probe { >>>> /* device-readable */ >>>> struct virtio_iommu_req_head head; >>>> le32 device; >>>> le32 flags; >>>> >>>> /* maybe also le32 content_size, but it must be equal to >>>> probe_size */ >>> >>> Can you please describe why we need to pass size of "probe_size" in probe >> request? >> >> We don't. I don't think we should add this 'content_size' field unless there >> is >> a compelling reason to do so. >> >>>> >>>> /* device-writeable */ >>>> u8 content[]; >>> >>> I assume content_size above is the size of array "content[]" and max value >> can be equal to probe_size advertised by device? >> >> probe_size is exactly the size of array content[]. The driver must allocate a >> buffer of this size (plus the space needed for head, device, flags and tail). >> >> Then the device is free to leave parts of content[] empty. Field 'type' 0 >> will be >> reserved and mark the end of the array. >> >>>> struct virtio_iommu_req_tail tail; >>>> }; >>>> >>>> I'm still struggling with the content and layout of the probe >>>> request, and would appreciate any feedback. To be easily extended, I >>>> think it should contain a list of fields of variable size: >>>> >>>> |0 15|16 31|32 N| >>>> | type | length | values | >>>> >>>> 'length' might be made optional if it can be deduced from type, but >>>> might make driver-side parsing more robust. >>>> >>>> The probe could either be done for each endpoint, or for each address >>>> space. I much prefer endpoint because it is the smallest granularity. >>>> The driver can then decide what endpoints to put together in the same >>>> address space based on their individual capabilities. The >>>> specification would described how each endpoint property is combined >>>> when endpoints are put in the same address space. For example, take >>>> the minimum of all PASID size, the maximum of all page granularities, >>>> combine doorbell addresses, etc. >>>> >>>> If we did the probe on address spaces instead, the driver would have >>>> to re-send a probe request each time a new endpoint is attached to an >>>> existing address space, to see if it is still capable of page table >>>> handover or if the driver just combined a VFIO and an emulated >>>> endpoint by accident. >>>> >>>> *** >>>> >>>> Using this framework, the device can declare doorbell regions by >>>> adding one or more RESV fields into the probe buffer: >>>> >>>> /* 'type' */ >>>> #define VIRTIO_IOMMU_PROBE_T_RESV 0x1 >>>> >>>> /* 'values'. 'length' is sizeof(struct virtio_iommu_probe_resv) */ >>>> struct virtio_iommu_probe_resv { >>>> le64 gpa; >>>> le64 size; >>>> >>>> #define VIRTIO_IOMMU_PROBE_RESV_MSI 0x1 >>>> u8 type; > > To be sure I am understanding it correctly, Is this "type" in struct > virtio_iommu_req_head?
No, virtio_iommu_req_head::type is the request type (ATTACH/DETACH/MAP/UNMAP/PROBE). Then virtio_iommu_probe_property::type is the property type (only RESV for the moment). And this is virtio_iommu_probe_resv::type, which is the type of the resv region (MSI). I renamed it to 'subtype' below, but I think it still is pretty confusing. I did a number of changes to structures and naming when trying to integrate it to the specification: * Added 64 bytes of padding in virtio_iommu_req_probe, so that future extensions can add fields in the device-readable part. * renamed "RESV" to "RESV_MEM". * The resv_mem property now looks like this: struct virtio_iommu_probe_resv_mem { u8 subtype; u8 padding[3]; le32 flags; le64 addr; le64 size; }; * subtype for MSI doorbells is now VIRTIO_IOMMU_PROBE_RESV_MEM_T_BYPASS (because transactions to this region bypass the IOMMU). 'flags' contain a hint VIRTIO_IOMMU_PROBE_RESV_MEM_F_MSI, telling the driver that this region is used for MSIs. Here is an example of a probe request returning an MSI doorbell property. 31 7 0 +---------------------------------+ | 0 | type | <- request type = PROBE (5) +---------------------------------+ | device | +---------------------------------+ : : : (64B padding) : : : +---------------------------------+ ^ | length = 24 | type = 1 | <- property type = RESV_MEM (1) | +---------------------------------+ | | 0 |subtype | <- RESV_MEM subtype = BYPASS (1) p| +---------------------------------+ r| | flags = MSI | o| +---------------------------------+ b| | addr = 0xfee00000 | e| | | _| +---------------------------------+ s| | size = 0x00100000 | i| | | z| +---------------------------------+ e| | length | type | <- another property may start | : : here v : ... : +---------------------------------+ | 0 | status | <- request tail +---------------------------------+ I'll try to send the next version of the spec out as soon as possible. Thanks, Jean > Thanks > -Bharat > >>> >>> type is 16 bit above? >> >> Ah, the naming isn't great. This is not the same as above, and could be >> called >> 'subtype' to avoid confusion. The above 16-bit type describes the field type, >> e.g. struct virtio_iommu_probe_resv. I proposed 16-bit because it seems >> easy to reach more than 255 kinds of endpoint properties, but >> 65535 should do. >> >> This subtype describes which kind of resv region is described in the >> structure. >> For the moment there only is VIRTIO_IOMMU_PROBE_RESV_MSI, but we >> could for example add resv regions that the driver should never use or that >> it >> should identity-map (equivalent to IOMMU_RESV_RESERVED/DIRECT in >> Linux). I think 8 bits should be enough to contain any future types, unless >> we >> make this a bitfield. For identity-map, there may be an additional flags >> field >> describing the protection. >> >>>> }; >>>> >>>> Such a region would be subject to the following rules: >>>> >>>> * Driver should not use any IOVA declared as RESV_MSI in a mapping. >>>> * Device should leave any transaction matching a RESV_MSI region pass >>>> through untranslated. >>>> * If the device does not advertise any RESV region, then the driver >>>> should assume that MSI doorbells, like any other GPA, must be mapped >>>> with an arbitrary IOVA in order for the endpoint to access them. >>>> * Given that the driver *should* perform a probe request if >>>> available, and it *should* understand the >> VIRTIO_IOMMU_PROBE_T_RESV >>>> field, then this field tells the guest how it should handle MSI >>>> doorbells, and whether it should map the address via MAP requests or >> not. >>>> >>>> Does this make sense and did I overlook something? >>> >>> Overall it looks good to me. Do you have plans to implements this in virtio- >> iommu driver and kvmtool? >> >> Yes, if there is no objection I'll try to formalize it and implement it >> right away. >> >> Thanks, >> Jean