Hi Alex,

On 06/10/2016 22:42, Alex Williamson wrote:
> On Thu, 6 Oct 2016 14:20:40 -0600
> Alex Williamson <alex.william...@redhat.com> wrote:
> 
>> On Thu,  6 Oct 2016 08:45:31 +0000
>> Eric Auger <eric.au...@redhat.com> wrote:
>>
>>> This patch allows the user-space to retrieve the MSI geometry. The
>>> implementation is based on capability chains, now also added to
>>> VFIO_IOMMU_GET_INFO.
>>>
>>> The returned info comprise:
>>> - whether the MSI IOVA are constrained to a reserved range (x86 case) and
>>>   in the positive, the start/end of the aperture,
>>> - or whether the IOVA aperture need to be set by the userspace. In that
>>>   case, the size and alignment of the IOVA window to be provided are
>>>   returned.
>>>
>>> In case the userspace must provide the IOVA aperture, we currently report
>>> a size/alignment based on all the doorbells registered by the host kernel.
>>> This may exceed the actual needs.
>>>
>>> Signed-off-by: Eric Auger <eric.au...@redhat.com>
>>>
>>> ---
>>> v11 -> v11:
>>> - msi_doorbell_pages was renamed msi_doorbell_calc_pages
>>>
>>> v9 -> v10:
>>> - move cap_offset after iova_pgsizes
>>> - replace __u64 alignment by __u32 order
>>> - introduce __u32 flags in vfio_iommu_type1_info_cap_msi_geometry and
>>>   fix alignment
>>> - call msi-doorbell API to compute the size/alignment
>>>
>>> v8 -> v9:
>>> - use iommu_msi_supported flag instead of programmable
>>> - replace IOMMU_INFO_REQUIRE_MSI_MAP flag by a more sophisticated
>>>   capability chain, reporting the MSI geometry
>>>
>>> v7 -> v8:
>>> - use iommu_domain_msi_geometry
>>>
>>> v6 -> v7:
>>> - remove the computation of the number of IOVA pages to be provisionned.
>>>   This number depends on the domain/group/device topology which can
>>>   dynamically change. Let's rely instead rely on an arbitrary max depending
>>>   on the system
>>>
>>> v4 -> v5:
>>> - move msi_info and ret declaration within the conditional code
>>>
>>> v3 -> v4:
>>> - replace former vfio_domains_require_msi_mapping by
>>>   more complex computation of MSI mapping requirements, especially the
>>>   number of pages to be provided by the user-space.
>>> - reword patch title
>>>
>>> RFC v1 -> v1:
>>> - derived from
>>>   [RFC PATCH 3/6] vfio: Extend iommu-info to return MSIs automap state
>>> - renamed allow_msi_reconfig into require_msi_mapping
>>> - fixed VFIO_IOMMU_GET_INFO
>>> ---
>>>  drivers/vfio/vfio_iommu_type1.c | 78 
>>> ++++++++++++++++++++++++++++++++++++++++-
>>>  include/uapi/linux/vfio.h       | 32 ++++++++++++++++-
>>>  2 files changed, 108 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/vfio/vfio_iommu_type1.c 
>>> b/drivers/vfio/vfio_iommu_type1.c
>>> index dc3ee5d..ce5e7eb 100644
>>> --- a/drivers/vfio/vfio_iommu_type1.c
>>> +++ b/drivers/vfio/vfio_iommu_type1.c
>>> @@ -38,6 +38,8 @@
>>>  #include <linux/workqueue.h>
>>>  #include <linux/dma-iommu.h>
>>>  #include <linux/msi-doorbell.h>
>>> +#include <linux/irqdomain.h>
>>> +#include <linux/msi.h>
>>>  
>>>  #define DRIVER_VERSION  "0.2"
>>>  #define DRIVER_AUTHOR   "Alex Williamson <alex.william...@redhat.com>"
>>> @@ -1101,6 +1103,55 @@ static int vfio_domains_have_iommu_cache(struct 
>>> vfio_iommu *iommu)
>>>     return ret;
>>>  }
>>>  
>>> +static int compute_msi_geometry_caps(struct vfio_iommu *iommu,
>>> +                                struct vfio_info_cap *caps)
>>> +{
>>> +   struct vfio_iommu_type1_info_cap_msi_geometry *vfio_msi_geometry;
>>> +   unsigned long order = __ffs(vfio_pgsize_bitmap(iommu));
>>> +   struct iommu_domain_msi_geometry msi_geometry;
>>> +   struct vfio_info_cap_header *header;
>>> +   struct vfio_domain *d;
>>> +   bool reserved;
>>> +   size_t size;
>>> +
>>> +   mutex_lock(&iommu->lock);
>>> +   /* All domains have same require_msi_map property, pick first */
>>> +   d = list_first_entry(&iommu->domain_list, struct vfio_domain, next);
>>> +   iommu_domain_get_attr(d->domain, DOMAIN_ATTR_MSI_GEOMETRY,
>>> +                         &msi_geometry);
>>> +   reserved = !msi_geometry.iommu_msi_supported;
>>> +
>>> +   mutex_unlock(&iommu->lock);
>>> +
>>> +   size = sizeof(*vfio_msi_geometry);
>>> +   header = vfio_info_cap_add(caps, size,
>>> +                              VFIO_IOMMU_TYPE1_INFO_CAP_MSI_GEOMETRY, 1);
>>> +
>>> +   if (IS_ERR(header))
>>> +           return PTR_ERR(header);
>>> +
>>> +   vfio_msi_geometry = container_of(header,
>>> +                           struct vfio_iommu_type1_info_cap_msi_geometry,
>>> +                           header);
>>> +
>>> +   vfio_msi_geometry->flags = reserved;  
>>
>> Use the bit flag VFIO_IOMMU_MSI_GEOMETRY_RESERVED
>>
>>> +   if (reserved) {
>>> +           vfio_msi_geometry->aperture_start = msi_geometry.aperture_start;
>>> +           vfio_msi_geometry->aperture_end = msi_geometry.aperture_end;  
>>
>> But maybe nobody has set these, did you intend to use
>> iommu_domain_msi_aperture_valid(), which you defined early on but never
>> used?
>>
>>> +           return 0;
>>> +   }
>>> +
>>> +   vfio_msi_geometry->order = order;  
>>
>> I'm tempted to suggest that a user could do the same math on their own
>> since we provide the supported bitmap already... could it ever not be
>> the same? 
>>
>>> +   /*
>>> +    * we compute a system-wide requirement based on all the registered
>>> +    * doorbells
>>> +    */
>>> +   vfio_msi_geometry->size =
>>> +           msi_doorbell_calc_pages(order) * ((uint64_t) 1 << order);
>>> +
>>> +   return 0;
>>> +}
>>> +
>>>  static long vfio_iommu_type1_ioctl(void *iommu_data,
>>>                                unsigned int cmd, unsigned long arg)
>>>  {
>>> @@ -1122,8 +1173,10 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
>>>             }
>>>     } else if (cmd == VFIO_IOMMU_GET_INFO) {
>>>             struct vfio_iommu_type1_info info;
>>> +           struct vfio_info_cap caps = { .buf = NULL, .size = 0 };
>>> +           int ret;
>>>  
>>> -           minsz = offsetofend(struct vfio_iommu_type1_info, iova_pgsizes);
>>> +           minsz = offsetofend(struct vfio_iommu_type1_info, cap_offset);
>>>  
>>>             if (copy_from_user(&info, (void __user *)arg, minsz))
>>>                     return -EFAULT;
>>> @@ -1135,6 +1188,29 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
>>>  
>>>             info.iova_pgsizes = vfio_pgsize_bitmap(iommu);
>>>  
>>> +           ret = compute_msi_geometry_caps(iommu, &caps);
>>> +           if (ret)
>>> +                   return ret;
>>> +
>>> +           if (caps.size) {
>>> +                   info.flags |= VFIO_IOMMU_INFO_CAPS;
>>> +                   if (info.argsz < sizeof(info) + caps.size) {
>>> +                           info.argsz = sizeof(info) + caps.size;
>>> +                           info.cap_offset = 0;
>>> +                   } else {
>>> +                           vfio_info_cap_shift(&caps, sizeof(info));
>>> +                           if (copy_to_user((void __user *)arg +
>>> +                                           sizeof(info), caps.buf,
>>> +                                           caps.size)) {
>>> +                                   kfree(caps.buf);
>>> +                                   return -EFAULT;
>>> +                           }
>>> +                           info.cap_offset = sizeof(info);
>>> +                   }
>>> +
>>> +                   kfree(caps.buf);
>>> +           }
>>> +
>>>             return copy_to_user((void __user *)arg, &info, minsz) ?
>>>                     -EFAULT : 0;
>>>  
>>> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
>>> index 4a9dbc2..8dae013 100644
>>> --- a/include/uapi/linux/vfio.h
>>> +++ b/include/uapi/linux/vfio.h
>>> @@ -488,7 +488,35 @@ struct vfio_iommu_type1_info {
>>>     __u32   argsz;
>>>     __u32   flags;
>>>  #define VFIO_IOMMU_INFO_PGSIZES (1 << 0)   /* supported page sizes info */
>>> -   __u64   iova_pgsizes;           /* Bitmap of supported page sizes */
>>> +#define VFIO_IOMMU_INFO_CAPS       (1 << 1)        /* Info supports caps */
>>> +   __u64   iova_pgsizes;   /* Bitmap of supported page sizes */
>>> +   __u32   __resv;
>>> +   __u32   cap_offset;     /* Offset within info struct of first cap */
>>> +};  
>>
>> I understand the padding, but not the ordering.  Why not end with
>> padding?
>>
>>> +
>>> +#define VFIO_IOMMU_TYPE1_INFO_CAP_MSI_GEOMETRY     1
>>> +
>>> +/*
>>> + * The MSI geometry capability allows to report the MSI IOVA geometry:
>>> + * - either the MSI IOVAs are constrained within a reserved IOVA aperture
>>> + *   whose boundaries are given by [@aperture_start, @aperture_end].
>>> + *   this is typically the case on x86 host. The userspace is not allowed
>>> + *   to map userspace memory at IOVAs intersecting this range using
>>> + *   VFIO_IOMMU_MAP_DMA.
>>> + * - or the MSI IOVAs are not requested to belong to any reserved range;
>>> + *   in that case the userspace must provide an IOVA window characterized 
>>> by
>>> + *   @size and @alignment using VFIO_IOMMU_MAP_DMA with RESERVED_MSI_IOVA 
>>> flag.
>>> + */
>>> +struct vfio_iommu_type1_info_cap_msi_geometry {
>>> +   struct vfio_info_cap_header header;
>>> +   __u32 flags;
>>> +#define VFIO_IOMMU_MSI_GEOMETRY_RESERVED (1 << 0) /* reserved geometry */
>>> +   /* not reserved */
>>> +   __u32 order; /* iommu page order used for aperture alignment*/
>>> +   __u64 size; /* IOVA aperture size (bytes) the userspace must provide */
>>> +   /* reserved */
>>> +   __u64 aperture_start;
>>> +   __u64 aperture_end;  
>>
>> Should these be a union?  We never set them both.  Should the !reserved
>> case have a flag as well, so the user can positively identify what's
>> being provided?
> 
> Actually, is there really any need to fit both of these within the same
> structure?  Part of the idea of the capability chains is we can create
> a capability for each new thing we want to describe.  So, we could
> simply define a generic reserved IOVA range capability with a 'start'
> and 'end' and then another capability to define MSI mapping
> requirements.  Thanks,
Yes your suggested approach makes sense to me.

One reason why I proceeded that way is we are mixing things at iommu.h
level too. Personally I would have preferred to separate things:
1) add a new IOMMU_CAP_TRANSLATE_MSI capability in iommu_cap
2) rename iommu_msi_supported into "programmable" bool: reporting
whether the aperture is reserved or programmable.

In the early releases I think it was as above but slightly we moved to a
mixed description.

What do you think?

Thank you for the whole review!

Eric

> 
> Alex
>  
>>>  };
>>>  
>>>  #define VFIO_IOMMU_GET_INFO _IO(VFIO_TYPE, VFIO_BASE + 12)
>>> @@ -503,6 +531,8 @@ struct vfio_iommu_type1_info {
>>>   * IOVA region that will be used on some platforms to map the host MSI 
>>> frames.
>>>   * In that specific case, vaddr is ignored. Once registered, an MSI 
>>> reserved
>>>   * IOVA region stays until the container is closed.
>>> + * The requirement for provisioning such reserved IOVA range can be 
>>> checked by
>>> + * checking the VFIO_IOMMU_TYPE1_INFO_CAP_MSI_GEOMETRY capability.
>>>   */
>>>  struct vfio_iommu_type1_dma_map {
>>>     __u32   argsz;  
>>
> 
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-ker...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> 
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Reply via email to