On Thu, May 15, 2025 at 06:30:27AM +0000, Tian, Kevin wrote:
> > From: Nicolin Chen <nicol...@nvidia.com>
> > Sent: Friday, May 9, 2025 11:03 AM
> > 
> > +
> > +/**
> > + * struct iommu_hw_queue_alloc - ioctl(IOMMU_HW_QUEUE_ALLOC)
> > + * @size: sizeof(struct iommu_hw_queue_alloc)
> > + * @flags: Must be 0
> > + * @viommu_id: Virtual IOMMU ID to associate the HW queue with
> > + * @type: One of enum iommu_hw_queue_type
> > + * @index: The logical index to the HW queue per virtual IOMMU for a
> > multi-queue
> > + *         model
> 
> I'm thinking of an alternative way w/o having the user to assign index
> and allowing the driver to poke object dependency (next patch).
> 
> Let's say the index is internally assigned by the driver. so this cmd is
> just for allowing a hw queue and it's the driver to decide the allocation
> policy, e.g. in ascending order.
> 
> Introduce a new flag in viommu_ops to indicate to core that the
> new hw queue should hold a reference to the previous hw queue.
> 
> core maintains a last_queue field in viommu. Upon success return
> from @hw_queue_alloc() the core increments the users refcnt of
> last_queue, records the dependency in iommufd_hw_queue struct,
> and update viommu->last_queue.
> 
> Then the destroy order is naturally guaranteed.

I have thought about that too. It's nice that the core can easily
maintain the dependency for the driver. 

But there would still need an out_index to mark each dynamically
allocated queue. So VMM would know where it should map the queue.

For example, if VMM wants to allocate a queue at its own index=1
without allocating index=0 first, kernel cannot fail that as VMM
doesn't provide the index. The only way left for kernel would be
to output the allocated queue with index=0 and then wish VMM can
validate it, which doesn't sound safe..

> > + * @out_hw_queue_id: The ID of the new HW queue
> > + * @base_addr: Base address of the queue memory in guest physical
> > address space
> > + * @length: Length of the queue memory in the guest physical address
> > space
> 
> length is agnostic to address space.

Ack.

* @length: Length of the queue memory

> > +int iommufd_hw_queue_alloc_ioctl(struct iommufd_ucmd *ucmd)
> > +{
> > +   struct iommu_hw_queue_alloc *cmd = ucmd->cmd;
> > +   struct iommufd_hw_queue *hw_queue;
> > +   struct iommufd_hwpt_paging *hwpt;
> > +   struct iommufd_viommu *viommu;
> > +   struct page **pages;
> > +   int max_npages, i;
> > +   u64 end;
> > +   int rc;
> > +
> > +   if (cmd->flags || cmd->type == IOMMU_HW_QUEUE_TYPE_DEFAULT)
> > +           return -EOPNOTSUPP;
> > +   if (!cmd->base_addr || !cmd->length)
> > +           return -EINVAL;
> > +   if (check_add_overflow(cmd->base_addr, cmd->length - 1, &end))
> > +           return -EOVERFLOW;
> > +
> > +   max_npages = DIV_ROUND_UP(cmd->length, PAGE_SIZE);
> 
> what about [base_addr, base_addr+length) spanning two pages but
> 'length' is smaller than the size of one page? 

Ah, right! Probably should be something like:

        offset = cmd->base_addr - PAGE_ALIGN(cmd->base_addr);
        max_npages = DIV_ROUND_UP(offset + cmd->length, PAGE_SIZE);

> > +   pages = kcalloc(max_npages, sizeof(*pages), GFP_KERNEL);
> > +   if (!pages)
> > +           return -ENOMEM;
> 
> this could be moved to right before iopt_pin_pages().

Ack.

> > +
> > +   viommu = iommufd_get_viommu(ucmd, cmd->viommu_id);
> > +   if (IS_ERR(viommu)) {
> > +           rc = PTR_ERR(viommu);
> > +           goto out_free;
> > +   }
> > +   hwpt = viommu->hwpt;
> > +
> > +   if (!viommu->ops || !viommu->ops->hw_queue_alloc) {
> > +           rc = -EOPNOTSUPP;
> > +           goto out_put_viommu;
> > +   }
> > +
> > +   /* Quick test on the base address */
> > +   if (!iommu_iova_to_phys(hwpt->common.domain, cmd->base_addr))
> > {
> > +           rc = -ENXIO;
> > +           goto out_put_viommu;
> > +   }
> 
> this check is redundant. Actually it's not future proof, assuming that
> S2 must be pinned before the user attempts to call this cmd. But what
> if one day iommufd supports unpinned S2 (if a device is 100% PRI faultable)
> then this path will be broken.
 
OK. Let's drop it.

> > +   hw_queue = viommu->ops->hw_queue_alloc(viommu, cmd->type,
> > cmd->index,
> > +                                          cmd->base_addr, cmd->length);
> > +   if (IS_ERR(hw_queue)) {
> > +           rc = PTR_ERR(hw_queue);
> > +           goto out_unpin;
> > +   }
> > +
> > +   hw_queue->viommu = viommu;
> > +   refcount_inc(&viommu->obj.users);
> > +   hw_queue->ictx = ucmd->ictx;
> 
> viommu/ictx are already set by iommufd_hw_queue_alloc().

OK. We'd need to be careful if someday there is a core-allocated
hw_queue that doesn't call iommufd_hw_queue_alloc(). Maybe I can
put a note here.

Thanks
Nicolin

Reply via email to