On Thu, May 15, 2025 at 06:30:27AM +0000, Tian, Kevin wrote: > > From: Nicolin Chen <nicol...@nvidia.com> > > Sent: Friday, May 9, 2025 11:03 AM > > > > + > > +/** > > + * struct iommu_hw_queue_alloc - ioctl(IOMMU_HW_QUEUE_ALLOC) > > + * @size: sizeof(struct iommu_hw_queue_alloc) > > + * @flags: Must be 0 > > + * @viommu_id: Virtual IOMMU ID to associate the HW queue with > > + * @type: One of enum iommu_hw_queue_type > > + * @index: The logical index to the HW queue per virtual IOMMU for a > > multi-queue > > + * model > > I'm thinking of an alternative way w/o having the user to assign index > and allowing the driver to poke object dependency (next patch). > > Let's say the index is internally assigned by the driver. so this cmd is > just for allowing a hw queue and it's the driver to decide the allocation > policy, e.g. in ascending order. > > Introduce a new flag in viommu_ops to indicate to core that the > new hw queue should hold a reference to the previous hw queue. > > core maintains a last_queue field in viommu. Upon success return > from @hw_queue_alloc() the core increments the users refcnt of > last_queue, records the dependency in iommufd_hw_queue struct, > and update viommu->last_queue. > > Then the destroy order is naturally guaranteed.
I have thought about that too. It's nice that the core can easily maintain the dependency for the driver. But there would still need an out_index to mark each dynamically allocated queue. So VMM would know where it should map the queue. For example, if VMM wants to allocate a queue at its own index=1 without allocating index=0 first, kernel cannot fail that as VMM doesn't provide the index. The only way left for kernel would be to output the allocated queue with index=0 and then wish VMM can validate it, which doesn't sound safe.. > > + * @out_hw_queue_id: The ID of the new HW queue > > + * @base_addr: Base address of the queue memory in guest physical > > address space > > + * @length: Length of the queue memory in the guest physical address > > space > > length is agnostic to address space. Ack. * @length: Length of the queue memory > > +int iommufd_hw_queue_alloc_ioctl(struct iommufd_ucmd *ucmd) > > +{ > > + struct iommu_hw_queue_alloc *cmd = ucmd->cmd; > > + struct iommufd_hw_queue *hw_queue; > > + struct iommufd_hwpt_paging *hwpt; > > + struct iommufd_viommu *viommu; > > + struct page **pages; > > + int max_npages, i; > > + u64 end; > > + int rc; > > + > > + if (cmd->flags || cmd->type == IOMMU_HW_QUEUE_TYPE_DEFAULT) > > + return -EOPNOTSUPP; > > + if (!cmd->base_addr || !cmd->length) > > + return -EINVAL; > > + if (check_add_overflow(cmd->base_addr, cmd->length - 1, &end)) > > + return -EOVERFLOW; > > + > > + max_npages = DIV_ROUND_UP(cmd->length, PAGE_SIZE); > > what about [base_addr, base_addr+length) spanning two pages but > 'length' is smaller than the size of one page? Ah, right! Probably should be something like: offset = cmd->base_addr - PAGE_ALIGN(cmd->base_addr); max_npages = DIV_ROUND_UP(offset + cmd->length, PAGE_SIZE); > > + pages = kcalloc(max_npages, sizeof(*pages), GFP_KERNEL); > > + if (!pages) > > + return -ENOMEM; > > this could be moved to right before iopt_pin_pages(). Ack. > > + > > + viommu = iommufd_get_viommu(ucmd, cmd->viommu_id); > > + if (IS_ERR(viommu)) { > > + rc = PTR_ERR(viommu); > > + goto out_free; > > + } > > + hwpt = viommu->hwpt; > > + > > + if (!viommu->ops || !viommu->ops->hw_queue_alloc) { > > + rc = -EOPNOTSUPP; > > + goto out_put_viommu; > > + } > > + > > + /* Quick test on the base address */ > > + if (!iommu_iova_to_phys(hwpt->common.domain, cmd->base_addr)) > > { > > + rc = -ENXIO; > > + goto out_put_viommu; > > + } > > this check is redundant. Actually it's not future proof, assuming that > S2 must be pinned before the user attempts to call this cmd. But what > if one day iommufd supports unpinned S2 (if a device is 100% PRI faultable) > then this path will be broken. OK. Let's drop it. > > + hw_queue = viommu->ops->hw_queue_alloc(viommu, cmd->type, > > cmd->index, > > + cmd->base_addr, cmd->length); > > + if (IS_ERR(hw_queue)) { > > + rc = PTR_ERR(hw_queue); > > + goto out_unpin; > > + } > > + > > + hw_queue->viommu = viommu; > > + refcount_inc(&viommu->obj.users); > > + hw_queue->ictx = ucmd->ictx; > > viommu/ictx are already set by iommufd_hw_queue_alloc(). OK. We'd need to be careful if someday there is a core-allocated hw_queue that doesn't call iommufd_hw_queue_alloc(). Maybe I can put a note here. Thanks Nicolin