[PATCH v3 0/8] IOMMUFD: Deliver IO page faults to user space
This series implements the functionality of delivering IO page faults to user space through the IOMMUFD framework. One feasible use case is the nested translation. Nested translation is a hardware feature that supports two-stage translation tables for IOMMU. The second-stage translation table is managed by the host VMM, while the first-stage translation table is owned by user space. This allows user space to control the IOMMU mappings for its devices. When an IO page fault occurs on the first-stage translation table, the IOMMU hardware can deliver the page fault to user space through the IOMMUFD framework. User space can then handle the page fault and respond to the device top-down through the IOMMUFD. This allows user space to implement its own IO page fault handling policies. User space application that is capable of handling IO page faults should allocate a fault object, and bind the fault object to any domain that it is willing to handle the fault generatd for them. On a successful return of fault object allocation, the user can retrieve and respond to page faults by reading or writing to the file descriptor (FD) returned. The iommu selftest framework has been updated to test the IO page fault delivery and response functionality. This series is based on the page fault handling framework refactoring in the IOMMU core [1]. The series and related patches are available on GitHub: [2] [1] https://lore.kernel.org/linux-iommu/20240122054308.23901-1-baolu...@linux.intel.com/ [2] https://github.com/LuBaolu/intel-iommu/commits/iommufd-io-pgfault-delivery-v3 Best regards, baolu Change log: v3: - Add iopf domain attach/detach/replace interfaces to manage the reference counters of hwpt and device, ensuring that both can only be destroyed after all outstanding IOPFs have been responded to. - Relocate the fault handling file descriptor from hwpt to a fault object to enable a single fault handling object to be utilized across multiple domains. - Miscellaneous cleanup and performance improvements. v2: https://lore.kernel.org/linux-iommu/20231026024930.382898-1-baolu...@linux.intel.com/ - Move all iommu refactoring patches into a sparated series and discuss it in a different thread. The latest patch series [v6] is available at https://lore.kernel.org/linux-iommu/20230928042734.16134-1-baolu...@linux.intel.com/ - We discussed the timeout of the pending page fault messages. We agreed that we shouldn't apply any timeout policy for the page fault handling in user space. https://lore.kernel.org/linux-iommu/20230616113232.GA84678@myrica/ - Jason suggested that we adopt a simple file descriptor interface for reading and responding to I/O page requests, so that user space applications can improve performance using io_uring. https://lore.kernel.org/linux-iommu/zjwjd1ajeem6p...@ziepe.ca/ v1: https://lore.kernel.org/linux-iommu/20230530053724.232765-1-baolu...@linux.intel.com/ Lu Baolu (8): iommu: Add iopf domain attach/detach/replace interface iommu/sva: Use iopf domain attach/detach interface iommufd: Add fault and response message definitions iommufd: Add iommufd fault object iommufd: Associate fault object with iommufd_hw_pgtable iommufd: IOPF-capable hw page table attach/detach/replace iommufd/selftest: Add IOPF support for mock device iommufd/selftest: Add coverage for IOPF test include/linux/iommu.h | 40 +- drivers/iommu/iommufd/iommufd_private.h | 41 ++ drivers/iommu/iommufd/iommufd_test.h | 8 + include/uapi/linux/iommufd.h | 91 tools/testing/selftests/iommu/iommufd_utils.h | 83 +++- drivers/iommu/io-pgfault.c| 215 -- drivers/iommu/iommu-sva.c | 48 ++- drivers/iommu/iommufd/device.c| 16 +- drivers/iommu/iommufd/fault.c | 391 ++ drivers/iommu/iommufd/hw_pagetable.c | 36 +- drivers/iommu/iommufd/main.c | 6 + drivers/iommu/iommufd/selftest.c | 63 +++ tools/testing/selftests/iommu/iommufd.c | 17 + .../selftests/iommu/iommufd_fail_nth.c| 2 +- drivers/iommu/iommufd/Makefile| 1 + 15 files changed, 1001 insertions(+), 57 deletions(-) create mode 100644 drivers/iommu/iommufd/fault.c -- 2.34.1
[PATCH v3 1/8] iommu: Add iopf domain attach/detach/replace interface
There is a slight difference between iopf domains and non-iopf domains. In the latter, references to domains occur between attach and detach; While in the former, due to the existence of asynchronous iopf handling paths, references to the domain may occur after detach, which leads to potential UAF issues. Introduce iopf-specific domain attach/detach/replace interface where the caller provides an attach cookie. This cookie can only be freed after all outstanding iopf groups are handled and the domain is detached from the RID or PASID. The presence of this attach cookie indicates that a domain has been attached to the RID or PASID and won't be released until all outstanding iopf groups are handled. The cookie data structure also includes a private field for storing a caller-specific pointer that will be passed back to its page fault handler. This field provides flexibility for various uses. For example, the IOMMUFD could use it to store the iommufd_device pointer, so that it could easily retrieve the dev_id of the device that triggered the fault. Signed-off-by: Lu Baolu --- include/linux/iommu.h | 36 + drivers/iommu/io-pgfault.c | 158 + 2 files changed, 194 insertions(+) diff --git a/include/linux/iommu.h b/include/linux/iommu.h index 1ccad10e8164..6d85be23952a 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -120,6 +120,16 @@ struct iommu_page_response { u32 code; }; +struct iopf_attach_cookie { + struct iommu_domain *domain; + struct device *dev; + unsigned int pasid; + refcount_t users; + + void *private; + void (*release)(struct iopf_attach_cookie *cookie); +}; + struct iopf_fault { struct iommu_fault fault; /* node for pending lists */ @@ -699,6 +709,7 @@ struct iommu_fault_param { struct device *dev; struct iopf_queue *queue; struct list_head queue_list; + struct xarray pasid_cookie; struct list_head partial; struct list_head faults; @@ -1552,6 +1563,12 @@ void iopf_free_group(struct iopf_group *group); void iommu_report_device_fault(struct device *dev, struct iopf_fault *evt); void iopf_group_response(struct iopf_group *group, enum iommu_page_response_code status); +int iopf_domain_attach(struct iommu_domain *domain, struct device *dev, + ioasid_t pasid, struct iopf_attach_cookie *cookie); +void iopf_domain_detach(struct iommu_domain *domain, struct device *dev, + ioasid_t pasid); +int iopf_domain_replace(struct iommu_domain *domain, struct device *dev, + ioasid_t pasid, struct iopf_attach_cookie *cookie); #else static inline int iopf_queue_add_device(struct iopf_queue *queue, struct device *dev) @@ -1596,5 +1613,24 @@ static inline void iopf_group_response(struct iopf_group *group, enum iommu_page_response_code status) { } + +static inline int iopf_domain_attach(struct iommu_domain *domain, +struct device *dev, ioasid_t pasid, +struct iopf_attach_cookie *cookie) +{ + return -ENODEV; +} + +static inline void iopf_domain_detach(struct iommu_domain *domain, + struct device *dev, ioasid_t pasid) +{ +} + +static inline int iopf_domain_replace(struct iommu_domain *domain, + struct device *dev, ioasid_t pasid, + struct iopf_attach_cookie *cookie) +{ + return -ENODEV; +} #endif /* CONFIG_IOMMU_IOPF */ #endif /* __LINUX_IOMMU_H */ diff --git a/drivers/iommu/io-pgfault.c b/drivers/iommu/io-pgfault.c index b64229dab976..f7ce41573799 100644 --- a/drivers/iommu/io-pgfault.c +++ b/drivers/iommu/io-pgfault.c @@ -39,6 +39,103 @@ static void iopf_put_dev_fault_param(struct iommu_fault_param *fault_param) kfree_rcu(fault_param, rcu); } +/* Get the domain attachment cookie for pasid of a device. */ +static struct iopf_attach_cookie __maybe_unused * +iopf_pasid_cookie_get(struct device *dev, ioasid_t pasid) +{ + struct iommu_fault_param *iopf_param = iopf_get_dev_fault_param(dev); + struct iopf_attach_cookie *curr; + + if (!iopf_param) + return ERR_PTR(-ENODEV); + + xa_lock(&iopf_param->pasid_cookie); + curr = xa_load(&iopf_param->pasid_cookie, pasid); + if (curr && !refcount_inc_not_zero(&curr->users)) + curr = ERR_PTR(-EINVAL); + xa_unlock(&iopf_param->pasid_cookie); + + iopf_put_dev_fault_param(iopf_param); + + return curr; +} + +/* Put the domain attachment cookie. */ +static void iopf_pasid_cookie_put(struct iopf_attach_cookie *cookie) +{ + if (cookie && refcount_dec_and_test(&cookie->users)) + cookie->release(cookie); +} + +/* + * Set the domain attachment cookie for pasid of a devi
[PATCH v3 2/8] iommu/sva: Use iopf domain attach/detach interface
The iommu sva implementation relies on iopf handling. Allocate an attachment cookie and use the iopf domain attach/detach interface. The SVA domain is guaranteed to be released after all outstanding page faults are handled. In the fault delivering path, the attachment cookie is retrieved, instead of the domain. This ensures that the page fault is forwarded only if an iopf-capable domain is attached, and the domain will only be released after all outstanding faults are handled. Signed-off-by: Lu Baolu --- include/linux/iommu.h | 2 +- drivers/iommu/io-pgfault.c | 59 +++--- drivers/iommu/iommu-sva.c | 48 --- 3 files changed, 68 insertions(+), 41 deletions(-) diff --git a/include/linux/iommu.h b/include/linux/iommu.h index 6d85be23952a..511dc7b4bdb2 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -142,9 +142,9 @@ struct iopf_group { /* list node for iommu_fault_param::faults */ struct list_head pending_node; struct work_struct work; - struct iommu_domain *domain; /* The device's fault data parameter. */ struct iommu_fault_param *fault_param; + struct iopf_attach_cookie *cookie; }; /** diff --git a/drivers/iommu/io-pgfault.c b/drivers/iommu/io-pgfault.c index f7ce41573799..2567d8c04e46 100644 --- a/drivers/iommu/io-pgfault.c +++ b/drivers/iommu/io-pgfault.c @@ -40,7 +40,7 @@ static void iopf_put_dev_fault_param(struct iommu_fault_param *fault_param) } /* Get the domain attachment cookie for pasid of a device. */ -static struct iopf_attach_cookie __maybe_unused * +static struct iopf_attach_cookie * iopf_pasid_cookie_get(struct device *dev, ioasid_t pasid) { struct iommu_fault_param *iopf_param = iopf_get_dev_fault_param(dev); @@ -147,6 +147,7 @@ static void __iopf_free_group(struct iopf_group *group) /* Pair with iommu_report_device_fault(). */ iopf_put_dev_fault_param(group->fault_param); + iopf_pasid_cookie_put(group->cookie); } void iopf_free_group(struct iopf_group *group) @@ -156,30 +157,6 @@ void iopf_free_group(struct iopf_group *group) } EXPORT_SYMBOL_GPL(iopf_free_group); -static struct iommu_domain *get_domain_for_iopf(struct device *dev, - struct iommu_fault *fault) -{ - struct iommu_domain *domain; - - if (fault->prm.flags & IOMMU_FAULT_PAGE_REQUEST_PASID_VALID) { - domain = iommu_get_domain_for_dev_pasid(dev, fault->prm.pasid, 0); - if (IS_ERR(domain)) - domain = NULL; - } else { - domain = iommu_get_domain_for_dev(dev); - } - - if (!domain || !domain->iopf_handler) { - dev_warn_ratelimited(dev, - "iopf (pasid %d) without domain attached or handler installed\n", -fault->prm.pasid); - - return NULL; - } - - return domain; -} - /* Non-last request of a group. Postpone until the last one. */ static int report_partial_fault(struct iommu_fault_param *fault_param, struct iommu_fault *fault) @@ -199,10 +176,20 @@ static int report_partial_fault(struct iommu_fault_param *fault_param, return 0; } +static ioasid_t fault_to_pasid(struct iommu_fault *fault) +{ + if (fault->prm.flags & IOMMU_FAULT_PAGE_REQUEST_PASID_VALID) + return fault->prm.pasid; + + return IOMMU_NO_PASID; +} + static struct iopf_group *iopf_group_alloc(struct iommu_fault_param *iopf_param, struct iopf_fault *evt, struct iopf_group *abort_group) { + ioasid_t pasid = fault_to_pasid(&evt->fault); + struct iopf_attach_cookie *cookie; struct iopf_fault *iopf, *next; struct iopf_group *group; @@ -215,7 +202,23 @@ static struct iopf_group *iopf_group_alloc(struct iommu_fault_param *iopf_param, group = abort_group; } + cookie = iopf_pasid_cookie_get(iopf_param->dev, pasid); + if (!cookie && pasid != IOMMU_NO_PASID) + cookie = iopf_pasid_cookie_get(iopf_param->dev, IOMMU_NO_PASID); + if (IS_ERR(cookie) || !cookie) { + /* +* The PASID of this device was not attached by an I/O-capable +* domain. Ask the caller to abort handling of this fault. +* Otherwise, the reference count will be switched to the new +* iopf group and will be released in iopf_free_group(). +*/ + kfree(group); + group = abort_group; + cookie = NULL; + } + group->fault_param = iopf_param; + group->cookie = cookie; group->last_fault.fault = evt->fault; INIT_LIST_HEAD(&group->faults); INIT_LIST_HEAD(&group->pending_node); @@ -305,15 +308,11 @@ void
[PATCH v3 3/8] iommufd: Add fault and response message definitions
iommu_hwpt_pgfaults represent fault messages that the userspace can retrieve. Multiple iommu_hwpt_pgfaults might be put in an iopf group, with the IOMMU_PGFAULT_FLAGS_LAST_PAGE flag set only for the last iommu_hwpt_pgfault. An iommu_hwpt_page_response is a response message that the userspace should send to the kernel after finishing handling a group of fault messages. The @dev_id, @pasid, and @grpid fields in the message identify an outstanding iopf group for a device. The @addr field, which matches the fault address of the last fault in the group, will be used by the kernel for a sanity check. Signed-off-by: Lu Baolu --- include/uapi/linux/iommufd.h | 67 1 file changed, 67 insertions(+) diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h index 1dfeaa2e649e..d59e839ae49e 100644 --- a/include/uapi/linux/iommufd.h +++ b/include/uapi/linux/iommufd.h @@ -692,4 +692,71 @@ struct iommu_hwpt_invalidate { __u32 __reserved; }; #define IOMMU_HWPT_INVALIDATE _IO(IOMMUFD_TYPE, IOMMUFD_CMD_HWPT_INVALIDATE) + +/** + * enum iommu_hwpt_pgfault_flags - flags for struct iommu_hwpt_pgfault + * @IOMMU_PGFAULT_FLAGS_PASID_VALID: The pasid field of the fault data is + * valid. + * @IOMMU_PGFAULT_FLAGS_LAST_PAGE: It's the last fault of a fault group. + */ +enum iommu_hwpt_pgfault_flags { + IOMMU_PGFAULT_FLAGS_PASID_VALID = (1 << 0), + IOMMU_PGFAULT_FLAGS_LAST_PAGE = (1 << 1), +}; + +/** + * enum iommu_hwpt_pgfault_perm - perm bits for struct iommu_hwpt_pgfault + * @IOMMU_PGFAULT_PERM_READ: request for read permission + * @IOMMU_PGFAULT_PERM_WRITE: request for write permission + * @IOMMU_PGFAULT_PERM_EXEC: request for execute permission + * @IOMMU_PGFAULT_PERM_PRIV: request for privileged permission + */ +enum iommu_hwpt_pgfault_perm { + IOMMU_PGFAULT_PERM_READ = (1 << 0), + IOMMU_PGFAULT_PERM_WRITE= (1 << 1), + IOMMU_PGFAULT_PERM_EXEC = (1 << 2), + IOMMU_PGFAULT_PERM_PRIV = (1 << 3), +}; + +/** + * struct iommu_hwpt_pgfault - iommu page fault data + * @size: sizeof(struct iommu_hwpt_pgfault) + * @flags: Combination of enum iommu_hwpt_pgfault_flags + * @dev_id: id of the originated device + * @pasid: Process Address Space ID + * @grpid: Page Request Group Index + * @perm: Combination of enum iommu_hwpt_pgfault_perm + * @addr: page address + */ +struct iommu_hwpt_pgfault { + __u32 size; + __u32 flags; + __u32 dev_id; + __u32 pasid; + __u32 grpid; + __u32 perm; + __u64 addr; +}; + +/** + * struct iommu_hwpt_page_response - IOMMU page fault response + * @size: sizeof(struct iommu_hwpt_page_response) + * @flags: Must be set to 0 + * @dev_id: device ID of target device for the response + * @pasid: Process Address Space ID + * @grpid: Page Request Group Index + * @code: response code. The supported codes include: + *0: Successful; 1: Response Failure; 2: Invalid Request. + * @addr: The fault address. Must match the addr field of the + *last iommu_hwpt_pgfault of a reported iopf group. + */ +struct iommu_hwpt_page_response { + __u32 size; + __u32 flags; + __u32 dev_id; + __u32 pasid; + __u32 grpid; + __u32 code; + __u64 addr; +}; #endif -- 2.34.1
[PATCH v3 4/8] iommufd: Add iommufd fault object
An iommufd fault object provides an interface for delivering I/O page faults to user space. These objects are created and destroyed by user space, and they can be associated with or dissociated from hardware page table objects during page table allocation or destruction. User space interacts with the fault object through a file interface. This interface offers a straightforward and efficient way for user space to handle page faults. It allows user space to read fault messages sequentially and respond to them by writing to the same file. The file interface supports reading messages in poll mode, so it's recommended that user space applications use io_uring to enhance read and write efficiency. A fault object can be associated with any iopf-capable iommufd_hw_pgtable during the pgtable's allocation. All I/O page faults triggered by devices when accessing the I/O addresses of an iommufd_hw_pgtable are routed through the fault object to user space. Similarly, user space's responses to these page faults are routed back to the iommu device driver through the same fault object. Signed-off-by: Lu Baolu --- include/linux/iommu.h | 2 + drivers/iommu/iommufd/iommufd_private.h | 23 +++ include/uapi/linux/iommufd.h| 18 ++ drivers/iommu/iommufd/device.c | 1 + drivers/iommu/iommufd/fault.c | 255 drivers/iommu/iommufd/main.c| 6 + drivers/iommu/iommufd/Makefile | 1 + 7 files changed, 306 insertions(+) create mode 100644 drivers/iommu/iommufd/fault.c diff --git a/include/linux/iommu.h b/include/linux/iommu.h index 511dc7b4bdb2..4372648ac22e 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -145,6 +145,8 @@ struct iopf_group { /* The device's fault data parameter. */ struct iommu_fault_param *fault_param; struct iopf_attach_cookie *cookie; + /* Used by handler provider to hook the group on its own lists. */ + struct list_head node; }; /** diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h index 991f864d1f9b..52d83e888bd0 100644 --- a/drivers/iommu/iommufd/iommufd_private.h +++ b/drivers/iommu/iommufd/iommufd_private.h @@ -128,6 +128,7 @@ enum iommufd_object_type { IOMMUFD_OBJ_HWPT_NESTED, IOMMUFD_OBJ_IOAS, IOMMUFD_OBJ_ACCESS, + IOMMUFD_OBJ_FAULT, #ifdef CONFIG_IOMMUFD_TEST IOMMUFD_OBJ_SELFTEST, #endif @@ -395,6 +396,8 @@ struct iommufd_device { /* always the physical device */ struct device *dev; bool enforce_cache_coherency; + /* outstanding faults awaiting response indexed by fault group id */ + struct xarray faults; }; static inline struct iommufd_device * @@ -426,6 +429,26 @@ void iopt_remove_access(struct io_pagetable *iopt, u32 iopt_access_list_id); void iommufd_access_destroy_object(struct iommufd_object *obj); +/* + * An iommufd_fault object represents an interface to deliver I/O page faults + * to the user space. These objects are created/destroyed by the user space and + * associated with hardware page table objects during page-table allocation. + */ +struct iommufd_fault { + struct iommufd_object obj; + struct iommufd_ctx *ictx; + + /* The lists of outstanding faults protected by below mutex. */ + struct mutex mutex; + struct list_head deliver; + struct list_head response; + + struct wait_queue_head wait_queue; +}; + +int iommufd_fault_alloc(struct iommufd_ucmd *ucmd); +void iommufd_fault_destroy(struct iommufd_object *obj); + #ifdef CONFIG_IOMMUFD_TEST int iommufd_test(struct iommufd_ucmd *ucmd); void iommufd_selftest_destroy(struct iommufd_object *obj); diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h index d59e839ae49e..c32d62b02306 100644 --- a/include/uapi/linux/iommufd.h +++ b/include/uapi/linux/iommufd.h @@ -50,6 +50,7 @@ enum { IOMMUFD_CMD_HWPT_SET_DIRTY_TRACKING, IOMMUFD_CMD_HWPT_GET_DIRTY_BITMAP, IOMMUFD_CMD_HWPT_INVALIDATE, + IOMMUFD_CMD_FAULT_ALLOC, }; /** @@ -759,4 +760,21 @@ struct iommu_hwpt_page_response { __u32 code; __u64 addr; }; + +/** + * struct iommu_fault_alloc - ioctl(IOMMU_FAULT_ALLOC) + * @size: sizeof(struct iommu_fault_alloc) + * @flags: Must be 0 + * @out_fault_id: The ID of the new FAULT + * @out_fault_fd: The fd of the new FAULT + * + * Explicitly allocate a fault handling object. + */ +struct iommu_fault_alloc { + __u32 size; + __u32 flags; + __u32 out_fault_id; + __u32 out_fault_fd; +}; +#define IOMMU_FAULT_ALLOC _IO(IOMMUFD_TYPE, IOMMUFD_CMD_FAULT_ALLOC) #endif diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c index 873630c111c1..d70913ee8fdf 100644 --- a/drivers/iommu/iommufd/device.c +++ b/drivers/iommu/iommufd/device.c @@ -215,6 +215,7 @@ struct iommufd_device *iommufd_device_bind(struct
[PATCH v3 5/8] iommufd: Associate fault object with iommufd_hw_pgtable
When allocating a user iommufd_hw_pagetable, the user space is allowed to associate a fault object with the hw_pagetable by specifying the fault object ID in the page table allocation data and setting the IOMMU_HWPT_FAULT_ID_VALID flag bit. On a successful return of hwpt allocation, the user can retrieve and respond to page faults by reading and writing the file interface of the fault object. Once a fault object has been associated with a hwpt, the hwpt is iopf-capable, indicated by fault_capable set to true. Attaching, detaching, or replacing an iopf-capable hwpt to an RID or PASID will differ from those that are not iopf-capable. The implementation of these will be introduced in the next patch. Signed-off-by: Lu Baolu --- drivers/iommu/iommufd/iommufd_private.h | 11 include/uapi/linux/iommufd.h| 6 + drivers/iommu/iommufd/fault.c | 14 ++ drivers/iommu/iommufd/hw_pagetable.c| 36 +++-- 4 files changed, 59 insertions(+), 8 deletions(-) diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h index 52d83e888bd0..2780bed0c6b1 100644 --- a/drivers/iommu/iommufd/iommufd_private.h +++ b/drivers/iommu/iommufd/iommufd_private.h @@ -293,6 +293,8 @@ int iommufd_check_iova_range(struct io_pagetable *iopt, struct iommufd_hw_pagetable { struct iommufd_object obj; struct iommu_domain *domain; + struct iommufd_fault *fault; + bool fault_capable : 1; }; struct iommufd_hwpt_paging { @@ -446,8 +448,17 @@ struct iommufd_fault { struct wait_queue_head wait_queue; }; +static inline struct iommufd_fault * +iommufd_get_fault(struct iommufd_ucmd *ucmd, u32 id) +{ + return container_of(iommufd_get_object(ucmd->ictx, id, + IOMMUFD_OBJ_FAULT), + struct iommufd_fault, obj); +} + int iommufd_fault_alloc(struct iommufd_ucmd *ucmd); void iommufd_fault_destroy(struct iommufd_object *obj); +int iommufd_fault_iopf_handler(struct iopf_group *group); #ifdef CONFIG_IOMMUFD_TEST int iommufd_test(struct iommufd_ucmd *ucmd); diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h index c32d62b02306..7481cdd57027 100644 --- a/include/uapi/linux/iommufd.h +++ b/include/uapi/linux/iommufd.h @@ -357,10 +357,13 @@ struct iommu_vfio_ioas { *the parent HWPT in a nesting configuration. * @IOMMU_HWPT_ALLOC_DIRTY_TRACKING: Dirty tracking support for device IOMMU is * enforced on device attachment + * @IOMMU_HWPT_FAULT_ID_VALID: The fault_id field of hwpt allocation data is + * valid. */ enum iommufd_hwpt_alloc_flags { IOMMU_HWPT_ALLOC_NEST_PARENT = 1 << 0, IOMMU_HWPT_ALLOC_DIRTY_TRACKING = 1 << 1, + IOMMU_HWPT_FAULT_ID_VALID = 1 << 2, }; /** @@ -411,6 +414,8 @@ enum iommu_hwpt_data_type { * @__reserved: Must be 0 * @data_type: One of enum iommu_hwpt_data_type * @data_len: Length of the type specific data + * @fault_id: The ID of IOMMUFD_FAULT object. Valid only if flags field of + *IOMMU_HWPT_FAULT_ID_VALID is set. * @data_uptr: User pointer to the type specific data * * Explicitly allocate a hardware page table object. This is the same object @@ -441,6 +446,7 @@ struct iommu_hwpt_alloc { __u32 __reserved; __u32 data_type; __u32 data_len; + __u32 fault_id; __aligned_u64 data_uptr; }; #define IOMMU_HWPT_ALLOC _IO(IOMMUFD_TYPE, IOMMUFD_CMD_HWPT_ALLOC) diff --git a/drivers/iommu/iommufd/fault.c b/drivers/iommu/iommufd/fault.c index 9844a85feeb2..e752d1c49dde 100644 --- a/drivers/iommu/iommufd/fault.c +++ b/drivers/iommu/iommufd/fault.c @@ -253,3 +253,17 @@ int iommufd_fault_alloc(struct iommufd_ucmd *ucmd) return rc; } + +int iommufd_fault_iopf_handler(struct iopf_group *group) +{ + struct iommufd_hw_pagetable *hwpt = group->cookie->domain->fault_data; + struct iommufd_fault *fault = hwpt->fault; + + mutex_lock(&fault->mutex); + list_add_tail(&group->node, &fault->deliver); + mutex_unlock(&fault->mutex); + + wake_up_interruptible(&fault->wait_queue); + + return 0; +} diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c index 3f3f1fa1a0a9..2703d5aea4f5 100644 --- a/drivers/iommu/iommufd/hw_pagetable.c +++ b/drivers/iommu/iommufd/hw_pagetable.c @@ -8,6 +8,15 @@ #include "../iommu-priv.h" #include "iommufd_private.h" +static void __iommufd_hwpt_destroy(struct iommufd_hw_pagetable *hwpt) +{ + if (hwpt->domain) + iommu_domain_free(hwpt->domain); + + if (hwpt->fault) + iommufd_put_object(hwpt->fault->ictx, &hwpt->fault->obj); +} + void iommufd_hwpt_paging_destroy(struct iommufd_object *obj) { struct iommufd_hwpt_paging *hwpt_paging = @@ -22,9 +31,7 @@ void iommufd_hwpt_paging_de
[PATCH v3 6/8] iommufd: IOPF-capable hw page table attach/detach/replace
The iopf-capable hw page table attach/detach/replace should use the iommu iopf-specific interfaces. The pointer to iommufd_device is stored in the private field of the attachment cookie, so that it can be easily retrieved in the fault handling paths. The references to iommufd_device and iommufd_hw_pagetable objects are held until the cookie is released, which happens after the hw_pagetable is detached from the device and all outstanding iopf's are responded to. This guarantees that both the device and hw_pagetable are valid before domain detachment and outstanding faults are handled. The iopf-capable hw page tables can only be attached to devices that support the IOMMU_DEV_FEAT_IOPF feature. On the first attachment of an iopf-capable hw_pagetable to the device, the IOPF feature is enabled on the device. Similarly, after the last iopf-capable hwpt is detached from the device, the IOPF feature is disabled on the device. The current implementation allows a replacement between iopf-capable and non-iopf-capable hw page tables. This matches the nested translation use case, where a parent domain is attached by default and can then be replaced with a nested user domain with iopf support. Signed-off-by: Lu Baolu --- drivers/iommu/iommufd/iommufd_private.h | 7 ++ drivers/iommu/iommufd/device.c | 15 ++- drivers/iommu/iommufd/fault.c | 122 3 files changed, 141 insertions(+), 3 deletions(-) diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h index 2780bed0c6b1..9844a1289c01 100644 --- a/drivers/iommu/iommufd/iommufd_private.h +++ b/drivers/iommu/iommufd/iommufd_private.h @@ -398,6 +398,7 @@ struct iommufd_device { /* always the physical device */ struct device *dev; bool enforce_cache_coherency; + bool iopf_enabled; /* outstanding faults awaiting response indexed by fault group id */ struct xarray faults; }; @@ -459,6 +460,12 @@ iommufd_get_fault(struct iommufd_ucmd *ucmd, u32 id) int iommufd_fault_alloc(struct iommufd_ucmd *ucmd); void iommufd_fault_destroy(struct iommufd_object *obj); int iommufd_fault_iopf_handler(struct iopf_group *group); +int iommufd_fault_domain_attach_dev(struct iommufd_hw_pagetable *hwpt, + struct iommufd_device *idev); +void iommufd_fault_domain_detach_dev(struct iommufd_hw_pagetable *hwpt, +struct iommufd_device *idev); +int iommufd_fault_domain_replace_dev(struct iommufd_hw_pagetable *hwpt, +struct iommufd_device *idev); #ifdef CONFIG_IOMMUFD_TEST int iommufd_test(struct iommufd_ucmd *ucmd); diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c index d70913ee8fdf..c4737e876ebc 100644 --- a/drivers/iommu/iommufd/device.c +++ b/drivers/iommu/iommufd/device.c @@ -377,7 +377,10 @@ int iommufd_hw_pagetable_attach(struct iommufd_hw_pagetable *hwpt, * attachment. */ if (list_empty(&idev->igroup->device_list)) { - rc = iommu_attach_group(hwpt->domain, idev->igroup->group); + if (hwpt->fault_capable) + rc = iommufd_fault_domain_attach_dev(hwpt, idev); + else + rc = iommu_attach_group(hwpt->domain, idev->igroup->group); if (rc) goto err_unresv; idev->igroup->hwpt = hwpt; @@ -403,7 +406,10 @@ iommufd_hw_pagetable_detach(struct iommufd_device *idev) mutex_lock(&idev->igroup->lock); list_del(&idev->group_item); if (list_empty(&idev->igroup->device_list)) { - iommu_detach_group(hwpt->domain, idev->igroup->group); + if (hwpt->fault_capable) + iommufd_fault_domain_detach_dev(hwpt, idev); + else + iommu_detach_group(hwpt->domain, idev->igroup->group); idev->igroup->hwpt = NULL; } if (hwpt_is_paging(hwpt)) @@ -498,7 +504,10 @@ iommufd_device_do_replace(struct iommufd_device *idev, goto err_unlock; } - rc = iommu_group_replace_domain(igroup->group, hwpt->domain); + if (old_hwpt->fault_capable || hwpt->fault_capable) + rc = iommufd_fault_domain_replace_dev(hwpt, idev); + else + rc = iommu_group_replace_domain(igroup->group, hwpt->domain); if (rc) goto err_unresv; diff --git a/drivers/iommu/iommufd/fault.c b/drivers/iommu/iommufd/fault.c index e752d1c49dde..a4a49f3cd4c2 100644 --- a/drivers/iommu/iommufd/fault.c +++ b/drivers/iommu/iommufd/fault.c @@ -267,3 +267,125 @@ int iommufd_fault_iopf_handler(struct iopf_group *group) return 0; } + +static void release_attach_cookie(struct iopf_attach_cookie *cookie) +{ + struct iommufd_hw_pagetable *hwpt = cookie->domain->fault_data; + struct iommufd_device *i
[PATCH v3 7/8] iommufd/selftest: Add IOPF support for mock device
Extend the selftest mock device to support generating and responding to an IOPF. Also add an ioctl interface to userspace applications to trigger the IOPF on the mock device. This would allow userspace applications to test the IOMMUFD's handling of IOPFs without having to rely on any real hardware. Signed-off-by: Lu Baolu --- drivers/iommu/iommufd/iommufd_test.h | 8 drivers/iommu/iommufd/selftest.c | 63 2 files changed, 71 insertions(+) diff --git a/drivers/iommu/iommufd/iommufd_test.h b/drivers/iommu/iommufd/iommufd_test.h index 482d4059f5db..ff9dcd812618 100644 --- a/drivers/iommu/iommufd/iommufd_test.h +++ b/drivers/iommu/iommufd/iommufd_test.h @@ -22,6 +22,7 @@ enum { IOMMU_TEST_OP_MOCK_DOMAIN_FLAGS, IOMMU_TEST_OP_DIRTY, IOMMU_TEST_OP_MD_CHECK_IOTLB, + IOMMU_TEST_OP_TRIGGER_IOPF, }; enum { @@ -126,6 +127,13 @@ struct iommu_test_cmd { __u32 id; __u32 iotlb; } check_iotlb; + struct { + __u32 dev_id; + __u32 pasid; + __u32 grpid; + __u32 perm; + __u64 addr; + } trigger_iopf; }; __u32 last; }; diff --git a/drivers/iommu/iommufd/selftest.c b/drivers/iommu/iommufd/selftest.c index 2fb2597e069f..2ca226f88856 100644 --- a/drivers/iommu/iommufd/selftest.c +++ b/drivers/iommu/iommufd/selftest.c @@ -445,6 +445,8 @@ static bool mock_domain_capable(struct device *dev, enum iommu_cap cap) return false; } +static struct iopf_queue *mock_iommu_iopf_queue; + static struct iommu_device mock_iommu_device = { }; @@ -455,6 +457,29 @@ static struct iommu_device *mock_probe_device(struct device *dev) return &mock_iommu_device; } +static void mock_domain_page_response(struct device *dev, struct iopf_fault *evt, + struct iommu_page_response *msg) +{ +} + +static int mock_dev_enable_feat(struct device *dev, enum iommu_dev_features feat) +{ + if (feat != IOMMU_DEV_FEAT_IOPF || !mock_iommu_iopf_queue) + return -ENODEV; + + return iopf_queue_add_device(mock_iommu_iopf_queue, dev); +} + +static int mock_dev_disable_feat(struct device *dev, enum iommu_dev_features feat) +{ + if (feat != IOMMU_DEV_FEAT_IOPF || !mock_iommu_iopf_queue) + return -ENODEV; + + iopf_queue_remove_device(mock_iommu_iopf_queue, dev); + + return 0; +} + static const struct iommu_ops mock_ops = { /* * IOMMU_DOMAIN_BLOCKED cannot be returned from def_domain_type() @@ -470,6 +495,9 @@ static const struct iommu_ops mock_ops = { .capable = mock_domain_capable, .device_group = generic_device_group, .probe_device = mock_probe_device, + .page_response = mock_domain_page_response, + .dev_enable_feat = mock_dev_enable_feat, + .dev_disable_feat = mock_dev_disable_feat, .default_domain_ops = &(struct iommu_domain_ops){ .free = mock_domain_free, @@ -1333,6 +1361,31 @@ static int iommufd_test_dirty(struct iommufd_ucmd *ucmd, unsigned int mockpt_id, return rc; } +static int iommufd_test_trigger_iopf(struct iommufd_ucmd *ucmd, +struct iommu_test_cmd *cmd) +{ + struct iopf_fault event = { }; + struct iommufd_device *idev; + + idev = iommufd_get_device(ucmd, cmd->trigger_iopf.dev_id); + if (IS_ERR(idev)) + return PTR_ERR(idev); + + event.fault.prm.flags = IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE; + if (cmd->trigger_iopf.pasid != IOMMU_NO_PASID) + event.fault.prm.flags |= IOMMU_FAULT_PAGE_REQUEST_PASID_VALID; + event.fault.type = IOMMU_FAULT_PAGE_REQ; + event.fault.prm.addr = cmd->trigger_iopf.addr; + event.fault.prm.pasid = cmd->trigger_iopf.pasid; + event.fault.prm.grpid = cmd->trigger_iopf.grpid; + event.fault.prm.perm = cmd->trigger_iopf.perm; + + iommu_report_device_fault(idev->dev, &event); + iommufd_put_object(ucmd->ictx, &idev->obj); + + return 0; +} + void iommufd_selftest_destroy(struct iommufd_object *obj) { struct selftest_obj *sobj = container_of(obj, struct selftest_obj, obj); @@ -1408,6 +1461,8 @@ int iommufd_test(struct iommufd_ucmd *ucmd) cmd->dirty.page_size, u64_to_user_ptr(cmd->dirty.uptr), cmd->dirty.flags); + case IOMMU_TEST_OP_TRIGGER_IOPF: + return iommufd_test_trigger_iopf(ucmd, cmd); default: return -EOPNOTSUPP; } @@ -1449,6 +1504,9 @@ int __init iommufd_test_init(void) &iommufd_mock_bus_type.nb); if (rc) goto err_sysfs; + + mock_iom
[PATCH v3 8/8] iommufd/selftest: Add coverage for IOPF test
Extend the selftest tool to add coverage of testing IOPF handling. This would include the following tests: - Allocating and destorying an iommufd fault object. - Allocating and destroying an IOPF-capable HWPT. - Attaching/detaching/replacing an IOPF-capable HWPT on a device. - Triggering an IOPF on the mock device. - Retrieving and responding to the IOPF through the file interface. Signed-off-by: Lu Baolu --- tools/testing/selftests/iommu/iommufd_utils.h | 83 +-- tools/testing/selftests/iommu/iommufd.c | 17 .../selftests/iommu/iommufd_fail_nth.c| 2 +- 3 files changed, 96 insertions(+), 6 deletions(-) diff --git a/tools/testing/selftests/iommu/iommufd_utils.h b/tools/testing/selftests/iommu/iommufd_utils.h index c646264aa41f..bf6027f2a16d 100644 --- a/tools/testing/selftests/iommu/iommufd_utils.h +++ b/tools/testing/selftests/iommu/iommufd_utils.h @@ -153,7 +153,7 @@ static int _test_cmd_mock_domain_replace(int fd, __u32 stdev_id, __u32 pt_id, EXPECT_ERRNO(_errno, _test_cmd_mock_domain_replace(self->fd, stdev_id, \ pt_id, NULL)) -static int _test_cmd_hwpt_alloc(int fd, __u32 device_id, __u32 pt_id, +static int _test_cmd_hwpt_alloc(int fd, __u32 device_id, __u32 pt_id, __u32 ft_id, __u32 flags, __u32 *hwpt_id, __u32 data_type, void *data, size_t data_len) { @@ -165,6 +165,7 @@ static int _test_cmd_hwpt_alloc(int fd, __u32 device_id, __u32 pt_id, .data_type = data_type, .data_len = data_len, .data_uptr = (uint64_t)data, + .fault_id = ft_id, }; int ret; @@ -177,24 +178,30 @@ static int _test_cmd_hwpt_alloc(int fd, __u32 device_id, __u32 pt_id, } #define test_cmd_hwpt_alloc(device_id, pt_id, flags, hwpt_id) \ - ASSERT_EQ(0, _test_cmd_hwpt_alloc(self->fd, device_id, pt_id, flags, \ + ASSERT_EQ(0, _test_cmd_hwpt_alloc(self->fd, device_id, pt_id, 0, flags, \ hwpt_id, IOMMU_HWPT_DATA_NONE, NULL, \ 0)) #define test_err_hwpt_alloc(_errno, device_id, pt_id, flags, hwpt_id) \ EXPECT_ERRNO(_errno, _test_cmd_hwpt_alloc( \ -self->fd, device_id, pt_id, flags, \ +self->fd, device_id, pt_id, 0, flags, \ hwpt_id, IOMMU_HWPT_DATA_NONE, NULL, 0)) #define test_cmd_hwpt_alloc_nested(device_id, pt_id, flags, hwpt_id, \ data_type, data, data_len)\ - ASSERT_EQ(0, _test_cmd_hwpt_alloc(self->fd, device_id, pt_id, flags, \ + ASSERT_EQ(0, _test_cmd_hwpt_alloc(self->fd, device_id, pt_id, 0, flags, \ hwpt_id, data_type, data, data_len)) #define test_err_hwpt_alloc_nested(_errno, device_id, pt_id, flags, hwpt_id, \ data_type, data, data_len)\ EXPECT_ERRNO(_errno, \ -_test_cmd_hwpt_alloc(self->fd, device_id, pt_id, flags, \ +_test_cmd_hwpt_alloc(self->fd, device_id, pt_id, 0, flags, \ hwpt_id, data_type, data, data_len)) +#define test_cmd_hwpt_alloc_iopf(device_id, pt_id, fault_id, flags, hwpt_id, \ + data_type, data, data_len) \ + ASSERT_EQ(0, _test_cmd_hwpt_alloc(self->fd, device_id, pt_id, fault_id, \ + flags, hwpt_id, data_type, data, \ + data_len)) + #define test_cmd_hwpt_check_iotlb(hwpt_id, iotlb_id, expected) \ ({ \ struct iommu_test_cmd test_cmd = { \ @@ -673,3 +680,69 @@ static int _test_cmd_get_hw_info(int fd, __u32 device_id, void *data, #define test_cmd_get_hw_capabilities(device_id, caps, mask) \ ASSERT_EQ(0, _test_cmd_get_hw_info(self->fd, device_id, NULL, 0, &caps)) + +static int _test_ioctl_fault_alloc(int fd, __u32 *fault_id, __u32 *fault_fd) +{ + struct iommu_fault_alloc cmd = { + .size = sizeof(cmd), + }; + int ret; + + ret = ioctl(fd, IOMMU_FAULT_ALLOC, &cmd); + if (ret) + return ret; + *fault_id = cmd.out_fault_id; + *fault_fd = cmd.out_fault_fd; + return 0; +} + +#define test_ioctl_fault_alloc(fault_id, fault_fd) \ + ({ \ + ASSERT_EQ(0, _test_ioctl_fault_alloc(self->fd, fault_id, \ +