Patches 1-7 are
Reviewed-by: Felix Kuehling <felix.kuehl...@amd.com>
See my separate comments on patches 8 and 9.
Regards,
Felix
On 2025-07-24 22:43, Zhu Lingshan wrote:
Currently kfd manages kfd_process in a one context (kfd_process)
per program manner, thus each user space program
only onws one kfd context (kfd_process).
This model works fine for most of the programs, but imperfect
for a hypervisor like QEMU. Because all programs in the guest
user space share the same only one kfd context, which is
problematic, including but not limited to:
As illustrated in Figure 1, all guest user space programs share the same fd of
/dev/kfd
and the same kfd_process, and the same PASID leading to the same
GPU_VM address space. Therefore the IOVA range of each
guest user space programs are not isolated,
they can attack each other through GPU DMA.
+----------------------------------------------------------------------------------+
|
|
| +-----------+ +-----------+ +------------+ +------------+
|
| | | | | | | | |
|
| | Program 1 | | Program 2 | | Program 3 | | Program N |
|
| | | | | | | | |
|
| +----+------+ +--------+--+ +--+---------+ +-----+------+
|
| | | | |
|
| | | | |
Guest |
| | | | |
|
+-------+----------------------+------------+----------------------+---------------+
| | | |
| | | |
| | | |
| | | |
| +--+------------+---+ |
| | file descriptor | |
+-------------------+ of /dev/kfd +------------------+
| opened by QEMU |
| |
+---------+---------+ User Space
| QEMU
|
---------------------------------------+-----------------------------------------------------
| Kernel
Space
| KFD Module
|
+--------+--------+
| |
| kfd_process |<------------------The only
one KFD context
| |
+--------+--------+
|
+--------+--------+
| PASID |
+--------+--------+
|
+--------+--------+
| GPU_VM |
+-----------------+
Fiture 1
This series implements a multiple contexts solution:
- Allow each program to create and hold multiple contexts (kfd processes)
- Each context has its own fd of /dev/kfd and an exclusive kfd_process,
which is a secondary kfd context. So that PASID/GPU VM isolates their IOVA
address spaces.
Therefore, they can not attack each other through GPU DMA.
The design is illustrated in Figure 2 below:
+---------------------------------------------------------------------------------------------------------+
|
|
|
|
|
|
|
+----------------------------------------------------------------------------------+
|
| |
| |
| | +-----------+ +-----------+ +-----------+
+-----------+ | |
| | | | | | | | |
| | |
| | | Program 1 | | Program 2 | | Program 3 | | Program N
| | |
| | | | | | | | |
| | |
| | +-----+-----+ +-----+-----+ +-----+-----+
+-----+-----+ | |
| | | | | |
| |
| | | | | |
Guest | |
| | | | | |
| |
|
+-------+------------------+-----------------+----------------+--------------------+
|
| | | | |
QEMU |
| | | | |
|
+---------------+------------------+-----------------+----------------+--------------------------+--------+
| | | |
|
| | | |
|
| | | |
|
+---+----+ +---+----+ +---+----+
+---+----+ +---+-----+
| | | | | | |
| | Primary |
| FD 1 | | FD 2 | | FD 3 | | FD 4
| | FD |
| | | | | | |
| | |
+---+----+ +---+----+ +---+----+
+----+---+ +----+----+
| | | |
| User Space
| | | |
|
-------------------+------------------+-----------------+-----------------+--------------------------+----------------------------
| | | |
| Kernel SPace
| | | |
|
| | | |
|
+--------------------------------------------------------------------------------------------------------------------------+
| +------+------+ +------+------+ +------+------+
+------+------+ +------+------+ |
| | Secondary | | Secondary | | Secondary | | Secondary
| | Primary | KFD Module |
| |kfd_process 1| |kfd_process 2| |kfd_process 3|
|kfd_process 4| | kfd_process | |
| | | | | | | |
| | | |
| +------+------+ +------+------+ +------+------+
+------+------+ +------+------+ |
| | | | |
| |
| +------+------+ +------+------+ +------+------+
+------+------+ +------+------+ |
| | PASID | | PASID | | PASID | | PASID
| | PASID | |
| +------+------+ +------+------+ +------+------+
+------+------+ +------+------+ |
| | | | |
| |
| | | | |
| |
| +------+------+ +------+------+ +------+------+
+------+------+ +------+------+ |
| | GPU_VM | | GPU_VM | | GPU_VM | | GPU_VM
| | GPU_VM | |
| +-------------+ +-------------+ +-------------+
+-------------+ +-------------+ |
|
|
+--------------------------------------------------------------------------------------------------------------------------+
Figure 2
Zhu Lingshan (9):
amdkfd: enlarge the hashtable of kfd_process
amdkfd: mark the first kfd_process as the primary one
amdkfd: find_process_by_mm always return the primary context
amdkfd: Introduce kfd_create_process_sysfs as a separate function
amdkfd: destroy kfd secondary contexts through fd close
amdkfd: process svm ioctl only on the primary kfd process
amdkfd: process USERPTR allocation only on the primary kfd process
amdkfd: identify a secondary kfd process by its id
amdkfd: introduce new ioctl AMDKFD_IOC_CREATE_PROCESS
drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 62 ++++++-
drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 14 +-
drivers/gpu/drm/amd/amdkfd/kfd_process.c | 204 +++++++++++++++++------
include/uapi/linux/kfd_ioctl.h | 8 +-
4 files changed, 231 insertions(+), 57 deletions(-)