[PATCH] drm/amdkfd: update SIMD distribution algo for GFXIP 9.4.2 onwards

2024-02-01 Thread Rajneesh Bhardwaj
In certain cooperative group dispatch scenarios the default SPI resource allocation may cause reduced per-CU workgroup occupancy. Set COMPUTE_RESOURCE_LIMITS.FORCE_SIMD_DIST=1 to mitigate soft hang scenarions. Suggested-by: Joseph Greathouse Signed-off-by: Rajneesh Bhardwaj --- drivers/gpu/drm

[Patch v2] drm/amdkfd: update SIMD distribution algo for GFXIP 9.4.2 onwards

2024-02-07 Thread Rajneesh Bhardwaj
In certain cooperative group dispatch scenarios the default SPI resource allocation may cause reduced per-CU workgroup occupancy. Set COMPUTE_RESOURCE_LIMITS.FORCE_SIMD_DIST=1 to mitigate soft hang scenarions. Suggested-by: Joseph Greathouse Signed-off-by: Rajneesh Bhardwaj --- * Found a bug in

[PATCH 1/2] drm/amdkfd: update SIMD distribution algo for GFXIP 9.4.2 onwards

2024-02-09 Thread Rajneesh Bhardwaj
In certain cooperative group dispatch scenarios the default SPI resource allocation may cause reduced per-CU workgroup occupancy. Set COMPUTE_RESOURCE_LIMITS.FORCE_SIMD_DIST=1 to mitigate soft hang scenarions. Suggested-by: Joseph Greathouse Signed-off-by: Rajneesh Bhardwaj --- * Incorporate

[PATCH 2/2] drm/amdgpu: Fix implicit assumtion in gfx11 debug flags

2024-02-09 Thread Rajneesh Bhardwaj
Gfx11 debug flags mask is currently set with an implicit assumption that no other mqd update flags exist. This needs to be fixed with newly introduced flag UPDATE_FLAG_IS_GWS by the previous patch. Signed-off-by: Rajneesh Bhardwaj --- drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v11.c | 4 ++-- 1

[Patch v2 1/2] drm/amdkfd: update SIMD distribution algo for GFXIP 9.4.2 onwards

2024-02-13 Thread Rajneesh Bhardwaj
In certain cooperative group dispatch scenarios the default SPI resource allocation may cause reduced per-CU workgroup occupancy. Set COMPUTE_RESOURCE_LIMITS.FORCE_SIMD_DIST=1 to mitigate soft hang scenarions. Suggested-by: Joseph Greathouse Signed-off-by: Rajneesh Bhardwaj --- * Change the

[Patch v2 2/2] drm/amdgpu: Fix implicit assumtion in gfx11 debug flags

2024-02-13 Thread Rajneesh Bhardwaj
Gfx11 debug flags mask is currently set with an implicit assumption that no other mqd update flags exist. This needs to be fixed with newly introduced flag UPDATE_FLAG_IS_GWS by the previous patch. Reviewed-by: Felix Kuehling Signed-off-by: Rajneesh Bhardwaj --- drivers/gpu/drm/amd/amdkfd

[PATCH] drm/ttm: Implement strict NUMA pool allocations

2024-03-22 Thread Rajneesh Bhardwaj
Signed-off-by: Rajneesh Bhardwaj --- drivers/gpu/drm/amd/amdgpu/amdgpu.h | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 8 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 7 ++- drivers/gpu/drm/ttm/tests/ttm_pool_test.c | 10 +- drivers/gpu/drm/ttm/ttm_device.c

[PATCH] drm/ttm: Make ttm shrinkers NUMA aware

2024-04-08 Thread Rajneesh Bhardwaj
Otherwise the nid is always passed as 0 during memory reclaim so make TTM shrinkers NUMA aware. Signed-off-by: Rajneesh Bhardwaj --- drivers/gpu/drm/ttm/ttm_pool.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/ttm/ttm_pool.c b/drivers/gpu/drm/ttm/ttm_pool.c

[PATCH 1/2] drm/amdgpu: Disable compute partition switch under SRIOV

2024-06-21 Thread Rajneesh Bhardwaj
Do not allow the compute partition mode switch from the guest driver. Signed-off-by: Rajneesh Bhardwaj --- drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c index

[PATCH 2/2] drm/amdgpu: Don't warn for compute mode switch under SRIOV

2024-06-21 Thread Rajneesh Bhardwaj
Under SRIOV environment, the compute partition mode is setup by the host driver so state machine cached copy might be different when doing the transition for the first time. Signed-off-by: Rajneesh Bhardwaj --- drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c | 3 ++- 1 file changed, 2 insertions(+), 1

[Patch v2 1/2] drm/amdgpu: Disable compute partition switch under SRIOV

2024-06-22 Thread Rajneesh Bhardwaj
Do not allow the compute partition mode switch from the guest driver but still allow the query for current_compute_partition. Signed-off-by: Rajneesh Bhardwaj --- drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 5 + drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c | 9 ++--- 2 files changed, 7

[Patch v2 2/2] drm/amdgpu: Don't warn for compute mode switch under SRIOV

2024-06-22 Thread Rajneesh Bhardwaj
Under SRIOV environment, the compute partition mode is setup by the host driver so state machine cached copy might be different when doing the transition for the first time. Signed-off-by: Rajneesh Bhardwaj --- drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c | 3 ++- 1 file changed, 2 insertions(+), 1

[PATCH 2/2] drm/ttm: Make ttm shrinkers NUMA aware

2024-07-02 Thread Rajneesh Bhardwaj
Otherwise the nid is always passed as 0 during memory reclaim so make TTM shrinkers NUMA aware. Signed-off-by: Rajneesh Bhardwaj --- drivers/gpu/drm/ttm/ttm_pool.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/ttm/ttm_pool.c b/drivers/gpu/drm/ttm/ttm_pool.c

[PATCH 1/2] drm/ttm: Allow direct reclaim to allocate local memory

2024-07-02 Thread Rajneesh Bhardwaj
(HUGEPAGE) but also offers performance improvement. Accessing remote pages suffers due to bandwidth limitations and could be avoided if memory becomes defragmented and in most cases without using manual compation. Signed-off-by: Rajneesh Bhardwaj --- drivers/gpu/drm/ttm/ttm_pool.c | 2 +- 1 file

[Patch v2] drm/ttm: Allow direct reclaim to allocate local memory

2024-07-08 Thread Rajneesh Bhardwaj
compaction is disabled. (https://tinyurl.com/4f32f7rs) Cc: Dave Airlie Cc: Vlastimil Babka Cc: Daniel Vetter Reviewed-by: Christian König Signed-off-by: Rajneesh Bhardwaj --- drivers/gpu/drm/ttm/ttm_pool.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/ttm

[PATCH] drm/amdgpu: Update CGCG settings for GFXIP 9.4.3

2024-04-21 Thread Rajneesh Bhardwaj
Tune coarse grain clock gating idle threshold and rlc idle timeout to achieve better kernel launch latency. Signed-off-by: Rajneesh Bhardwaj --- drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu

[PATCH] drm/amdgpu: Make CPX mode auto default in NPS4

2024-05-22 Thread Rajneesh Bhardwaj
On GFXIP9.4.3, make CPX mode as the default compute mode if the node is setup in NPS4 memory partition mode. This change is only applicable for dGPU, for APU, continue to use TPX mode. Cc: Lijo Lazar Signed-off-by: Rajneesh Bhardwaj --- drivers/gpu/drm/amd/amdgpu/aqua_vanjaram.c | 2 +- 1 file

[Patch v2] drm/ttm: Schedule delayed_delete worker closer

2023-11-08 Thread Rajneesh Bhardwaj
-off-by: Rajneesh Bhardwaj --- Changes in v2: - Absorbed the feedback provided by Christian in the commit message and the comment. drivers/gpu/drm/ttm/ttm_bo.c | 8 +++- drivers/gpu/drm/ttm/ttm_device.c | 3 ++- 2 files changed, 9 insertions(+), 2 deletions(-) diff --git a/drivers

[Patch v3] drm/ttm: Schedule delayed_delete worker closer

2023-11-11 Thread Rajneesh Bhardwaj
-off-by: Rajneesh Bhardwaj --- Changes in v3: * Use WQ_UNBOUND to address the warning reported by CI pipeline. drivers/gpu/drm/ttm/ttm_bo.c | 8 +++- drivers/gpu/drm/ttm/ttm_device.c | 6 -- 2 files changed, 11 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/ttm/ttm_bo.c

[PATCH] drm/ttm: set max_active to recommened default

2023-11-11 Thread Rajneesh Bhardwaj
To maximize per cpu execution context for the work items, use the recommended settings i.e. WQ_DFL_ACTIVE(256). There is no apparent reason to throttle to 16 while process tear down. Signed-off-by: Rajneesh Bhardwaj --- drivers/gpu/drm/ttm/ttm_device.c | 2 +- 1 file changed, 1 insertion(+), 1

[PATCH] drm/ttm: Don't inherit GEM object VMAs in child process

2021-12-08 Thread Rajneesh Bhardwaj
pings in the child which confuse CRIU when it mmaps on restore. Having this flag set for the render node VMAs helps. VMAs mapped via KFD already take care of this so this is needed only for the render nodes. Cc: Felix Kuehling Signed-off-by: David Yat Sin Signed-off-by: Rajneesh Bhardwaj --- dr

[Patch v2] drm/amdgpu: Don't inherit GEM object VMAs in child process

2021-12-10 Thread Rajneesh Bhardwaj
D BOs only. Cc: Felix Kuehling Signed-off-by: David Yat Sin Signed-off-by: Rajneesh Bhardwaj --- Changes in v2: * Addressed Christian's concerns for user space impact * Further reduced the scope to KFD BOs only drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 3 +++ 1 file changed, 3

[Patch v4 00/24] CHECKPOINT RESTORE WITH ROCm

2021-12-22 Thread Rajneesh Bhardwaj
mdkfd: CRIU checkpoint and restore queue mqds drm/amdkfd: CRIU checkpoint and restore queue control stack drm/amdkfd: CRIU checkpoint and restore events drm/amdkfd: CRIU implement gpu_id remapping Rajneesh Bhardwaj (15): x86/configs: CRIU update debug rock defconfig x86/configs: Add

[Patch v4 01/24] x86/configs: CRIU update debug rock defconfig

2021-12-22 Thread Rajneesh Bhardwaj
- Update debug config for Checkpoint-Restore (CR) support - Also include necessary options for CR with docker containers. Signed-off-by: Rajneesh Bhardwaj --- arch/x86/configs/rock-dbg_defconfig | 53 ++--- 1 file changed, 34 insertions(+), 19 deletions(-) diff --git

[Patch v4 07/24] drm/amdkfd: CRIU Implement KFD resume ioctl

2021-12-22 Thread Rajneesh Bhardwaj
process i.e. criu_resume ioctl is received, and the process is ready to be resumed. This ioctl is different from other KFD CRIU ioctls since its called by CRIU master restore process for all the target processes being resumed by CRIU. Signed-off-by: David Yat Sin Signed-off-by: Rajneesh Bhardwaj

[Patch v4 03/24] drm/amdkfd: CRIU Introduce Checkpoint-Restore APIs

2021-12-22 Thread Rajneesh Bhardwaj
ached privileges and CAP_CHECKPOINT_RESTORE capabilities attached with the file descriptors so modify KFD to allow such calls. (API redesigned by David Yat Sin) Suggested-by: Felix Kuehling Signed-off-by: David Yat Sin Signed-off-by: Rajneesh Bhardwaj --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c

[Patch v4 10/24] drm/amdkfd: CRIU restore queue ids

2021-12-22 Thread Rajneesh Bhardwaj
From: David Yat Sin When re-creating queues during CRIU restore, restore the queue with the same queue id value used during CRIU dump. Signed-off-by: Rajneesh Bhardwaj Signed-off-by: David Yat Sin --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 2 +- drivers/gpu/drm/amd/amdkfd

[Patch v4 09/24] drm/amdkfd: CRIU add queues support

2021-12-22 Thread Rajneesh Bhardwaj
From: David Yat Sin Add support to existing CRIU ioctl's to save number of queues and queue properties for each queue during checkpoint and re-create queues on restore. Signed-off-by: David Yat Sin --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 110 - drivers/gpu/drm/amd/amdkfd/kf

[Patch v4 04/24] drm/amdkfd: CRIU Implement KFD process_info ioctl

2021-12-22 Thread Rajneesh Bhardwaj
Also the pid of a process inside a container might be different than its global pid so return the ns pid. Signed-off-by: Rajneesh Bhardwaj Signed-off-by: David Yat Sin --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 55 +++- drivers/gpu/drm/amd/amdkfd/kfd_priv.h| 2 + dr

[Patch v4 13/24] drm/amdkfd: CRIU checkpoint and restore queue mqds

2021-12-22 Thread Rajneesh Bhardwaj
From: David Yat Sin Checkpoint contents of queue MQD's on CRIU dump and restore them during CRIU restore. Signed-off-by: David Yat Sin --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 2 +- drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c | 2 +- .../drm/amd/amdkfd/kfd_device_queue_manage

[Patch v4 05/24] drm/amdkfd: CRIU Implement KFD checkpoint ioctl

2021-12-22 Thread Rajneesh Bhardwaj
. Signed-off-by: David Yat Sin Signed-off-by: Rajneesh Bhardwaj --- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 20 +++ drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h | 2 + drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 172 ++- drivers/gpu/drm/amd/amdkfd/kfd_priv.h| 3 +- 4

[Patch v4 16/24] drm/amdkfd: CRIU implement gpu_id remapping

2021-12-22 Thread Rajneesh Bhardwaj
during the user ioctl's. Signed-off-by: David Yat Sin Signed-off-by: Rajneesh Bhardwaj --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 465 -- drivers/gpu/drm/amd/amdkfd/kfd_events.c | 45 +- drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 11 + drivers/gpu

[Patch v4 06/24] drm/amdkfd: CRIU Implement KFD restore ioctl

2021-12-22 Thread Rajneesh Bhardwaj
values to newly created BOs. This also adds the minimal gpu mapping support for a single gpu checkpoint restore use case. Signed-off-by: David Yat Sin Signed-off-by: Rajneesh Bhardwaj --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 298 ++- 1 file changed, 297 insertions

[Patch v4 08/24] drm/amdkfd: CRIU Implement KFD unpause operation

2021-12-22 Thread Rajneesh Bhardwaj
From: David Yat Sin Introducing UNPAUSE op. After CRIU amdgpu plugin performs a PROCESS_INFO op the queues will be stay in an evicted state. Once the plugin is done draining BO contents, it is safe to perform an UNPAUSE op for the queues to resume. Signed-off-by: David Yat Sin --- drivers/gpu/

[Patch v4 17/24] drm/amdkfd: CRIU export BOs as prime dmabuf objects

2021-12-22 Thread Rajneesh Bhardwaj
command submissions. With sDMA, we see huge improvement in checkpoint and restore operations compared to the generic pci based access via host data path. Suggested-by: Felix Kuehling Signed-off-by: Rajneesh Bhardwaj Signed-off-by: David Yat Sin --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 71

[Patch v4 15/24] drm/amdkfd: CRIU checkpoint and restore events

2021-12-22 Thread Rajneesh Bhardwaj
From: David Yat Sin Add support to existing CRIU ioctl's to save and restore events during criu checkpoint and restore. Signed-off-by: David Yat Sin --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 70 +- drivers/gpu/drm/amd/amdkfd/kfd_events.c | 272 --- drivers/gpu/dr

[Patch v4 21/24] drm/amdkfd: CRIU Discover svm ranges

2021-12-22 Thread Rajneesh Bhardwaj
extending the PROCESS_INFO op of the the CRIU IOCTL to discover the svm ranges in the target process and a future patches brings in the required support for checkpoint and restore for SVM ranges. Signed-off-by: Rajneesh Bhardwaj --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 12 +++-- drivers

[Patch v4 12/24] drm/amdkfd: CRIU restore queue doorbell id

2021-12-22 Thread Rajneesh Bhardwaj
From: David Yat Sin When re-creating queues during CRIU restore, restore the queue with the same doorbell id value used during CRIU dump. Signed-off-by: David Yat Sin --- .../drm/amd/amdkfd/kfd_device_queue_manager.c | 60 +-- 1 file changed, 41 insertions(+), 19 deletions(-)

[Patch v4 19/24] drm/amdkfd: CRIU allow external mm for svm ranges

2021-12-22 Thread Rajneesh Bhardwaj
ned-off-by: Rajneesh Bhardwaj --- drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 17 + 1 file changed, 9 insertions(+), 8 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c index 88360f23eb61..7c92116153fe 100644 --- a/drivers/gpu/drm/

[Patch v4 18/24] drm/amdkfd: CRIU checkpoint and restore xnack mode

2021-12-22 Thread Rajneesh Bhardwaj
Recoverable page faults are represented by the xnack mode setting inside a kfd process and are used to represent the device page faults. For CR, we don't consider negative values which are typically used for querying the current xnack mode without modifying it. Signed-off-by: Rajneesh Bha

[Patch v4 22/24] drm/amdkfd: CRIU Save Shared Virtual Memory ranges

2021-12-22 Thread Rajneesh Bhardwaj
possible values for the max possible attribute types. Signed-off-by: Rajneesh Bhardwaj --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 4 +- drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 95 drivers/gpu/drm/amd/amdkfd/kfd_svm.h | 10 +++ 3 files changed, 108 insertions

[Patch v4 11/24] drm/amdkfd: CRIU restore sdma id for queues

2021-12-22 Thread Rajneesh Bhardwaj
From: David Yat Sin When re-creating queues during CRIU restore, restore the queue with the same sdma id value used during CRIU dump. Signed-off-by: David Yat Sin --- .../drm/amd/amdkfd/kfd_device_queue_manager.c | 48 ++- .../drm/amd/amdkfd/kfd_device_queue_manager.h | 3 +-

[Patch v4 20/24] drm/amdkfd: use user_gpu_id for svm ranges

2021-12-22 Thread Rajneesh Bhardwaj
Currently the SVM ranges use actual_gpu_id but with Checkpoint Restore support its possible that the SVM ranges can be resumed on another node where the actual_gpu_id may not be same as the original (user_gpu_id) gpu id. So modify svm code to use user_gpu_id. Signed-off-by: Rajneesh Bhardwaj

[Patch v4 23/24] drm/amdkfd: CRIU prepare for svm resume

2021-12-22 Thread Rajneesh Bhardwaj
During CRIU restore phase, the VMAs for the virtual address ranges are not at their final location yet so in this stage, only cache the data required to successfully resume the svm ranges during an imminent CRIU resume phase. Signed-off-by: Rajneesh Bhardwaj --- drivers/gpu/drm/amd/amdkfd

[Patch v4 24/24] drm/amdkfd: CRIU resume shared virtual memory ranges

2021-12-22 Thread Rajneesh Bhardwaj
In CRIU resume stage, resume all the shared virtual memory ranges from the data stored inside the resuming kfd process during CRIU restore phase. Also setup xnack mode and free up the resources. Signed-off-by: Rajneesh Bhardwaj --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 10 + drivers

[Patch v4 02/24] x86/configs: Add rock-rel_defconfig for amd-feature-criu branch

2021-12-22 Thread Rajneesh Bhardwaj
- Add rock-rel_defconfig for release builds. Signed-off-by: Rajneesh Bhardwaj --- arch/x86/configs/rock-rel_defconfig | 4927 +++ 1 file changed, 4927 insertions(+) create mode 100644 arch/x86/configs/rock-rel_defconfig diff --git a/arch/x86/configs/rock-rel_defconfig

[Patch v4 14/24] drm/amdkfd: CRIU checkpoint and restore queue control stack

2021-12-22 Thread Rajneesh Bhardwaj
From: David Yat Sin Checkpoint contents of queue control stacks on CRIU dump and restore them during CRIU restore. Signed-off-by: David Yat Sin Signed-off-by: Rajneesh Bhardwaj --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 2 +- drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c | 2

[Patch v5 00/24] CHECKPOINT RESTORE WITH ROCm

2022-02-03 Thread Rajneesh Bhardwaj
ement gpu_id remapping Rajneesh Bhardwaj (15): x86/configs: CRIU update debug rock defconfig drm/amdkfd: CRIU Introduce Checkpoint-Restore APIs drm/amdkfd: CRIU Implement KFD process_info ioctl drm/amdkfd: CRIU Implement KFD checkpoint ioctl drm/amdkfd: CRIU Implement KFD restore ioctl drm/a

[Patch v5 01/24] x86/configs: CRIU update debug rock defconfig

2022-02-03 Thread Rajneesh Bhardwaj
- Update debug config for Checkpoint-Restore (CR) support - Also include necessary options for CR with docker containers. Reviewed-by: Felix Kuehling Signed-off-by: Rajneesh Bhardwaj --- arch/x86/configs/rock-dbg_defconfig | 53 ++--- 1 file changed, 34 insertions

[Patch v5 03/24] drm/amdkfd: CRIU Implement KFD process_info ioctl

2022-02-03 Thread Rajneesh Bhardwaj
umper process. Also the pid of a process inside a container might be different than its global pid so return the ns pid. Signed-off-by: Rajneesh Bhardwaj Signed-off-by: David Yat Sin --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 56 +++- 1 file changed, 55 insertions(+), 1 del

[Patch v5 04/24] drm/amdkfd: CRIU Implement KFD checkpoint ioctl

2022-02-03 Thread Rajneesh Bhardwaj
. Signed-off-by: David Yat Sin Signed-off-by: Rajneesh Bhardwaj --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h| 1 + .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 11 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 20 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h | 2 + drivers

[Patch v5 06/24] drm/amdkfd: CRIU Implement KFD resume ioctl

2022-02-03 Thread Rajneesh Bhardwaj
process i.e. criu_resume ioctl op is received, and the process is ready to be resumed. This ioctl is different from other KFD CRIU ioctls since its called by CRIU master restore process for all the target processes being resumed by CRIU. Signed-off-by: David Yat Sin Signed-off-by: Rajneesh

[Patch v5 02/24] drm/amdkfd: CRIU Introduce Checkpoint-Restore APIs

2022-02-03 Thread Rajneesh Bhardwaj
and CAP_CHECKPOINT_RESTORE capabilities attached with the file descriptors so modify KFD to allow such calls. (API redesigned by David Yat Sin) Suggested-by: Felix Kuehling Reviewed-by: Felix Kuehling Signed-off-by: David Yat Sin Signed-off-by: Rajneesh Bhardwaj --- drivers/gpu/drm/amd/amdkfd

[Patch v5 09/24] drm/amdkfd: CRIU restore queue ids

2022-02-03 Thread Rajneesh Bhardwaj
From: David Yat Sin When re-creating queues during CRIU restore, restore the queue with the same queue id value used during CRIU dump. Signed-off-by: Rajneesh Bhardwaj Signed-off-by: David Yat Sin --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 2 +- drivers/gpu/drm/amd/amdkfd

[Patch v5 05/24] drm/amdkfd: CRIU Implement KFD restore ioctl

2022-02-03 Thread Rajneesh Bhardwaj
-off-by: David Yat Sin Signed-off-by: Rajneesh Bhardwaj --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 298 ++- 1 file changed, 297 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c index 17a937b7139f

[Patch v5 07/24] drm/amdkfd: CRIU Implement KFD unpause operation

2022-02-03 Thread Rajneesh Bhardwaj
: Rajneesh Bhardwaj --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 37 +++- drivers/gpu/drm/amd/amdkfd/kfd_priv.h| 2 ++ drivers/gpu/drm/amd/amdkfd/kfd_process.c | 1 + 3 files changed, 39 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b

[Patch v5 10/24] drm/amdkfd: CRIU restore sdma id for queues

2022-02-03 Thread Rajneesh Bhardwaj
From: David Yat Sin When re-creating queues during CRIU restore, restore the queue with the same sdma id value used during CRIU dump. Signed-off-by: David Yat Sin Signed-off-by: Rajneesh Bhardwaj --- .../drm/amd/amdkfd/kfd_device_queue_manager.c | 48 ++- .../drm/amd/amdkfd

[Patch v5 11/24] drm/amdkfd: CRIU restore queue doorbell id

2022-02-03 Thread Rajneesh Bhardwaj
From: David Yat Sin When re-creating queues during CRIU restore, restore the queue with the same doorbell id value used during CRIU dump. Signed-off-by: David Yat Sin --- .../drm/amd/amdkfd/kfd_device_queue_manager.c | 60 +-- 1 file changed, 41 insertions(+), 19 deletions(-)

[Patch v5 08/24] drm/amdkfd: CRIU add queues support

2022-02-03 Thread Rajneesh Bhardwaj
From: David Yat Sin Add support to existing CRIU ioctl's to save number of queues and queue properties for each queue during checkpoint and re-create queues on restore. Signed-off-by: David Yat Sin Signed-off-by: Rajneesh Bhardwaj --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c

[Patch v5 21/24] drm/amdkfd: CRIU Save Shared Virtual Memory ranges

2022-02-03 Thread Rajneesh Bhardwaj
possible values for the max possible attribute types. Signed-off-by: Rajneesh Bhardwaj --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 4 +- drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 95 drivers/gpu/drm/amd/amdkfd/kfd_svm.h | 10 +++ 3 files changed, 108 insertions

[Patch v5 17/24] drm/amdkfd: CRIU checkpoint and restore xnack mode

2022-02-03 Thread Rajneesh Bhardwaj
Recoverable page faults are represented by the xnack mode setting inside a kfd process and are used to represent the device page faults. For CR, we don't consider negative values which are typically used for querying the current xnack mode without modifying it. Signed-off-by: Rajneesh Bha

[Patch v5 22/24] drm/amdkfd: CRIU prepare for svm resume

2022-02-03 Thread Rajneesh Bhardwaj
During CRIU restore phase, the VMAs for the virtual address ranges are not at their final location yet so in this stage, only cache the data required to successfully resume the svm ranges during an imminent CRIU resume phase. Signed-off-by: Rajneesh Bhardwaj --- drivers/gpu/drm/amd/amdkfd

[Patch v5 12/24] drm/amdkfd: CRIU checkpoint and restore queue mqds

2022-02-03 Thread Rajneesh Bhardwaj
From: David Yat Sin Checkpoint contents of queue MQD's on CRIU dump and restore them during CRIU restore. Signed-off-by: David Yat Sin Signed-off-by: Rajneesh Bhardwaj --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 2 +- drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c | 2 +- ..

[Patch v5 24/24] drm/amdkfd: Bump up KFD API version for CRIU

2022-02-03 Thread Rajneesh Bhardwaj
- Change KFD minor version to 7 for CRIU Proposed userspace changes: https://github.com/RadeonOpenCompute/criu Signed-off-by: Rajneesh Bhardwaj --- include/uapi/linux/kfd_ioctl.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/include/uapi/linux/kfd_ioctl.h b/include

[Patch v5 16/24] drm/amdkfd: CRIU export BOs as prime dmabuf objects

2022-02-03 Thread Rajneesh Bhardwaj
command submissions. With sDMA, we see huge improvement in checkpoint and restore operations compared to the generic pci based access via host data path. Suggested-by: Felix Kuehling Signed-off-by: Rajneesh Bhardwaj Signed-off-by: David Yat Sin --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 71

[Patch v5 14/24] drm/amdkfd: CRIU checkpoint and restore events

2022-02-03 Thread Rajneesh Bhardwaj
From: David Yat Sin Add support to existing CRIU ioctl's to save and restore events during criu checkpoint and restore. Signed-off-by: David Yat Sin Signed-off-by: Rajneesh Bhardwaj --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 70 +- drivers/gpu/drm/amd/amdkfd/kfd_events.c

[Patch v5 19/24] drm/amdkfd: use user_gpu_id for svm ranges

2022-02-03 Thread Rajneesh Bhardwaj
Currently the SVM ranges use actual_gpu_id but with Checkpoint Restore support its possible that the SVM ranges can be resumed on another node where the actual_gpu_id may not be same as the original (user_gpu_id) gpu id. So modify svm code to use user_gpu_id. Signed-off-by: Rajneesh Bhardwaj

[Patch v5 15/24] drm/amdkfd: CRIU implement gpu_id remapping

2022-02-03 Thread Rajneesh Bhardwaj
during the user ioctl's. Signed-off-by: David Yat Sin Signed-off-by: Rajneesh Bhardwaj --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 468 -- drivers/gpu/drm/amd/amdkfd/kfd_events.c | 45 +- drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 11 + drivers/gpu

[Patch v5 20/24] drm/amdkfd: CRIU Discover svm ranges

2022-02-03 Thread Rajneesh Bhardwaj
extending the PROCESS_INFO op of the the CRIU IOCTL to discover the svm ranges in the target process and a future patches brings in the required support for checkpoint and restore for SVM ranges. Signed-off-by: Rajneesh Bhardwaj --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 12 +++-- drivers

[Patch v5 13/24] drm/amdkfd: CRIU checkpoint and restore queue control stack

2022-02-03 Thread Rajneesh Bhardwaj
From: David Yat Sin Checkpoint contents of queue control stacks on CRIU dump and restore them during CRIU restore. Signed-off-by: David Yat Sin Signed-off-by: Rajneesh Bhardwaj --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 2 +- drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c | 2

[Patch v5 23/24] drm/amdkfd: CRIU resume shared virtual memory ranges

2022-02-03 Thread Rajneesh Bhardwaj
the flags during restore as there might be some default flags set when the prange is created. Also handle the invalid PREFETCH atribute values saved during checkpoint by replacing them with another dummy KFD_IOCTL_SVM_ATTR_SET_FLAGS attribute. Signed-off-by: Rajneesh Bhardwaj --- drivers/gpu/drm

[Patch v5 18/24] drm/amdkfd: CRIU allow external mm for svm ranges

2022-02-03 Thread Rajneesh Bhardwaj
ned-off-by: Rajneesh Bhardwaj --- drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 17 + 1 file changed, 9 insertions(+), 8 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c index ffec25e642e2..d34508f5e88b 100644 --- a/drivers/gpu/drm/

[PATCH] drm/amdgpu: Fix recursive locking warning

2022-02-03 Thread Rajneesh Bhardwaj
0004 R14: 7fbf689593a0 R15: 7fbcc402d820 Cc: Christian König Cc: Felix Kuehling Cc: Alex Deucher Fixes: 627b92ef9d7c ("drm/amdgpu: Wipe all VRAM on free when RAS is enabled") Signed-off-by: Rajneesh Bhardwaj --- drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 3 ++- 1 file changed

[PATCH] drm/amdkfd: CRIU fix extra whitespace and block comment warnings

2022-02-10 Thread Rajneesh Bhardwaj
Fix checkpatch reported warning for a quoted line and block line comments. Signed-off-by: Rajneesh Bhardwaj --- drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd

[PATCH] drm/amdkfd: Fix prototype warning for get_process_num_bos

2022-02-10 Thread Rajneesh Bhardwaj
Fix the warning: no previous prototype for 'get_process_num_bos' [-Wmissing-prototypes] Reported-by: kernel test robot Signed-off-by: Rajneesh Bhardwaj --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/

[PATCH 0/4] General cleanup and SPDX update

2022-02-11 Thread Rajneesh Bhardwaj
There are a bunch of warnings from checkpatch and kerneldoc style issues that this cleanup series tries to address and also update the SPDX License header for all the KFD files within amdgpu driver. Rajneesh Bhardwaj (4): drm/amdgpu: Fix some kerneldoc warnings drm/amdkfd: updade SPDX

[PATCH 1/4] drm/amdgpu: Fix some kerneldoc warnings

2022-02-11 Thread Rajneesh Bhardwaj
Fix few kerneldoc warnings and one typo. Signed-off-by: Rajneesh Bhardwaj --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c | 4 ++-- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu

[PATCH 4/4] drm/amdgpu: Fix a kerneldoc warning

2022-02-11 Thread Rajneesh Bhardwaj
Add missing parameters to fix a kerneldoc warning Signed-off-by: Rajneesh Bhardwaj --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c

[PATCH 3/4] drm/amdkfd: Fix leftover errors and warnings

2022-02-11 Thread Rajneesh Bhardwaj
A bunch of errors and warnings are leftover KFD over the years, attempt to fix the errors and most warnings reported by checkpatch tool. Still a few warnings remain which may be false positives so ignore them for now. Signed-off-by: Rajneesh Bhardwaj --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c

[PATCH 2/4] drm/amdkfd: updade SPDX license header

2022-02-11 Thread Rajneesh Bhardwaj
Update the SPDX License header for all the KFD files. Signed-off-by: Rajneesh Bhardwaj --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 3 ++- drivers/gpu/drm/amd/amdkfd/kfd_crat.c | 3 ++- drivers/gpu/drm/amd/amdkfd/kfd_crat.h | 3

[PATCH] drm/amdgpu: restrict bo mapping within gpu address limits

2020-06-02 Thread Rajneesh Bhardwaj
max_pfn range. This restricts the range to map bo within pratical limits of cpu and gpu for shared virtual memory access. Reviewed-by: Oak Zeng Reviewed-by: Christian König Reviewed-by: Hawking Zhang Acked-by: Alex Deucher Signed-off-by: Rajneesh Bhardwaj --- drivers/gpu/drm/amd/amdgpu

[PATCH] drm/amdkfd: fix kernel-doc and cleanup

2020-07-13 Thread Rajneesh Bhardwaj
- fix some styling issues - fixes for kernel-doc type Signed-off-by: Rajneesh Bhardwaj --- drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 61 +++ 1 file changed, 25 insertions(+), 36 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd

[PATCH 1/3] drm/amdgpu: Rework KFD memory max limits

2023-09-29 Thread Rajneesh Bhardwaj
To allow bigger allocations specially on systems such as GFXIP 9.4.3 that use GTT memory for VRAM allocations, relax the limits to maximize ROCm allocations. Signed-off-by: Rajneesh Bhardwaj --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 10 -- 1 file changed, 8 insertions(+), 2

[PATCH 2/3] drm/amdgpu: Initialize acpi mem ranges after TTM

2023-09-29 Thread Rajneesh Bhardwaj
Move ttm init before acpi mem range init so we can use ttm_pages_limit to override vram size for GFXIP 9.4.3. The vram size override change will be introduced in a future commit. Acked-by: Felix Kuehling Signed-off-by: Rajneesh Bhardwaj --- drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 10

[PATCH 3/3] drm/amdgpu: Use ttm_pages_limit to override vram reporting

2023-09-29 Thread Rajneesh Bhardwaj
On GFXIP9.4.3 APU, allow the memory reporting as per the ttm pages limit in NPS1 mode. Signed-off-by: Rajneesh Bhardwaj --- drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 23 +++ 1 file changed, 15 insertions(+), 8 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c b

[Patch v2 2/2] drm/amdgpu: Use ttm_pages_limit to override vram reporting

2023-10-02 Thread Rajneesh Bhardwaj
On GFXIP9.4.3 APU, allow the memory reporting as per the ttm pages limit in NPS1 mode. Signed-off-by: Rajneesh Bhardwaj --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 17 - drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 9 + 2 files changed, 17 insertions(+), 9

[Patch v2 1/2] drm/amdgpu: Rework KFD memory max limits

2023-10-02 Thread Rajneesh Bhardwaj
To allow bigger allocations specially on systems such as GFXIP 9.4.3 that use GTT memory for VRAM allocations, relax the limits to maximize ROCm allocations. Reviewed-by: Felix Kuehling Signed-off-by: Rajneesh Bhardwaj --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 10 -- 1

[PATCH] drm/ttm: Schedule delayed_delete worker closer

2023-11-07 Thread Rajneesh Bhardwaj
etc. This change helps USWC GTT allocations on NUMA systems (dGPU) and AMD APU platforms such as GFXIP9.4.3. Signed-off-by: Rajneesh Bhardwaj --- drivers/gpu/drm/ttm/ttm_bo.c | 10 +- drivers/gpu/drm/ttm/ttm_device.c | 3 ++- 2 files changed, 11 insertions(+), 2 deletions(-) diff

[PATCH] drm/amdkfd: Fix CRIU restore op due to doorbell offset

2022-09-07 Thread Rajneesh Bhardwaj
ommit 15bcfbc55b57 ("drm/amdkfd: Allocate doorbells only when needed")' Signed-off-by: Rajneesh Bhardwaj --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 8 drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c | 4 +++- drivers/gpu/drm/amd/amdkfd/kfd_process

[Patch v2] drm/amdkfd: Fix CRIU restore op due to doorbell offset

2022-09-07 Thread Rajneesh Bhardwaj
ommit 15bcfbc55b57 ("drm/amdkfd: Allocate doorbells only when needed")' Signed-off-by: Rajneesh Bhardwaj --- Changes in v2: * Addressed review feedback from Felix drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 6 ++ drivers/gpu/drm/amd/amdkfd/kfd_doorbel

[PATCH] drm/amdgpu: Refactor code to handle non coherent and uncached

2022-07-18 Thread Rajneesh Bhardwaj
This simplifies existing coherence handling for Arcturus and Aldabaran to account for !coherent && uncached scenarios. Cc: Joseph Greathouse Cc: Alex Deucher Signed-off-by: Rajneesh Bhardwaj --- .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 53 +-- 1 file cha

[PATCH] drm/amdgpu: Avoid direct cast to amdgpu_ttm_tt

2022-07-26 Thread Rajneesh Bhardwaj
For typesafety, use container_of() instead of implicit cast from struct ttm_tt to struct amdgpu_ttm_tt. Cc: Christian König Signed-off-by: Rajneesh Bhardwaj --- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 44 ++--- 1 file changed, 24 insertions(+), 20 deletions(-) diff --git

[Patch v2] drm/amdgpu: Avoid direct cast to amdgpu_ttm_tt

2022-07-27 Thread Rajneesh Bhardwaj
For typesafety, use container_of() instead of implicit cast from struct ttm_tt to struct amdgpu_ttm_tt. Cc: Christian König Signed-off-by: Rajneesh Bhardwaj --- Changes in v2: * Fixed a bug that Felix pointed out in V1 by updating the macro definition drivers/gpu/drm/amd/amdgpu

[PATCH] drm/amdgpu: Fix the kerneldoc description

2022-11-05 Thread Rajneesh Bhardwaj
amdgpu_ttm_tt_set_userptr() is also called by the KFD as part it initializing the user pages for userptr BOs and also while initializing the GPUVM for a KFD process so update the function description. Signed-off-by: Rajneesh Bhardwaj --- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 5 +++-- 1 file

[PATCH] drm/ttm: Use init_on_free to early release TTM BOs

2023-07-05 Thread Rajneesh Bhardwaj
Early release TTM BOs when the kernel default setting is init_on_free to wipe out and reinitialize system memory chunks. This could potentially optimize performance when an application does a lot of malloc/free style allocations with unified system memory. Signed-off-by: Rajneesh Bhardwaj

[Patch v2] drm/ttm: Use init_on_free to delay release TTM BOs

2023-07-07 Thread Rajneesh Bhardwaj
. Reviewed-by: Christian König . Signed-off-by: Rajneesh Bhardwaj --- Changes in v2: - Updated commit message as per Christian's feedback drivers/gpu/drm/ttm/ttm_bo.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index 326a3d1

[PATCH] drm/amdgpu: Rework memory limits to allow big allocations

2023-08-21 Thread Rajneesh Bhardwaj
, leave 1GB exclusively outside ROCm allocations i.e. on 16GB system, >14 GB can be used by ROCm still leaving some memory for other system applications and on 128GB systems (e.g. GFXIP 9.4.3 APU in NPS1 mode) nearly >120GB can be used by ROCm. Signed-off-by: Rajneesh Bhardwaj --- .../gpu/d

[PATCH] drm/amdgpu: Hide xcp partition sysfs under SRIOV

2023-08-24 Thread Rajneesh Bhardwaj
XCP partitions should not be visible for the VF for GFXIP 9.4.3. Signed-off-by: Rajneesh Bhardwaj --- drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c | 11 --- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c b/drivers/gpu/drm/amd/amdgpu

[PATCH] drm/amdgpu/gmc: Fix spelling mistake.

2020-04-15 Thread Rajneesh Bhardwaj
Fixes a minor typo in the file. Reviewed-by: Christian König Reviewed-by: Alex Deucher Signed-off-by: Rajneesh Bhardwaj --- drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c b

[Patch v1 2/5] drm/amdgpu: Fix missing error check in suspend

2020-01-27 Thread Rajneesh Bhardwaj
amdgpu_device_suspend might return an error code since it can be called from both runtime and system suspend flows. Add the missing return code in case of a failure. Signed-off-by: Rajneesh Bhardwaj --- drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 3 +++ 1 file changed, 3 insertions(+) diff --git

  1   2   >