On 2025-07-22 13:47, David Yat Sin wrote:
Add support for checkpoint/restore for SDMA queues of type
KFD_QUEUE_TYPE_SDMA_BY_ENG_ID.

Signed-off-by: David Yat Sin <david.yat...@amd.com>
---
  drivers/gpu/drm/amd/amdkfd/kfd_priv.h                  | 1 +
  drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c | 9 +++++++++
  2 files changed, 10 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index 67694bcd9464..837da09b5bec 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -1261,6 +1261,7 @@ struct kfd_criu_queue_priv_data {
        uint32_t doorbell_id;
        uint32_t gws;
        uint32_t sdma_id;
+       uint32_t sdma_engine_id;
        uint32_t eop_ring_buffer_size;
        uint32_t ctx_save_restore_area_size;
        uint32_t ctl_stack_size;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
index c643e0ccec52..fe4c48930aad 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
@@ -846,6 +846,14 @@ static int criu_checkpoint_queue(struct kfd_process_device 
*pdd,
q_data->sdma_id = q->sdma_id; + if ((q->properties.type == KFD_QUEUE_TYPE_SDMA ||
+                q->properties.type == KFD_QUEUE_TYPE_SDMA_XGMI)
+               && q->properties.sdma_engine_id) {
+               q_data->type = KFD_QUEUE_TYPE_SDMA_BY_ENG_ID;
+       }
+
+       q_data->sdma_engine_id = q->properties.sdma_engine_id;

Is this really needed? Isn't the sdma engine ID implied by the sdma_id? That should be sufficient to ensure that SDMA queues are restored on the same engine.

I think we should never see KFD_QUEUE_TYPE_SDMA_BY_ENG_ID when we take a CRIU checkpoint because that gets replaced by either KFD_QUEUE_TYPE_SDMA or KFD_QUEUE_TYPE_SDMA_XGMI in allocate_sdma_queue.

Regards,
  Felix


+
        q_data->eop_ring_buffer_address =
                q->properties.eop_ring_buffer_address;
@@ -972,6 +980,7 @@ static void set_queue_properties_from_criu(struct queue_properties *qp,
        qp->queue_size = q_data->q_size;
        qp->read_ptr = (uint32_t *) q_data->read_ptr_addr;
        qp->write_ptr = (uint32_t *) q_data->write_ptr_addr;
+       qp->sdma_engine_id = q_data->sdma_engine_id;
        qp->eop_ring_buffer_address = q_data->eop_ring_buffer_address;
        qp->eop_ring_buffer_size = q_data->eop_ring_buffer_size;
        qp->ctx_save_restore_area_address = 
q_data->ctx_save_restore_area_address;

Reply via email to