On 2023-11-22 17:35, Felix Kuehling wrote:

On 2023-11-03 09:11, James Zhu wrote:
Add queue remapping to force the waves in any running
processes to complete a CWSR trap.

Please add an explanation why this is needed.

[JZ] Even though the profiling-enabled bits is turned off, the CWSR trap handlers for some kernels with this process may still in running stage, this will

force the waves in any running processes to complete a CWSR trap, and make sure pc sampling is completely stopped with this process.   I will add it later.




Signed-off-by: James Zhu <james....@amd.com>
---
  drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 11 +++++++++++
  drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h |  5 +++++
  drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c          |  3 +++
  3 files changed, 19 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
index c0e71543389a..a3f57be63f4f 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
@@ -3155,6 +3155,17 @@ int debug_refresh_runlist(struct device_queue_manager *dqm)
      return debug_map_and_unlock(dqm);
  }
  +void remap_queue(struct device_queue_manager *dqm,
+                enum kfd_unmap_queues_filter filter,
+                uint32_t filter_param,
+                uint32_t grace_period)

Not sure if you need the filter and grace period parameters in this function. What's the point of exposing that to callers who just want to trigger a CWSR? You could also change the function name to reflect the purpose of the function, rather than the implementation.
[JZ] Just want to create a general function in case that used by others. I am fine to remove passing filter_param/grace_period

Regards,
  Felix


+{
+    dqm_lock(dqm);
+    if (!dqm->dev->kfd->shared_resources.enable_mes)
+        execute_queues_cpsch(dqm, filter, filter_param, grace_period);
+    dqm_unlock(dqm);
+}
+
  #if defined(CONFIG_DEBUG_FS)
    static void seq_reg_dump(struct seq_file *m,
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
index cf7e182588f8..f8aae3747a36 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
@@ -303,6 +303,11 @@ int debug_lock_and_unmap(struct device_queue_manager *dqm);
  int debug_map_and_unlock(struct device_queue_manager *dqm);
  int debug_refresh_runlist(struct device_queue_manager *dqm);
  +void remap_queue(struct device_queue_manager *dqm,
+                enum kfd_unmap_queues_filter filter,
+                uint32_t filter_param,
+                uint32_t grace_period);
+
  static inline unsigned int get_sh_mem_bases_32(struct kfd_process_device *pdd)
  {
      return (pdd->lds_base >> 16) & 0xFF;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
index e8f0559b618e..66670cdb813a 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
@@ -24,6 +24,7 @@
  #include "kfd_priv.h"
  #include "amdgpu_amdkfd.h"
  #include "kfd_pc_sampling.h"
+#include "kfd_device_queue_manager.h"
    struct supported_pc_sample_info {
      uint32_t ip_version;
@@ -164,6 +165,8 @@ static int kfd_pc_sample_stop(struct kfd_process_device *pdd,
cancel_work_sync(&pdd->dev->pcs_data.hosttrap_entry.base.pc_sampling_work);
kfd_process_set_trap_pc_sampling_flag(&pdd->qpd,
pdd->dev->pcs_data.hosttrap_entry.base.pc_sample_info.method, false);
+        remap_queue(pdd->dev->dqm,
+            KFD_UNMAP_QUEUES_FILTER_ALL_QUEUES, 0, USE_DEFAULT_GRACE_PERIOD);
            mutex_lock(&pdd->dev->pcs_data.mutex);
pdd->dev->pcs_data.hosttrap_entry.base.target_simd = 0;

Reply via email to