Taking a look at this again, it's not an indexing issue, it's a placement 
problem.  I don't think this solution will work if we expect to pstate switch 
all gpus.

>From amdgpu_kms.c,  amdgpu_register_gpu_instance comes after 
>amdgpu_device_init.  This means we'll never reach mgpu_info.num_gpu == 
>adev->gmc.xgmi.num_physical_nodes until in this code path.

From: Kim, Jonathan
Sent: Tuesday, November 5, 2019 12:07 PM
To: Strawbridge, Michael <michael.strawbri...@amd.com>; Quan, Evan 
<evan.q...@amd.com>; amd-gfx@lists.freedesktop.org
Subject: RE: [PATCH 2/2] drm/amdgpu: perform p-state switch after the whole 
hive initialized

+                       for (i = 0; i < mgpu_info.num_gpu; i++) {
Also while num_physical_nodes should be used here instead.  It doesn't make 
sense to have a pre-condition check i.e. (if (mgpu_info.num_dgpu == 
adev->gmc.xgmi.num_physical_nodes - 1) ) against the total number of nodes 
while the loop checks for the current num gpus from mgpu_info.  This will fail 
to set the last node to low pstate for the same 0-indexing issue once the 
pre-condition passes.

+                               gpu_instance = &(mgpu_info.gpu_ins[i]);
+                               if (gpu_instance->adev->flags & AMD_IS_APU)
+                                       continue;
+

From: Kim, Jonathan
Sent: Tuesday, November 5, 2019 11:42 AM
To: Strawbridge, Michael 
<michael.strawbri...@amd.com<mailto:michael.strawbri...@amd.com>>; Quan, Evan 
<evan.q...@amd.com<mailto:evan.q...@amd.com>>; 
amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>
Subject: RE: [PATCH 2/2] drm/amdgpu: perform p-state switch after the whole 
hive initialized

Yes that's correct.  This should fix the issue.

Jon

From: Strawbridge, Michael 
<michael.strawbri...@amd.com<mailto:michael.strawbri...@amd.com>>
Sent: Tuesday, November 5, 2019 11:40 AM
To: Kim, Jonathan <jonathan....@amd.com<mailto:jonathan....@amd.com>>; Quan, 
Evan <evan.q...@amd.com<mailto:evan.q...@amd.com>>; 
amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>
Subject: Re: [PATCH 2/2] drm/amdgpu: perform p-state switch after the whole 
hive initialized

Hi Jon,

Does that mean we can simply use this instead?

+               if (mgpu_info.num_dgpu == adev->gmc.xgmi.num_physical_nodes - 
1) {

Thanks,
Michael
________________________________
From: Kim, Jonathan <jonathan....@amd.com<mailto:jonathan....@amd.com>>
Sent: 05 November 2019 11:32 AM
To: Quan, Evan <evan.q...@amd.com<mailto:evan.q...@amd.com>>; 
amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org> 
<amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>>
Cc: Strawbridge, Michael 
<michael.strawbri...@amd.com<mailto:michael.strawbri...@amd.com>>
Subject: RE: [PATCH 2/2] drm/amdgpu: perform p-state switch after the whole 
hive initialized

Please see inline.

Jon

-----Original Message-----
From: Quan, Evan <evan.q...@amd.com<mailto:evan.q...@amd.com>>
Sent: Tuesday, November 5, 2019 5:24 AM
To: amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>
Cc: Kim, Jonathan <jonathan....@amd.com<mailto:jonathan....@amd.com>>; 
Strawbridge, Michael 
<michael.strawbri...@amd.com<mailto:michael.strawbri...@amd.com>>; Quan, Evan 
<evan.q...@amd.com<mailto:evan.q...@amd.com>>
Subject: [PATCH 2/2] drm/amdgpu: perform p-state switch after the whole hive 
initialized

P-state switch should be performed after all devices from the hive get 
initialized.

Change-Id: Ifc7cac9ef0cf250447d2a412da35d601e2ac79ec
Signed-off-by: Evan Quan <evan.q...@amd.com<mailto:evan.q...@amd.com>>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 47 ++++++++++++++++------
 1 file changed, 35 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index e6ce949670e5..2d72d206cead 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -2090,6 +2090,7 @@ static int amdgpu_device_enable_mgpu_fan_boost(void)
  */
 static int amdgpu_device_ip_late_init(struct amdgpu_device *adev)  {
+       struct amdgpu_gpu_instance *gpu_instance;
         int i = 0, r;

         for (i = 0; i < adev->num_ip_blocks; i++) { @@ -2115,6 +2116,40 @@ 
static int amdgpu_device_ip_late_init(struct amdgpu_device *adev)
         if (r)
                 DRM_ERROR("enable mgpu fan boost failed (%d).\n", r);

+
+       if (adev->gmc.xgmi.num_physical_nodes > 1) {
+               mutex_lock(&mgpu_info.mutex);
+
+               /*
+                * Reset device p-state to low as this was booted with high.
+                *
+                * This should be performed only after all devices from the same
+                * hive get initialized.
+                *
+                * However, it's unknown how many device in the hive in advance.
+                * As this is counted one by one during devices initializations.
+                *
+                * So, we wait for all XGMI interlinked devices initialized.
+                * This may bring some delays as those devices may come from
+                * different hives. But that should be OK.
+                */
+               if (mgpu_info.num_dgpu == adev->gmc.xgmi.num_physical_nodes) {
[JK] This condition will never succeed.  mgpu_info.num_dgpu is 0-indexed while 
num_physical_nodes give total nodes.

+                       for (i = 0; i < mgpu_info.num_gpu; i++) {
+                               gpu_instance = &(mgpu_info.gpu_ins[i]);
+                               if (gpu_instance->adev->flags & AMD_IS_APU)
+                                       continue;
+
+                               r = amdgpu_xgmi_set_pstate(gpu_instance->adev, 
0);
+                               if (r) {
+                                       DRM_ERROR("pstate setting failed 
(%d).\n", r);
+                                       break;
+                               }
+                       }
+               }
+
+               mutex_unlock(&mgpu_info.mutex);
+       }
+
         return 0;
 }

@@ -2227,18 +2262,6 @@ static void 
amdgpu_device_delayed_init_work_handler(struct work_struct *work)
         r = amdgpu_ib_ring_tests(adev);
         if (r)
                 DRM_ERROR("ib ring test failed (%d).\n", r);
-
-       /*
-        * set to low pstate by default
-        * This should be performed after all devices from
-        * XGMI finish their initializations. Thus it's moved
-        * to here.
-        * The time delay is 2S. TODO: confirm whether that
-        * is enough for all possible XGMI setups.
-        */
-       r = amdgpu_xgmi_set_pstate(adev, 0);
-       if (r)
-               DRM_ERROR("pstate setting failed (%d).\n", r);
 }

 static void amdgpu_device_delay_enable_gfx_off(struct work_struct *work)
--
2.23.0
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Reply via email to