Taking a look at this again, it's not an indexing issue, it's a placement problem. I don't think this solution will work if we expect to pstate switch all gpus.
>From amdgpu_kms.c, amdgpu_register_gpu_instance comes after >amdgpu_device_init. This means we'll never reach mgpu_info.num_gpu == >adev->gmc.xgmi.num_physical_nodes until in this code path. From: Kim, Jonathan Sent: Tuesday, November 5, 2019 12:07 PM To: Strawbridge, Michael <michael.strawbri...@amd.com>; Quan, Evan <evan.q...@amd.com>; amd-gfx@lists.freedesktop.org Subject: RE: [PATCH 2/2] drm/amdgpu: perform p-state switch after the whole hive initialized + for (i = 0; i < mgpu_info.num_gpu; i++) { Also while num_physical_nodes should be used here instead. It doesn't make sense to have a pre-condition check i.e. (if (mgpu_info.num_dgpu == adev->gmc.xgmi.num_physical_nodes - 1) ) against the total number of nodes while the loop checks for the current num gpus from mgpu_info. This will fail to set the last node to low pstate for the same 0-indexing issue once the pre-condition passes. + gpu_instance = &(mgpu_info.gpu_ins[i]); + if (gpu_instance->adev->flags & AMD_IS_APU) + continue; + From: Kim, Jonathan Sent: Tuesday, November 5, 2019 11:42 AM To: Strawbridge, Michael <michael.strawbri...@amd.com<mailto:michael.strawbri...@amd.com>>; Quan, Evan <evan.q...@amd.com<mailto:evan.q...@amd.com>>; amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org> Subject: RE: [PATCH 2/2] drm/amdgpu: perform p-state switch after the whole hive initialized Yes that's correct. This should fix the issue. Jon From: Strawbridge, Michael <michael.strawbri...@amd.com<mailto:michael.strawbri...@amd.com>> Sent: Tuesday, November 5, 2019 11:40 AM To: Kim, Jonathan <jonathan....@amd.com<mailto:jonathan....@amd.com>>; Quan, Evan <evan.q...@amd.com<mailto:evan.q...@amd.com>>; amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org> Subject: Re: [PATCH 2/2] drm/amdgpu: perform p-state switch after the whole hive initialized Hi Jon, Does that mean we can simply use this instead? + if (mgpu_info.num_dgpu == adev->gmc.xgmi.num_physical_nodes - 1) { Thanks, Michael ________________________________ From: Kim, Jonathan <jonathan....@amd.com<mailto:jonathan....@amd.com>> Sent: 05 November 2019 11:32 AM To: Quan, Evan <evan.q...@amd.com<mailto:evan.q...@amd.com>>; amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org> <amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>> Cc: Strawbridge, Michael <michael.strawbri...@amd.com<mailto:michael.strawbri...@amd.com>> Subject: RE: [PATCH 2/2] drm/amdgpu: perform p-state switch after the whole hive initialized Please see inline. Jon -----Original Message----- From: Quan, Evan <evan.q...@amd.com<mailto:evan.q...@amd.com>> Sent: Tuesday, November 5, 2019 5:24 AM To: amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org> Cc: Kim, Jonathan <jonathan....@amd.com<mailto:jonathan....@amd.com>>; Strawbridge, Michael <michael.strawbri...@amd.com<mailto:michael.strawbri...@amd.com>>; Quan, Evan <evan.q...@amd.com<mailto:evan.q...@amd.com>> Subject: [PATCH 2/2] drm/amdgpu: perform p-state switch after the whole hive initialized P-state switch should be performed after all devices from the hive get initialized. Change-Id: Ifc7cac9ef0cf250447d2a412da35d601e2ac79ec Signed-off-by: Evan Quan <evan.q...@amd.com<mailto:evan.q...@amd.com>> --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 47 ++++++++++++++++------ 1 file changed, 35 insertions(+), 12 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index e6ce949670e5..2d72d206cead 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -2090,6 +2090,7 @@ static int amdgpu_device_enable_mgpu_fan_boost(void) */ static int amdgpu_device_ip_late_init(struct amdgpu_device *adev) { + struct amdgpu_gpu_instance *gpu_instance; int i = 0, r; for (i = 0; i < adev->num_ip_blocks; i++) { @@ -2115,6 +2116,40 @@ static int amdgpu_device_ip_late_init(struct amdgpu_device *adev) if (r) DRM_ERROR("enable mgpu fan boost failed (%d).\n", r); + + if (adev->gmc.xgmi.num_physical_nodes > 1) { + mutex_lock(&mgpu_info.mutex); + + /* + * Reset device p-state to low as this was booted with high. + * + * This should be performed only after all devices from the same + * hive get initialized. + * + * However, it's unknown how many device in the hive in advance. + * As this is counted one by one during devices initializations. + * + * So, we wait for all XGMI interlinked devices initialized. + * This may bring some delays as those devices may come from + * different hives. But that should be OK. + */ + if (mgpu_info.num_dgpu == adev->gmc.xgmi.num_physical_nodes) { [JK] This condition will never succeed. mgpu_info.num_dgpu is 0-indexed while num_physical_nodes give total nodes. + for (i = 0; i < mgpu_info.num_gpu; i++) { + gpu_instance = &(mgpu_info.gpu_ins[i]); + if (gpu_instance->adev->flags & AMD_IS_APU) + continue; + + r = amdgpu_xgmi_set_pstate(gpu_instance->adev, 0); + if (r) { + DRM_ERROR("pstate setting failed (%d).\n", r); + break; + } + } + } + + mutex_unlock(&mgpu_info.mutex); + } + return 0; } @@ -2227,18 +2262,6 @@ static void amdgpu_device_delayed_init_work_handler(struct work_struct *work) r = amdgpu_ib_ring_tests(adev); if (r) DRM_ERROR("ib ring test failed (%d).\n", r); - - /* - * set to low pstate by default - * This should be performed after all devices from - * XGMI finish their initializations. Thus it's moved - * to here. - * The time delay is 2S. TODO: confirm whether that - * is enough for all possible XGMI setups. - */ - r = amdgpu_xgmi_set_pstate(adev, 0); - if (r) - DRM_ERROR("pstate setting failed (%d).\n", r); } static void amdgpu_device_delay_enable_gfx_off(struct work_struct *work) -- 2.23.0
_______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx