On 12/14/2017 03:52 AM, Christian König wrote:
Am 14.12.2017 um 08:19 schrieb Liu, Monk:Problem with this is that amdgpu_check_soft_reset will not be called, this function which prints which IP block was hung even when later we opt not to recover.I suggest instead to add a bool force_reset parameter to amdgpu_gpu_recover which will override amdgpu_gpu_recovery and we can set it to true from amdgpu_debugfs_gpu_recover only.[ML] why you need "check_soft_reset" be called ? I think soft reset checking is useless totally ... because with TDR feature, the only thing canTell you if GPU hang is time out warning.And that's exactly why we call it, we just want the warning in the logs. Christian.
Just a ping for any more comments or a RB. Thanks, Andrey
For soft checking, it only shows you if some IP is busy or not, but busy may not prove the engine is hang , it may just busy ....BR Monk -----Original Message----- From: Grodzovsky, Andrey Sent: 2017年12月13日 20:53To: Koenig, Christian <christian.koe...@amd.com>; amd-gfx@lists.freedesktop.orgCc: Liu, Monk <monk....@amd.com>; mar...@gmail.com Subject: Re: [PATCH] drm/amdgpu: Add gpu_recovery parameter On 12/13/2017 07:20 AM, Christian König wrote:Problem with this is that amdgpu_check_soft_reset will not be called, this function which prints which IP block was hung even when later we opt not to recover. I suggest instead to add a bool force_reset parameter to amdgpu_gpu_recover which will override amdgpu_gpu_recovery and we can set it to true from amdgpu_debugfs_gpu_recover only.Am 12.12.2017 um 20:16 schrieb Andrey Grodzovsky:Add new parameter to control GPU recovery procedure. Retire old way of disabling GPU recovery by setting lockup_timeout == 0 and set default for lockup_timeout to 10s. Signed-off-by: Andrey Grodzovsky <andrey.grodzov...@amd.com> --- drivers/gpu/drm/amd/amdgpu/amdgpu.h | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 5 +++++ drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 8 ++++++-- 3 files changed, 12 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index 3735500..26abe03 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -126,6 +126,7 @@ extern int amdgpu_param_buf_per_se; extern int amdgpu_job_hang_limit; extern int amdgpu_lbpw; extern int amdgpu_compute_multipipe; +extern int amdgpu_gpu_recovery; #ifdef CONFIG_DRM_AMDGPU_SI extern int amdgpu_si_support; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 8d03baa..d84b57a 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -3030,6 +3030,11 @@ int amdgpu_gpu_recover(struct amdgpu_device *adev, struct amdgpu_job *job) return 0; } + if (!amdgpu_gpu_recovery) { + DRM_INFO("GPU recovery disabled.\n"); + return 0; + } +Please move this check into the caller of amdgpu_gpu_recover(). This way we can still trigger a GPU recovery manually or from the hypervisor under SRIOV. Christian.Thanks, Andreydev_info(adev->dev, "GPU reset begin!\n"); mutex_lock(&adev->lock_reset); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c index 0b039bd..5c612e9 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c @@ -90,7 +90,7 @@ int amdgpu_disp_priority = 0; int amdgpu_hw_i2c = 0; int amdgpu_pcie_gen2 = -1; int amdgpu_msi = -1; -int amdgpu_lockup_timeout = 0; +int amdgpu_lockup_timeout = 10000; int amdgpu_dpm = -1; int amdgpu_fw_load_type = -1; int amdgpu_aspm = -1; @@ -128,6 +128,7 @@ int amdgpu_param_buf_per_se = 0; int amdgpu_job_hang_limit = 0; int amdgpu_lbpw = -1; int amdgpu_compute_multipipe = -1; +int amdgpu_gpu_recovery = 1; MODULE_PARM_DESC(vramlimit, "Restrict VRAM for testing, in megabytes"); module_param_named(vramlimit, amdgpu_vram_limit, int, 0600); @@ -165,7 +166,7 @@ module_param_named(pcie_gen2, amdgpu_pcie_gen2, int, 0444); MODULE_PARM_DESC(msi, "MSI support (1 = enable, 0 = disable, -1 = auto)"); module_param_named(msi, amdgpu_msi, int, 0444); -MODULE_PARM_DESC(lockup_timeout, "GPU lockup timeout in ms (default 0 = disable)"); +MODULE_PARM_DESC(lockup_timeout, "GPU lockup timeout in ms (default 10000)"); module_param_named(lockup_timeout, amdgpu_lockup_timeout, int, 0444); MODULE_PARM_DESC(dpm, "DPM support (1 = enable, 0 = disable, -1 = auto)"); @@ -280,6 +281,9 @@ module_param_named(lbpw, amdgpu_lbpw, int, 0444); MODULE_PARM_DESC(compute_multipipe, "Force compute queues to be spread across pipes (1 = enable, 0 = disable, -1 = auto)"); module_param_named(compute_multipipe, amdgpu_compute_multipipe, int, 0444); +MODULE_PARM_DESC(gpu_recovery, "Enable GPU recovery mechanism, (1 = enable (default) , 0 = disable"); +module_param_named(gpu_recovery, amdgpu_gpu_recovery, int, 0444); + #ifdef CONFIG_DRM_AMDGPU_SI #if defined(CONFIG_DRM_RADEON) || defined(CONFIG_DRM_RADEON_MODULE)_______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx_______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
_______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx