Re: [PATCH v2] drm/amdgpu: Skip the coredump collection on reset during driver reload

Lazar, Lijo Tue, 16 Apr 2024 21:31:25 -0700

On 4/17/2024 9:43 AM, Ahmad Rehman wrote:
> In passthrough environment, the driver triggers the mode-1 reset on
> reload. The reset causes the core dump collection which is delayed task
> and prevents driver from unloading until it is completed. Since we do
> not need to collect data on "reset on reload" case, we can skip core
> dump collection.
> 
> v2: Use the same flag to avoid calling amdgpu_reset_reg_dumps as well.
> 
> Signed-off-by: Ahmad Rehman <ahmad.reh...@amd.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 7 +++++--
>  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c    | 1 +
>  drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h  | 1 +
>  3 files changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index 1b2e177bc2d6..c718982cffa8 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -5357,7 +5357,9 @@ int amdgpu_do_asic_reset(struct list_head 
> *device_list_handle,
>       /* Try reset handler method first */
>       tmp_adev = list_first_entry(device_list_handle, struct amdgpu_device,
>                                   reset_list);
> -     amdgpu_reset_reg_dumps(tmp_adev);
> +     
> +     if (!test_bit(AMDGPU_SKIP_COREDUMP, &reset_context->flags))
> +             amdgpu_reset_reg_dumps(tmp_adev);

This may be saved to some bool. Anyway,

Reviewed-by: Lijo Lazar <lijo.la...@amd.com>

Thanks,
Lijo

>  
>       reset_context->reset_device_list = device_list_handle;
>       r = amdgpu_reset_perform_reset(tmp_adev, reset_context);
> @@ -5430,7 +5432,8 @@ int amdgpu_do_asic_reset(struct list_head 
> *device_list_handle,
>  
>                               vram_lost = 
> amdgpu_device_check_vram_lost(tmp_adev);
>  
> -                             amdgpu_coredump(tmp_adev, vram_lost, 
> reset_context);
> +                             if (!test_bit(AMDGPU_SKIP_COREDUMP, 
> &reset_context->flags))
> +                                     amdgpu_coredump(tmp_adev, vram_lost, 
> reset_context);
>  
>                               if (vram_lost) {
>                                       DRM_INFO("VRAM is lost due to GPU 
> reset!\n");
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> index 6ea893ad9a36..c512f70b8272 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> @@ -2481,6 +2481,7 @@ static void 
> amdgpu_drv_delayed_reset_work_handler(struct work_struct *work)
>  
>       /* Use a common context, just need to make sure full reset is done */
>       set_bit(AMDGPU_SKIP_HW_RESET, &reset_context.flags);
> +     set_bit(AMDGPU_SKIP_COREDUMP, &reset_context.flags);
>       r = amdgpu_do_asic_reset(&device_list, &reset_context);
>  
>       if (r) {
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h
> index 66125d43cf21..b11d190ece53 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h
> @@ -32,6 +32,7 @@ enum AMDGPU_RESET_FLAGS {
>  
>       AMDGPU_NEED_FULL_RESET = 0,
>       AMDGPU_SKIP_HW_RESET = 1,
> +     AMDGPU_SKIP_COREDUMP = 2,
>  };
>  
>  struct amdgpu_reset_context {
Re: [PATCH v2] drm/amdgpu: Skip the coredump collection on reset during driver reload

Reply via email to