Any comments?

I believe this is a nice stability improvement. In case of VM faults
they don't take down the whole GPU with an interrupt storm. With KFD we
can recover without a GPU reset in many cases just by unmapping the
offending process' queues.

Regards,
  Felix


On 17-07-03 05:11 PM, Felix Kuehling wrote:
> From: Jay Cornwall <jay.cornw...@amd.com>
>
> A subset of VM fault types currently send retry XNACK to the client.
> This causes a storm of interrupts from the VM to the host.
>
> Until the storm is throttled by other means send no-retry XNACK for
> all fault types instead. No change in behavior to the client which
> will stall indefinitely with the current configuration in any case.
> Improves system stability under GC or MMHUB faults.
>
> Signed-off-by: Jay Cornwall <jay.cornw...@amd.com>
> Reviewed-by: Felix Kuehling <felix.kuehl...@amd.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c | 3 +++
>  drivers/gpu/drm/amd/amdgpu/mmhub_v1_0.c  | 3 +++
>  2 files changed, 6 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c 
> b/drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c
> index a42f483..f957b18 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c
> @@ -206,6 +206,9 @@ static void gfxhub_v1_0_setup_vmid_config(struct 
> amdgpu_device *adev)
>               tmp = REG_SET_FIELD(tmp, VM_CONTEXT1_CNTL,
>                               PAGE_TABLE_BLOCK_SIZE,
>                               adev->vm_manager.block_size - 9);
> +             /* Send no-retry XNACK on fault to suppress VM fault storm. */
> +             tmp = REG_SET_FIELD(tmp, VM_CONTEXT1_CNTL,
> +                                 RETRY_PERMISSION_OR_INVALID_PAGE_FAULT, 0);
>               WREG32_SOC15_OFFSET(GC, 0, mmVM_CONTEXT1_CNTL, i, tmp);
>               WREG32_SOC15_OFFSET(GC, 0, 
> mmVM_CONTEXT1_PAGE_TABLE_START_ADDR_LO32, i*2, 0);
>               WREG32_SOC15_OFFSET(GC, 0, 
> mmVM_CONTEXT1_PAGE_TABLE_START_ADDR_HI32, i*2, 0);
> diff --git a/drivers/gpu/drm/amd/amdgpu/mmhub_v1_0.c 
> b/drivers/gpu/drm/amd/amdgpu/mmhub_v1_0.c
> index 01918dc..b760018 100644
> --- a/drivers/gpu/drm/amd/amdgpu/mmhub_v1_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/mmhub_v1_0.c
> @@ -222,6 +222,9 @@ static void mmhub_v1_0_setup_vmid_config(struct 
> amdgpu_device *adev)
>               tmp = REG_SET_FIELD(tmp, VM_CONTEXT1_CNTL,
>                               PAGE_TABLE_BLOCK_SIZE,
>                               adev->vm_manager.block_size - 9);
> +             /* Send no-retry XNACK on fault to suppress VM fault storm. */
> +             tmp = REG_SET_FIELD(tmp, VM_CONTEXT1_CNTL,
> +                                 RETRY_PERMISSION_OR_INVALID_PAGE_FAULT, 0);
>               WREG32_SOC15_OFFSET(MMHUB, 0, mmVM_CONTEXT1_CNTL, i, tmp);
>               WREG32_SOC15_OFFSET(MMHUB, 0, 
> mmVM_CONTEXT1_PAGE_TABLE_START_ADDR_LO32, i*2, 0);
>               WREG32_SOC15_OFFSET(MMHUB, 0, 
> mmVM_CONTEXT1_PAGE_TABLE_START_ADDR_HI32, i*2, 0);

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Reply via email to