From: Trigger Huang
The current dev coredump implementation sometimes cannot fully satisfy
customer's requirements due to:
1, dev coredump is called in GPU reset function, so if GPU reset is disabled,
the dev coredump is also disabled
2, When job timeout happened, the dump GPU status will be ha
From: Trigger Huang
Do the coredump immediately after a job timeout to get a closer
representation of GPU's error status.
V2: This will skip printing vram_lost as the GPU reset is not
happened yet (Alex)
V3: Unconditionally call the core dump as we care about all the reset
functions(soft-recove
From: Trigger Huang
The vm lost status can only be obtained after a GPU reset occurs, but
sometimes a dev core dump can be happened before GPU reset. So a new
argument is added to tell the dev core dump implementation whether to
skip printing the vram_lost status in the dump.
And this patch is al
From: Trigger Huang
The current dev coredump implementation sometimes cannot fully satisfy
customer's requirements due to:
1, dev coredump is called in GPU reset function, so if GPU reset is disabled,
the dev coredump is also disabled
2, When job timeout happened, the dump GPU status will be ha
From: Trigger Huang
The vm lost status can only be obtained after a GPU reset occurs, but
sometimes a dev core dump can be happened before GPU reset. So a new
argument is added to tell the dev core dump implementation whether to
skip printing the vram_lost status in the dump.
And this patch is al
From: Trigger Huang
Do the coredump immediately after a job timeout to get a closer
representation of GPU's error status.
V2: This will skip printing vram_lost as the GPU reset is not
happened yet (Alex)
V3: Unconditionally call the core dump as we care about all the reset
functions(soft-recove
From: Trigger Huang
Currently the funcs variable of a gfx software ring is not set. So
if it is visited somewhere, it will lead to error logic being
executed. For example, if we want to call some callbacks in funcs of
a gfx software ring, like per ring reset, it will be failed due to
funcs is NUL
From: Trigger Huang
Add new separate parameter to control GPU coredump procedure. This can
be used to decouple the coredump procedure from gpu recovery procedure
Signed-off-by: Trigger Huang
---
drivers/gpu/drm/amd/amdgpu/amdgpu.h | 1 +
drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 8
From: Trigger Huang
Put ip dump and register to dev_coredumpm together
Signed-off-by: Trigger Huang
---
drivers/gpu/drm/amd/amdgpu/amdgpu.h| 2 +
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 73 ++
2 files changed, 75 insertions(+)
diff --git a/drivers/gpu/drm/amd
From: Trigger Huang
Do the coredump immediately after a job timeout to get a closer
representation of GPU's error status. For other code paths that
need to do the coredump, keep the original logic unchanged, except:
1,All the coredump operations will be under the control of parameter
amdgpu_gpu_c
From: Trigger Huang
The current dev coredump implementation sometimes cannot fully satisfy
customer's requirements due to:
1, The enablement of dev coredump is under the control of gpu_recovery.
Customer can not do dev coredump with gpu_recovery disabled
2, When job timeout happened, the dump G
From: Trigger Huang
The current dev coredump implementation sometimes cannot fully satisfy
customer's requirements due to:
1, dev coredump is under the control of gpu_recovery, thinking about the
following application scenarios:
1), Customer may need to do the core dump with gpu_recover
From: Trigger Huang
Add new separate parameter to control GPU coredump procedure. This can
be used to decouple the coredump procedure from gpu recovery procedure
V2: enable gpu_coredump by default (Alex)
Signed-off-by: Trigger Huang
---
drivers/gpu/drm/amd/amdgpu/amdgpu.h | 1 +
drivers/g
From: Trigger Huang
Do the dev core dump if gpu_coredump is enabled
Signed-off-by: Trigger Huang
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
b/drivers/gpu/drm/amd/amdgpu/amdgp
From: Trigger Huang
Do the coredump immediately after a job timeout to get a closer
representation of GPU's error status.
V2: This will skip printing vram_lost as the GPU reset is not
happened yet (Alex)
Signed-off-by: Trigger Huang
---
drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 64 +++
From: Trigger Huang
The vm lost status can only be obtained after a GPU reset occurs, but
sometimes a dev core dump can be happened before GPU reset. So a new
argument is added to tell the dev core dump implementation whether to
skip printing the vram_lost status in the dump.
And this patch is al
16 matches
Mail list logo