[PATCH Review 1/1] drm/amdkfd: Use mode1 reset for GFX v9.4.4

2024-07-07 Thread Stanley . Yang
GFX v9.4.4 uses mode1 reset to handle poison consumption. Signed-off-by: Stanley.Yang --- drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c b/drivers/gpu/drm/amd/amdkfd/kfd_in

RE: [PATCH Review 1/1] drm/amdkfd: Use mode1 reset for GFX v9.4.4

2024-07-07 Thread Zhang, Hawking
[AMD Official Use Only - AMD Internal Distribution Only] Reviewed-by: Hawking Zhang Regards, Hawking -Original Message- From: Yang, Stanley Sent: Monday, July 8, 2024 10:02 To: amd-gfx@lists.freedesktop.org; Zhang, Hawking Cc: Yang, Stanley Subject: [PATCH Review 1/1] drm/amdkfd: Use

RE: [PATCH 1/4] drm/amdgpu: refine amdgpu ras event id core code

2024-07-07 Thread Zhang, Hawking
[AMD Official Use Only - AMD Internal Distribution Only] We will need to change RAS_EVENT_TYPE_ISR to RAS_EVENT_TYPE_FATAL to differentiate the upcoming correctable error ISR event. And please also align the terminology in code to change recovery_event to fatal_error_event, i.e., amdgpu_ras_get

[PATCH v4 3/4] drm/amdgpu: add ras POSION_CONSUMPTION event id support

2024-07-07 Thread Yang Wang
add amdgpu ras POSION_CONSUMPTION event id support. Signed-off-by: Yang Wang Reviewed-by: Tao Zhou --- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 16 +--- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h | 1 + drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c | 15

[PATCH v4 4/4] drm/amdgpu: add ras event state device attribute support

2024-07-07 Thread Yang Wang
add amdgpu ras 'event_state' sysfs device attribute support Signed-off-by: Yang Wang Reviewed-by: Tao Zhou --- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 56 +++-- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h | 7 +++- 2 files changed, 58 insertions(+), 5 deletions(-) diff --git

[PATCH v4 1/4] drm/amdgpu: refine amdgpu ras event id core code

2024-07-07 Thread Yang Wang
v1: - use unified event id to manage ras events - add a new function amdgpu_ras_query_error_status_with_event() to accept event type as parameter. v2: add a warn log to show the location of function failure when calling amdgpu_ras_mark_event(). (Tao Zhou) v3: change RAS_EVENT_TYPE_ISR to RAS_EV

[PATCH v4 2/4] drm/amdgpu: add ras POSION_CREATION event id support

2024-07-07 Thread Yang Wang
add amdgpu ras POSION_CREATION event id support. Signed-off-by: Yang Wang Reviewed-by: Tao Zhou --- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 17 ++--- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h | 1 + 2 files changed, 15 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd

RE: [PATCH v4 3/4] drm/amdgpu: add ras POSION_CONSUMPTION event id support

2024-07-07 Thread Zhang, Hawking
[AMD Official Use Only - AMD Internal Distribution Only] amdgpu_ras_get_fatal_error_event(struct amdgpu_device if (amdgpu_ras_intr_triggered()) return RAS_EVENT_TYPE_FATAL; else - return RAS_EVENT_TYPE_INVALID; + return RAS_EVENT_TYPE_POI

RE: [PATCH v4 3/4] drm/amdgpu: add ras POSION_CONSUMPTION event id support

2024-07-07 Thread Wang, Yang(Kevin)
[AMD Official Use Only - AMD Internal Distribution Only] -Original Message- From: Zhang, Hawking Sent: Monday, July 8, 2024 1:06 PM To: Wang, Yang(Kevin) ; amd-gfx@lists.freedesktop.org Cc: Zhou1, Tao Subject: RE: [PATCH v4 3/4] drm/amdgpu: add ras POSION_CONSUMPTION event id support [

Re:Proposal to add CRIU support to DRM render nodes

2024-07-07 Thread 周春明(日月)
Hi Felix, When I learn CRIU you introduced in https://github.com/checkpoint-restore/criu/tree/criu-dev/plugins/amdgpu , there is a sentence "ROCm manages memory in the form of buffer objects (BOs). We are also working on