Re: [PATCH] drm/amdgpu: Fix lifetime of struct amdgpu_task_info after ring reset

2025-07-11 Thread André Almeida
Em 04/07/2025 00:06, André Almeida escreveu: When a ring reset happens, amdgpu calls drm_dev_wedged_event() using struct amdgpu_task_info *ti as one of the arguments. After using *ti, a call to amdgpu_vm_put_task_info(ti) is required to correctly track its lifetime. However, it's called f

[PATCH] drm/amdgpu: Fix lifetime of struct amdgpu_task_info after ring reset

2025-07-04 Thread André Almeida
s function if *ti isn't used. Fixes: a72002cb181f ("drm/amdgpu: Make use of drm_wedge_task_info") Reported-by: Dave Airlie Closes: https://lore.kernel.org/dri-devel/CAPM=9tz0rQP8VZWKWyuF8kUMqRScxqoa6aVdwWw9=5yyxyy...@mail.gmail.com/ Signed-off-by: André Almeida --- drivers/gpu/dr

Re: [PATCH 0/2] drm: amdgpu: Fix includes of

2025-06-25 Thread André Almeida
Hi Alex, Em 16/06/2025 03:59, Christian König escreveu: Acked-by: Christian König for the series. Can you add this series to amd-staging-drm-next? Thanks! On 6/13/25 20:26, André Almeida wrote: Commit 7d95680d64ac ("scripts/misc-check: check unnecessary #include when W=1")

Re: [PATCH v9 6/6] drm/amdgpu: Make use of drm_wedge_task_info

2025-06-18 Thread André Almeida
Hi Christian, Em 18/06/2025 04:29, Christian König escreveu: On 6/17/25 15:22, André Almeida wrote: Em 17/06/2025 10:07, Christian König escreveu: On 6/17/25 14:49, André Almeida wrote: To notify userspace about which task (if any) made the device get in a wedge state, make use of

Re: [PATCH v9 6/6] drm/amdgpu: Make use of drm_wedge_task_info

2025-06-18 Thread André Almeida
Em 17/06/2025 10:07, Christian König escreveu: On 6/17/25 14:49, André Almeida wrote: To notify userspace about which task (if any) made the device get in a wedge state, make use of drm_wedge_task_info parameter, filling it with the task PID and name. Signed-off-by: André Almeida Reviewed

Re: [PATCH v9 6/6] drm/amdgpu: Make use of drm_wedge_task_info

2025-06-18 Thread André Almeida
Em 17/06/2025 10:07, Christian König escreveu: On 6/17/25 14:49, André Almeida wrote: To notify userspace about which task (if any) made the device get in a wedge state, make use of drm_wedge_task_info parameter, filling it with the task PID and name. Signed-off-by: André Almeida Reviewed

Re: [PATCH v9 6/6] drm/amdgpu: Make use of drm_wedge_task_info

2025-06-18 Thread André Almeida
Em 17/06/2025 10:22, André Almeida escreveu: Em 17/06/2025 10:07, Christian König escreveu: On 6/17/25 14:49, André Almeida wrote: To notify userspace about which task (if any) made the device get in a wedge state, make use of drm_wedge_task_info parameter, filling it with the task PID and

[PATCH v8 3/6] drm: Create a task info option for wedge events

2025-06-17 Thread André Almeida
ife easier also notify what's the task's name in the user event. Acked-by: Rodrigo Vivi (for i915 and xe) Reviewed-by: Krzysztof Karas Reviewed-by: Raag Jadav Signed-off-by: André Almeida --- v8: Code style changes (Raag) v7: - Change `char *comm` to `char comm[TASK_COMM_LEN]` v6: -

[PATCH v8 2/6] drm: amdgpu: Create amdgpu_vm_print_task_info()

2025-06-17 Thread André Almeida
To avoid repetitive code in amdgpu, create a function that prints the content of struct amdgpu_task_info. Signed-off-by: André Almeida --- v8: drop the inline v7: new patch --- drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 4 +--- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 9 + drivers/gpu

[PATCH v9 2/6] drm: amdgpu: Create amdgpu_vm_print_task_info()

2025-06-17 Thread André Almeida
To avoid repetitive code in amdgpu, create a function that prints the content of struct amdgpu_task_info. Signed-off-by: André Almeida --- v8: drop the inline v7: new patch --- drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 4 +--- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 9 + drivers/gpu

[PATCH v9 6/6] drm/amdgpu: Make use of drm_wedge_task_info

2025-06-17 Thread André Almeida
To notify userspace about which task (if any) made the device get in a wedge state, make use of drm_wedge_task_info parameter, filling it with the task PID and name. Signed-off-by: André Almeida --- v8: - Drop check before calling amdgpu_vm_put_task_info() - Drop local variable `info` v7

[PATCH v9 5/6] drm: amdgpu: Use struct drm_wedge_task_info inside of struct amdgpu_task_info

2025-06-17 Thread André Almeida
To avoid a cast when calling drm_dev_wedged_event(), replace pid and task name inside of struct amdgpu_task_info with struct drm_wedge_task_info. Signed-off-by: André Almeida Reviewed-by: Christian König --- v7: New patch --- drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c | 2 +- drivers

[PATCH v9 0/6] drm: Create a task info option for wedge events

2025-06-17 Thread André Almeida
ength v3: - Make comm_string and pid_string empty when there's no app info - Change "app that caused ..." to "app involved ..." - Clarify that devcoredump have more information about what happened v2: - Rebased on top of drm/drm-next - Added new patch for documen

[PATCH v8 4/6] drm/doc: Add a section about "Task information" for the wedge API

2025-06-17 Thread André Almeida
Add a section about "Task information" for the wedge API. Reviewed-by: Krzysztof Karas Reviewed-by: Raag Jadav Signed-off-by: André Almeida --- v5: - Change app to task in the text as well v4: - Change APP to TASK v3: - Change "app that caused ..." to "app invo

[PATCH v8 1/6] drm: amdgpu: Allow NULL pointers at amdgpu_vm_put_task_info()

2025-06-17 Thread André Almeida
Allow NULL pointers at amdgpu_vm_put_task_info() as it common practice for "put" or "free" functions. This avoid an extra check for NULL for callers. Signed-off-by: André Almeida --- v8: New patch --- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 3 +++ 1 file changed, 3 inserti

[PATCH v8 5/6] drm: amdgpu: Use struct drm_wedge_task_info inside of struct amdgpu_task_info

2025-06-17 Thread André Almeida
To avoid a cast when calling drm_dev_wedged_event(), replace pid and task name inside of struct amdgpu_task_info with struct drm_wedge_task_info. Signed-off-by: André Almeida Reviewed-by: Christian König --- v7: New patch --- drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c | 2 +- drivers

[PATCH v9 1/6] drm: amdgpu: Allow NULL pointers at amdgpu_vm_put_task_info()

2025-06-17 Thread André Almeida
Allow NULL pointers at amdgpu_vm_put_task_info() as it common practice for "put" or "free" functions. This avoid an extra check for NULL for callers. Signed-off-by: André Almeida --- v9: use if (task) instead of if (ZERO_OR_NULL_PTR(task)) v8: New patch --- driver

[PATCH v9 3/6] drm: Create a task info option for wedge events

2025-06-17 Thread André Almeida
ife easier also notify what's the task's name in the user event. Acked-by: Rodrigo Vivi (for i915 and xe) Reviewed-by: Krzysztof Karas Reviewed-by: Raag Jadav Signed-off-by: André Almeida --- v8: Code style changes (Raag) v7: - Change `char *comm` to `char comm[TASK_COMM_LEN]` v6: -

[PATCH v9 4/6] drm/doc: Add a section about "Task information" for the wedge API

2025-06-17 Thread André Almeida
Add a section about "Task information" for the wedge API. Reviewed-by: Krzysztof Karas Reviewed-by: Raag Jadav Signed-off-by: André Almeida --- v5: - Change app to task in the text as well v4: - Change APP to TASK v3: - Change "app that caused ..." to "app invo

[PATCH v8 0/6] drm: Create a task info option for wedge events

2025-06-17 Thread André Almeida
;app that caused ..." to "app involved ..." - Clarify that devcoredump have more information about what happened v2: - Rebased on top of drm/drm-next - Added new patch for documentation André Almeida (6): drm: amdgpu: Allow NULL pointers at amdgpu_vm_put_task_info() drm: amdgpu

[PATCH v8 6/6] drm/amdgpu: Make use of drm_wedge_task_info

2025-06-17 Thread André Almeida
To notify userspace about which task (if any) made the device get in a wedge state, make use of drm_wedge_task_info parameter, filling it with the task PID and name. Signed-off-by: André Almeida --- v8: - Drop check before calling amdgpu_vm_put_task_info() - Drop local variable `info` v7

[PATCH v7 1/5] drm: amdgpu: Create amdgpu_vm_print_task_info()

2025-06-16 Thread André Almeida
To avoid repetitive code in amdgpu, create a function that prints the content of struct amdgpu_task_info. Signed-off-by: André Almeida --- v7: new patch --- drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 4 +--- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 9 + drivers/gpu/drm/amd/amdgpu

[PATCH v7 0/5] drm: Create a task info option for wedge events

2025-06-16 Thread André Almeida
27;s no app info - Change "app that caused ..." to "app involved ..." - Clarify that devcoredump have more information about what happened v2: - Rebased on top of drm/drm-next - Added new patch for documentation André Almeida (5): drm: amdgpu: Create amdgpu_vm_print_

[PATCH 2/2] drm/amd: Include when needed

2025-06-16 Thread André Almeida
Fix the following compile time warning when building with W=1: warning: EXPORT_SYMBOL() is used, but #include is missing Signed-off-by: André Almeida --- drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 1 + drivers/gpu/drm/amd/amdxcp/amdgpu_xcp_drv.c | 1 + 2 files changed, 2 insertions

[PATCH v7 4/5] drm: amdgpu: Use struct drm_wedge_task_info inside of struct amdgpu_task_info

2025-06-16 Thread André Almeida
To avoid a cast when calling drm_dev_wedged_event(), replace pid and task name inside of struct amdgpu_task_info with struct drm_wedge_task_info. Signed-off-by: André Almeida --- v7: New patch --- drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c | 2 +- drivers/gpu/drm/amd/amdgpu

[PATCH 1/2] drm/amd: Do not include when unused

2025-06-16 Thread André Almeida
Fix the following compile time warning when building with W=1: warning: EXPORT_SYMBOL() is not used, but #include is present Signed-off-by: André Almeida --- drivers/gpu/drm/amd/amdgpu/amdgpu_i2c.c | 1 - drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 1 - drivers/gpu/drm/amd/amdkfd

[PATCH v7 2/5] drm: Create a task info option for wedge events

2025-06-16 Thread André Almeida
ife easier also notify what's the task's name in the user event. Acked-by: Rodrigo Vivi (for i915 and xe) Reviewed-by: Krzysztof Karas Reviewed-by: Raag Jadav Signed-off-by: André Almeida --- v7: - Change `char *comm` to `char comm[TASK_COMM_LEN]` v6: - s/cause/involved - drop strin

[PATCH 0/2] drm: amdgpu: Fix includes of

2025-06-16 Thread André Almeida
w commits. See also: https://lore.kernel.org/dri-devel/20250612121633.229222-1-tzimmerm...@suse.de/ André Almeida (2): drm/amd: Do not include when unused drm/amd: Include when needed drivers/gpu/drm/amd/amdgpu/amdgpu_i2c.c | 1 - drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 1 +

[PATCH v7 3/5] drm/doc: Add a section about "Task information" for the wedge API

2025-06-16 Thread André Almeida
Add a section about "Task information" for the wedge API. Reviewed-by: Krzysztof Karas Reviewed-by: Raag Jadav Signed-off-by: André Almeida --- v5: - Change app to task in the text as well v4: - Change APP to TASK v3: - Change "app that caused ..." to "app invo

[PATCH v7 5/5] drm/amdgpu: Make use of drm_wedge_task_info

2025-06-16 Thread André Almeida
To notify userspace about which task (if any) made the device get in a wedge state, make use of drm_wedge_task_info parameter, filling it with the task PID and name. Signed-off-by: André Almeida --- v7: - Remove struct cast, now we can use `info = &ti->task` - Fix struct lifetim

[PATCH v6 3/3] drm/amdgpu: Make use of drm_wedge_task_info

2025-05-22 Thread André Almeida
To notify userspace about which task (if any) made the device get in a wedge state, make use of drm_wedge_task_info parameter, filling it with the task PID and name. Signed-off-by: André Almeida --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 19 +-- drivers/gpu/drm/amd/amdgpu

[PATCH v6 2/3] drm/doc: Add a section about "Task information" for the wedge API

2025-05-22 Thread André Almeida
Add a section about "Task information" for the wedge API. Reviewed-by: Krzysztof Karas Reviewed-by: Raag Jadav Signed-off-by: André Almeida --- v5: - Change app to task in the text as well v4: - Change APP to TASK v3: - Change "app that caused ..." to "app invo

Re: [PATCH v5 1/3] drm: Create a task info option for wedge events

2025-05-22 Thread André Almeida
Em 21/05/2025 06:11, Raag Jadav escreveu: On Tue, May 20, 2025 at 01:32:41PM -0300, André Almeida wrote: When a device get wedged, it might be caused by a guilty application. For userspace, knowing which task was the cause can be useful for some situations, like for implementing a policy, logs

[PATCH v6 0/3] drm: Create a task info option for wedge events

2025-05-22 Thread André Almeida
that caused ..." to "app involved ..." - Clarify that devcoredump have more information about what happened v2: - Rebased on top of drm/drm-next - Added new patch for documentation André Almeida (3): drm: Create a task info option for wedge events drm/doc: Add a section

[PATCH v6 1/3] drm: Create a task info option for wedge events

2025-05-22 Thread André Almeida
ier also notify what's the task's name in the user event. Acked-by: Rodrigo Vivi (for i915 and xe) Reviewed-by: Krzysztof Karas Signed-off-by: André Almeida --- v6: - s/app/task in a comment - add PID >= 0 check v5: - s/app/task for struct and commit message as well - move defines

[PATCH v5 3/3] drm/amdgpu: Make use of drm_wedge_task_info

2025-05-21 Thread André Almeida
To notify userspace about which task (if any) made the device get in a wedge state, make use of drm_wedge_task_info parameter, filling it with the task PID and name. Signed-off-by: André Almeida --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 19 +-- drivers/gpu/drm/amd/amdgpu

[PATCH v5 2/3] drm/doc: Add a section about "Task information" for the wedge API

2025-05-21 Thread André Almeida
Add a section about "Task information" for the wedge API. Reviewed-by: Krzysztof Karas Reviewed-by: Raag Jadav Signed-off-by: André Almeida --- v5: - Change app to task in the text as well v4: - Change APP to TASK v3: - Change "app that caused ..." to "app invo

[PATCH v5 1/3] drm: Create a task info option for wedge events

2025-05-21 Thread André Almeida
ier also notify what's the task's name in the user event. Acked-by: Rodrigo Vivi (for i915 and xe) Reviewed-by: Krzysztof Karas Signed-off-by: André Almeida --- v5: - s/app/task for struct and commit message as well - move defines to drm_drv.c - validates if comm is not NULL and it'

[PATCH v5 0/3] drm: Create a tas info option for wedge events

2025-05-21 Thread André Almeida
ore information about what happened v2: - Rebased on top of drm/drm-next - Added new patch for documentation André Almeida (3): drm: Create a task info option for wedge events drm/doc: Add a section about "Task information" for the wedge API drm/amdgpu: Make

[PATCH v4 2/3] drm/doc: Add a section about "App information" for the wedge API

2025-05-20 Thread André Almeida
Add a section about "App information" for the wedge API. Reviewed-by: Krzysztof Karas Reviewed-by: Raag Jadav Signed-off-by: André Almeida --- v4: - Change APP to TASK v3: - Change "app that caused ..." to "app involved ..." - Clarify that devcoredump ha

[PATCH v4 3/3] drm/amdgpu: Make use of drm_wedge_app_info

2025-05-20 Thread André Almeida
To notify userspace about which app (if any) made the device get in a wedge state, make use of drm_wedge_app_info parameter, filling it with the app PID and name. Signed-off-by: André Almeida --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 19 +-- drivers/gpu/drm/amd/amdgpu

[PATCH v4 1/3] drm: Create an app info option for wedge events

2025-05-20 Thread André Almeida
o notify what's the app's name in the user event. Acked-by: Rodrigo Vivi (for i915 and xe) Reviewed-by: Krzysztof Karas Signed-off-by: André Almeida --- v4: s/APP/TASK v3: Make comm_string and pid_string empty when there's no app info --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.

[PATCH v4 0/3] drm: Create an app info option for wedge events

2025-05-20 Thread André Almeida
- Make comm_string and pid_string empty when there's no app info - Change "app that caused ..." to "app involved ..." - Clarify that devcoredump have more information about what happened v2: - Rebased on top of drm/drm-next - Added new patch for documentation André Al

Re: [PATCH v2 1/3] drm: Create an app info option for wedge events

2025-05-13 Thread André Almeida
Hi Krzysztof, Thanks for the feedback. Em 12/05/2025 03:08, Krzysztof Karas escreveu: Hi André, [...] @@ -582,6 +584,14 @@ int drm_dev_wedged_event(struct drm_device *dev, unsigned long method) drm_info(dev, "device wedged, %s\n", method == DRM_WEDGE_RECOVERY_NONE ?

[PATCH v3 0/3] drm: Create an app info option for wedge events

2025-05-13 Thread André Almeida
Change "app that caused ..." to "app involved ..." - Clarify that devcoredump have more information about what happened v2: - Rebased on top of drm/drm-next - Added new patch for documentation André Almeida (3): drm: Create an app info option for wedge events drm/doc: Add a sectio

[PATCH v3 2/3] drm/doc: Add a section about "App information" for the wedge API

2025-05-13 Thread André Almeida
Add a section about "App information" for the wedge API. Signed-off-by: André Almeida --- v3: - Change "app that caused ..." to "app involved ..." - Clarify that devcoredump have more information about what happened - Update that PID and APP will b

[PATCH v3 3/3] drm/amdgpu: Make use of drm_wedge_app_info

2025-05-13 Thread André Almeida
To notify userspace about which app (if any) made the device get in a wedge state, make use of drm_wedge_app_info parameter, filling it with the app PID and name. Signed-off-by: André Almeida --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 19 +-- drivers/gpu/drm/amd/amdgpu

[PATCH v3 1/3] drm: Create an app info option for wedge events

2025-05-13 Thread André Almeida
tify what's the app's name in the user event. Acked-by: Rodrigo Vivi (for i915 and xe) Signed-off-by: André Almeida --- v3: Make comm_string and pid_string empty when there's no app info --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_job

[PATCH v2 3/3] drm/amdgpu: Make use of drm_wedge_app_info

2025-05-12 Thread André Almeida
To notify userspace about which app (if any) made the device get in a wedge state, make use of drm_wedge_app_info parameter, filling it with the app PID and name. Signed-off-by: André Almeida --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 19 +-- drivers/gpu/drm/amd/amdgpu

[PATCH v2 0/3] drm: Create an app info option for wedge events

2025-05-12 Thread André Almeida
and debug_disable_gpu_ring_reset to test both wedge event paths in the driver. To trigger a ring timeout, I've used this app: https://gitlab.freedesktop.org/andrealmeid/gpu-timeout Thanks! Changelog: v2: - Rebased on top of drm/drm-next - Added new patch for documentation André Alme

[PATCH v2 2/3] drm/doc: Add a section about "App information" for the wedge API

2025-05-12 Thread André Almeida
Add a section about "App information" for the wedge API. Signed-off-by: André Almeida --- Documentation/gpu/drm-uapi.rst | 15 +++ 1 file changed, 15 insertions(+) diff --git a/Documentation/gpu/drm-uapi.rst b/Documentation/gpu/drm-uapi.rst index 69f72e71a96e..826abe265

[PATCH v2 1/3] drm: Create an app info option for wedge events

2025-05-12 Thread André Almeida
the life easier also notify what's the app's name in the user event. Signed-off-by: André Almeida --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_job.c| 2 +- drivers/gpu/drm/drm_drv.c | 16 +--- drivers/gpu/

Re: [PATCH 1/2] drm: Create an app info option for wedge events

2025-03-13 Thread André Almeida
Em 12/03/2025 07:06, Raag Jadav escreveu: On Tue, Mar 11, 2025 at 07:09:45PM +0200, Raag Jadav wrote: On Mon, Mar 10, 2025 at 06:27:53PM -0300, André Almeida wrote: Em 01/03/2025 02:53, Raag Jadav escreveu: On Fri, Feb 28, 2025 at 06:54:12PM -0300, André Almeida wrote: Hi Raag, On 2/28/25

Re: [PATCH 1/2] drm: Create an app info option for wedge events

2025-03-11 Thread André Almeida
Em 01/03/2025 02:53, Raag Jadav escreveu: On Fri, Feb 28, 2025 at 06:54:12PM -0300, André Almeida wrote: Hi Raag, On 2/28/25 11:20, Raag Jadav wrote: Cc: Lucas On Fri, Feb 28, 2025 at 09:13:52AM -0300, André Almeida wrote: When a device get wedged, it might be caused by a guilty application

Re: [PATCH 2/2] drm/amdgpu: Make use of drm_wedge_app_info

2025-03-11 Thread André Almeida
Em 01/03/2025 03:04, Raag Jadav escreveu: On Fri, Feb 28, 2025 at 06:49:43PM -0300, André Almeida wrote: Hi Raag, On 2/28/25 11:58, Raag Jadav wrote: On Fri, Feb 28, 2025 at 09:13:53AM -0300, André Almeida wrote: To notify userspace about which app (if any) made the device get in a wedge

Re: [PATCH 1/2] drm: Create an app info option for wedge events

2025-03-03 Thread André Almeida
Hi Raag, On 2/28/25 11:20, Raag Jadav wrote: Cc: Lucas On Fri, Feb 28, 2025 at 09:13:52AM -0300, André Almeida wrote: When a device get wedged, it might be caused by a guilty application. For userspace, knowing which app was the cause can be useful for some situations, like for implementing a

Re: [PATCH 2/2] drm/amdgpu: Make use of drm_wedge_app_info

2025-03-03 Thread André Almeida
Hi Raag, On 2/28/25 11:58, Raag Jadav wrote: On Fri, Feb 28, 2025 at 09:13:53AM -0300, André Almeida wrote: To notify userspace about which app (if any) made the device get in a wedge state, make use of drm_wedge_app_info parameter, filling it with the app PID and name. Signed-off-by: André

[PATCH 2/2] drm/amdgpu: Make use of drm_wedge_app_info

2025-02-28 Thread André Almeida
To notify userspace about which app (if any) made the device get in a wedge state, make use of drm_wedge_app_info parameter, filling it with the app PID and name. Signed-off-by: André Almeida --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 19 +-- drivers/gpu/drm/amd/amdgpu

[PATCH 1/2] drm: Create an app info option for wedge events

2025-02-28 Thread André Almeida
the life easier also notify what's the app's name in the user event. Signed-off-by: André Almeida --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_job.c| 2 +- drivers/gpu/drm/drm_drv.c | 16 +--- drivers/gpu/

[PATCH 0/2] drm: Create an app info option for wedge events

2025-02-28 Thread André Almeida
drealm...@igalia.com/ For testing, I've used amdgpu's debug_mask options debug_disable_soft_recovery and debug_disable_gpu_ring_reset to test both wedge event paths in the driver. To trigger a ring timeout, I used this app: https://gitlab.freedesktop.org/andrealmeid/gpu-timeout Thanks! And

[PATCH] drm/amdgpu: Create a debug option to disable ring reset

2025-02-26 Thread André Almeida
combined. This option is useful for testing and debugging purposes when one wants to test the full reset from userspace. Signed-off-by: André Almeida --- drivers/gpu/drm/amd/amdgpu/amdgpu.h | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 6 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 5

[PATCH v3] drm/amdgpu: Trigger a wedged event for ring reset

2025-02-25 Thread André Almeida
Instead of only triggering a wedged event for complete GPU resets, trigger for ring resets. Regardless of the reset, it's useful for userspace to know that it happened because the kernel will reject further submissions from that app. Signed-off-by: André Almeida --- v3: do only for ring r

[PATCH v2 2/3] drm/amdgpu: Log after a successful ring reset

2025-02-21 Thread André Almeida
When a ring reset happens, the kernel log shows only "amdgpu: Starting ring reset", but when it finishes nothing appears in the log. Explicitly write in the log that the reset has finished correctly. Reviewed-by: Christian König Signed-off-by: André Almeida --- drivers/gpu/drm/

[PATCH v2 3/3] drm/amdgpu: Trigger a wedged event for every type of reset

2025-02-21 Thread André Almeida
Instead of only triggering a wedged event for complete GPU resets, trigger for all types, like soft resets and ring resets. Regardless of the reset, it's useful for userspace to know that it happened because the kernel will reject further submissions from that app. Signed-off-by: André Al

[PATCH v2 1/3] drm/amdgpu: Log the creation of a coredump file

2025-02-21 Thread André Almeida
-off-by: André Almeida --- drivers/gpu/drm/amd/amdgpu/amdgpu_dev_coredump.c | 4 1 file changed, 4 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_dev_coredump.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_dev_coredump.c index 824f9da5b6ce..7b50741dc097 100644 --- a/drivers/gpu/drm/amd

[PATCH v2 0/3] drm/amdgpu: Small reset improvements

2025-02-21 Thread André Almeida
This series does some small improvements to GPU reset information collection. v2: Keep the wedge event in amdgpu_device_gpu_recover() and add and extra check to avoid triggering two events. André Almeida (3): drm/amdgpu: Log the creation of a coredump file drm/amdgpu: Log after a

[PATCH 3/3] drm/amdgpu: Trigger a wedged event for every type of reset

2025-02-20 Thread André Almeida
Instead of only triggering a wedged event for complete GPU resets, trigger for all types, like soft resets and ring resets. Regardless of the reset, it's useful for userspace to know that it happened because the kernel will reject further submissions from that app. Signed-off-by: André Al

[PATCH 1/3] drm/amdgpu: Log the creation of a coredump file

2025-02-20 Thread André Almeida
After a GPU reset happens, the driver creates a coredump file. However, the user might not be aware of it. Log the file creation the user can find more information about the device and add the file to bug reports. This is similar to what the xe driver does. Signed-off-by: André Almeida

[PATCH 2/3] drm/amdgpu: Log after a successful ring reset

2025-02-20 Thread André Almeida
When a ring reset happens, the kernel log shows only "amdgpu: Starting ring reset", but when it finishes nothing appears in the log. Explicitly write in the log that the reset has finished correctly. Signed-off-by: André Almeida --- drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 1 + 1 fi

[PATCH 0/3] drm/amdgpu: Small reset improvements

2025-02-20 Thread André Almeida
This series does some small improvements to GPU reset information collection. André Almeida (3): drm/amdgpu: Log the creation of a coredump file drm/amdgpu: Log after a successful ring reset drm/amdgpu: Trigger a wedged event for every type of reset .../gpu/drm/amd/amdgpu

Re: Rework and fix queue reset for gfx7-gfx10

2025-02-05 Thread André Almeida
Hi Christian, Em 04/02/2025 11:31, Christian König escreveu: Hi guys, I finally found time to work on queue reset a bit more and also gave it some more testing. How are you testing this series?

Re: [PATCH v12 2/2] drm/amdgpu: Enable async flip on overlay planes

2025-01-30 Thread André Almeida
Em 29/01/2025 11:36, Xaver Hugl escreveu: Am Mo., 27. Jan. 2025 um 21:00 Uhr schrieb André Almeida : amdgpu can handle async flips on overlay planes, so allow it for atomic async checks. Signed-off-by: André Almeida --- drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_plane.c | 10

[PATCH v12 2/2] drm/amdgpu: Enable async flip on overlay planes

2025-01-28 Thread André Almeida
amdgpu can handle async flips on overlay planes, so allow it for atomic async checks. Signed-off-by: André Almeida --- drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_plane.c | 10 ++ 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm

[PATCH v12 0/2] drm/atomic: Ease async flip restrictions

2025-01-28 Thread André Almeida
0240806135300.114469-1-andrealm...@igalia.com/ - Complete rewrite --- André Almeida (2): drm/atomic: Let drivers decide which planes to async flip drm/amdgpu: Enable async flip on overlay planes .../drm/amd/display/amdgpu_dm/amdgpu_dm_plane.c| 10 +++--- drivers/gpu/drm

[PATCH v12 1/2] drm/atomic: Let drivers decide which planes to async flip

2025-01-28 Thread André Almeida
update from a full page flip. In order to prevent regressions and such, we keep the current policy: we skip the driver check for the primary plane, because it is always allowed to do async flips on it. Signed-off-by: André Almeida Reviewed-by: Dmitry Baryshkov Reviewed-by: Christopher Snowhill

Re: [PATCH v11 2/2] drm/amdgpu: Enable async flip on overlay planes

2025-01-16 Thread André Almeida
Hey Harry, Gentle ping on this one :) Em 12/12/2024 16:19, André Almeida escreveu: amdgpu can handle async flips on overlay planes, so allow it for atomic async checks. Signed-off-by: André Almeida --- drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_plane.c | 11 +++ 1 file

Re: [PATCH v10 2/4] drm/doc: Document device wedged event

2024-12-17 Thread André Almeida
Em 17/12/2024 05:42, Raag Jadav escreveu: On Thu, Dec 12, 2024 at 03:50:29PM -0300, André Almeida wrote: Em 28/11/2024 12:37, Raag Jadav escreveu: Add documentation for device wedged event in a new 'Device wedging' chapter. The describes basic definitions, prerequisites an

[PATCH v3 0/1] drm/amdgpu: Use device wedged event

2024-12-17 Thread André Almeida
it depend on `r` value. André Almeida (1): drm/amdgpu: Use device wedged event drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 1 file changed, 4 insertions(+) -- 2.47.1

[PATCH v3 1/1] drm/amdgpu: Use device wedged event

2024-12-17 Thread André Almeida
Use DRM's device wedged event to notify userspace that a reset had happened. For now, only use `none` method meant for telemetry capture. In the future we might want to report a recovery method if the reset didn't succeed. Acked-by: Shashank Sharma Signed-off-by: André Almeida ---

Re: [PATCH v2 1/1] drm/amdgpu: Use device wedged event

2024-12-17 Thread André Almeida
Em 16/12/2024 12:27, Christian König escreveu: Am 16.12.24 um 16:02 schrieb André Almeida: Use DRM's device wedged event to notify userspace that a reset had happened. For now, only use `none` method meant for telemetry capture. In the future we might want to report a recovery method i

[PATCH v2 1/1] drm/amdgpu: Use device wedged event

2024-12-17 Thread André Almeida
Use DRM's device wedged event to notify userspace that a reset had happened. For now, only use `none` method meant for telemetry capture. In the future we might want to report a recovery method if the reset didn't succeed. Acked-by: Shashank Sharma Signed-off-by: André Almeida --

[PATCH v2 0/1] drm/amdgpu: Use device wedged event

2024-12-17 Thread André Almeida
This patch requires [1] to be applied. Raag, if you are sending a v11 of your work you can carry this patch as well if you think it makes sense. [1] https://lore.kernel.org/dri-devel/20241128153707.1294347-1-raag.ja...@intel.com/ Changelog v2: only report if reset succeeded, make it depend on `r` value

Re: [PATCH 1/1] drm/amdgpu: Use device wedged event

2024-12-16 Thread André Almeida
Em 16/12/2024 07:38, Lazar, Lijo escreveu: On 12/16/2024 3:48 PM, Christian König wrote: Am 13.12.24 um 16:56 schrieb André Almeida: Em 13/12/2024 11:36, Raag Jadav escreveu: On Fri, Dec 13, 2024 at 11:15:31AM -0300, André Almeida wrote: Hi Christian, Em 13/12/2024 04:34, Christian

Re: [PATCH 1/1] drm/amdgpu: Use device wedged event

2024-12-16 Thread André Almeida
Em 16/12/2024 10:10, Christian König escreveu: Am 16.12.24 um 14:04 schrieb André Almeida: Em 16/12/2024 07:38, Lazar, Lijo escreveu: On 12/16/2024 3:48 PM, Christian König wrote: Am 13.12.24 um 16:56 schrieb André Almeida: Em 13/12/2024 11:36, Raag Jadav escreveu: On Fri, Dec 13, 2024

Re: [PATCH 1/1] drm/amdgpu: Use device wedged event

2024-12-16 Thread André Almeida
Em 13/12/2024 11:36, Raag Jadav escreveu: On Fri, Dec 13, 2024 at 11:15:31AM -0300, André Almeida wrote: Hi Christian, Em 13/12/2024 04:34, Christian König escreveu: Am 12.12.24 um 20:09 schrieb André Almeida: Use DRM's device wedged event to notify userspace that a reset had hap

Re: [PATCH 1/1] drm/amdgpu: Use device wedged event

2024-12-16 Thread André Almeida
Hi Christian, Em 13/12/2024 04:34, Christian König escreveu: Am 12.12.24 um 20:09 schrieb André Almeida: Use DRM's device wedged event to notify userspace that a reset had happened. For now, only use `none` method meant for telemetry capture. Signed-off-by: André Almeida ---   driver

[PATCH v11 2/2] drm/amdgpu: Enable async flip on overlay planes

2024-12-13 Thread André Almeida
amdgpu can handle async flips on overlay planes, so allow it for atomic async checks. Signed-off-by: André Almeida --- drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_plane.c | 11 +++ 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm

[PATCH v11 0/2] drm/atomic: Ease async flip restrictions

2024-12-13 Thread André Almeida
from. v9: https://lore.kernel.org/all/20241101-tonyk-async_flip-v9-0-681814efb...@igalia.com/ - Rebased on top of 6.12-rc1 (drm/drm-next) v8: https://lore.kernel.org/lkml/20240806135300.114469-1-andrealm...@igalia.com/ - Complete rewrite --- André Almeida (2): drm/atomic: Let drivers decide w

Re: [PATCH v10 1/4] drm: Introduce device wedged event

2024-12-13 Thread André Almeida
uot;device wedged, needs recovery\n"); and maybe a note like this: else drm_info(dev, "device reseted, but managed to recover\n"); Either way, this patch is: Reviewed-by: André Almeida

[PATCH 1/1] drm/amdgpu: Use device wedged event

2024-12-13 Thread André Almeida
Use DRM's device wedged event to notify userspace that a reset had happened. For now, only use `none` method meant for telemetry capture. Signed-off-by: André Almeida --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/gpu/drm/amd/a

Re: [PATCH v10 2/4] drm/doc: Document device wedged event

2024-12-13 Thread André Almeida
his information can be +collected and added to user bug reports. + With those changes applied: Reviewed-by: André Almeida

[PATCH 0/1] drm/amdgpu: Use device wedged event

2024-12-13 Thread André Almeida
This patch requires [1] to be applied. Raag, if you are sending a v11 of your work you can carry this patch as well if you think it makes sense. [1] https://lore.kernel.org/dri-devel/20241128153707.1294347-1-raag.ja...@intel.com/ André Almeida (1): drm/amdgpu: Use device wedged event drivers/g

[PATCH v11 1/2] drm/atomic: Let drivers decide which planes to async flip

2024-12-13 Thread André Almeida
update from a full page flip. In order to prevent regressions and such, we keep the current policy: we skip the driver check for the primary plane, because it is always allowed to do async flips on it. Signed-off-by: André Almeida Reviewed-by: Dmitry Baryshkov Reviewed-by: Christopher Snowhill

Re: [PATCH v10 0/2] drm/atomic: Ease async flip restrictions

2024-12-12 Thread André Almeida
Hi Dmitry, Em 11/12/2024 16:35, Dmitry Baryshkov escreveu: On Wed, Dec 11, 2024 at 12:25:07AM -0300, André Almeida wrote: Hi, The goal of this work is to find a nice way to allow amdgpu to perform async page flips in the overlay plane as well, not only on the primary one. Currently, when

[PATCH v10 2/2] drm/amdgpu: Enable async flip on overlay planes

2024-12-11 Thread André Almeida
amdgpu can handle async flips on overlay planes, so allow it for atomic async checks. Signed-off-by: André Almeida --- Changes from v8: - Use new parameter 'flip' --- drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_plane.c | 11 +++ 1 file changed, 7 insertions(+), 4 deletion

[PATCH v10 0/2] drm/atomic: Ease async flip restrictions

2024-12-11 Thread André Almeida
00.114469-1-andrealm...@igalia.com/ - Rebased on top of 6.12-rc1 (drm/drm-next) v7: https://lore.kernel.org/dri-devel/20240618030024.500532-1-andrealm...@igalia.com/ - Complete rewrite --- André Almeida (2): drm/atomic: Let drivers decide which planes to async flip drm/amdgpu: Enable async fl

[PATCH v10 1/2] drm/atomic: Let drivers decide which planes to async flip

2024-12-11 Thread André Almeida
update from a full page flip. In order to prevent regressions and such, we keep the current policy: we skip the driver check for the primary plane, because it is always allowed to do async flips on it. Signed-off-by: André Almeida --- Changes from v9: - Add a 'flip' flag to indicate

Re: [PATCH v10 1/4] drm: Introduce device wedged event

2024-12-02 Thread André Almeida
Hi Raag, Em 28/11/2024 12:37, Raag Jadav escreveu: Introduce device wedged event, which notifies userspace of 'wedged' (hanged/unusable) state of the DRM device through a uevent. This is useful especially in cases where the device is no longer operating as expected and has become unrecoverable f

Re: [PATCH RESEND v9 2/2] drm/amdgpu: Enable async flip on overlay planes

2024-11-13 Thread André Almeida
Hi Harry, thanks for the reply! Em 11/11/2024 18:10, Harry Wentland escreveu: On 2024-11-01 14:23, André Almeida wrote: amdgpu can handle async flips on overlay planes, so allow it for atomic async checks. Signed-off-by: André Almeida --- drivers/gpu/drm/amd/display/amdgpu_dm

Re: [PATCH RESEND v9 1/2] drm/atomic: Let drivers decide which planes to async flip

2024-11-04 Thread André Almeida
Hi Christopher, Em 03/11/2024 03:36, Christopher Snowhill escreveu: On Fri Nov 1, 2024 at 11:23 AM PDT, André Almeida wrote: Currently, DRM atomic uAPI allows only primary planes to be flipped asynchronously. However, each driver might be able to perform async flips in other different plane

  1   2   3   4   >