Re: [PATCH v7 1/5] drm: Introduce device wedged event

2024-10-19 Thread Raag Jadav
On Fri, Oct 18, 2024 at 12:58:09PM +0200, Christian König wrote: > Am 17.10.24 um 18:43 schrieb Rodrigo Vivi: > > On Thu, Oct 17, 2024 at 09:59:10AM +0200, Christian König wrote: > > > > > Purpose of this implementation is to provide drivers a generic way to > > > > > recover with the help of users

Re: [PATCH v7 1/5] drm: Introduce device wedged event

2024-10-19 Thread Raag Jadav
On Fri, Oct 18, 2024 at 02:54:38PM +0200, Christian König wrote: > Am 18.10.24 um 14:46 schrieb Raag Jadav: > > > As far as I can see this makes the enum how to recover the device > > > superfluous because you will most likely always need a bus reset to get > > > out

Re: [PATCH v7 1/5] drm: Introduce device wedged event

2024-10-17 Thread Raag Jadav
On Mon, Sep 30, 2024 at 01:08:41PM +0530, Raag Jadav wrote: > Introduce device wedged event, which will notify userspace of wedged > (hanged/unusable) state of the DRM device through a uevent. This is > useful especially in cases where the device is no longer operating as > expected

[PATCH v8 1/4] drm: Introduce device wedged event

2024-10-25 Thread Raag Jadav
Handle invalid method cases v8: Allow sending multiple methods with uevent (Lucas, Michal) static_assert() globally (Andy) Signed-off-by: Raag Jadav --- drivers/gpu/drm/drm_drv.c | 51 +++ include/drm/drm_device.h | 7 ++ include/drm/drm_drv.h

[PATCH v8 3/4] drm/xe: Use device wedged event

2024-10-25 Thread Raag Jadav
v2: Change authorship to Himal (Aravind) Add uevent for all device wedged cases (Aravind) v3: Generic re-implementation in DRM subsystem (Lucas) v4: Change authorship to Raag (Aravind) Signed-off-by: Raag Jadav --- drivers/gpu/drm/xe/xe_device.c | 9 +++-- 1 file changed, 7 insert

[PATCH v8 4/4] drm/i915: Use device wedged event

2024-10-25 Thread Raag Jadav
Now that we have device wedged event provided by DRM core, make use of it and support both driver rebind and bus-reset based recovery. With this in place, userspace will be notified of wedged device on gt reset failure. Signed-off-by: Raag Jadav --- drivers/gpu/drm/i915/gt/intel_reset.c | 3

[PATCH v8 0/4] Introduce DRM device wedged event

2024-10-25 Thread Raag Jadav
functions (Andy, Jani) Aesthetic adjustments (Andy) Handle invalid method cases Add documentation to drm-uapi.rst (Sima) v8: Drop sysfs and allow sending multiple methods with uevent (Lucas, Michal) Improve documentation (Christian, Rodrigo) static_assert() globally (Andy) Raag Jadav

[PATCH v8 2/4] drm/doc: Document device wedged event

2024-10-25 Thread Raag Jadav
Add documentation for device wedged event in a new 'Device wedging' chapter. The describes basic definitions and consumer expectations along with an example. v8: Improve documentation (Christian, Rodrigo) Signed-off-by: Raag Jadav --- Documentation/gpu/drm-uap

Re: [PATCH v8 1/4] drm: Introduce device wedged event

2024-10-26 Thread Raag Jadav
On Fri, Oct 25, 2024 at 05:45:59PM +0300, Andy Shevchenko wrote: > On Fri, Oct 25, 2024 at 12:08:50PM +0300, Jani Nikula wrote: > > On Fri, 25 Oct 2024, Raag Jadav wrote: > > ... > > > > +/* > > > + * Available recovery methods for wedged device.

Re: [PATCH v8 2/4] drm/doc: Document device wedged event

2024-11-01 Thread Raag Jadav
On Tue, Oct 29, 2024 at 10:51:34AM +0100, Christian König wrote: > Am 25.10.24 um 10:48 schrieb Raag Jadav: > > Add documentation for device wedged event in a new 'Device wedging' > > chapter. The describes basic definitions and consumer expectations > > along with a

[PATCH v10 3/4] drm/xe: Use device wedged event

2024-11-29 Thread Raag Jadav
v2: Change authorship to Himal (Aravind) Add uevent for all device wedged cases (Aravind) v3: Generic re-implementation in DRM subsystem (Lucas) v4: Change authorship to Raag (Aravind) Signed-off-by: Raag Jadav Reviewed-by: Aravind Iddamsetty --- drivers/gpu/drm/xe/xe_device.c | 9 +++

[PATCH v10 2/4] drm/doc: Document device wedged event

2024-11-29 Thread Raag Jadav
leanup and consumer prerequisites (Christian, Aravind) Signed-off-by: Raag Jadav Reviewed-by: Christian König --- Documentation/gpu/drm-uapi.rst | 112 - 1 file changed, 109 insertions(+), 3 deletions(-) diff --git a/Documentation/gpu/drm-uapi.rst b/Documentati

[PATCH v10 4/4] drm/i915: Use device wedged event

2024-11-29 Thread Raag Jadav
Now that we have device wedged event provided by DRM core, make use of it and support both driver rebind and bus-reset based recovery. With this in place, userspace will be notified of wedged device on gt reset failure. Signed-off-by: Raag Jadav Reviewed-by: Aravind Iddamsetty --- drivers/gpu

[PATCH v10 0/4] Introduce DRM device wedged event

2024-11-29 Thread Raag Jadav
(Andy) v9: Document prerequisites section (Christian) Provide 'none' method for reset cases (Christian) Provide recovery opts using switch cases v10: Clarify mmap cleanup and consumer prerequisites (Christian, Aravind) Raag Jadav (4): drm: Introduce device wedged event

[PATCH v10 1/4] drm: Introduce device wedged event

2024-11-29 Thread Raag Jadav
globally (Andy) v9: Provide 'none' method for reset cases (Christian) Provide recovery opts using switch cases Signed-off-by: Raag Jadav --- drivers/gpu/drm/drm_drv.c | 66 +++ include/drm/drm_device.h | 8 + include/drm/drm_drv.h | 1 +

Re: [PATCH v9 1/4] drm: Introduce device wedged event

2024-11-22 Thread Raag Jadav
On Mon, Nov 18, 2024 at 08:26:37PM +0530, Aravind Iddamsetty wrote: > On 15/11/24 10:37, Raag Jadav wrote: > > Introduce device wedged event, which notifies userspace of 'wedged' > > (hanged/unusable) state of the DRM device through a uevent. This is > > useful espec

Re: [PATCH v9 3/4] drm/xe: Use device wedged event

2024-11-20 Thread Raag Jadav
On Tue, Nov 19, 2024 at 10:25:10AM +0530, Ghimiray, Himal Prasad wrote: > On 15-11-2024 10:37, Raag Jadav wrote: > > This was previously attempted as xe specific reset uevent but dropped > > in commit 77a0d4d1cea2 ("drm/xe/uapi: Remove reset uevent for now") &

[PATCH v9 2/4] drm/doc: Document device wedged event

2024-11-15 Thread Raag Jadav
Add documentation for device wedged event in a new 'Device wedging' chapter. The describes basic definitions and consumer expectations along with an example. v8: Improve documentation (Christian, Rodrigo) v9: Add prerequisites section (Christian) Signed-off-by: Raag Jadav --- Documen

[PATCH v9 1/4] drm: Introduce device wedged event

2024-11-15 Thread Raag Jadav
Aesthetic adjustments (Andy) Handle invalid method cases v8: Allow sending multiple methods with uevent (Lucas, Michal) static_assert() globally (Andy) v9: Provide 'none' method for reset cases (Christian) Provide recovery opts using switch cases Signed-off-by: Raag Jadav

[PATCH v9 0/4] Introduce DRM device wedged event

2024-11-15 Thread Raag Jadav
: Document prerequisites section (Christian) Provide 'none' method for reset cases (Christian) Provide recovery opts using switch cases Raag Jadav (4): drm: Introduce device wedged event drm/doc: Document device wedged event drm/xe: Use device wedged event drm/i915: Use device we

[PATCH v9 3/4] drm/xe: Use device wedged event

2024-11-15 Thread Raag Jadav
v2: Change authorship to Himal (Aravind) Add uevent for all device wedged cases (Aravind) v3: Generic re-implementation in DRM subsystem (Lucas) v4: Change authorship to Raag (Aravind) Signed-off-by: Raag Jadav --- drivers/gpu/drm/xe/xe_device.c | 9 +++-- 1 file changed, 7 insert

[PATCH v9 4/4] drm/i915: Use device wedged event

2024-11-15 Thread Raag Jadav
Now that we have device wedged event provided by DRM core, make use of it and support both driver rebind and bus-reset based recovery. With this in place, userspace will be notified of wedged device on gt reset failure. Signed-off-by: Raag Jadav --- drivers/gpu/drm/i915/gt/intel_reset.c | 3

Re: [PATCH v10 1/4] drm: Introduce device wedged event

2024-12-02 Thread Raag Jadav
On Fri, Nov 29, 2024 at 10:40:14AM -0300, André Almeida wrote: > Hi Raag, > > Em 28/11/2024 12:37, Raag Jadav escreveu: > > Introduce device wedged event, which notifies userspace of 'wedged' > > (hanged/unusable) state of the DRM device through a uevent. This is

Re: [PATCH v9 1/4] drm: Introduce device wedged event

2024-11-25 Thread Raag Jadav
On Fri, Nov 22, 2024 at 11:09:32AM +0100, Christian König wrote: > Am 22.11.24 um 08:07 schrieb Raag Jadav: > > On Mon, Nov 18, 2024 at 08:26:37PM +0530, Aravind Iddamsetty wrote: > > > On 15/11/24 10:37, Raag Jadav wrote: > > > > Introduce device wedged event, which

Re: [PATCH v9 1/4] drm: Introduce device wedged event

2024-11-26 Thread Raag Jadav
On Mon, Nov 25, 2024 at 10:32:42AM +0100, Christian König wrote: > Am 22.11.24 um 17:02 schrieb Raag Jadav: > > On Fri, Nov 22, 2024 at 11:09:32AM +0100, Christian König wrote: > > > Am 22.11.24 um 08:07 schrieb Raag Jadav: > > > > On Mon, Nov 18, 2024 at 08:26:37PM +

Re: [PATCH v10 1/4] drm: Introduce device wedged event

2024-12-03 Thread Raag Jadav
On Mon, Dec 02, 2024 at 10:07:59AM +0200, Raag Jadav wrote: > On Fri, Nov 29, 2024 at 10:40:14AM -0300, André Almeida wrote: > > Hi Raag, > > > > Em 28/11/2024 12:37, Raag Jadav escreveu: > > > Introduce device wedged event, which notifies userspace of 'wedge

Re: [PATCH v10 1/4] drm: Introduce device wedged event

2024-12-04 Thread Raag Jadav
+ misc maintainers On Tue, Dec 03, 2024 at 11:18:00AM +0100, Christian König wrote: > Am 03.12.24 um 06:00 schrieb Raag Jadav: > > On Mon, Dec 02, 2024 at 10:07:59AM +0200, Raag Jadav wrote: > > > On Fri, Nov 29, 2024 at 10:40:14AM -0300, André Almeida wrote: > > > >

[PATCH v12 3/5] drm/xe: Use device wedged event

2025-02-04 Thread Raag Jadav
v2: Change authorship to Himal (Aravind) Add uevent for all device wedged cases (Aravind) v3: Generic implementation in DRM subsystem (Lucas) v4: Change authorship to Raag (Aravind) Signed-off-by: Raag Jadav Reviewed-by: Aravind Iddamsetty --- drivers/gpu/drm/xe/xe_device.c | 7 ++- 1 fi

[PATCH v12 1/5] drm: Introduce device wedged event

2025-02-04 Thread Raag Jadav
djustments (Andy) Handle invalid recovery method v8: Allow sending multiple methods with uevent (Lucas, Michal) static_assert() globally (Andy) v9: Provide 'none' method for device reset (Christian) Provide recovery opts using switch c

[PATCH v12 2/5] drm/doc: Document device wedged event

2025-02-04 Thread Raag Jadav
leanup and consumer prerequisites (Christian, Aravind) v11: Reference wedged event in device reset chapter (André) v12: Refine consumer expectations and terminologies (Xaver, Pekka) Signed-off-by: Raag Jadav Reviewed-by: Christian König Reviewed-by: André Almeida --- Documentation/gpu/drm-uapi

[PATCH v12 5/5] drm/amdgpu: Use device wedged event

2025-02-04 Thread Raag Jadav
From: André Almeida Use DRM's device wedged event to notify userspace that a reset had happened. For now, only use `none` method meant for telemetry capture. In the future we might want to report a recovery method if the reset didn't succeed. Acked-by: Shashank Sharma Signed-off-by: André Alme

[PATCH v12 0/5] Introduce DRM device wedged event

2025-02-04 Thread Raag Jadav
et chapter (André) Wedged event support for amdgpu (André) v12: Refine consumer expectations and terminologies (Xaver, Pekka) André Almeida (1): drm/amdgpu: Use device wedged event Raag Jadav (4): drm: Introduce device wedged event drm/doc: Document device wedged event drm/xe: Use dev

[PATCH v12 4/5] drm/i915: Use device wedged event

2025-02-04 Thread Raag Jadav
Now that we have device wedged event provided by DRM core, make use of it and support both driver rebind and bus-reset based recovery. With this in place, userspace will be notified of wedged device on gt reset failure. Signed-off-by: Raag Jadav Reviewed-by: Aravind Iddamsetty --- drivers/gpu

Re: [PATCH v10 1/4] drm: Introduce device wedged event

2024-12-12 Thread Raag Jadav
On Wed, Dec 11, 2024 at 06:14:12PM +0100, Maxime Ripard wrote: > On Wed, Dec 04, 2024 at 01:17:17PM +0200, Raag Jadav wrote: > > + misc maintainers > > > > On Tue, Dec 03, 2024 at 11:18:00AM +0100, Christian König wrote: > > > Am 03.12.24 um 06:00 schrieb Raag Jadav

Re: [PATCH 1/1] drm/amdgpu: Use device wedged event

2024-12-16 Thread Raag Jadav
2/16/2024 3:48 PM, Christian König wrote: > > > > > Am 13.12.24 um 16:56 schrieb André Almeida: > > > > > > Em 13/12/2024 11:36, Raag Jadav escreveu: > > > > > > > On Fri, Dec 13, 2024 at 11:15:31AM -0300, André Almeida wrote: > > > > &g

Re: [PATCH 1/1] drm/amdgpu: Use device wedged event

2024-12-16 Thread Raag Jadav
On Fri, Dec 13, 2024 at 11:15:31AM -0300, André Almeida wrote: > Hi Christian, > > Em 13/12/2024 04:34, Christian König escreveu: > > Am 12.12.24 um 20:09 schrieb André Almeida: > > > Use DRM's device wedged event to notify userspace that a reset had > > > happened. For now, only use `none` method

Re: [PATCH v10 1/4] drm: Introduce device wedged event

2024-12-16 Thread Raag Jadav
On Thu, Dec 12, 2024 at 03:31:01PM -0300, André Almeida wrote: > Hi Raag, > > Thank you for your patch. > > Em 28/11/2024 12:37, Raag Jadav escreveu: > > [...] > > > +int drm_dev_wedged_event(struct drm_device *dev, unsigned long method) > > +{

Re: [PATCH v10 2/4] drm/doc: Document device wedged event

2024-12-17 Thread Raag Jadav
On Thu, Dec 12, 2024 at 03:50:29PM -0300, André Almeida wrote: > Em 28/11/2024 12:37, Raag Jadav escreveu: > > Add documentation for device wedged event in a new 'Device wedging' > > chapter. The describes basic definitions, prerequisites and consumer > > exp

Re: [PATCH v10 0/4] Introduce DRM device wedged event

2025-01-23 Thread Raag Jadav
On Tue, Jan 21, 2025 at 01:59:47AM +0100, Xaver Hugl wrote: > Hi, > > I experimented with using this in KWin, and > https://invent.kde.org/plasma/kwin/-/merge_requests/7027/diffs?commit_id=6da40f1b9e2bc94615a436de4778880cee16f940 > makes it fall back to a software renderer when a rebind is require

Re: [PATCH v10 2/4] drm/doc: Document device wedged event

2025-01-23 Thread Raag Jadav
On Tue, Jan 21, 2025 at 02:14:56AM +0100, Xaver Hugl wrote: > > +It is the responsibility of the consumer to make sure that the device or > > +its resources are not in use by any process before attempting recovery. > I'm not convinced this is actually doable in practice, outside of > killing all ap

[PATCH v11 1/5] drm: Introduce device wedged event

2025-01-24 Thread Raag Jadav
tatic_assert() globally (Andy) v9: Provide 'none' method for device reset (Christian) Provide recovery opts using switch cases v11: Log device reset (André) Signed-off-by: Raag Jadav Reviewed-by: André Almeida --- drivers/gpu/drm/drm_drv.c | 68

[PATCH v11 2/5] drm/doc: Document device wedged event

2025-01-24 Thread Raag Jadav
leanup and consumer prerequisites (Christian, Aravind) v11: Reference wedged event in device reset section (André) Signed-off-by: Raag Jadav Reviewed-by: Christian König Reviewed-by: André Almeida --- Documentation/gpu/drm-uapi.rst | 112 - 1 file changed, 109 insert

[PATCH v11 5/5] drm/amdgpu: Use device wedged event

2025-01-24 Thread Raag Jadav
From: André Almeida Use DRM's device wedged event to notify userspace that a reset had happened. For now, only use `none` method meant for telemetry capture. In the future we might want to report a recovery method if the reset didn't succeed. Acked-by: Shashank Sharma Signed-off-by: André Alme

[PATCH v11 3/5] drm/xe: Use device wedged event

2025-01-24 Thread Raag Jadav
v2: Change authorship to Himal (Aravind) Add uevent for all device wedged cases (Aravind) v3: Generic implementation in DRM subsystem (Lucas) v4: Change authorship to Raag (Aravind) Signed-off-by: Raag Jadav Reviewed-by: Aravind Iddamsetty --- drivers/gpu/drm/xe/xe_device.c | 7 ++- 1 fi

[PATCH v11 4/5] drm/i915: Use device wedged event

2025-01-24 Thread Raag Jadav
Now that we have device wedged event provided by DRM core, make use of it and support both driver rebind and bus-reset based recovery. With this in place, userspace will be notified of wedged device on gt reset failure. Signed-off-by: Raag Jadav Reviewed-by: Aravind Iddamsetty --- drivers/gpu

[PATCH v11 0/5] Introduce DRM device wedged event

2025-01-24 Thread Raag Jadav
et section (André) Wedged event support for amdgpu (André) André Almeida (1): drm/amdgpu: Use device wedged event Raag Jadav (4): drm: Introduce device wedged event drm/doc: Document device wedged event drm/xe: Use device wedged event drm/i915: Use device wedged event Documentati

Re: [PATCH v10 2/4] drm/doc: Document device wedged event

2025-01-28 Thread Raag Jadav
On Mon, Jan 27, 2025 at 12:23:28PM +0200, Pekka Paalanen wrote: > On Wed, 22 Jan 2025 07:22:25 +0200 > Raag Jadav wrote: > > > On Tue, Jan 21, 2025 at 02:14:56AM +0100, Xaver Hugl wrote: > > > > +It is the responsibility of the consumer to make sure that the devi

Re: [PATCH v10 2/4] drm/doc: Document device wedged event

2025-01-29 Thread Raag Jadav
On Tue, Jan 28, 2025 at 01:38:09PM +0200, Pekka Paalanen wrote: > On Tue, 28 Jan 2025 11:37:53 +0200 > Raag Jadav wrote: > > > On Mon, Jan 27, 2025 at 12:23:28PM +0200, Pekka Paalanen wrote: > > > On Wed, 22 Jan 2025 07:22:25 +0200 > > > Raag Jadav wrote: >

Re: [PATCH v12 0/5] Introduce DRM device wedged event

2025-02-12 Thread Raag Jadav
On Tue, Feb 04, 2025 at 12:35:23PM +0530, Raag Jadav wrote: > This series introduces device wedged event in DRM subsystem and uses it > in xe, i915 and amdgpu drivers. Detailed description in commit message. > > This was earlier attempted as xe specific uevent in v1 and v2 on [1]. &g

Re: [PATCH 2/2] drm/amdgpu: Make use of drm_wedge_app_info

2025-03-03 Thread Raag Jadav
On Fri, Feb 28, 2025 at 09:13:53AM -0300, André Almeida wrote: > To notify userspace about which app (if any) made the device get in a > wedge state, make use of drm_wedge_app_info parameter, filling it with > the app PID and name. > > Signed-off-by: André Almeida > --- > drivers/gpu/drm/amd/amd

Re: [PATCH 1/2] drm: Create an app info option for wedge events

2025-03-03 Thread Raag Jadav
On Fri, Feb 28, 2025 at 06:54:12PM -0300, André Almeida wrote: > Hi Raag, > > On 2/28/25 11:20, Raag Jadav wrote: > > Cc: Lucas > > > > On Fri, Feb 28, 2025 at 09:13:52AM -0300, André Almeida wrote: > > > When a device get wedged, it might be caused by a gui

Re: [PATCH 2/2] drm/amdgpu: Make use of drm_wedge_app_info

2025-03-03 Thread Raag Jadav
On Fri, Feb 28, 2025 at 06:49:43PM -0300, André Almeida wrote: > Hi Raag, > > On 2/28/25 11:58, Raag Jadav wrote: > > On Fri, Feb 28, 2025 at 09:13:53AM -0300, André Almeida wrote: > > > To notify userspace about which app (if any) made the device get in a > &

Re: [PATCH 1/2] drm: Create an app info option for wedge events

2025-03-03 Thread Raag Jadav
Cc: Lucas On Fri, Feb 28, 2025 at 09:13:52AM -0300, André Almeida wrote: > When a device get wedged, it might be caused by a guilty application. > For userspace, knowing which app was the cause can be useful for some > situations, like for implementing a policy, logs or for giving a chance > for t

Re: [PATCH 1/2] drm: Create an app info option for wedge events

2025-03-13 Thread Raag Jadav
On Wed, Mar 12, 2025 at 06:59:33PM -0300, André Almeida wrote: > Em 12/03/2025 07:06, Raag Jadav escreveu: > > On Tue, Mar 11, 2025 at 07:09:45PM +0200, Raag Jadav wrote: > > > On Mon, Mar 10, 2025 at 06:27:53PM -0300, André Almeida wrote: > > > > Em 01/03/2

Re: [PATCH 1/2] drm: Create an app info option for wedge events

2025-03-12 Thread Raag Jadav
On Mon, Mar 10, 2025 at 06:27:53PM -0300, André Almeida wrote: > Em 01/03/2025 02:53, Raag Jadav escreveu: > > On Fri, Feb 28, 2025 at 06:54:12PM -0300, André Almeida wrote: > > > Hi Raag, > > > > > > On 2/28/25 11:20, Raag Jadav wrote: > > > > Cc

Re: [PATCH 2/2] drm/amdgpu: Make use of drm_wedge_app_info

2025-03-12 Thread Raag Jadav
On Wed, Mar 12, 2025 at 09:25:08AM +0100, Christian König wrote: >Am 11.03.25 um 18:13 schrieb Raag Jadav: >> On Mon, Mar 10, 2025 at 06:03:27PM -0400, Alex Deucher wrote: >>> On Mon, Mar 10, 2025 at 5:54 PM André Almeida >>> wrote: >>>> Em 01/03/2025 0

Re: [PATCH 1/2] drm: Create an app info option for wedge events

2025-03-12 Thread Raag Jadav
On Tue, Mar 11, 2025 at 07:09:45PM +0200, Raag Jadav wrote: > On Mon, Mar 10, 2025 at 06:27:53PM -0300, André Almeida wrote: > > Em 01/03/2025 02:53, Raag Jadav escreveu: > > > On Fri, Feb 28, 2025 at 06:54:12PM -0300, André Almeida wrote: > > > > Hi Raag, > >

Re: [PATCH 2/2] drm/amdgpu: Make use of drm_wedge_app_info

2025-03-12 Thread Raag Jadav
On Mon, Mar 10, 2025 at 06:03:27PM -0400, Alex Deucher wrote: > On Mon, Mar 10, 2025 at 5:54 PM André Almeida wrote: > > > > Em 01/03/2025 03:04, Raag Jadav escreveu: > > > On Fri, Feb 28, 2025 at 06:49:43PM -0300, André Almeida wrote: > > >> Hi Raag, > &