Hi Raag
On 7/12/2024 5:53 PM, Raag Jadav wrote:
Add hwmon support for fan1_input attribute, which will expose fan speed
in RPM. With this in place we can monitor fan speed using lm-sensors tool.
$ sensors
i915-pci-0300
Adapter: PCI adapter
in0: 653.00 mV
fan1:3833 RPM
power1:
On 7/23/2024 3:53 PM, Raag Jadav wrote:
On Mon, Jul 22, 2024 at 04:20:51PM +0530, Riana Tauro wrote:
Hi Raag
On 7/12/2024 5:53 PM, Raag Jadav wrote:
Add hwmon support for fan1_input attribute, which will expose fan speed
in RPM. With this in place we can monitor fan speed using lm-sensors
: N/A (max = 43.00 W)
energy1: 32.02 kJ
v2:
- Add mutex protection
- Handle overflow
- Add ABI documentation
- Aesthetic adjustments
Signed-off-by: Raag Jadav
Add the name in front of the comments given by reviewer in version history
With that
Reviewed-by: Riana Tauro
Hi Himal
On 6/3/2025 3:01 PM, Ghimiray, Himal Prasad wrote:
On 03-06-2025 13:44, Riana Tauro wrote:
Add support to handle CSC firmware reported errors. When CSC firmware
errors are encoutered, a error interrupt is received by the GFX device as
a MSI interrupt.
Device Source control
Hi Raag
On 6/4/2025 4:13 PM, Raag Jadav wrote:
On Tue, Jun 03, 2025 at 01:43:57PM +0530, Riana Tauro wrote:
A device is declared wedged when it is non-recoverable from
the driver context. Some firmware errors can also cause
the device to enter this state and the only method to recover
from
Hi Raag
Thank you for the review comments
On 6/6/2025 8:42 PM, Raag Jadav wrote:
On Tue, Jun 03, 2025 at 01:43:58PM +0530, Riana Tauro wrote:
Add a helper function to set recovery method. The recovery
method has to be set before declaring the device wedged and sending the
drm wedged uevent
Hi Christian
On 6/24/2025 5:56 PM, Christian König wrote:
On 23.06.25 12:01, Riana Tauro wrote:
A device is declared wedged when it is non-recoverable from
the driver context.
Well, not quite.
i took this from the below document. Should it be changed?
https://www.kernel.org/doc/html/v6.16
: 50875, 53073, 53074, 53075, 53076
Riana Tauro (4):
drm: Add a firmware flash method to device wedged uevent
drm/xe: Add a helper function to set recovery method
drm/xe: Add support to handle hardware errors
drm/xe/xe_hw_error: Handle CSC Firmware reported Hardware errors
Documentation/gpu
A device is declared wedged when it is non-recoverable from
the driver context. Some firmware errors can also cause
the device to enter this state and the only method to recover
from this would be to do a firmware flash
Signed-off-by: Riana Tauro
---
Documentation/gpu/drm-uapi.rst | 6
warm reset
Add basic support to handle these errors
Bspec: 50875, 53073, 53074, 53075, 53076
Co-developed-by: Himal Prasad Ghimiray
Signed-off-by: Himal Prasad Ghimiray
Signed-off-by: Riana Tauro
---
drivers/gpu/drm/xe/Makefile| 1 +
drivers/gpu/drm/xe/regs/xe_hw_error_regs.h
Add a helper function to set recovery method. The recovery
method has to be set before declaring the device wedged and sending the
drm wedged uevent. If no method is set, default unbind/re-bind method
will be set
Signed-off-by: Riana Tauro
---
drivers/gpu/drm/xe/xe_device.c | 30
and userspace is
notified with a drm uevent
Signed-off-by: Riana Tauro
---
drivers/gpu/drm/xe/regs/xe_gsc_regs.h | 2 +
drivers/gpu/drm/xe/regs/xe_hw_error_regs.h | 7 ++-
drivers/gpu/drm/xe/xe_device_types.h | 3 +
drivers/gpu/drm/xe/xe_hw_error.c | 65
On 7/15/2025 10:28 PM, Summers, Stuart wrote:
On Tue, 2025-07-15 at 22:09 +0530, Riana Tauro wrote:
Hi Stuart
On 7/15/2025 7:40 PM, Summers, Stuart wrote:
On Tue, 2025-07-15 at 16:17 +0530, Riana Tauro wrote:
Add a debugfs fault handler to trigger csc error handler that
wedges the device
ls to commit message (Sima, Rodrigo, Raag)
add an example to the documentation (by Raag)
Cc: André Almeida
Cc: Christian König
Cc: David Airlie
Co-developed-by: Raag Jadav
Signed-off-by: Raag Jadav
Signed-off-by: Riana Tauro
---
Documentation/gpu/drm-uapi.rst | 41 +
able runtime survivability mode when csc errors are reported
Rev4: refactor survivability code
Rev5: Add more documentation
add user friendly logs
remove checks for BMG if not necessary
fix other review comments
Riana Tauro (9):
drm: Add a vendor-specific recovery method to
Userspace should be notified after setting the device as wedged.
Re-order function calls to set gt wedged before sending uevent.
Cc: Matthew Brost
Suggested-by: Raag Jadav
Signed-off-by: Riana Tauro
Reviewed-by: Matthew Brost
---
drivers/gpu/drm/xe/xe_device.c | 12
1 file
The patches in these series refactor the boot survivability code to
allow adding runtime survivability
Refactor existing code to separate both the modes
This patch renames the functions and separates init and enable
Signed-off-by: Riana Tauro
---
drivers/gpu/drm/xe/xe_device.c
Add a helper function to set recovery method. The recovery
method has to be set before declaring the device wedged and sending the
drm wedged uevent. If no method is set, default unbind/re-bind method
will be set
Signed-off-by: Riana Tauro
---
drivers/gpu/drm/xe/xe_device.c | 26
Signed-off-by: Riana Tauro
Reviewed-by: Umesh Nerlige Ramappa
---
drivers/gpu/drm/xe/Makefile| 1 +
drivers/gpu/drm/xe/regs/xe_hw_error_regs.h | 15 +++
drivers/gpu/drm/xe/regs/xe_irq_regs.h | 1 +
drivers/gpu/drm/xe/xe_hw_error.c | 106
vendor recovery method with
runtime survivability (Christian, Rodrigo, Raag)
v3: move declare wedged to runtime survivability mode (Rodrigo)
v4: update commit message
Signed-off-by: Riana Tauro
Reviewed-by: Umesh Nerlige Ramappa
---
drivers/gpu/drm/xe/regs/xe_gsc_regs.h | 2 +
drivers
Add documentation for vendor specific device wedged recovery method
and runtime survivability.
v2: fix documentation (Raag)
v3: add userspace tool for firmware update (Raag)
Signed-off-by: Riana Tauro
---
Documentation/gpu/xe/index.rst | 1 +
Documentation/gpu/xe/xe_device.rst
Add a debugfs fault handler to trigger csc error handler that
wedges the device and sends drm uevent
v2: add debugfs only for bmg (Umesh)
Signed-off-by: Riana Tauro
---
drivers/gpu/drm/xe/xe_debugfs.c | 3 +++
drivers/gpu/drm/xe/xe_hw_error.c | 11 +++
2 files changed, 14 insertions
that device is in survivability mode
/sys/bus/pci/devices//survivability_mode
v2: Fix kernel-doc (Umesh)
v3: Add user friendly dmesg (Frank)
Signed-off-by: Riana Tauro
---
drivers/gpu/drm/xe/xe_survivability_mode.c| 43 ++-
drivers/gpu/drm/xe/xe_survivability_mode.h| 1
Hi Sima
On 7/9/2025 7:11 PM, Simona Vetter wrote:
On Wed, Jul 09, 2025 at 04:50:13PM +0530, Riana Tauro wrote:
Certain errors can cause the device to be wedged and may
require a vendor specific recovery method to restore normal
operation.
Add a recovery method 'WEDGED=vendor-specific
umentation (Raag)
Cc: André Almeida
Cc: Christian König
Cc: David Airlie
Cc:
Suggested-by: Raag Jadav
Signed-off-by: Riana Tauro
---
Documentation/gpu/drm-uapi.rst | 9 +
drivers/gpu/drm/drm_drv.c | 2 ++
include/drm/drm_device.h | 4
3 files changed, 11 insertions(+), 4
at 12:52:05PM -0400, Rodrigo Vivi wrote:
On Wed, Jul 09, 2025 at 05:18:54PM +0300, Raag Jadav wrote:
On Wed, Jul 09, 2025 at 04:09:20PM +0200, Christian König wrote:
On 09.07.25 15:41, Simona Vetter wrote:
On Wed, Jul 09, 2025 at 04:50:13PM +0530, Riana Tauro wrote:
Certain errors can cause the
, 2025 at 04:09:20PM +0200, Christian König wrote:
On 09.07.25 15:41, Simona Vetter wrote:
On Wed, Jul 09, 2025 at 04:50:13PM +0530, Riana Tauro wrote:
Certain errors can cause the device to be wedged and may
require a vendor specific recovery method to restore normal
operation.
Add a recovery method
On 7/3/2025 12:10 PM, Raag Jadav wrote:
On Thu, Jul 03, 2025 at 10:50:53AM +0530, Riana Tauro wrote:
On 7/3/2025 9:36 AM, Raag Jadav wrote:
On Wed, Jul 02, 2025 at 07:41:11PM +0530, Riana Tauro wrote:
Certain errors can cause the device to be wedged and may
require a vendor specific
On 7/3/2025 9:36 AM, Raag Jadav wrote:
On Wed, Jul 02, 2025 at 07:41:11PM +0530, Riana Tauro wrote:
Certain errors can cause the device to be wedged and may
require a vendor specific recovery method to restore normal
operation.
Add a recovery method 'WEDGED=vendor-specific' for s
lmeida
Cc: Christian König
Cc: David Airlie
Cc:
Suggested-by: Raag Jadav
Signed-off-by: Riana Tauro
---
Documentation/gpu/drm-uapi.rst | 5 -
drivers/gpu/drm/drm_drv.c | 2 ++
include/drm/drm_device.h | 4
3 files changed, 10 insertions(+), 1 deletion(-)
diff --git a/Documen
On 7/1/2025 9:32 PM, Raag Jadav wrote:
On Tue, Jul 01, 2025 at 04:35:42PM +0200, Christian König wrote:
On 01.07.25 16:23, Raag Jadav wrote:
On Tue, Jul 01, 2025 at 05:11:24PM +0530, Riana Tauro wrote:
On 7/1/2025 5:07 PM, Riana Tauro wrote:
On 6/30/2025 11:03 PM, Rodrigo Vivi wrote:
On
Hi Stuart
On 7/15/2025 7:40 PM, Summers, Stuart wrote:
On Tue, 2025-07-15 at 16:17 +0530, Riana Tauro wrote:
Add a debugfs fault handler to trigger csc error handler that
wedges the device and sends drm uevent
v2: add debugfs only for bmg (Umesh)
Signed-off-by: Riana Tauro
---
drivers/gpu
Hi Stuart
On 7/15/2025 7:38 PM, Summers, Stuart wrote:
On Tue, 2025-07-15 at 16:17 +0530, Riana Tauro wrote:
Gfx device reports two classes of errors: uncorrectable and
correctable. Depending on the severity uncorrectable errors are
further
classified Non-Fatal and Fatal
Correctable and Non
Hi Rodrigo/Christian
On 6/30/2025 11:03 PM, Rodrigo Vivi wrote:
On Mon, Jun 30, 2025 at 10:29:10AM +0200, Christian König wrote:
On 27.06.25 23:38, Rodrigo Vivi wrote:
Or at least print a big warning into the system log?
I mean a firmware update is usually something which the system administr
On 7/1/2025 5:07 PM, Riana Tauro wrote:
Hi Rodrigo/Christian
On 6/30/2025 11:03 PM, Rodrigo Vivi wrote:
On Mon, Jun 30, 2025 at 10:29:10AM +0200, Christian König wrote:
On 27.06.25 23:38, Rodrigo Vivi wrote:
Or at least print a big warning into the system log?
I mean a firmware update is
On 7/23/2025 7:30 PM, Raag Jadav wrote:
On Tue, Jul 15, 2025 at 04:17:24PM +0530, Riana Tauro wrote:
The patches in these series refactor the boot survivability code to
allow adding runtime survivability
Refactor existing code to separate both the modes
Punctuations please!
This patch
On 7/23/2025 7:38 PM, Raag Jadav wrote:
On Tue, Jul 15, 2025 at 04:17:25PM +0530, Riana Tauro wrote:
Certain runtime firmware errors can cause the device to be in a unusable
state requiring a firmware flash to restore normal operation.
Runtime Survivability Mode indicates firmware flash is
On 7/23/2025 7:04 PM, Raag Jadav wrote:
On Tue, Jul 15, 2025 at 04:17:26PM +0530, Riana Tauro wrote:
Add documentation for vendor specific device wedged recovery method
and runtime survivability.
...
diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
index
maintained through
SBR
Add basic support to handle these errors
Bspec: 50875, 53073, 53074, 53075, 53076
v2: Format commit message (Umesh)
v3: fix documentation (Stuart)
Cc: Stuart Summers
Co-developed-by: Himal Prasad Ghimiray
Signed-off-by: Himal Prasad Ghimiray
Signed-off-by: Riana Tauro
: use vendor recovery method with
runtime survivability (Christian, Rodrigo, Raag)
v3: move declare wedged to runtime survivability mode (Rodrigo)
v4: update commit message
Signed-off-by: Riana Tauro
Reviewed-by: Umesh Nerlige Ramappa
---
drivers/gpu/drm/xe/regs/xe_gsc_regs.h | 2
Add a debugfs fault handler to trigger csc error handler that
wedges the device and enables runtime survivability mode
v2: add debugfs only for bmg (Umesh)
v3: do not use csc_fault attribute if debugfs is not enabled
Cc: Lucas De Marchi
Signed-off-by: Riana Tauro
---
drivers/gpu/drm/xe
Add documentation for vendor specific device wedged recovery method
and runtime survivability.
v2: fix documentation (Raag)
v3: add userspace tool for firmware update (Raag)
v4: use consistent documentation (Raag)
Signed-off-by: Riana Tauro
---
Documentation/gpu/xe/index.rst | 1
The patches in these series refactor the boot survivability code to
allow adding runtime survivability.
Refactor existing code to separate both the modes
This patch renames the functions and separates init and enable.
Signed-off-by: Riana Tauro
---
drivers/gpu/drm/xe/xe_device.c
that device is in survivability mode
/sys/bus/pci/devices//survivability_mode
v2: Fix kernel-doc (Umesh)
v3: Add user friendly dmesg (Frank)
Signed-off-by: Riana Tauro
---
drivers/gpu/drm/xe/xe_survivability_mode.c| 43 ++-
drivers/gpu/drm/xe/xe_survivability_mode.h| 1
mode when csc errors are reported
Rev4: refactor survivability code
Rev5: Add more documentation
add user friendly logs
remove checks for BMG if not necessary
fix other review comments
Rev6: Use consistent words
revert to include BMG checks
Riana Tauro (9):
drm: Add a vendo
Add a helper function to set recovery method. The recovery
method has to be set before declaring the device wedged and sending the
drm wedged uevent. If no method is set, default unbind/re-bind method
will be set
v2: fix documentation (Raag)
Signed-off-by: Riana Tauro
Reviewed-by: Raag Jadav
ore details to commit message (Sima, Rodrigo, Raag)
add an example script to the documentation (Raag)
v4: use consistent naming (Raag)
Cc: André Almeida
Cc: Christian König
Cc: David Airlie
Co-developed-by: Raag Jadav
Signed-off-by: Raag Jadav
Signed-off-by: Ria
Userspace should be notified after setting the device as wedged.
Re-order function calls to set gt wedged before sending uevent.
Cc: Matthew Brost
Suggested-by: Raag Jadav
Signed-off-by: Riana Tauro
Reviewed-by: Matthew Brost
---
drivers/gpu/drm/xe/xe_device.c | 12
1 file
On 7/24/2025 9:48 PM, Rodrigo Vivi wrote:
On Thu, Jul 24, 2025 at 08:04:30PM +0530, Riana Tauro wrote:
This patch addresses the need for a recovery method (firmware flash
on Firmware errors) introduced in the later patches of Xe KMD. Whenever
XE KMD detects a firmware error, a drm device
mode when csc errors are reported
Rev4: refactor survivability code
Rev5: Add more documentation
add user friendly logs
remove checks for BMG if not necessary
fix other review comments
Rev6: Use consistent words
revert to include BMG checks
Rev7: fix cosmetic changes
Riana
Add documentation for vendor specific device wedged recovery method
and runtime survivability.
v2: fix documentation (Raag)
v3: add userspace tool for firmware update (Raag)
v4: use consistent documentation (Raag)
Signed-off-by: Riana Tauro
Reviewed-by: Rodrigo Vivi
Reviewed-by: Raag Jadav
Add a helper function to set recovery method. The recovery
method can be set before declaring the device wedged and sending the
drm wedged uevent. If no method is set, default unbind/re-bind method
will be set.
v2: fix documentation (Raag)
Signed-off-by: Riana Tauro
Reviewed-by: Raag Jadav
Refactor survivability mode code to support both boot
and runtime survivability.
Signed-off-by: Riana Tauro
Reviewed-by: Raag Jadav
---
drivers/gpu/drm/xe/xe_device.c| 2 +-
drivers/gpu/drm/xe/xe_heci_gsc.c | 2 +-
drivers/gpu/drm/xe/xe_pci.c
Userspace should be notified after setting the device as wedged.
Re-order function calls to set gt wedged before sending uevent.
Cc: Matthew Brost
Suggested-by: Raag Jadav
Signed-off-by: Riana Tauro
Reviewed-by: Matthew Brost
---
drivers/gpu/drm/xe/xe_device.c | 12
1 file
that device is in survivability mode
/sys/bus/pci/devices//survivability_mode
v2: Fix kernel-doc (Umesh)
v3: Add user friendly dmesg (Frank)
Signed-off-by: Riana Tauro
Reviewed-by: Raag Jadav
---
drivers/gpu/drm/xe/xe_survivability_mode.c| 43 ++-
drivers/gpu/drm/xe
add an example script to the documentation (Raag)
v4: use consistent naming (Raag)
v5: fix commit message
Cc: André Almeida
Cc: Christian König
Cc: David Airlie
Cc: Simona Vetter
Co-developed-by: Raag Jadav
Signed-off-by: Raag Jadav
Signed-off-by: Riana Tauro
Reviewed-by: Rodr
maintained through SBR.
Add basic support to handle these errors.
Bspec: 50875, 53073, 53074, 53075, 53076
v2: Format commit message (Umesh)
v3: fix documentation (Stuart)
Cc: Stuart Summers
Co-developed-by: Himal Prasad Ghimiray
Signed-off-by: Himal Prasad Ghimiray
Signed-off-by: Riana Tauro
: use vendor recovery method with
runtime survivability (Christian, Rodrigo, Raag)
v3: move declare wedged to runtime survivability mode (Rodrigo)
v4: update commit message
Signed-off-by: Riana Tauro
Reviewed-by: Umesh Nerlige Ramappa
---
drivers/gpu/drm/xe/regs/xe_gsc_regs.h | 2
Add a debugfs fault handler to trigger csc error handler that
wedges the device and enables runtime survivability mode.
v2: add debugfs only for bmg (Umesh)
v3: do not use csc_fault attribute if debugfs is not enabled
Cc: Lucas De Marchi
Signed-off-by: Riana Tauro
Reviewed-by: Raag Jadav
On 7/31/2025 6:31 PM, Maxime Ripard wrote:
On Thu, Jul 31, 2025 at 04:43:46PM +0530, Riana Tauro wrote:
Hi Maxim
On 7/31/2025 3:02 PM, Maxime Ripard wrote:
Hi,
On Wed, Jul 30, 2025 at 07:33:01PM +0530, Riana Tauro wrote:
On 7/28/2025 3:57 PM, Riana Tauro wrote:
Address the need for a
Hi Maxim
On 7/31/2025 3:02 PM, Maxime Ripard wrote:
Hi,
On Wed, Jul 30, 2025 at 07:33:01PM +0530, Riana Tauro wrote:
On 7/28/2025 3:57 PM, Riana Tauro wrote:
Address the need for a recovery method (firmware flash on Firmware errors)
introduced in the later patches of Xe KMD.
Whenever XE KMD
On 7/28/2025 3:57 PM, Riana Tauro wrote:
Address the need for a recovery method (firmware flash on Firmware errors)
introduced in the later patches of Xe KMD.
Whenever XE KMD detects a firmware error, a firmware flash is required to
recover the device to normal operation.
The initial
62 matches
Mail list logo