[PATCH] drm/i915/uc: Include requested frequency in slow firmware load messages

2024-12-20 Thread John . C . Harrison
From: John Harrison To aid debug of sporadic issues, include the requested frequency in the debug message as well as the actual frequency. That way we know for certain that the clamping is not because the driver forgot to ask. Signed-off-by: John Harrison Reviewed-by: Vinay Belgaumkar --- dri

[PATCH] drm/i915: Add debug print about hw config table size

2024-12-20 Thread John . C . Harrison
From: John Harrison Add debug info to help investigate a very rare bug: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/13385 Signed-off-by: John Harrison --- drivers/gpu/drm/i915/gt/uc/intel_guc_hwconfig.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/gpu/drm/i915/g

[PATCH v6 2/3] drm/xe: Move the coredump registration to the worker thread

2024-11-28 Thread John . C . Harrison
From: John Harrison Adding lockdep checking to the coredump code showed that there was an existing violation. The dev_coredumpm_timeout() call is used to register the dump with the base coredump subsystem. However, that makes multiple memory allocations, only some of which use the GFP_ flags pass

[PATCH v9 07/11] drm/print: Introduce drm_line_printer

2024-10-02 Thread John . C . Harrison
From: Michal Wajdeczko This drm printer wrapper can be used to increase the robustness of the captured output generated by any other drm_printer to make sure we didn't lost any intermediate lines of the output by adding line numbers to each output line. Helpful for capturing some crash data. v2:

[PATCH v9 07/11] drm/print: Introduce drm_line_printer

2024-10-02 Thread John . C . Harrison
From: Michal Wajdeczko This drm printer wrapper can be used to increase the robustness of the captured output generated by any other drm_printer to make sure we didn't lost any intermediate lines of the output by adding line numbers to each output line. Helpful for capturing some crash data. v2:

[PATCH v8 10/11] drm/xe/guc: Add GuC log to devcoredump captures

2024-09-19 Thread John . C . Harrison
From: John Harrison Include the GuC log in devcoredump captures because they can be useful with debugging certain types of bug. v2: Fix kerneldoc v3: Drop module parameter as now using more compact ascii85 encoding rather than hexdump (although still not compressed) (review feedback from Matthew

[PATCH v8 09/11] drm/xe/guc: Dump entire CTB on errors

2024-09-19 Thread John . C . Harrison
From: John Harrison The dump of the CT buffers was only showing the unprocessed data which is not generally useful for saying why a hang occurred - because it was probably caused by the commands that were just processed. So save and dump the entire buffer but in a more compact dump format. Also z

[PATCH v8 04/11] drm/xe/devcoredump: Add ASCII85 dump helper function

2024-09-19 Thread John . C . Harrison
From: John Harrison There is a need to include the GuC log and other large binary objects in core dumps and via dmesg. So add a helper for dumping to a printer function via conversion to ASCII85 encoding. Another issue with dumping such a large buffer is that it can be slow, especially if dumpin

[PATCH v8 02/11] drm/xe/devcoredump: Use drm_puts and already cached local variables

2024-09-19 Thread John . C . Harrison
From: John Harrison There are a bunch of calls to drm_printf with static strings. Switch them to drm_puts instead. There are also a bunch of 'coredump->snapshot.XXX' references when 'coredump->snapshot' has alread been cached locally as 'ss'. So use 'ss->XXX' instead. Signed-off-by: John Harris

[PATCH v8 01/11] drm/xe/guc: Remove spurious line feed in debug print

2024-09-19 Thread John . C . Harrison
From: John Harrison Including line feeds at the start of a debug print messes up the output when sent to dmesg. The break appears between all the useful prefix information and the actual string being printed. In this case, each block of data has a very clear start line and an extra delimeter is r

[PATCH v8 03/11] drm/xe/devcoredump: Improve section headings and add tile info

2024-09-19 Thread John . C . Harrison
From: John Harrison The xe_guc_exec_queue_snapshot is not really a GuC internal thing and is definitely not a GuC CT thing. So give it its own section heading. The snapshot itself is really a capture of the submission backend's internal state. Although all it currently prints out is the submissio

[PATCH v8 07/11] drm/print: Introduce drm_line_printer

2024-09-19 Thread John . C . Harrison
From: Michal Wajdeczko This drm printer wrapper can be used to increase the robustness of the captured output generated by any other drm_printer to make sure we didn't lost any intermediate lines of the output by adding line numbers to each output line. Helpful for capturing some crash data. v2:

[PATCH v8 06/11] drm/xe/guc: Use a two stage dump for GuC logs and add more info

2024-09-19 Thread John . C . Harrison
From: John Harrison Split the GuC log dump into a two stage snapshot and print mechanism. This allows the log to be captured at the point of an error (which may be in a restricted context) and then dump it out later (from a regular context such as a worker function or a sysfs file handler). Also

[PATCH v8 07/11] drm/print: Introduce drm_line_printer

2024-09-19 Thread John . C . Harrison
From: Michal Wajdeczko This drm printer wrapper can be used to increase the robustness of the captured output generated by any other drm_printer to make sure we didn't lost any intermediate lines of the output by adding line numbers to each output line. Helpful for capturing some crash data. v2:

[PATCH v8 08/11] drm/xe/guc: Dead CT helper

2024-09-19 Thread John . C . Harrison
From: John Harrison Add a worker function helper for asynchronously dumping state when an internal/fatal error is detected in CT processing. Being asynchronous is required to avoid deadlocks and scheduling-while-atomic or process-stalled-for-too-long issues. Also check for a bunch more error cond

[PATCH v8 11/11] drm/xe/guc: Add a helper function for dumping GuC log to dmesg

2024-09-19 Thread John . C . Harrison
From: John Harrison Create a helper function that can be used to dump the GuC log to dmesg in a manner that is reliable for extraction and decode. The intention is that calls to this can be added by developers when debugging specific issues that require a GuC log but do not allow easy capture of

[PATCH v8 05/11] drm/xe/guc: Copy GuC log prior to dumping

2024-09-19 Thread John . C . Harrison
From: John Harrison Add an extra stage to the GuC log print to copy the log buffer into regular host memory first, rather than printing the live GPU buffer object directly. Doing so helps prevent inconsistencies due to the log being updated as it is being dumped. It also allows the use of the ASC

[PATCH v8 00/11] drm/xe/guc: Improve GuC log dumping and add to devcoredump

2024-09-19 Thread John . C . Harrison
From: John Harrison There is a debug mechanism for dumping the GuC log as an ASCII hex stream via dmesg. This is extremely useful for situations where it is not possibe to query the log from debugfs (self tests, bugs that cause the driver to fail to load, system hangs, etc.). However, dumping via

[PATCH v2] drm/i915/guc: Enable PXP GuC autoteardown flow

2024-09-06 Thread John . C . Harrison
From: Juston Li This feature flag enables GuC autoteardown which allows for a grace period before session teardown. Also add a HAS_PXP() helper to share with the other place that wants to check. Signed-off-by: Juston Li Signed-off-by: John Harrison --- drivers/gpu/drm/i915/gt/uc/intel_guc.c

[PATCH] drm/i915/uc: Includ requested frequency in slow firmware load messages

2024-08-30 Thread John . C . Harrison
From: John Harrison To aid debug of sporadic issues, include the requested frequency in the debug message as well as the actual frequency. That way we know for certain that the clamping is not because the driver forgot to ask. Signed-off-by: John Harrison --- drivers/gpu/drm/i915/gt/uc/intel_g

[PATCH] drm/i915/guc: Fix missing enable of Wa_14019159160 on ARL

2024-08-08 Thread John . C . Harrison
From: John Harrison The previous update to enable the workaround on ARL only changed two out of three places where the w/a needs to be enabled. That meant the GuC side was operational but not the KMD side. And as the KMD side is the trigger, it meant the w/a was not actually active. So fix that.

[PATCH] drm/i915/dg2: Enable Wa_14019159160 for DG2

2024-08-05 Thread John . C . Harrison
From: John Harrison The context switch hold out workaround also applies to DG2. Signed-off-by: John Harrison --- drivers/gpu/drm/i915/gt/uc/intel_guc.c | 3 ++- drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c | 3 ++- 2 files changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/dr

[PATCH] drm/i915: ARL requires a newer GSC firmware

2024-08-01 Thread John . C . Harrison
From: John Harrison ARL and MTL share a single GSC firmware blob. However, ARL requires a newer version of it. So add differentiate of the PCI ids for ARL from MTL and create ARL as a sub-platform of MTL. That way, all the existing workarounds and such still treat ARL as MTL exactly as before. H

[PATCH 0/3] [CI] Extend Wa14019159160 and enable for ARL and DG2

2024-06-21 Thread John . C . Harrison
From: John Harrison The context switch out workaround requires an extra piece on top. Also, it applies to more platforms. Signed-off-by: John Harrison John Harrison (3): drm/i915/arl: Enable Wa_14019159160 for ARL drm/i915/guc: Extend w/a 14019159160 drm/i915/dg2: Enable Wa_14019159160

[PATCH 2/3] drm/i915/guc: Extend w/a 14019159160

2024-06-21 Thread John . C . Harrison
From: John Harrison There is a new part to an existing workaround, so enable that piece as well. v2: Extend even further. v3: Drop DG2 as there are CI failures still to resolve. Also re-order the parameters to a function to reduce excessive line wrapping. Signed-off-by: John Harrison --- driv

[PATCH 3/3] drm/i915/dg2: Enable Wa_14019159160 for DG2

2024-06-21 Thread John . C . Harrison
From: John Harrison The context switch hold out workaround also applies to DG2. Signed-off-by: John Harrison --- drivers/gpu/drm/i915/gt/uc/intel_guc.c | 3 ++- drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c | 3 ++- 2 files changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/dr

[PATCH 1/3] drm/i915/arl: Enable Wa_14019159160 for ARL

2024-06-21 Thread John . C . Harrison
From: John Harrison The context switch out workaround also applies to ARL. Signed-off-by: John Harrison --- drivers/gpu/drm/i915/gt/uc/intel_guc.c | 2 +- drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/i915/

[PATCH 0/2] Extend Wa14019159160 and enable for ARL

2024-06-21 Thread John . C . Harrison
From: John Harrison The context switch out workaround requires an extra piece on top. Also, it applies to more platforms. Signed-off-by: John Harrison John Harrison (2): drm/i915/arl: Enable Wa_14019159160 for ARL drm/i915/guc: Extend w/a 14019159160 drivers/gpu/drm/i915/gt/uc/abi/guc_k

[PATCH 2/2] drm/i915/guc: Extend w/a 14019159160

2024-06-21 Thread John . C . Harrison
From: John Harrison There is a new part to an existing workaround, so enable that piece as well. v2: Extend even further. v3: Drop DG2 as there are CI failures still to resolve. Also re-order the parameters to a function to reduce excessive line wrapping. Signed-off-by: John Harrison --- driv

[PATCH 1/2] drm/i915/arl: Enable Wa_14019159160 for ARL

2024-06-21 Thread John . C . Harrison
From: John Harrison The context switch out workaround also applies to ARL. Signed-off-by: John Harrison --- drivers/gpu/drm/i915/gt/uc/intel_guc.c | 2 +- drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/i915/

[PATCH v2] drm/i915/guc: Enable w/a 16021333562 for DG2, MTL and ARL

2024-05-28 Thread John . C . Harrison
From: John Harrison Enable another workaround that is implemented inside the GuC. v2: Use the correct Gen12 w/a id rather than the Xe version (review feedback from Matthew R) also extend to include ARL. Signed-off-by: John Harrison --- drivers/gpu/drm/i915/gt/uc/abi/guc_klvs_abi.h | 1 + dri

[PATCH] drm/i915/guc: Enable w/a 14019882105 for DG2 and MTL

2024-05-24 Thread John . C . Harrison
From: John Harrison Enable another workaround that is implemented inside the GuC. Signed-off-by: John Harrison --- drivers/gpu/drm/i915/gt/uc/abi/guc_klvs_abi.h | 1 + drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c| 32 --- 2 files changed, 21 insertions(+), 12 deletions(-) d

[PATCH] drm/i915/guc: Fix the fix for reset lock confusion

2024-03-29 Thread John . C . Harrison
From: John Harrison The previous fix for the circlular lock splat about the busyness worker wasn't quite complete. Even though the reset-in-progress flag is cleared at the start of intel_uc_reset_finish, the entire function is still inside the reset mutex lock. Not sure why the patch appeared to

[PATCH] drm/i915/guc: Update w/a 14019159160

2024-03-07 Thread John . C . Harrison
From: John Harrison An existing workaround has been extended in both platforms affected and implementation complexity. Signed-off-by: John Harrison --- drivers/gpu/drm/i915/gt/uc/abi/guc_klvs_abi.h | 3 ++- drivers/gpu/drm/i915/gt/uc/intel_guc.c| 3 ++- drivers/gpu/drm/i915/gt/uc/int

[PATCH v3 3/3] drm/i915/guc: Enable Wa_14019159160

2024-02-23 Thread John . C . Harrison
From: John Harrison Use the new w/a KLV support to enable a MTL w/a. Note, this w/a is a super-set of Wa_16019325821, so requires turning that one as well as setting the new flag for Wa_14019159160 itself. Signed-off-by: John Harrison Reviewed-by: Vinay Belgaumkar --- drivers/gpu/drm/i915/gt/

[PATCH v3 0/3] Enable Wa_14019159160 and Wa_16019325821 for MTL

2024-02-23 Thread John . C . Harrison
From: John Harrison Enable Wa_14019159160 and Wa_16019325821 for MTL RCS/CCS workarounds for MTL. v2: Fix bug in WA KLV implementation (offset not being reset to start of list). Add better comment to prep patch about how KLVs can be added. Add a module parameter override and disable the w/a by

[PATCH v3 2/3] drm/i915/guc: Add support for w/a KLVs

2024-02-23 Thread John . C . Harrison
From: John Harrison To prevent running out of bits, new w/a enable flags are being added via a KLV system instead of a 32 bit flags word. Signed-off-by: John Harrison Reviewed-by: Vinay Belgaumkar --- .../gpu/drm/i915/gt/uc/abi/guc_errors_abi.h | 1 + drivers/gpu/drm/i915/gt/uc/intel_guc.h

[PATCH v3 1/3] drm/i915: Enable Wa_16019325821

2024-02-23 Thread John . C . Harrison
From: John Harrison Some platforms require holding RCS context switches until CCS is idle (the reverse w/a of Wa_14014475959). Some platforms require both versions. Signed-off-by: John Harrison Reviewed-by: Vinay Belgaumkar --- drivers/gpu/drm/i915/gt/gen8_engine_cs.c | 19 +++---

[PATCH] drm/i915/guc: Correct capture of EIR register on hang

2024-02-23 Thread John . C . Harrison
From: John Harrison The EIR register (0x20B0) was being included in the engine class list for render and compute as the absolute register address. However, it is actually a ring register available on all engines at an offset of (base) + 0xB0. As it was included as an RCS engine but with the absol

[PATCH v3] drm/i915/guc: Simplify/extend platform check for Wa_14018913170

2024-02-23 Thread John . C . Harrison
From: John Harrison The above w/a is required for every platform that the i915 driver supports. It is fixed on the latest platforms but they are only supported by Xe instead of i915. So just remove the platform check completely and keep the code simple. v2: Add extra comment (review feedback fro

[PATCH v3] drm/i915/guc: Simplify/extend platform check for Wa_14018913170

2024-02-16 Thread John . C . Harrison
From: John Harrison The above w/a is required for every platform that the i915 driver supports. It is fixed on the latest platforms but they are only supported by Xe instead of i915. So just remove the platform check completely and keep the code simple. Signed-off-by: John Harrison --- drivers

[PATCH] drm/i915/gt: Restart the heartbeat timer when forcing a pulse

2024-01-10 Thread John . C . Harrison
From: John Harrison The context persistence code does things like send super high priority heartbeat pulses to ensure any leaked context can still be pre-empted and thus isn't a total denial of service but only a minor denial of service. Unfortunately, it wasn't bothering to restart the heatbeat

[PATCH v3 2/3] drm/i915/guc: Add support for w/a KLVs

2024-01-04 Thread John . C . Harrison
From: John Harrison To prevent running out of bits, new w/a enable flags are being added via a KLV system instead of a 32 bit flags word. Signed-off-by: John Harrison Reviewed-by: Vinay Belgaumkar --- .../gpu/drm/i915/gt/uc/abi/guc_errors_abi.h | 1 + drivers/gpu/drm/i915/gt/uc/intel_guc.h

[PATCH v3 3/3] drm/i915/guc: Enable Wa_14019159160

2024-01-04 Thread John . C . Harrison
From: John Harrison Use the new w/a KLV support to enable a MTL w/a. Note, this w/a is a super-set of Wa_16019325821, so requires turning that one as well as setting the new flag for Wa_14019159160 itself. Signed-off-by: John Harrison Reviewed-by: Vinay Belgaumkar --- drivers/gpu/drm/i915/gt/

[PATCH v3 1/3] drm/i915: Enable Wa_16019325821

2024-01-04 Thread John . C . Harrison
From: John Harrison Some platforms require holding RCS context switches until CCS is idle (the reverse w/a of Wa_14014475959). Some platforms require both versions. Signed-off-by: John Harrison Reviewed-by: Vinay Belgaumkar --- drivers/gpu/drm/i915/gt/gen8_engine_cs.c | 19 +++---

[PATCH v3 0/3] Enable Wa_14019159160 and Wa_16019325821 for MTL

2024-01-04 Thread John . C . Harrison
From: John Harrison Enable Wa_14019159160 and Wa_16019325821 for MTL RCS/CCS workarounds for MTL. v2: Fix bug in WA KLV implementation (offset not being reset to start of list). Add better comment to prep patch about how KLVs can be added. Add a module parameter override and disable the w/a by

[PATCH] drm/i915/huc: Allow for very slow HuC loading

2024-01-02 Thread John . C . Harrison
From: John Harrison A failure to load the HuC is occasionally observed where the cause is believed to be a low GT frequency leading to very long load times. So a) increase the timeout so that the user still gets a working system even in the case of slow load. And b) report the frequency during t

[PATCH v3 2/3] drm/i915/guc: Add support for w/a KLVs

2023-12-20 Thread John . C . Harrison
From: John Harrison To prevent running out of bits, new w/a enable flags are being added via a KLV system instead of a 32 bit flags word. Signed-off-by: John Harrison Reviewed-by: Vinay Belgaumkar --- .../gpu/drm/i915/gt/uc/abi/guc_errors_abi.h | 1 + drivers/gpu/drm/i915/gt/uc/intel_guc.h

[PATCH v3 0/3] Enable Wa_14019159160 and Wa_16019325821 for MTL

2023-12-20 Thread John . C . Harrison
From: John Harrison Enable Wa_14019159160 and Wa_16019325821 for MTL RCS/CCS workarounds for MTL. v2: Fix bug in WA KLV implementation (offset not being reset to start of list). Add better comment to prep patch about how KLVs can be added. Add a module parameter override and disable the w/a by

[PATCH v3 1/3] drm/i915: Enable Wa_16019325821

2023-12-20 Thread John . C . Harrison
From: John Harrison Some platforms require holding RCS context switches until CCS is idle (the reverse w/a of Wa_14014475959). Some platforms require both versions. Signed-off-by: John Harrison Reviewed-by: Vinay Belgaumkar --- drivers/gpu/drm/i915/gt/gen8_engine_cs.c | 19 +++---

[PATCH v3 3/3] drm/i915/guc: Enable Wa_14019159160

2023-12-20 Thread John . C . Harrison
From: John Harrison Use the new w/a KLV support to enable a MTL w/a. Note, this w/a is a super-set of Wa_16019325821, so requires turning that one as well as setting the new flag for Wa_14019159160 itself. Signed-off-by: John Harrison Reviewed-by: Vinay Belgaumkar --- drivers/gpu/drm/i915/gt/

[PATCH] drm/i915/guc: Avoid circular locking issue on busyness flush

2023-12-19 Thread John . C . Harrison
From: John Harrison Avoid the following lockdep complaint: <4> [298.856498] == <4> [298.856500] WARNING: possible circular locking dependency detected <4> [298.856503] 6.7.0-rc5-CI_DRM_14017-g58ac4ffc75b6+ #1 Tainted: G N <4> [298.856505] --

[PATCH v2 2/2] drm/i915/guc: Add a selftest for FAST_REQUEST errors

2023-11-13 Thread John . C . Harrison
From: John Harrison There is a mechanism for reporting errors from fire and forget H2G messages. This is the only way to find out about almost any error in the GuC backend submission path. So it would be useful to know that it is working. v2: Fix some dumb over-complications and a couple of typo

[PATCH v2 0/2] Selftest for FAST_REQUEST feature

2023-11-13 Thread John . C . Harrison
From: John Harrison Add a selftest to verify that the FAST_REQUEST mechanism (getting errors back from fire-and-forget H2G commands) is functional. Also fix up a potential false positive in the GuC hang selftest. v2: Fix some dumb over-complications and typos - review feedback from Daniele. Si

[PATCH v2 1/2] drm/i915/guc: Fix for potential false positives in GuC hang selftest

2023-11-13 Thread John . C . Harrison
From: John Harrison Noticed that the hangcheck selftest is submitting a non-preemptoble spinner. That means that even if the GuC does not die, the heartbeat will still kick in and trigger a reset. Which is rather defeating the purpose of the test - to verify that the heartbeat will kick in if the

[PATCH 1/2] drm/i915/guc: Don't double enable a context

2023-11-09 Thread John . C . Harrison
From: John Harrison If a context is blocked, unblocked and subitted repeatedly in rapid succession, the driver can end up trying to enable the context while the previous enable request is still in flight. This can lead to much confusion in the state tracking. Prevent that by checking the pending

[PATCH 0/2] Don't send double context enable/disable requests

2023-11-09 Thread John . C . Harrison
From: John Harrison The driver could sometimes send context enable/disable requests when a previous request was still pending. This is not allowed. So stop doing it. Signed-off-by: John Harrison John Harrison (2): drm/i915/guc: Don't double enable a context drm/i915/guc: Don't disable a c

[PATCH 2/2] drm/i915/guc: Don't disable a context whose enable is still pending

2023-11-09 Thread John . C . Harrison
From: John Harrison Various processes involve requesting GuC to disable a given context. However context enable/disable is an asynchronous process in the GuC. Thus, it is possible the previous enable request is still being processed when the disable request is triggered. Having both enable and di

[PATCH 1/2] drm/i915/guc: Fix for potential false positives in GuC hang selftest

2023-11-06 Thread John . C . Harrison
From: John Harrison Noticed that the hangcheck selftest is submitting a non-preemptoble spinner. That means that even if the GuC does not die, the heartbeat will still kick in and trigger a reset. Which is rather defeating the purpose of the test - to verify that the heartbeat will kick in if the

[PATCH 2/2] drm/i915/guc: Add a selftest for FAST_REQUEST errors

2023-11-06 Thread John . C . Harrison
From: John Harrison There is a mechanism for reporting errors from fire and forget H2G messages. This is the only way to find out about almost any error in the GuC backend submission path. So it would be useful to know that it is working. Signed-off-by: John Harrison --- drivers/gpu/drm/i915/g

[PATCH 0/2] Selftest for FAST_REQUEST feature

2023-11-06 Thread John . C . Harrison
From: John Harrison Add a selftest to verify that the FAST_REQUEST mechanism (getting errors back from fire-and-forget H2G commands) is functional. Also fix up a potential false positive in the GuC hang selftest. Signed-off-by: John Harrison John Harrison (2): drm/i915/guc: Fix for potenti

[PATCH v2 4/4] drm/i915/mtl: Add module parameter override for Wa_16019325821/Wa_14019159160

2023-10-27 Thread John . C . Harrison
From: John Harrison These w/a's can have signficant performance implications for any workload which uses both RCS and CCS. On the other hand, the hang itself is only seen in one or two very specific workloads. So add a module parameter to control whether the w/a's are enabled or not and default t

[PATCH v2 3/4] drm/i915/guc: Enable Wa_14019159160

2023-10-27 Thread John . C . Harrison
From: John Harrison Use the new w/a KLV support to enable a MTL w/a. Note, this w/a is a super-set of Wa_16019325821, so requires turning that one as well as setting the new flag for Wa_14019159160 itself. Signed-off-by: John Harrison --- drivers/gpu/drm/i915/gt/gen8_engine_cs.c | 3 ++

[PATCH v2 1/4] drm/i915: Enable Wa_16019325821

2023-10-27 Thread John . C . Harrison
From: John Harrison Some platforms require holding RCS context switches until CCS is idle (the reverse w/a of Wa_14014475959). Some platforms require both versions. Signed-off-by: John Harrison --- drivers/gpu/drm/i915/gt/gen8_engine_cs.c | 19 +++ drivers/gpu/drm/i915/gt/

[PATCH v2 2/4] drm/i915/guc: Add support for w/a KLVs

2023-10-27 Thread John . C . Harrison
From: John Harrison To prevent running out of bits, new w/a enable flags are being added via a KLV system instead of a 32 bit flags word. Signed-off-by: John Harrison --- .../gpu/drm/i915/gt/uc/abi/guc_errors_abi.h | 1 + drivers/gpu/drm/i915/gt/uc/intel_guc.h| 2 + drivers/gpu/drm

[PATCH v2 0/4] Enable Wa_14019159160 and Wa_16019325821 for MTL

2023-10-27 Thread John . C . Harrison
From: John Harrison Enable Wa_14019159160 and Wa_16019325821 for MTL RCS/CCS workarounds for MTL. v2: Fix bug in WA KLV implementation (offset not being reset to start of list). Add better comment to prep patch about how KLVs can be added. Add a module parameter override and disable the w/a by

[PATCH 2/2] drm/i915: More use of GT specific print helpers

2023-10-09 Thread John . C . Harrison
From: John Harrison Update a bunch of GT related print messages in non-GT files to use the GT specific helpers. Signed-off-by: John Harrison --- drivers/gpu/drm/i915/gt/uc/intel_gsc_uc.c | 8 +++- drivers/gpu/drm/i915/i915_driver.c| 3 ++- drivers/gpu/drm/i915/i915_perf.c

[PATCH 1/2] drm/i915/gt: More use of GT specific print helpers

2023-10-09 Thread John . C . Harrison
From: John Harrison A bunch of print messages got missed in the update to using sub-system specific helpers. So update those. Signed-off-by: John Harrison --- drivers/gpu/drm/i915/gt/intel_engine_cs.c | 29 + drivers/gpu/drm/i915/gt/intel_gsc.c | 11 driv

[PATCH 0/2] More print message helper updates

2023-10-09 Thread John . C . Harrison
From: John Harrison There was an update a while back to use sub-system specific print helpers that implicitly add sub-system specific information to the print. It seems a bunch of GT related messages got missed in that update. So update them now. Signed-off-by: John Harrison John Harrison (2)

[PATCH] drm/i915/guc: Update 'recommended' version to 70.12.1 for DG2/ADL-S/ADL-P/MTL

2023-10-06 Thread John . C . Harrison
From: John Harrison The latest GuC has new features and new workarounds that we wish to enable. So let the universe know that it is useful to update their firmware. Signed-off-by: John Harrison --- drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c | 8 1 file changed, 4 insertions(+), 4 deleti

[PATCH] drm/i915/guc: Enable WA 14018913170

2023-10-05 Thread John . C . Harrison
From: Daniele Ceraolo Spurio The GuC handles the WA, the KMD just needs to set the flag to enable it on the appropriate platforms. Signed-off-by: John Harrison Signed-off-by: Daniele Ceraolo Spurio Reviewed-by: Vinay Belgaumkar --- drivers/gpu/drm/i915/gt/uc/intel_guc.c | 6 ++ driv

[PATCH 1/3] drm/i915/guc: Support new and improved engine busyness

2023-09-22 Thread John . C . Harrison
From: John Harrison The GuC has been extended to support a much more friendly engine busyness interface. So partition the old interface into a 'busy_v1' space and add 'busy_v2' support alongside. And if v2 is available, use that in preference to v1. Note that v2 provides extra features over and a

[PATCH 3/3] drm/i915/mtl: Add counters for engine busyness ticks

2023-09-22 Thread John . C . Harrison
From: Umesh Nerlige Ramappa In new version of GuC engine busyness, GuC provides engine busyness ticks as a 64 bit counter. Add a new counter to relay this value to the user as is. Signed-off-by: Umesh Nerlige Ramappa Signed-off-by: John Harrison --- drivers/gpu/drm/i915/gt/intel_engine.h

[PATCH 0/3] Engine busyness v2

2023-09-22 Thread John . C . Harrison
From: John Harrison The latest GuC implements a new and improved scheme for tracking engine busyness. So make use of it. Note that this change comes along with a new set of PMU counters. The old counters have a fundamental problem that they are defined in terms of wall time but the sampling is n

[PATCH 2/3] drm/i915/mtl: Add a PMU counter for total active ticks

2023-09-22 Thread John . C . Harrison
From: Umesh Nerlige Ramappa Current engine busyness interface exposed by GuC has a few issues: - The busyness of active engine is calculated using 2 values provided by GuC and is prone to race between CPU reading those values and GuC updating them. Any sort of HW synchronization would be at

[PATCH] drm/i915/guc: Suppress 'ignoring reset notification' message

2023-09-21 Thread John . C . Harrison
From: John Harrison If an active context has been banned (e.g. Ctrl+C killed) then it is likely to be reset as part of evicting it from the hardware. That results in a 'ignoring context reset notification: banned = 1' message at info level. This confuses/concerns people and makes them thing somet

[PATCH 3/4] drm/i915/guc: Add support for w/a KLVs

2023-09-15 Thread John . C . Harrison
From: John Harrison To prevent running out of bits, new w/a enable flags are being added via a KLV system instead of a 32 bit flags word. Signed-off-by: John Harrison --- .../gpu/drm/i915/gt/uc/abi/guc_errors_abi.h | 1 + drivers/gpu/drm/i915/gt/uc/intel_guc.h| 3 + drivers/gpu/drm

[PATCH 4/4] drm/i915/guc: Enable Wa_14019159160

2023-09-15 Thread John . C . Harrison
From: John Harrison Use the new w/a KLV support to enable a MTL w/a. Note, this w/a is a super-set of Wa_16019325821, so requires turning that one as well as setting the new flag for Wa_14019159160 itself. Signed-off-by: John Harrison --- drivers/gpu/drm/i915/gt/gen8_engine_cs.c | 3 +++

[PATCH 2/4] drm/i915: Enable Wa_16019325821

2023-09-15 Thread John . C . Harrison
From: John Harrison Some platforms require holding RCS context switches until CCS is idle (the reverse w/a of Wa_14014475959). Some platforms require both versions. Signed-off-by: John Harrison --- drivers/gpu/drm/i915/gt/gen8_engine_cs.c | 19 +++ drivers/gpu/drm/i915/gt/

[PATCH 1/4] drm/i915/guc: Update 'recommended' version to 70.11.0 for DG2/ADL-P/MTL

2023-09-15 Thread John . C . Harrison
From: John Harrison The latest GuC has new features and new workarounds that we wish to enable. So let the universe know that it is useful to update their firmware. Signed-off-by: John Harrison --- drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletion

[PATCH 0/4] Enable Wa_14019159160 and Wa_16019325821 for MTL

2023-09-15 Thread John . C . Harrison
From: John Harrison Enable Wa_14019159160 and Wa_16019325821 for MTL RCS/CCS workarounds for MTL. Signed-off-by: John Harrison John Harrison (4): drm/i915/guc: Update 'recommended' version to 70.11.0 for DG2/ADL-P/MTL drm/i915: Enable Wa_16019325821 drm/i915/guc: Add support for w

[PATCH 1/2] drm/i915/guc: Update 'recommended' version to 70.11.0 for DG2/ADL-P/MTL

2023-09-14 Thread John . C . Harrison
From: John Harrison The latest GuC has new features and new workarounds that we wish to enable. So let the universe know that it is useful to update their firmware. Signed-off-by: John Harrison --- drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletion

[PATCH 2/2] drm/i915/guc: Enable WA 14018913170

2023-09-14 Thread John . C . Harrison
From: Daniele Ceraolo Spurio The GuC handles the WA, the KMD just needs to set the flag to enable it on the appropriate platforms. Signed-off-by: John Harrison Signed-off-by: Daniele Ceraolo Spurio --- drivers/gpu/drm/i915/gt/uc/intel_guc.c | 6 ++ drivers/gpu/drm/i915/gt/uc/intel_gu

[PATCH 0/2] Enable Wa_14018913170 on DG2/MTL/PVD

2023-09-14 Thread John . C . Harrison
From: John Harrison Enable a WA on the latest platforms. Also update the recommended GuC version for those platforms to the latest available. Further patches will follow to make use of other features in the latest GuC firmware, but the w/a at least requires something newer than what was previousl

[PATCH v2] drm/i915/guc: Force a reset on internal GuC error

2023-08-15 Thread John . C . Harrison
From: John Harrison If GuC hits an internal error (and survives long enough to report it to the KMD), it is basically toast and will stop until a GT reset and subsequent GuC reload is performed. Previously, the KMD just printed an error message and then waited for the heartbeat to eventually kick

[PATCH] drm/i915/guc: Fix potential null pointer deref in GuC 'steal id' test

2023-08-02 Thread John . C . Harrison
From: John Harrison It was noticed that if the very first 'stealing' request failed to create for some reason then the 'steal all ids' loop would immediately exit with 'last' still being NULL. The test would attempt to continue but using a null pointer. Fix that by aborting the test if it fails t

[PATCH] drm/i915/guc: Force a reset on internal GuC error

2023-06-05 Thread John . C . Harrison
From: John Harrison If GuC hits an internal error (and survives long enough to report it to the KMD), it is basically toast and will stop until a GT reset and subsequent GuC reload is performed. Previously, the KMD just printed an error message and then waited for the heartbeat to eventually kick

[PATCH] drm/i915/guc: Remove some obsolete definitions

2023-05-31 Thread John . C . Harrison
From: John Harrison There were a bunch of defines and structures left over from an API update a very long time ago. Remove them. Signed-off-by: John Harrison --- drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h | 33 - 1 file changed, 33 deletions(-) diff --git a/drivers/gpu/dr

[PATCH 0/3] Use FAST_REQUEST mechanism for non-blocking H2G calls

2023-05-26 Thread John . C . Harrison
From: John Harrison The GuC interface supports a mechanism for returning errors against non-blocking H2G calls. This is called FAST_REQUEST. Given that the call is asynchronous, matching the returned error up is difficult. However, getting any error at all back is better than no error. If any su

[PATCH 3/3] drm/i915/guc: Track all sent actions to GuC

2023-05-26 Thread John . C . Harrison
From: Michal Wajdeczko For easier debug of any unexpected error responses from GuC that might be related to non-blocking fast requests, track action code (and stack if under DEBUG_GUC config) for every H2G request. Signed-off-by: Michal Wajdeczko Signed-off-by: John Harrison --- drivers/gpu/d

[PATCH 2/3] drm/i915/guc: Update log for unsolicited CTB response

2023-05-26 Thread John . C . Harrison
From: Michal Wajdeczko Instead of printing message fence twice, include HXG header of the unexpected message and its len. Signed-off-by: Michal Wajdeczko Signed-off-by: John Harrison --- drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff

[PATCH 1/3] drm/i915/guc: Use FAST_REQUEST for non-blocking H2G calls

2023-05-26 Thread John . C . Harrison
From: Michal Wajdeczko In addition to the already defined REQUEST HXG message format, which is used when sender expects some confirmation or data, HXG protocol includes definition of the FAST REQUEST message, that may be used when sender does not expect any useful data to be returned. Using this

[PATCH] drm/i915/guc: Fix confused register capture list creation

2023-05-11 Thread John . C . Harrison
From: John Harrison The GuC has a completely separate engine class enum when referring to register capture lists, which combines render and compute. The driver was using the 'normal' GuC specific engine class enum instead. That meant that it thought it was defining a capture list for compute engi

[PATCH] drm/i1915/guc: Fix probe injection CI failures after recent change

2023-05-10 Thread John . C . Harrison
From: John Harrison A recent change bumped a 'notice' message up to 'error' level for debug builds to help trap incorrect configurations in CI systems. Unfortunaetly, tha error condition in question is triggered by the error injection probe test. So change the message again to be 'probe error' le

[PATCH 1/2] drm/i915/uc: Track patch level versions on reduced version firmware files

2023-05-04 Thread John . C . Harrison
From: John Harrison When reduced version firmware files were added (matching major component being the only strict requirement), the minor version was still tracked and a notification reported if it was older. However, the patch version should really be tracked as well for the same reasons. The K

[PATCH 0/2] Update MTL GuC firmware

2023-05-04 Thread John . C . Harrison
From: John Harrison Update MTL to the latest GuC release and switch to using reduced version file names. Also, pull in a patch from an earlier series that is waiting to merge to prevent merge conflicts later. Signed-off-by: John Harrison John Harrison (2): drm/i915/uc: Track patch level ver

[PATCH 2/2] drm/i915/mtl: Update GuC firmware version for MTL to 70.6.6

2023-05-04 Thread John . C . Harrison
From: John Harrison Also switch to using reduced version file naming as it is no longer such a work-in-progress and likely to change. Signed-off-by: John Harrison --- drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i9

[PATCH v3 6/6] drm/i915/uc: Make unexpected firmware versions an error in debug builds

2023-05-02 Thread John . C . Harrison
From: John Harrison If the DEBUG_GEM config option is set then escalate the 'unexpected firmware version' message from a notice to an error. This will ensure that the CI system treats such occurences as a failure and logs a bug about it (or fails the pre-merge testing). Signed-off-by: John Harri

[PATCH v3 5/6] drm/i915/uc: Reject duplicate entries in firmware table

2023-05-02 Thread John . C . Harrison
From: John Harrison It was noticed that duplicate entries in the firmware table could cause an infinite loop in the firmware loading code if that entry failed to load. Duplicate entries are a bug anyway and so should never happen. Ensure they don't by tweaking the table validation code to reject

[PATCH v3 4/6] drm/i915/uc: Enhancements to firmware table validation

2023-05-02 Thread John . C . Harrison
From: John Harrison The validation of the firmware table was being done inside the code for scanning the table for the next available firmware blob. Which is unnecessary. So pull it out into a separate function that is only called once per blob type at init time. Also, drop the CONFIG_SELFTEST r

  1   2   3   4   5   6   >