[Intel-gfx] [PATCH 1/1] drm/i915/guc: Enable compute scheduling on DG2

2022-09-22 Thread John . C . Harrison
From: John Harrison DG2 has issues. To work around one of these the GuC must schedule apps in an exclusive manner across both RCS and CCS. That is, if a context from app X is running on RCS then all CCS engines must sit idle even if there are contexts from apps Y, Z, ... waiting to run. A certain

[Intel-gfx] [PATCH 0/1] DG2 fix for CCS starvation

2022-09-22 Thread John . C . Harrison
From: John Harrison Enable CCS/RCS arbitration scheduling in GuC to prevent CCS starvation on DG2. Signed-off-by: John Harrison John Harrison (1): drm/i915/guc: Enable compute scheduling on DG2 .../gpu/drm/i915/gt/uc/abi/guc_actions_abi.h | 1 + drivers/gpu/drm/i915/gt/uc/abi/guc_klvs_a

[Intel-gfx] [PATCH v4 0/4] Improve anti-pre-emption w/a for compute workloads

2022-09-28 Thread John . C . Harrison
From: John Harrison Compute workloads are inherently not pre-emptible on current hardware. Thus the pre-emption timeout was disabled as a workaround to prevent unwanted resets. Instead, the hang detection was left to the heartbeat and its (longer) timeout. This is undesirable with GuC submission

[Intel-gfx] [PATCH v4 1/4] drm/i915/guc: Limit scheduling properties to avoid overflow

2022-09-28 Thread John . C . Harrison
From: John Harrison GuC converts the pre-emption timeout and timeslice quantum values into clock ticks internally. That significantly reduces the point of 32bit overflow. On current platforms, worst case scenario is approximately 110 seconds. Rather than allowing the user to set higher values and

[Intel-gfx] [PATCH v4 2/4] drm/i915: Fix compute pre-emption w/a to apply to compute engines

2022-09-28 Thread John . C . Harrison
From: John Harrison An earlier patch added support for compute engines. However, it missed enabling the anti-pre-emption w/a for the new engine class. So move the 'compute capable' flag earlier and use it for the pre-emption w/a test. Fixes: c674c5b9342e ("drm/i915/xehp: CCS should use RCS setup

[Intel-gfx] [PATCH v4 3/4] drm/i915: Make the heartbeat play nice with long pre-emption timeouts

2022-09-28 Thread John . C . Harrison
From: John Harrison Compute workloads are inherently not pre-emptible for long periods on current hardware. As a workaround for this, the pre-emption timeout for compute capable engines was disabled. This is undesirable with GuC submission as it prevents per engine reset of hung contexts. Hence t

[Intel-gfx] [PATCH v4 4/4] drm/i915: Improve long running compute w/a for GuC submission

2022-09-28 Thread John . C . Harrison
From: John Harrison A workaround was added to the driver to allow compute workloads to run 'forever' by disabling pre-emption on the RCS engine for Gen12. It is not totally unbound as the heartbeat will kick in eventually and cause a reset of the hung engine. However, this does not work well in

[Intel-gfx] [PATCH v5 0/4] Improve anti-pre-emption w/a for compute workloads

2022-10-06 Thread John . C . Harrison
From: John Harrison Compute workloads are inherently not pre-emptible on current hardware. Thus the pre-emption timeout was disabled as a workaround to prevent unwanted resets. Instead, the hang detection was left to the heartbeat and its (longer) timeout. This is undesirable with GuC submission

[Intel-gfx] [PATCH v5 1/4] drm/i915/guc: Limit scheduling properties to avoid overflow

2022-10-06 Thread John . C . Harrison
From: John Harrison GuC converts the pre-emption timeout and timeslice quantum values into clock ticks internally. That significantly reduces the point of 32bit overflow. On current platforms, worst case scenario is approximately 110 seconds. Rather than allowing the user to set higher values and

[Intel-gfx] [PATCH v5 3/4] drm/i915: Make the heartbeat play nice with long pre-emption timeouts

2022-10-06 Thread John . C . Harrison
From: John Harrison Compute workloads are inherently not pre-emptible for long periods on current hardware. As a workaround for this, the pre-emption timeout for compute capable engines was disabled. This is undesirable with GuC submission as it prevents per engine reset of hung contexts. Hence t

[Intel-gfx] [PATCH v5 2/4] drm/i915: Fix compute pre-emption w/a to apply to compute engines

2022-10-06 Thread John . C . Harrison
From: John Harrison An earlier patch added support for compute engines. However, it missed enabling the anti-pre-emption w/a for the new engine class. So move the 'compute capable' flag earlier and use it for the pre-emption w/a test. Fixes: c674c5b9342e ("drm/i915/xehp: CCS should use RCS setup

[Intel-gfx] [PATCH v5 4/4] drm/i915: Improve long running compute w/a for GuC submission

2022-10-06 Thread John . C . Harrison
From: John Harrison A workaround was added to the driver to allow compute workloads to run 'forever' by disabling pre-emption on the RCS engine for Gen12. It is not totally unbound as the heartbeat will kick in eventually and cause a reset of the hung engine. However, this does not work well in

[Intel-gfx] [PATCH v2 1/3] drm/i915/uc: Rationalise delimiters in filename macros

2022-11-23 Thread John . C . Harrison
From: John Harrison The way delimiters (underscores and dots) were added to the UC filenames was different for different types of delimiter. Rationalise them to all be done the same way - implicitly in the concatenation macro rather than explicitly in the file name prefix. Signed-off-by: John Ha

[Intel-gfx] [PATCH v2 3/3] drm/i915/guc: Use GuC submission API version number

2022-11-23 Thread John . C . Harrison
From: John Harrison The GuC firmware includes an extra version number to specify the submission API level. So use that rather than the main firmware version number for submission related checks. Also, while it is guaranteed that GuC version number components are only 8-bits in size, other firmwa

[Intel-gfx] [PATCH v2 0/3] More GuC firmware version improvements

2022-11-23 Thread John . C . Harrison
From: John Harrison Start using the 'submission API version' for deciding which GuC API to use in the submission code. Correct version number manipulation code to support full 32bit major/minor/patch components, except for GuC which is guaranteed to be 8bit safe. Other minor code clean ups arou

[Intel-gfx] [PATCH v2 2/3] drm/i915/uc: More refactoring of UC version numbers

2022-11-23 Thread John . C . Harrison
From: John Harrison As a precursor to a coming change (for adding a GuC submission API version), abstract the UC version number into its own private structure separate to the firmware filename. Signed-off-by: John Harrison Reviewed-by: Daniele Ceraolo Spurio --- drivers/gpu/drm/i915/gt/uc/int

[Intel-gfx] [PATCH 0/2] Allow error capture without a request / on reset failure

2022-11-29 Thread John . C . Harrison
From: John Harrison It is technically possible to get a hung context without a valid request. In such a situation, try to provide as much information in the error capture as possible rather than just aborting and capturing nothing. Similarly, in the case of a engine reset failure the GuC is not

[Intel-gfx] [PATCH 1/2] drm/i915: Allow error capture without a request

2022-11-29 Thread John . C . Harrison
From: John Harrison There was a report of error captures occurring without any hung context being indicated despite the capture being initiated by a 'hung context notification' from GuC. The problem was not reproducible. However, it is possible to happen if the context in question has no active r

[Intel-gfx] [PATCH 2/2] drm/i915/guc: Look for a guilty context when an engine reset fails

2022-11-29 Thread John . C . Harrison
From: John Harrison Engine resets are supposed to never happen. But in the case when one does (due to unknwon reasons that normally come down to a missing w/a), it is useful to get as much information out of the system as possible. Given that the GuC effectively dies on such a situation, it is no

[Intel-gfx] [PATCH v3 1/3] drm/i915/uc: Rationalise delimiters in filename macros

2022-11-29 Thread John . C . Harrison
From: John Harrison The way delimiters (underscores and dots) were added to the UC filenames was different for different types of delimiter. Rationalise them to all be done the same way - implicitly in the concatenation macro rather than explicitly in the file name prefix. Signed-off-by: John Ha

[Intel-gfx] [PATCH v3 0/3] More GuC firmware version improvements

2022-11-29 Thread John . C . Harrison
From: John Harrison Start using the 'submission API version' for deciding which GuC API to use in the submission code. Correct version number manipulation code to support full 32bit major/minor/patch components, except for GuC which is guaranteed to be 8bit safe. Other minor code clean ups arou

[Intel-gfx] [PATCH v3 2/3] drm/i915/uc: More refactoring of UC version numbers

2022-11-29 Thread John . C . Harrison
From: John Harrison As a precursor to a coming change (for adding a GuC submission API version), abstract the UC version number into its own private structure separate to the firmware filename. Signed-off-by: John Harrison Reviewed-by: Daniele Ceraolo Spurio --- drivers/gpu/drm/i915/gt/uc/int

[Intel-gfx] [PATCH v3 3/3] drm/i915/guc: Use GuC submission API version number

2022-11-29 Thread John . C . Harrison
From: John Harrison The GuC firmware includes an extra version number to specify the submission API level. So use that rather than the main firmware version number for submission related checks. Also, while it is guaranteed that GuC version number components are only 8-bits in size, other firmwa

[Intel-gfx] [PATCH 0/3] Fixes for various UC related issues

2022-12-19 Thread John . C . Harrison
From: John Harrison Fix a bunch of assorted issues with firmware loading and GuC intialisation. Signed-off-by: John Harrison John Harrison (3): drm/i915/guc: Fix missing return code checks in submission init drm/i915/guc: Fix a static analysis warning drm/i915/uc: Fix two issues with ov

[Intel-gfx] [PATCH 3/3] drm/i915/uc: Fix two issues with over-size firmware files

2022-12-19 Thread John . C . Harrison
From: John Harrison In the case where a firmware file is too large (e.g. someone downloaded a web page ASCII dump from github...), the firmware object is released but the pointer is not zerod. If no other firmware file was found then release would be called again leading to a double kfree. Also,

[Intel-gfx] [PATCH 1/3] drm/i915/guc: Fix missing return code checks in submission init

2022-12-19 Thread John . C . Harrison
From: John Harrison The CI results for the 'fast request' patch set (enables error return codes for fire-and-forget H2G messages) hit an issue with the KMD sending context submission requests on an invalid context. That was caused by a fault injection probe failing the context creation of a kerne

[Intel-gfx] [PATCH 2/3] drm/i915/guc: Fix a static analysis warning

2022-12-19 Thread John . C . Harrison
From: John Harrison A static analyser was complaining about not checking for null pointers. However, the location of the complaint can only be reached in the first place if said pointer is non-null. Basically, if we are using a v69 GuC then the descriptor pool is guaranteed to be alocated at star

[Intel-gfx] [PATCH 0/3] Fixes for various UC related issues

2022-12-21 Thread John . C . Harrison
From: John Harrison Fix a bunch of assorted issues with firmware loading and GuC intialisation. Signed-off-by: John Harrison John Harrison (3): drm/i915/guc: Fix missing return code checks in submission init drm/i915/guc: Fix a static analysis warning drm/i915/uc: Fix two issues with ov

[Intel-gfx] [PATCH 1/3] drm/i915/guc: Fix missing return code checks in submission init

2022-12-21 Thread John . C . Harrison
From: John Harrison The CI results for the 'fast request' patch set (enables error return codes for fire-and-forget H2G messages) hit an issue with the KMD sending context submission requests on an invalid context. That was caused by a fault injection probe failing the context creation of a kerne

[Intel-gfx] [PATCH 3/3] drm/i915/uc: Fix two issues with over-size firmware files

2022-12-21 Thread John . C . Harrison
From: John Harrison In the case where a firmware file is too large (e.g. someone downloaded a web page ASCII dump from github...), the firmware object is released but the pointer is not zerod. If no other firmware file was found then release would be called again leading to a double kfree. Also,

[Intel-gfx] [PATCH 2/3] drm/i915/guc: Fix a static analysis warning

2022-12-21 Thread John . C . Harrison
From: John Harrison A static analyser was complaining about not checking for null pointers. However, the location of the complaint can only be reached in the first place if said pointer is non-null. Basically, if we are using a v69 GuC then the descriptor pool is guaranteed to be alocated at star

[Intel-gfx] [PATCH 0/2] Fix for two GuC issues

2022-10-28 Thread John . C . Harrison
From: John Harrison Fix for a deadlock issue between the GuC busyness stats worker and GT resets. Also fix kernel contexts not getting the correct scheduling priority at start of day. Signed-off-by: John Harrison John Harrison (2): drm/i915/guc: Properly initialise kernel contexts drm/i91

[Intel-gfx] [PATCH 2/2] drm/i915/guc: Don't deadlock busyness stats vs reset

2022-10-28 Thread John . C . Harrison
From: John Harrison The engine busyness stats has a worker function to do things like 64bit extend the 32bit hardware counters. The GuC's reset prepare function flushes out this worker function to ensure no corruption happens during the reset. Unforunately, the worker function has an infinite wai

[Intel-gfx] [PATCH 1/2] drm/i915/guc: Properly initialise kernel contexts

2022-10-28 Thread John . C . Harrison
From: John Harrison If a context has already been registered prior to first submission then context init code was not being called. The noticeable effect of that was the scheduling priority was left at zero (meaning super high priority) instead of being set to normal. This would occur with kernel

[Intel-gfx] [PATCH] drm/i915/guc: Remove excessive line feeds in state dumps

2022-10-31 Thread John . C . Harrison
From: John Harrison Some of the GuC state dump messages were adding extra line feeds. When printing via a DRM printer to dmesg, for example, that messes up the log formatting as it loses any prefixing from the printer. Given that the extra line feeds are just in the middle of random bits of GuC s

[Intel-gfx] [PATCH i-g-t] tests/sysfs: Update timeslice/preemption for new range limits

2022-10-31 Thread John . C . Harrison
From: John Harrison Guc submission imposes new range limits on certain scheduling parameters. The idempotent sections of the timeslice duration and pre-emption timeout tests was exceeding those limits and so would fail. Reduce the excessively large value (654s) to one which does not overflow (54

[Intel-gfx] [PATCH v3 2/3] drm/i915/guc: Clean up of register capture search

2023-03-10 Thread John . C . Harrison
From: John Harrison The comparison in the search for a matching register capture node was not the most readable. It was also assuming that a zero GuC id means invalid, which it does not. So remove one invalid term, one redundant term and re-format to keep each term on a single line, and only one

[Intel-gfx] [PATCH v3 3/3] drm/i915: Include timeline seqno in error capture

2023-03-10 Thread John . C . Harrison
From: John Harrison The seqno value actually written out to memory is no longer in the regular HWSP. Instead, it is now in its own private timeline buffer. Thus, it is no longer visible in an error capture. So, explicitly read the value and include that in the capture. v2: %d -> %u (Alan) Signe

[Intel-gfx] [PATCH v3 1/3] drm/i915/guc: Fix missing ecodes

2023-03-10 Thread John . C . Harrison
From: John Harrison Error captures are tagged with an 'ecode'. This is a pseduo-unique magic number that is meant to distinguish similar seeming bugs with different underlying signatures. It is a combination of two ring state registers. Unfortunately, the register state being used is only valid i

[Intel-gfx] [PATCH v3 0/3] More error capture improvements

2023-03-10 Thread John . C . Harrison
From: John Harrison Ecodes got lost with the switch to GuC based register lists. Put them back. Seqno values got lost with the switch to per context timelines. Put those back too. v2: Rework the timeline patch to just read the single seqno value rather than copying the entire object (Daniele) v

[Intel-gfx] [PATCH 4.14.y] drm/i915: Don't use BAR mappings for ring buffers with LLC

2023-03-13 Thread John . C . Harrison
From: John Harrison Direction from hardware is that ring buffers should never be mapped via the BAR on systems with LLC. There are too many caching pitfalls due to the way BAR accesses are routed. So it is safest to just not use it. Signed-off-by: John Harrison Fixes: 9d80841ea4c9 ("drm/i915: A

[Intel-gfx] [PATCH 4.19.y] drm/i915: Don't use BAR mappings for ring buffers with LLC

2023-03-13 Thread John . C . Harrison
From: John Harrison Direction from hardware is that ring buffers should never be mapped via the BAR on systems with LLC. There are too many caching pitfalls due to the way BAR accesses are routed. So it is safest to just not use it. Signed-off-by: John Harrison Fixes: 9d80841ea4c9 ("drm/i915: A

[Intel-gfx] [PATCH 5.4.y] drm/i915: Don't use BAR mappings for ring buffers with LLC

2023-03-13 Thread John . C . Harrison
From: John Harrison Direction from hardware is that ring buffers should never be mapped via the BAR on systems with LLC. There are too many caching pitfalls due to the way BAR accesses are routed. So it is safest to just not use it. Signed-off-by: John Harrison Fixes: 9d80841ea4c9 ("drm/i915: A

[Intel-gfx] [PATCH v2 2/2] drm/i915/guc: Allow for very slow GuC loading

2023-03-16 Thread John . C . Harrison
From: John Harrison A failure to load the GuC is occasionally observed where the GuC log actually showed that the GuC had loaded just fine. The implication being that the load just took ever so slightly longer than the 200ms timeout. Given that the actual time should be tens of milliseconds at th

[Intel-gfx] [PATCH v2 0/2] Improvements to GuC load failure handling

2023-03-16 Thread John . C . Harrison
From: John Harrison Add more decoding of the GuC load failures. Also include information about GT frequency to see if timeouts are due to a failure to boost the clocks. Finally, increase the timeout to accommodate situations where the clock boost does fail. v2: Reduce timeout in release builds,

[Intel-gfx] [PATCH v2 1/2] drm/i915/guc: Improve GuC load error reporting

2023-03-16 Thread John . C . Harrison
From: John Harrison There are multiple ways in which the GuC load can fail. The driver was reporting the status register as is, but not everyone can read the matrix unfiltered. So add decoding of the common error cases. Also, remove the comment about interrupt based load completion checking bein

[Intel-gfx] [PATCH 4.14.y] drm/i915: Don't use stolen memory for ring buffers with LLC

2023-03-17 Thread John . C . Harrison
From: John Harrison Direction from hardware is that stolen memory should never be used for ring buffer allocations on platforms with LLC. There are too many caching pitfalls due to the way stolen memory accesses are routed. So it is safest to just not use it. Signed-off-by: John Harrison Fixes:

[Intel-gfx] [PATCH 4.19.y] drm/i915: Don't use stolen memory for ring buffers with LLC

2023-03-17 Thread John . C . Harrison
From: John Harrison Direction from hardware is that stolen memory should never be used for ring buffer allocations on platforms with LLC. There are too many caching pitfalls due to the way stolen memory accesses are routed. So it is safest to just not use it. Signed-off-by: John Harrison Fixes:

[Intel-gfx] [PATCH 5.4.y] drm/i915: Don't use stolen memory for ring buffers with LLC

2023-03-17 Thread John . C . Harrison
From: John Harrison Direction from hardware is that stolen memory should never be used for ring buffer allocations on platforms with LLC. There are too many caching pitfalls due to the way stolen memory accesses are routed. So it is safest to just not use it. Signed-off-by: John Harrison Fixes:

[Intel-gfx] [PATCH 5.10.y] drm/i915: Don't use stolen memory for ring buffers with LLC

2023-03-17 Thread John . C . Harrison
From: John Harrison Direction from hardware is that stolen memory should never be used for ring buffer allocations on platforms with LLC. There are too many caching pitfalls due to the way stolen memory accesses are routed. So it is safest to just not use it. Signed-off-by: John Harrison Fixes:

[Intel-gfx] [PATCH 5.15.y] drm/i915: Don't use stolen memory for ring buffers with LLC

2023-03-17 Thread John . C . Harrison
From: John Harrison Direction from hardware is that stolen memory should never be used for ring buffer allocations on platforms with LLC. There are too many caching pitfalls due to the way stolen memory accesses are routed. So it is safest to just not use it. Signed-off-by: John Harrison Fixes:

[Intel-gfx] [CI] PR for new GuC v70.6.4 for MTL

2023-03-28 Thread John . C . Harrison
The following changes since commit bcdcfbcf0a8f24a914b8c163906e6ce93d7f8897: linux-firmware: Update firmware file for Intel Bluetooth AX101 (2023-03-20 08:34:27 -0400) are available in the Git repository at: git://anongit.freedesktop.org/drm/drm-firmware mtl_guc_70.6.4 for you to fetch cha

[Intel-gfx] [CI] PR for new GuC v70.6.5 for MTL

2023-03-31 Thread John . C . Harrison
The following changes since commit bcdcfbcf0a8f24a914b8c163906e6ce93d7f8897: linux-firmware: Update firmware file for Intel Bluetooth AX101 (2023-03-20 08:34:27 -0400) are available in the Git repository at: git://anongit.freedesktop.org/drm/drm-firmware mtl_guc_70.6.5 for you to fetch cha

[Intel-gfx] [CI] drm/i915/mtl: Define GuC firmware version for MTL

2023-03-31 Thread John . C . Harrison
From: John Harrison First release of GuC for Meteorlake. NB: As this is still pre-release and likely to change, use explicit versioning for now. The official, full release will use reduced version naming. Signed-off-by: John Harrison --- drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c | 1 + 1 file

[Intel-gfx] [CI] drm/i915/mtl: Define GuC firmware version for MTL

2023-03-31 Thread John . C . Harrison
From: John Harrison First release of GuC for Meteorlake. NB: As this is still pre-release and likely to change, use explicit versioning for now. The official, full release will use reduced version naming. Signed-off-by: John Harrison --- drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c | 1 + 1 file

[Intel-gfx] [PATCH] drm/i915/guc: Don't capture Gen8 regs on Gen12 devices

2023-04-03 Thread John . C . Harrison
From: John Harrison A pair of pre-Gen12 registers were being included in the Gen12 capture list. GuC was rejecting those as being invalid and logging errors about them. So, stop doing it. Signed-off-by: John Harrison Fixes: dce2bd542337 ("drm/i915/guc: Add Gen9 registers for GuC error state ca

[Intel-gfx] [PATCH] drm/i915: Don't wait forever in drop_caches

2022-11-01 Thread John . C . Harrison
From: John Harrison At the end of each test, IGT does a drop caches call via sysfs with special flags set. One of the possible paths waits for idle with an infinite timeout. That causes problems for debugging issues when CI catches a "can't go idle" test failure. Best case, the CI system times ou

[Intel-gfx] [PATCH v2 0/2] Fix for two GuC issues

2022-11-02 Thread John . C . Harrison
From: John Harrison Fix for a deadlock issue between the GuC busyness stats worker and GT resets. Also fix kernel contexts not getting the correct scheduling priority at start of day. v2: Rename existing uses of _trylock rather than adding a _noretry version. Also improve the comment a bit. Sig

[Intel-gfx] [PATCH v2 2/2] drm/i915/guc: Don't deadlock busyness stats vs reset

2022-11-02 Thread John . C . Harrison
From: John Harrison The engine busyness stats has a worker function to do things like 64bit extend the 32bit hardware counters. The GuC's reset prepare function flushes out this worker function to ensure no corruption happens during the reset. Unforunately, the worker function has an infinite wai

[Intel-gfx] [PATCH v2 1/2] drm/i915/guc: Properly initialise kernel contexts

2022-11-02 Thread John . C . Harrison
From: John Harrison If a context has already been registered prior to first submission then context init code was not being called. The noticeable effect of that was the scheduling priority was left at zero (meaning super high priority) instead of being set to normal. This would occur with kernel

[Intel-gfx] [PATCH] drm/i915: Don't wait forever in drop_caches

2022-11-02 Thread John . C . Harrison
From: John Harrison At the end of each test, IGT does a drop caches call via debugfs with special flags set. One of the possible paths waits for idle with an infinite timeout. That causes problems for debugging issues when CI catches a "can't go idle" test failure. Best case, the CI system times

[Intel-gfx] [PATCH 1/2] drm/i915/gt: Add GT oriented dmesg output

2022-11-04 Thread John . C . Harrison
From: John Harrison When trying to analyse bug reports from CI, customers, etc. it can be difficult to work out exactly what is happening on which GT in a multi-GT system. So add GT oriented debug/error message wrappers. If used instead of the drm_ equivalents, you get the same output but with a

[Intel-gfx] [PATCH 2/2] drm/i915/uc: Update the gt/uc code to use GT_ERR and friends

2022-11-04 Thread John . C . Harrison
From: John Harrison Use the new GT oriented output message helpers where possible. Signed-off-by: John Harrison --- drivers/gpu/drm/i915/gt/uc/intel_guc.c| 25 +++-- drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c| 9 +- .../gpu/drm/i915/gt/uc/intel_guc_capture.c| 50 -- dr

[Intel-gfx] [PATCH 0/2] Add GT oriented dmesg output

2022-11-04 Thread John . C . Harrison
From: John Harrison When trying to analyse bug reports from CI, customers, etc. it can be difficult to work out exactly what is happening on which GT in a multi-GT system. So add GT oriented debug/error message wrappers. If used instead of the drm_ equivalents, you get the same output but with a

[Intel-gfx] [PATCH v2 0/5] Add module oriented dmesg output

2022-11-17 Thread John . C . Harrison
From: John Harrison When trying to analyse bug reports from CI, customers, etc. it can be difficult to work out exactly what is happening on which GT in a multi-GT system. So add GT oriented debug/error message wrappers. If used instead of the drm_ equivalents, you get the same output but with a

[Intel-gfx] [PATCH v2 1/5] drm/i915/gt: Start adding module oriented dmesg output

2022-11-17 Thread John . C . Harrison
From: John Harrison When trying to analyse bug reports from CI, customers, etc. it can be difficult to work out exactly what is happening on which GT in a multi-GT system. So add GT oriented debug/error message wrappers. If used instead of the drm_ equivalents, you get the same output but with a

[Intel-gfx] [PATCH v2 5/5] drm/i915/uc: Update the gt/uc code to use gt_err and friends

2022-11-17 Thread John . C . Harrison
From: John Harrison Use the new module oriented output message helpers where possible. Signed-off-by: John Harrison --- drivers/gpu/drm/i915/gt/uc/intel_uc.c| 108 +++ drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c | 98 ++-- 2 files changed, 99 insertions(+)

[Intel-gfx] [PATCH v2 4/5] drm/i915/guc: Add GuC CT specific debug print wrappers

2022-11-17 Thread John . C . Harrison
From: John Harrison Re-work the existing GuC CT printers and extend as required to match the new wrapping scheme. Signed-off-by: John Harrison --- drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 222 +++--- 1 file changed, 113 insertions(+), 109 deletions(-) diff --git a/drivers/g

[Intel-gfx] [PATCH v2 2/5] drm/i915/huc: Add HuC specific debug print wrappers

2022-11-17 Thread John . C . Harrison
From: John Harrison Create a set of HuC printers and start using them. Signed-off-by: John Harrison --- drivers/gpu/drm/i915/gt/uc/intel_huc.c | 31 ++ drivers/gpu/drm/i915/gt/uc/intel_huc.h | 23 +++ 2 files changed, 35 insertions(+), 19 deletions(-) d

[Intel-gfx] [PATCH v2 3/5] drm/i915/guc: Add GuC specific debug print wrappers

2022-11-17 Thread John . C . Harrison
From: John Harrison Create a set of GuC printers and start using them. Signed-off-by: John Harrison --- drivers/gpu/drm/i915/gt/uc/intel_guc.c| 32 -- drivers/gpu/drm/i915/gt/uc/intel_guc.h| 35 +++ drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c| 8 +-- .../gpu

[Intel-gfx] [PATCH 0/3] More GuC firmware version improvements

2022-11-22 Thread John . C . Harrison
From: John Harrison Start using the 'submission API version' for deciding which GuC API to use in the submission code. Correct version number manipulation code to support full 32bit major/minor/patch components, except for GuC which is guaranteed to be 8bit safe. Other minor code clean ups arou

[Intel-gfx] [PATCH 3/3] drm/i915/guc: Use GuC submission API version number

2022-11-22 Thread John . C . Harrison
From: John Harrison The GuC firmware includes an extra version number to specify the submission API level. So use that rather than the main firmware version number for submission related checks. Also, while it is guaranteed that GuC version number components are only 8-bits in size, other firmwa

[Intel-gfx] [PATCH 1/3] drm/i915/uc: Rationalise delimiters in filename macros

2022-11-22 Thread John . C . Harrison
From: John Harrison The way delimieters (underscores and dots) were added to the UC filenames was different for different types of delimter. Rationalise them to all be done the same way - implicitly in the concatenation macro rather than explicitly in the file name prefix. Signed-off-by: John Ha

[Intel-gfx] [PATCH 2/3] drm/i915/uc: More refactoring of UC version numbers

2022-11-22 Thread John . C . Harrison
From: John Harrison As a precursor to a coming change (for adding a GuC submission API version), abstract the UC version number into its own private structure separate to the firmware filename. Signed-off-by: John Harrison --- drivers/gpu/drm/i915/gt/uc/intel_uc.c| 6 +- drivers/gpu/drm/i

[Intel-gfx] [PATCH] drm/i915: Allow error capture without a request

2022-11-22 Thread John . C . Harrison
From: John Harrison There was a report of error captures occurring without any hung context being indicated despite the capture being initiated by a 'hung context notification' from GuC. The problem was not reproducible. However, it is possible to happen if the context in question has no active r

[Intel-gfx] [PATCH] drm/i915/uc: Fix table order verification to check all FW types

2022-11-22 Thread John . C . Harrison
From: John Harrison It was noticed that the table order verification step was only being run once rather than once per firmware type. Fix that. Note that the long term plan is to convert this code to be a mock selftest. It is already only compiled in when selftests are enabled. And the work invo

[Intel-gfx] [PATCH v3 0/5] Add module oriented dmesg output

2022-11-23 Thread John . C . Harrison
From: John Harrison When trying to analyse bug reports from CI, customers, etc. it can be difficult to work out exactly what is happening on which GT in a multi-GT system. So add GT oriented debug/error message wrappers. If used instead of the drm_ equivalents, you get the same output but with a

[Intel-gfx] [PATCH v3 1/5] drm/i915/gt: Start adding module oriented dmesg output

2022-11-23 Thread John . C . Harrison
From: John Harrison When trying to analyse bug reports from CI, customers, etc. it can be difficult to work out exactly what is happening on which GT in a multi-GT system. So add GT oriented debug/error message wrappers. If used instead of the drm_ equivalents, you get the same output but with a

[Intel-gfx] [PATCH v3 3/5] drm/i915/guc: Add GuC specific debug print wrappers

2022-11-23 Thread John . C . Harrison
From: John Harrison Create a set of GuC printers and start using them. v2: Tweaks to output messages. (review feedback from Michal W). Split definitions to separate header (review feedback from Jani). Signed-off-by: John Harrison --- drivers/gpu/drm/i915/gt/uc/intel_guc.c| 33 +---

[Intel-gfx] [PATCH v3 5/5] drm/i915/uc: Update the gt/uc code to use gt_err and friends

2022-11-23 Thread John . C . Harrison
From: John Harrison Use the new module oriented output message helpers where possible. Signed-off-by: John Harrison --- drivers/gpu/drm/i915/gt/uc/intel_uc.c| 110 +++ drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c | 99 ++-- 2 files changed, 102 insertions(+

[Intel-gfx] [PATCH v3 4/5] drm/i915/guc: Add GuC CT specific debug print wrappers

2022-11-23 Thread John . C . Harrison
From: John Harrison Re-work the existing GuC CT printers and extend as required to match the new wrapping scheme. v2: Improve probe_error definition (review feedback from MichalW). Signed-off-by: John Harrison --- drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 218 +++--- 1 file

[Intel-gfx] [PATCH v3 2/5] drm/i915/huc: Add HuC specific debug print wrappers

2022-11-23 Thread John . C . Harrison
From: John Harrison Create a set of HuC printers and start using them. v2: Minor tweaks (review feedback from MichalW). Split definitions into separate header (review feedback from Jani). Signed-off-by: John Harrison --- drivers/gpu/drm/i915/gt/uc/intel_huc.c | 32

[Intel-gfx] [PATCH v3 0/1] Add module oriented dmesg output

2023-01-09 Thread John . C . Harrison
From: John Harrison When trying to analyse bug reports from CI, customers, etc. it can be difficult to work out exactly what is happening on which GT in a multi-GT system. So add GT oriented debug/error message wrappers. If used instead of the drm_ equivalents, you get the same output but with a

[Intel-gfx] [PATCH v3 1/1] drm/i915/gt: Start adding module oriented dmesg output

2023-01-09 Thread John . C . Harrison
From: John Harrison When trying to analyse bug reports from CI, customers, etc. it can be difficult to work out exactly what is happening on which GT in a multi-GT system. So add GT oriented debug/error message wrappers. If used instead of the drm_ equivalents, you get the same output but with a

[Intel-gfx] [PATCH v4 0/1] Add module oriented dmesg output

2023-01-11 Thread John . C . Harrison
From: John Harrison When trying to analyse bug reports from CI, customers, etc. it can be difficult to work out exactly what is happening on which GT in a multi-GT system. So add GT oriented debug/error message wrappers. If used instead of the drm_ equivalents, you get the same output but with a

[Intel-gfx] [PATCH v4 1/1] drm/i915/gt: Start adding module oriented dmesg output

2023-01-11 Thread John . C . Harrison
From: John Harrison When trying to analyse bug reports from CI, customers, etc. it can be difficult to work out exactly what is happening on which GT in a multi-GT system. So add GT oriented debug/error message wrappers. If used instead of the drm_ equivalents, you get the same output but with a

[Intel-gfx] [PATCH 1/2] drm/i915/guc: Improve clean up of busyness stats worker

2023-01-11 Thread John . C . Harrison
From: John Harrison The stats worker thread management was mis-matched between enable/disable call sites. Fix those up. Also, abstract the cancel code into a helper function rather than replicating in multiple places. Signed-off-by: John Harrison --- .../gpu/drm/i915/gt/uc/intel_guc_submission

[Intel-gfx] [PATCH 0/2] Clean up some GuC related failure paths

2023-01-11 Thread John . C . Harrison
From: John Harrison Improve failure code handling during GuC intialisation. Signed-off-by: John Harrison John Harrison (2): drm/i915/guc: Improve clean up of busyness stats worker drm/i915/guc: Fix missing return code checks in submission init .../gpu/drm/i915/gt/uc/intel_guc_submission

[Intel-gfx] [PATCH 2/2] drm/i915/guc: Fix missing return code checks in submission init

2023-01-11 Thread John . C . Harrison
From: John Harrison The CI results for the 'fast request' patch set (enables error return codes for fire-and-forget H2G messages) hit an issue with the KMD sending context submission requests on an invalid context. That was caused by a fault injection probe failing the context creation of a kerne

[Intel-gfx] [PATCH 0/4] Allow error capture without a request / on reset failure

2023-01-11 Thread John . C . Harrison
From: John Harrison It is technically possible to get a hung context without a valid request. In such a situation, try to provide as much information in the error capture as possible rather than just aborting and capturing nothing. Similarly, in the case of an engine reset failure the GuC is not

[Intel-gfx] [PATCH 2/4] drm/i915: Allow error capture of a pending request

2023-01-11 Thread John . C . Harrison
From: John Harrison A hang situation has been observed where the only requests on the context were either completed or not yet started according to the breaadcrumbs. However, the register state claimed a batch was (maybe) in progress. So, allow capture of the pending request on the grounds that t

[Intel-gfx] [PATCH 1/4] drm/i915: Allow error capture without a request

2023-01-11 Thread John . C . Harrison
From: John Harrison There was a report of error captures occurring without any hung context being indicated despite the capture being initiated by a 'hung context notification' from GuC. The problem was not reproducible. However, it is possible to happen if the context in question has no active r

[Intel-gfx] [PATCH 3/4] drm/i915/guc: Look for a guilty context when an engine reset fails

2023-01-11 Thread John . C . Harrison
From: John Harrison Engine resets are supposed to never fail. But in the case when one does (due to unknown reasons that normally come down to a missing w/a), it is useful to get as much information out of the system as possible. Given that the GuC effectively dies on such a situation, it is not

[Intel-gfx] [PATCH 4/4] drm/i915/guc: Add a debug print on GuC triggered reset

2023-01-11 Thread John . C . Harrison
From: John Harrison For understanding bug reports, it can be useful to have an explicit dmesg print when a reset notification is received from GuC. As opposed to simply inferring that this happened from other messages. Signed-off-by: John Harrison --- drivers/gpu/drm/i915/gt/uc/intel_guc_submi

[Intel-gfx] [PATCH v2 0/5] Allow error capture without a request / on reset failure

2023-01-17 Thread John . C . Harrison
From: John Harrison It is technically possible to get a hung context without a valid request. In such a situation, try to provide as much information in the error capture as possible rather than just aborting and capturing nothing. Similarly, in the case of an engine reset failure the GuC is not

[Intel-gfx] [PATCH v2 1/5] drm/i915: Fix request locking during error capture & debugfs dump

2023-01-17 Thread John . C . Harrison
From: John Harrison When GuC support was added to error capture, the locking around the request object was broken. Fix it up. The context based search manages the spinlocking around the search internally. So it needs to grab the reference count internally as well. The execlist only request based

[Intel-gfx] [PATCH v2 2/5] drm/i915: Allow error capture without a request

2023-01-17 Thread John . C . Harrison
From: John Harrison There was a report of error captures occurring without any hung context being indicated despite the capture being initiated by a 'hung context notification' from GuC. The problem was not reproducible. However, it is possible to happen if the context in question has no active r

[Intel-gfx] [PATCH v2 4/5] drm/i915/guc: Look for a guilty context when an engine reset fails

2023-01-17 Thread John . C . Harrison
From: John Harrison Engine resets are supposed to never fail. But in the case when one does (due to unknown reasons that normally come down to a missing w/a), it is useful to get as much information out of the system as possible. Given that the GuC effectively dies on such a situation, it is not

[Intel-gfx] [PATCH v2 5/5] drm/i915/guc: Add a debug print on GuC triggered reset

2023-01-17 Thread John . C . Harrison
From: John Harrison For understanding bug reports, it can be useful to have an explicit dmesg print when a reset notification is received from GuC. As opposed to simply inferring that this happened from other messages. Signed-off-by: John Harrison Reviewed-by: Tvrtko Ursulin --- drivers/gpu/d

[Intel-gfx] [PATCH v2 3/5] drm/i915: Allow error capture of a pending request

2023-01-17 Thread John . C . Harrison
From: John Harrison A hang situation has been observed where the only requests on the context were either completed or not yet started according to the breaadcrumbs. However, the register state claimed a batch was (maybe) in progress. So, allow capture of the pending request on the grounds that t

<    5   6   7   8   9   10   11   12   13   14   >