[Intel-gfx] [PATCH 3/4] drm/i915/guc: Flag an error if an engine reset fails

2021-12-10 Thread John . C . Harrison
From: John Harrison If GuC encounters an error during engine reset, the i915 driver promotes to full GT reset. This includes an info message about why the reset is happening. However, that is not treated as a failure by any of the CI systems because resets are an expected occurrance during testin

[Intel-gfx] [PATCH 1/4] drm/i915/guc: Speed up GuC log dumps

2021-12-10 Thread John . C . Harrison
From: John Harrison Add support for telling the debugfs interface the size of the GuC log dump in advance. Without that, the underlying framework keeps calling the 'show' function with larger and larger buffer allocations until it fits. That means reading the log from graphics memory many times -

[Intel-gfx] [PATCH 4/4] drm/i915: Improve long running OCL w/a for GuC submission

2021-12-10 Thread John . C . Harrison
From: John Harrison A workaround was added to the driver to allow OpenCL workloads to run 'forever' by disabling pre-emption on the RCS engine for Gen12. It is not totally unbound as the heartbeat will kick in eventually and cause a reset of the hung engine. However, this does not work well in G

[Intel-gfx] [PATCH 2/4] drm/i915/guc: Increase GuC log size for CONFIG_DEBUG_GEM

2021-12-10 Thread John . C . Harrison
From: John Harrison Lots of testing is done with the DEBUG_GEM config option enabled but not the DEBUG_GUC option. That means we only get teeny-tiny GuC logs which are not hugely useful. Enabling full DEBUG_GUC also spews lots of other detailed output that is not generally desired. However, bigge

[Intel-gfx] [PATCH i-g-t 00/11] Fixes for i915_hangman and gem_exec_capture

2021-12-13 Thread John . C . Harrison
From: John Harrison Fix a bunch of issues with i915_hangman and gem_exec_capture with the ultimate aim of making them pass on GuC enabled platforms. Signed-off-by: John Harrison John Harrison (11): tests/i915/i915_hangman: Add descriptions lib/hang: Fix igt_require_hang_ring to work with

[Intel-gfx] [PATCH i-g-t 02/11] lib/hang: Fix igt_require_hang_ring to work with all engines

2021-12-13 Thread John . C . Harrison
From: John Harrison The above function was checking for valid rings via the old interface. The new scheme is to check for engines on contexts as there are now more engines than could be supported. Signed-off-by: John Harrison --- lib/igt_gt.c | 6 +++--- lib/igt_gt.h

[Intel-gfx] [PATCH i-g-t 04/11] tests/i915/i915_hangman: Explicitly test per engine reset vs full GPU reset

2021-12-13 Thread John . C . Harrison
From: John Harrison Although the hangman test was ensuring that *some* reset functionality was enabled, it did not differentiate what kind. The infrastructure required to choose between per engine reset or full GT reset was recently added. So update this test to use it as well. Signed-off-by: Jo

[Intel-gfx] [PATCH i-g-t 01/11] tests/i915/i915_hangman: Add descriptions

2021-12-13 Thread John . C . Harrison
From: John Harrison Added descriptions of the various sub-tests and the test as a whole. Signed-off-by: John Harrison --- tests/i915/i915_hangman.c | 11 +-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/tests/i915/i915_hangman.c b/tests/i915/i915_hangman.c index 4c18c2

[Intel-gfx] [PATCH i-g-t 08/11] lib/store: Refactor common store code into helper function

2021-12-13 Thread John . C . Harrison
From: John Harrison A lot of tests use almost identical code for creating a batch buffer which does a single write to memory. This patch collects two such instances into a common helper function. Unfortunately, the other instances are all subtly different enough to make it not so trivial to try t

[Intel-gfx] [PATCH i-g-t 03/11] tests/i915/i915_hangman: Update capture test to use engine structure

2021-12-13 Thread John . C . Harrison
From: John Harrison The capture test was still using old style ring_id and ring_name (derived from the engine structure at the higher level). Update it to just take the engine structure directly. Signed-off-by: John Harrison --- tests/i915/i915_hangman.c | 10 +- 1 file changed, 5 inse

[Intel-gfx] [PATCH i-g-t 11/11] tests/i915/gem_exec_fence: Configure correct context

2021-12-13 Thread John . C . Harrison
From: John Harrison The update to use intel_ctx_t missed a line that configures the context to allow hanging. Fix that. Fixes: 09c36188b23f83ef9a7b5414e2a10100adc4291f Signed-off-by: John Harrison --- tests/i915/gem_exec_fence.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --gi

[Intel-gfx] [PATCH i-g-t 07/11] tests/i915/i915_hangman: Add alive-ness test after error capture

2021-12-13 Thread John . C . Harrison
From: John Harrison Added a an extra step to the i915_hangman tests to check that the system is still alive after the hang and recovery. This submits a simple batch to each engine which does a write to memory and checks that the write occurred. Signed-off-by: John Harrison --- tests/i915/i915_

[Intel-gfx] [PATCH i-g-t 05/11] tests/i915/i915_hangman: Add uevent test & fix detector

2021-12-13 Thread John . C . Harrison
From: John Harrison Some of the IGT framework relies on receving a uevent when a hang occurs. So add a test that this actually works. While testing this, noticed that hangs could sometimes be missed because the uevent was (presumably) still in flight by the time the handler was de-registered. So

[Intel-gfx] [PATCH i-g-t 10/11] tests/i915/i915_hangman: Run background task on all engines

2021-12-13 Thread John . C . Harrison
From: John Harrison As opposed to only on the non-target engines. This means that there is some other workload present for the scheduler to switch between and so detet the hang immediately. Signed-off-by: John Harrison --- tests/i915/i915_hangman.c | 10 ++ 1 file changed, 6 insertions

[Intel-gfx] [PATCH i-g-t 06/11] tests/i915/i915_hangman: Use the correct context in hangcheck_unterminated

2021-12-13 Thread John . C . Harrison
From: John Harrison The hangman framework sets up a context that is valid for all engines and has things like banning disabled. The 'unterminated' test then ignores it and uses the default context. Fix that. Signed-off-by: John Harrison --- tests/i915/i915_hangman.c | 1 + 1 file changed, 1 in

[Intel-gfx] [PATCH i-g-t 09/11] tests/i915/i915_hangman: Remove reliance on context persistance

2021-12-13 Thread John . C . Harrison
From: John Harrison The hang test was relying on context persitence for no particular reason. That is, it would set a bunch of background spinners running then immediately destroy the active contexts but expect the spinners to keep spinning. With the current implementation of context persistence

[Intel-gfx] [CI] PR for new GuC v69.0.3

2021-12-15 Thread John . C . Harrison
The following changes since commit b0e898fbaf377c99a36aac6fdeb7250003648ca4: linux-firmware: Update firmware file for Intel Bluetooth 9462 (2021-11-23 12:31:45 -0500) are available in the Git repository at: ssh://git.freedesktop.org/git/drm/drm-firmware guc_v69.0.3 for you to fetch changes

[Intel-gfx] [PATCH] drm/i915/guc: Check for wedged before doing stuff

2021-12-15 Thread John . C . Harrison
From: John Harrison A fault injection probe test hit a BUG_ON in a GuC error path. It showed that the GuC code could potentially attempt to do many things when the device is actually wedged. So, add a check in to prevent that. Signed-off-by: John Harrison --- drivers/gpu/drm/i915/gt/uc/intel_g

[Intel-gfx] [CI] PR for new GuC v69.0.3

2021-12-20 Thread John . C . Harrison
The following changes since commit b0e898fbaf377c99a36aac6fdeb7250003648ca4: linux-firmware: Update firmware file for Intel Bluetooth 9462 (2021-11-23 12:31:45 -0500) are available in the Git repository at: git://anongit.freedesktop.org/drm/drm-firmware guc_v69.0.3 for you to fetch changes

[Intel-gfx] [CI] PR for new GuC v69.0.3

2021-12-20 Thread John . C . Harrison
The following changes since commit b0e898fbaf377c99a36aac6fdeb7250003648ca4: linux-firmware: Update firmware file for Intel Bluetooth 9462 (2021-11-23 12:31:45 -0500) are available in the Git repository at: git://anongit.freedesktop.org/drm/drm-firmware guc_v69.0.3 for you to fetch changes

[Intel-gfx] [PATCH 1/3] drm/i915/guc: Temporarily bump the GuC load timeout

2021-12-20 Thread John . C . Harrison
From: John Harrison There is a known (but exceedingly unlikely) race condition where the asynchronous frequency management code could reduce the GT clock while a GuC reload is in progress (during a full GT reset). A fix is in progress but there are complex locking issues to be resolved. In the me

[Intel-gfx] [PATCH 0/3] Update to GuC version 69.0.3

2021-12-20 Thread John . C . Harrison
From: John Harrison Update to the latest GuC version. This includes a suite of interface changes and new features with corresponding i915 side changes. Signed-off-by: John Harrison John Harrison (3): drm/i915/guc: Temporarily bump the GuC load timeout drm/i915/guc: Update to GuC version

[Intel-gfx] [PATCH 3/3] drm/i915/guc: Improve GuC loading status check/error reports

2021-12-20 Thread John . C . Harrison
From: John Harrison If the GuC fails to load, it is useful to know what firmware file / version was attempted. So move the version info report to before the load attempt rather than only after a successful load. If the GuC does fail to load, then make the error messages visible rather than being

[Intel-gfx] [PATCH 2/3] drm/i915/guc: Update to GuC version 69.0.3

2021-12-20 Thread John . C . Harrison
From: John Harrison Update to the latest GuC release. The latest GuC firmware introduces a number of interface changes: GuC may return NO_RESPONSE_RETRY message for requests sent over CTB. Add support for this reply and try resending the request again as a new CTB message. A KLV (key-length-va

[Intel-gfx] [PATCH 0/3] Update to GuC version 69.0.3

2021-12-21 Thread John . C . Harrison
From: John Harrison Update to the latest GuC version. This includes a suite of interface changes and new features with corresponding i915 side changes. Signed-off-by: John Harrison John Harrison (3): drm/i915/guc: Temporarily bump the GuC load timeout drm/i915/guc: Update to GuC version

[Intel-gfx] [PATCH 1/3] drm/i915/guc: Temporarily bump the GuC load timeout

2021-12-21 Thread John . C . Harrison
From: John Harrison There is a known (but exceedingly unlikely) race condition where the asynchronous frequency management code could reduce the GT clock while a GuC reload is in progress (during a full GT reset). A fix is in progress but there are complex locking issues to be resolved. In the me

[Intel-gfx] [PATCH 2/3] drm/i915/guc: Update to GuC version 69.0.3

2021-12-21 Thread John . C . Harrison
From: John Harrison Update to the latest GuC release. The latest GuC firmware introduces a number of interface changes: GuC may return NO_RESPONSE_RETRY message for requests sent over CTB. Add support for this reply and try resending the request again as a new CTB message. A KLV (key-length-va

[Intel-gfx] [PATCH 3/3] drm/i915/guc: Improve GuC loading status check/error reports

2021-12-21 Thread John . C . Harrison
From: John Harrison If the GuC fails to load, it is useful to know what firmware file / version was attempted. So move the version info report to before the load attempt rather than only after a successful load. If the GuC does fail to load, then make the error messages visible rather than being

[Intel-gfx] [PATCH v2] drm/i915/guc: Check for wedged before doing stuff

2021-12-21 Thread John . C . Harrison
From: John Harrison A fault injection probe test hit a BUG_ON in a GuC error path. It showed that the GuC code could potentially attempt to do many things when the device is actually wedged. So, add a check in to prevent that. v2: Use intel_gt_is_wedged instead of testing bits directly in the Gu

[Intel-gfx] [PATCH] drm/i915/guc: Report error on invalid reset notification

2021-12-22 Thread John . C . Harrison
From: John Harrison Don't silently drop reset notifications from the GuC. It might not be safe to do an error capture but we still want some kind of report that the reset happened. Signed-off-by: John Harrison --- drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 5 + 1 file changed, 5 i

[Intel-gfx] [PATCH v2 0/3] Update to GuC version 69.0.3

2022-01-06 Thread John . C . Harrison
From: John Harrison Update to the latest GuC version. This includes a suite of interface changes and new features with corresponding i915 side changes. v2: Rebased to latest tree. Signed-off-by: John Harrison John Harrison (3): drm/i915/guc: Temporarily bump the GuC load timeout drm/i91

[Intel-gfx] [PATCH v2 1/3] drm/i915/guc: Temporarily bump the GuC load timeout

2022-01-06 Thread John . C . Harrison
From: John Harrison There is a known (but exceedingly unlikely) race condition where the asynchronous frequency management code could reduce the GT clock while a GuC reload is in progress (during a full GT reset). A fix is in progress but there are complex locking issues to be resolved. In the me

[Intel-gfx] [PATCH v2 2/3] drm/i915/guc: Update to GuC version 69.0.3

2022-01-06 Thread John . C . Harrison
From: John Harrison Update to the latest GuC release. The latest GuC firmware introduces a number of interface changes: GuC may return NO_RESPONSE_RETRY message for requests sent over CTB. Add support for this reply and try resending the request again as a new CTB message. A KLV (key-length-va

[Intel-gfx] [PATCH v2 3/3] drm/i915/guc: Improve GuC loading status check/error reports

2022-01-06 Thread John . C . Harrison
From: John Harrison If the GuC fails to load, it is useful to know what firmware file / version was attempted. So move the version info report to before the load attempt rather than only after a successful load. If the GuC does fail to load, then make the error messages visible rather than being

[Intel-gfx] [PATCH] drm/i915/guc: Don't error on reset of banned context

2022-01-06 Thread John . C . Harrison
From: John Harrison There is a race (already documented in the code) whereby a context can be (re-)queued for submission at the same time as it is being banned due to a hang and reset. That leads to a hang/reset report from GuC for a context which i915 thinks is already banned. While the race is

[Intel-gfx] [PATCH v2 i-g-t 01/15] tests/i915/i915_hangman: Add descriptions

2022-01-13 Thread John . C . Harrison
From: John Harrison Added descriptions of the various sub-tests and the test as a whole. v2: Added missing linefeed (spotted by Petri) Signed-off-by: John Harrison Reviewed-by: Petri Latvala --- tests/i915/i915_hangman.c | 11 +-- 1 file changed, 9 insertions(+), 2 deletions(-) diff

[Intel-gfx] [PATCH v2 i-g-t 00/15] Fixes for i915_hangman and gem_exec_capture

2022-01-13 Thread John . C . Harrison
From: John Harrison Fix a bunch of issues with i915_hangman and gem_exec_capture with the ultimate aim of making them pass on GuC enabled platforms. v2: Fixes to the store code. Add engine properties management. Signed-off-by: John Harrison John Harrison (15): tests/i915/i915_hangman: Add

[Intel-gfx] [PATCH v2 i-g-t 04/15] tests/i915/i915_hangman: Explicitly test per engine reset vs full GPU reset

2022-01-13 Thread John . C . Harrison
From: John Harrison Although the hangman test was ensuring that *some* reset functionality was enabled, it did not differentiate what kind. The infrastructure required to choose between per engine reset or full GT reset was recently added. So update this test to use it as well. Signed-off-by: Jo

[Intel-gfx] [PATCH v2 i-g-t 05/15] tests/i915/i915_hangman: Add uevent test & fix detector

2022-01-13 Thread John . C . Harrison
From: John Harrison Some of the IGT framework relies on receving a uevent when a hang occurs. So add a test that this actually works. While testing this, noticed that hangs could sometimes be missed because the uevent was (presumably) still in flight by the time the handler was de-registered. So

[Intel-gfx] [PATCH v2 i-g-t 13/15] lib/i915: Add helper for non-destructive engine property updates

2022-01-13 Thread John . C . Harrison
From: John Harrison Various tests want to configure engine properties such as pre-emption timeout and heartbeat interval. Some don't bother to restore the original values again afterwards. So, add a helper to make it easier to do this. Signed-off-by: John Harrison --- lib/i915/gem_engine_topol

[Intel-gfx] [PATCH v2 i-g-t 10/15] tests/i915/i915_hangman: Run background task on all engines

2022-01-13 Thread John . C . Harrison
From: John Harrison As opposed to only on the non-target engines. This means that there is some other workload present for the scheduler to switch between and so detet the hang immediately. Signed-off-by: John Harrison --- tests/i915/i915_hangman.c | 10 ++ 1 file changed, 6 insertions

[Intel-gfx] [PATCH v2 i-g-t 02/15] lib/hang: Fix igt_require_hang_ring to work with all engines

2022-01-13 Thread John . C . Harrison
From: John Harrison The above function was checking for valid rings via the old interface. The new scheme is to check for engines on contexts as there are now more engines than could be supported. Signed-off-by: John Harrison --- lib/igt_gt.c | 6 +++--- lib/igt_gt.h

[Intel-gfx] [PATCH v2 i-g-t 03/15] tests/i915/i915_hangman: Update capture test to use engine structure

2022-01-13 Thread John . C . Harrison
From: John Harrison The capture test was still using old style ring_id and ring_name (derived from the engine structure at the higher level). Update it to just take the engine structure directly. Signed-off-by: John Harrison --- tests/i915/i915_hangman.c | 10 +- 1 file changed, 5 inse

[Intel-gfx] [PATCH v2 i-g-t 09/15] tests/i915/i915_hangman: Remove reliance on context persistance

2022-01-13 Thread John . C . Harrison
From: John Harrison The hang test was relying on context persitence for no particular reason. That is, it would set a bunch of background spinners running then immediately destroy the active contexts but expect the spinners to keep spinning. With the current implementation of context persistence

[Intel-gfx] [PATCH v2 i-g-t 06/15] tests/i915/i915_hangman: Use the correct context in hangcheck_unterminated

2022-01-13 Thread John . C . Harrison
From: John Harrison The hangman framework sets up a context that is valid for all engines and has things like banning disabled. The 'unterminated' test then ignores it and uses the default context. Fix that. Signed-off-by: John Harrison --- tests/i915/i915_hangman.c | 1 + 1 file changed, 1 in

[Intel-gfx] [PATCH v2 i-g-t 07/15] lib/store: Refactor common store code into helper function

2022-01-13 Thread John . C . Harrison
From: John Harrison A lot of tests use almost identical code for creating a batch buffer which does a single write to memory and another is about to be added. Instead, move the most generic version into a common helper function. Unfortunately, the other instances are all subtly different enough t

[Intel-gfx] [PATCH v2 i-g-t 14/15] tests/i915/i915_hangman: Configure engine properties for quicker hangs

2022-01-13 Thread John . C . Harrison
From: John Harrison Some platforms have very long timeouts configured for some engines. Some have them disabled completely. That makes for a very slow (or broken) hangman test. So explicitly configure the engines to have reasonable settings first. Signed-off-by: John Harrison --- tests/i915/i9

[Intel-gfx] [PATCH v2 i-g-t 11/15] tests/i915/i915_hangman: Don't let background contexts cause a ban

2022-01-13 Thread John . C . Harrison
From: John Harrison The global context used by all the subtests for causing hangs is marked as unbannable. However, some of the subtests set background spinners running on all engines using a freshly created context. If there is a test failure for any reason, all of those spinners can be killed o

[Intel-gfx] [PATCH v2 i-g-t 12/15] tests/i915/gem_exec_fence: Configure correct context

2022-01-13 Thread John . C . Harrison
From: John Harrison The update to use intel_ctx_t missed a line that configures the context to allow hanging. Fix that. Fixes: 09c36188b23f83ef9a7b5414e2a10100adc4291f Signed-off-by: John Harrison --- tests/i915/gem_exec_fence.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --gi

[Intel-gfx] [PATCH v2 i-g-t 15/15] tests/i915/gem_exec_capture: Restore engines

2022-01-13 Thread John . C . Harrison
From: John Harrison The test was updated some engine properties but not restoring them afterwards. That would leave the system in a non-default state which could potentially affect subsequent tests. Fix it by using the new save/restore engine properties helper functions. Signed-off-by: John Harr

[Intel-gfx] [PATCH v2 i-g-t 08/15] tests/i915/i915_hangman: Add alive-ness test after error capture

2022-01-13 Thread John . C . Harrison
From: John Harrison Added a an extra step to the i915_hangman tests to check that the system is still alive after the hang and recovery. This submits a simple batch to each engine which does a write to memory and checks that the write occurred. Signed-off-by: John Harrison --- tests/i915/i915_

[Intel-gfx] [PATCH v3 i-g-t 00/15] Fixes for i915_hangman and gem_exec_capture

2022-01-13 Thread John . C . Harrison
From: John Harrison Fix a bunch of issues with i915_hangman and gem_exec_capture with the ultimate aim of making them pass on GuC enabled platforms. v2: Fixes to the store code. Add engine properties management. v3: Fix for platforms without pre-emption. Signed-off-by: John Harrison John Har

[Intel-gfx] [PATCH v3 i-g-t 02/15] lib/hang: Fix igt_require_hang_ring to work with all engines

2022-01-13 Thread John . C . Harrison
From: John Harrison The above function was checking for valid rings via the old interface. The new scheme is to check for engines on contexts as there are now more engines than could be supported. Signed-off-by: John Harrison --- lib/igt_gt.c | 6 +++--- lib/igt_gt.h

[Intel-gfx] [PATCH v3 i-g-t 05/15] tests/i915/i915_hangman: Add uevent test & fix detector

2022-01-13 Thread John . C . Harrison
From: John Harrison Some of the IGT framework relies on receving a uevent when a hang occurs. So add a test that this actually works. While testing this, noticed that hangs could sometimes be missed because the uevent was (presumably) still in flight by the time the handler was de-registered. So

[Intel-gfx] [PATCH v3 i-g-t 08/15] tests/i915/i915_hangman: Add alive-ness test after error capture

2022-01-13 Thread John . C . Harrison
From: John Harrison Added a an extra step to the i915_hangman tests to check that the system is still alive after the hang and recovery. This submits a simple batch to each engine which does a write to memory and checks that the write occurred. Signed-off-by: John Harrison --- tests/i915/i915_

[Intel-gfx] [PATCH v3 i-g-t 01/15] tests/i915/i915_hangman: Add descriptions

2022-01-13 Thread John . C . Harrison
From: John Harrison Added descriptions of the various sub-tests and the test as a whole. v2: Added missing linefeed (spotted by Petri) Signed-off-by: John Harrison Reviewed-by: Petri Latvala --- tests/i915/i915_hangman.c | 11 +-- 1 file changed, 9 insertions(+), 2 deletions(-) diff

[Intel-gfx] [PATCH v3 i-g-t 06/15] tests/i915/i915_hangman: Use the correct context in hangcheck_unterminated

2022-01-13 Thread John . C . Harrison
From: John Harrison The hangman framework sets up a context that is valid for all engines and has things like banning disabled. The 'unterminated' test then ignores it and uses the default context. Fix that. Signed-off-by: John Harrison --- tests/i915/i915_hangman.c | 1 + 1 file changed, 1 in

[Intel-gfx] [PATCH v3 i-g-t 15/15] tests/i915/gem_exec_capture: Restore engines

2022-01-13 Thread John . C . Harrison
From: John Harrison The test was updated some engine properties but not restoring them afterwards. That would leave the system in a non-default state which could potentially affect subsequent tests. Fix it by using the new save/restore engine properties helper functions. Signed-off-by: John Harr

[Intel-gfx] [PATCH v3 i-g-t 04/15] tests/i915/i915_hangman: Explicitly test per engine reset vs full GPU reset

2022-01-13 Thread John . C . Harrison
From: John Harrison Although the hangman test was ensuring that *some* reset functionality was enabled, it did not differentiate what kind. The infrastructure required to choose between per engine reset or full GT reset was recently added. So update this test to use it as well. Signed-off-by: Jo

[Intel-gfx] [PATCH v3 i-g-t 03/15] tests/i915/i915_hangman: Update capture test to use engine structure

2022-01-13 Thread John . C . Harrison
From: John Harrison The capture test was still using old style ring_id and ring_name (derived from the engine structure at the higher level). Update it to just take the engine structure directly. Signed-off-by: John Harrison --- tests/i915/i915_hangman.c | 10 +- 1 file changed, 5 inse

[Intel-gfx] [PATCH v3 i-g-t 09/15] tests/i915/i915_hangman: Remove reliance on context persistance

2022-01-13 Thread John . C . Harrison
From: John Harrison The hang test was relying on context persitence for no particular reason. That is, it would set a bunch of background spinners running then immediately destroy the active contexts but expect the spinners to keep spinning. With the current implementation of context persistence

[Intel-gfx] [PATCH v3 i-g-t 10/15] tests/i915/i915_hangman: Run background task on all engines

2022-01-13 Thread John . C . Harrison
From: John Harrison As opposed to only on the non-target engines. This means that there is some other workload present for the scheduler to switch between and so detet the hang immediately. Signed-off-by: John Harrison --- tests/i915/i915_hangman.c | 10 ++ 1 file changed, 6 insertions

[Intel-gfx] [PATCH v3 i-g-t 13/15] lib/i915: Add helper for non-destructive engine property updates

2022-01-13 Thread John . C . Harrison
From: John Harrison Various tests want to configure engine properties such as pre-emption timeout and heartbeat interval. Some don't bother to restore the original values again afterwards. So, add a helper to make it easier to do this. v2: Fix for platforms with no pre-emption capability. Signe

[Intel-gfx] [PATCH v3 i-g-t 12/15] tests/i915/gem_exec_fence: Configure correct context

2022-01-13 Thread John . C . Harrison
From: John Harrison The update to use intel_ctx_t missed a line that configures the context to allow hanging. Fix that. Fixes: 09c36188b23f83ef9a7b5414e2a10100adc4291f Signed-off-by: John Harrison --- tests/i915/gem_exec_fence.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --gi

[Intel-gfx] [PATCH v3 i-g-t 11/15] tests/i915/i915_hangman: Don't let background contexts cause a ban

2022-01-13 Thread John . C . Harrison
From: John Harrison The global context used by all the subtests for causing hangs is marked as unbannable. However, some of the subtests set background spinners running on all engines using a freshly created context. If there is a test failure for any reason, all of those spinners can be killed o

[Intel-gfx] [PATCH v3 i-g-t 14/15] tests/i915/i915_hangman: Configure engine properties for quicker hangs

2022-01-13 Thread John . C . Harrison
From: John Harrison Some platforms have very long timeouts configured for some engines. Some have them disabled completely. That makes for a very slow (or broken) hangman test. So explicitly configure the engines to have reasonable settings first. Signed-off-by: John Harrison --- tests/i915/i9

[Intel-gfx] [PATCH v3 i-g-t 07/15] lib/store: Refactor common store code into helper function

2022-01-13 Thread John . C . Harrison
From: John Harrison A lot of tests use almost identical code for creating a batch buffer which does a single write to memory and another is about to be added. Instead, move the most generic version into a common helper function. Unfortunately, the other instances are all subtly different enough t

[Intel-gfx] [PATCH i-g-t] lib/store: Refactor common store code into helper function

2022-01-13 Thread John . C . Harrison
From: John Harrison A lot of tests use almost identical code for creating a batch buffer which does a single write to memory and another is about to be added. Instead, move the most generic version into a common helper function. Unfortunately, the other instances are all subtly different enough t

[Intel-gfx] [PATCH i-g-t] tests/i915/i915_hangman: Don't let background contexts cause a ban

2022-01-13 Thread John . C . Harrison
From: John Harrison The global context used by all the subtests for causing hangs is marked as unbannable. However, some of the subtests set background spinners running on all engines using a freshly created context. If there is a test failure for any reason, all of those spinners can be killed o

[Intel-gfx] [PATCH i-g-t] tests/i915/i915_hangman: Add alive-ness test after error capture

2022-01-13 Thread John . C . Harrison
From: John Harrison Added a an extra step to the i915_hangman tests to check that the system is still alive after the hang and recovery. This submits a simple batch to each engine which does a write to memory and checks that the write occurred. v2: Use _device_coherent instead of _wc for mapping

[Intel-gfx] [PATCH v4 i-g-t 00/15] Fixes for i915_hangman and gem_exec_capture

2022-01-13 Thread John . C . Harrison
From: John Harrison Fix a bunch of issues with i915_hangman and gem_exec_capture with the ultimate aim of making them pass on GuC enabled platforms. v2: Fixes to the store code. Add engine properties management. v3: Fix for platforms without pre-emption. v4: Simplify anti-ban code, support >32bi

[Intel-gfx] [PATCH v4 i-g-t 01/15] tests/i915/i915_hangman: Add descriptions

2022-01-13 Thread John . C . Harrison
From: John Harrison Added descriptions of the various sub-tests and the test as a whole. v2: Added missing linefeed (spotted by Petri) Signed-off-by: John Harrison Reviewed-by: Petri Latvala --- tests/i915/i915_hangman.c | 11 +-- 1 file changed, 9 insertions(+), 2 deletions(-) diff

[Intel-gfx] [PATCH v4 i-g-t 07/15] lib/store: Refactor common store code into helper function

2022-01-13 Thread John . C . Harrison
From: John Harrison A lot of tests use almost identical code for creating a batch buffer which does a single write to memory and another is about to be added. Instead, move the most generic version into a common helper function. Unfortunately, the other instances are all subtly different enough t

[Intel-gfx] [PATCH v4 i-g-t 02/15] lib/hang: Fix igt_require_hang_ring to work with all engines

2022-01-13 Thread John . C . Harrison
From: John Harrison The above function was checking for valid rings via the old interface. The new scheme is to check for engines on contexts as there are now more engines than could be supported. Signed-off-by: John Harrison --- lib/igt_gt.c | 6 +++--- lib/igt_gt.h

[Intel-gfx] [PATCH v4 i-g-t 05/15] tests/i915/i915_hangman: Add uevent test & fix detector

2022-01-13 Thread John . C . Harrison
From: John Harrison Some of the IGT framework relies on receving a uevent when a hang occurs. So add a test that this actually works. While testing this, noticed that hangs could sometimes be missed because the uevent was (presumably) still in flight by the time the handler was de-registered. So

[Intel-gfx] [PATCH v4 i-g-t 06/15] tests/i915/i915_hangman: Use the correct context in hangcheck_unterminated

2022-01-13 Thread John . C . Harrison
From: John Harrison The hangman framework sets up a context that is valid for all engines and has things like banning disabled. The 'unterminated' test then ignores it and uses the default context. Fix that. Signed-off-by: John Harrison Reviewed-by: Matthew Brost --- tests/i915/i915_hangman.c

[Intel-gfx] [PATCH v4 i-g-t 03/15] tests/i915/i915_hangman: Update capture test to use engine structure

2022-01-13 Thread John . C . Harrison
From: John Harrison The capture test was still using old style ring_id and ring_name (derived from the engine structure at the higher level). Update it to just take the engine structure directly. Signed-off-by: John Harrison Reviewed-by: Matthew Brost --- tests/i915/i915_hangman.c | 10 +-

[Intel-gfx] [PATCH v4 i-g-t 14/15] tests/i915/i915_hangman: Configure engine properties for quicker hangs

2022-01-13 Thread John . C . Harrison
From: John Harrison Some platforms have very long timeouts configured for some engines. Some have them disabled completely. That makes for a very slow (or broken) hangman test. So explicitly configure the engines to have reasonable settings first. Signed-off-by: John Harrison Reviewed-by: Matth

[Intel-gfx] [PATCH v4 i-g-t 12/15] tests/i915/gem_exec_fence: Configure correct context

2022-01-13 Thread John . C . Harrison
From: John Harrison The update to use intel_ctx_t missed a line that configures the context to allow hanging. Fix that. Fixes: 09c36188b ("tests/i915/gem_exec_fence: Convert to intel_ctx_t (v2)") Signed-off-by: John Harrison Reviewed-by: Matthew Brost --- tests/i915/gem_exec_fence.c | 2 +- 1

[Intel-gfx] [PATCH v4 i-g-t 11/15] tests/i915/i915_hangman: Don't let background contexts cause a ban

2022-01-13 Thread John . C . Harrison
From: John Harrison The global context used by all the subtests for causing hangs is marked as unbannable. However, some of the subtests set background spinners running on all engines using a freshly created context. If there is a test failure for any reason, all of those spinners can be killed o

[Intel-gfx] [PATCH v4 i-g-t 08/15] tests/i915/i915_hangman: Add alive-ness test after error capture

2022-01-13 Thread John . C . Harrison
From: John Harrison Added a an extra step to the i915_hangman tests to check that the system is still alive after the hang and recovery. This submits a simple batch to each engine which does a write to memory and checks that the write occurred. v2: Use _device_coherent instead of _wc for mapping

[Intel-gfx] [PATCH v4 i-g-t 10/15] tests/i915/i915_hangman: Run background task on all engines

2022-01-13 Thread John . C . Harrison
From: John Harrison As opposed to only on the non-target engines. This means that there is some other workload present for the scheduler to switch between and so detet the hang immediately. Signed-off-by: John Harrison Reviewed-by: Matthew Brost --- tests/i915/i915_hangman.c | 10 ++

[Intel-gfx] [PATCH v4 i-g-t 04/15] tests/i915/i915_hangman: Explicitly test per engine reset vs full GPU reset

2022-01-13 Thread John . C . Harrison
From: John Harrison Although the hangman test was ensuring that *some* reset functionality was enabled, it did not differentiate what kind. The infrastructure required to choose between per engine reset or full GT reset was recently added. So update this test to use it as well. Signed-off-by: Jo

[Intel-gfx] [PATCH v4 i-g-t 13/15] lib/i915: Add helper for non-destructive engine property updates

2022-01-13 Thread John . C . Harrison
From: John Harrison Various tests want to configure engine properties such as pre-emption timeout and heartbeat interval. Some don't bother to restore the original values again afterwards. So, add a helper to make it easier to do this. v2: Fix for platforms with no pre-emption capability. Signe

[Intel-gfx] [PATCH v4 i-g-t 15/15] tests/i915/gem_exec_capture: Restore engines

2022-01-13 Thread John . C . Harrison
From: John Harrison The test was updated some engine properties but not restoring them afterwards. That would leave the system in a non-default state which could potentially affect subsequent tests. Fix it by using the new save/restore engine properties helper functions. Signed-off-by: John Harr

[Intel-gfx] [PATCH v4 i-g-t 09/15] tests/i915/i915_hangman: Remove reliance on context persistance

2022-01-13 Thread John . C . Harrison
From: John Harrison The hang test was relying on context persitence for no particular reason. That is, it would set a bunch of background spinners running then immediately destroy the active contexts but expect the spinners to keep spinning. With the current implementation of context persistence

[Intel-gfx] [PATCH v5 i-g-t 02/15] lib/hang: Fix igt_require_hang_ring to work with all engines

2022-01-14 Thread John . C . Harrison
From: John Harrison The above function was checking for valid rings via the old interface. The new scheme is to check for engines on contexts as there are now more engines than could be supported. Signed-off-by: John Harrison Reviewed-by: Matthew Brost --- lib/igt_gt.c | 6 +++---

[Intel-gfx] [PATCH v5 i-g-t 00/15] Fixes for i915_hangman, gem_exec_capture and gem_exec_fence

2022-01-14 Thread John . C . Harrison
From: John Harrison Fix a bunch of issues with i915_hangman and gem_exec_capture with the ultimate aim of making them pass on GuC enabled platforms. v2: Fixes to the store code. Add engine properties management. v3: Fix for platforms without pre-emption. v4: Simplify anti-ban code, support >32bi

[Intel-gfx] [PATCH v5 i-g-t 05/15] tests/i915/i915_hangman: Add uevent test & fix detector

2022-01-14 Thread John . C . Harrison
From: John Harrison Some of the IGT framework relies on receving a uevent when a hang occurs. So add a test that this actually works. While testing this, noticed that hangs could sometimes be missed because the uevent was (presumably) still in flight by the time the handler was de-registered. So

[Intel-gfx] [PATCH v5 i-g-t 09/15] tests/i915/i915_hangman: Remove reliance on context persistance

2022-01-14 Thread John . C . Harrison
From: John Harrison The hang test was relying on context persitence for no particular reason. That is, it would set a bunch of background spinners running then immediately destroy the active contexts but expect the spinners to keep spinning. With the current implementation of context persistence

[Intel-gfx] [PATCH v5 i-g-t 11/15] tests/i915/i915_hangman: Don't let background contexts cause a ban

2022-01-14 Thread John . C . Harrison
From: John Harrison The global context used by all the subtests for causing hangs is marked as unbannable. However, some of the subtests set background spinners running on all engines using a freshly created context. If there is a test failure for any reason, all of those spinners can be killed o

[Intel-gfx] [PATCH v5 i-g-t 06/15] tests/i915/i915_hangman: Use the correct context in hangcheck_unterminated

2022-01-14 Thread John . C . Harrison
From: John Harrison The hangman framework sets up a context that is valid for all engines and has things like banning disabled. The 'unterminated' test then ignores it and uses the default context. Fix that. Signed-off-by: John Harrison Reviewed-by: Matthew Brost --- tests/i915/i915_hangman.c

[Intel-gfx] [PATCH v5 i-g-t 01/15] tests/i915/i915_hangman: Add descriptions

2022-01-14 Thread John . C . Harrison
From: John Harrison Added descriptions of the various sub-tests and the test as a whole. v2: Added missing linefeed (spotted by Petri) Signed-off-by: John Harrison Reviewed-by: Petri Latvala --- tests/i915/i915_hangman.c | 11 +-- 1 file changed, 9 insertions(+), 2 deletions(-) diff

[Intel-gfx] [PATCH v5 i-g-t 14/15] tests/i915/i915_hangman: Configure engine properties for quicker hangs

2022-01-14 Thread John . C . Harrison
From: John Harrison Some platforms have very long timeouts configured for some engines. Some have them disabled completely. That makes for a very slow (or broken) hangman test. So explicitly configure the engines to have reasonable settings first. Signed-off-by: John Harrison Reviewed-by: Matth

[Intel-gfx] [PATCH v5 i-g-t 03/15] tests/i915/i915_hangman: Update capture test to use engine structure

2022-01-14 Thread John . C . Harrison
From: John Harrison The capture test was still using old style ring_id and ring_name (derived from the engine structure at the higher level). Update it to just take the engine structure directly. Signed-off-by: John Harrison Reviewed-by: Matthew Brost --- tests/i915/i915_hangman.c | 10 +-

[Intel-gfx] [PATCH v5 i-g-t 04/15] tests/i915/i915_hangman: Explicitly test per engine reset vs full GPU reset

2022-01-14 Thread John . C . Harrison
From: John Harrison Although the hangman test was ensuring that *some* reset functionality was enabled, it did not differentiate what kind. The infrastructure required to choose between per engine reset or full GT reset was recently added. So update this test to use it as well. Signed-off-by: Jo

[Intel-gfx] [PATCH v5 i-g-t 15/15] tests/i915/gem_exec_capture: Restore engines

2022-01-14 Thread John . C . Harrison
From: John Harrison The test was updated some engine properties but not restoring them afterwards. That would leave the system in a non-default state which could potentially affect subsequent tests. Fix it by using the new save/restore engine properties helper functions. v2: Don't restore too so

[Intel-gfx] [PATCH v5 i-g-t 07/15] lib/store: Refactor common store code into helper function

2022-01-14 Thread John . C . Harrison
From: John Harrison A lot of tests use almost identical code for creating a batch buffer which does a single write to memory and another is about to be added. Instead, move the most generic version into a common helper function. Unfortunately, the other instances are all subtly different enough t

[Intel-gfx] [PATCH v5 i-g-t 12/15] tests/i915/gem_exec_fence: Configure correct context

2022-01-14 Thread John . C . Harrison
From: John Harrison The update to use intel_ctx_t missed a line that configures the context to allow hanging. Fix that. Fixes: 09c36188b ("tests/i915/gem_exec_fence: Convert to intel_ctx_t (v2)") Signed-off-by: John Harrison Reviewed-by: Matthew Brost --- tests/i915/gem_exec_fence.c | 2 +- 1

[Intel-gfx] [PATCH v5 i-g-t 08/15] tests/i915/i915_hangman: Add alive-ness test after error capture

2022-01-14 Thread John . C . Harrison
From: John Harrison Added a an extra step to the i915_hangman tests to check that the system is still alive after the hang and recovery. This submits a simple batch to each engine which does a write to memory and checks that the write occurred. v2: Use _device_coherent instead of _wc for mapping

<    1   2   3   4   5   6   7   8   9   10   >