[Intel-gfx] [PATCH v4 33/38] drm/i915: GPU priority bumping to prevent starvation

2016-01-11 Thread John . C . Harrison
From: John Harrison If a high priority task was to continuously submit batch buffers to the driver, it could starve out any lower priority task from getting any GPU time at all. To prevent this, the priority of a queued batch buffer is bumped each time it does not get submitted to the hardware.

[Intel-gfx] [PATCH v4 17/38] drm/i915: Hook scheduler node clean up into retire requests

2016-01-11 Thread John . C . Harrison
From: John Harrison The scheduler keeps its own lock on various DRM objects in order to guarantee safe access long after the original execbuff IOCTL has completed. This is especially important when pre-emption is enabled as the batch buffer might need to be submitted to the hardware multiple time

[Intel-gfx] [PATCH v4 06/38] drm/i915: Re-instate request->uniq because it is extremely useful

2016-01-11 Thread John . C . Harrison
From: John Harrison The seqno value cannot always be used when debugging issues via trace points. This is because it can be reset back to start, especially during TDR type tests. Also, when the scheduler arrives the seqno is only valid while a given request is executing on the hardware. While the

[Intel-gfx] [PATCH v4 38/38] drm/i915: Allow scheduler to manage inter-ring object synchronisation

2016-01-11 Thread John . C . Harrison
From: John Harrison The scheduler has always tracked batch buffer dependencies based on DRM object usage. This means that it will not submit a batch on one ring that has outstanding dependencies still executing on other rings. This is exactly the same synchronisation performed by i915_gem_object_

[Intel-gfx] [PATCH v4 15/38] drm/i915: Keep the reserved space mechanism happy

2016-01-11 Thread John . C . Harrison
From: John Harrison Ring space is reserved when constructing a request to ensure that the subsequent 'add_request()' call cannot fail due to waiting for space on a busy or broken GPU. However, the scheduler jumps in to the middle of the execbuffer process between request creation and request subm

[Intel-gfx] [PATCH v4 05/38] drm/i915: Cache request pointer in *_submission_final()

2016-01-11 Thread John . C . Harrison
From: Dave Gordon Keep a local copy of the request pointer in the _final() functions rather than dereferencing the params block repeatedly. v3: New patch in series. For: VIZ-1587 Signed-off-by: Dave Gordon Signed-off-by: John Harrison --- drivers/gpu/drm/i915/i915_gem_execbuffer.c | 13 +

[Intel-gfx] [PATCH v4 03/38] drm/i915: Prelude to splitting i915_gem_do_execbuffer in two

2016-01-11 Thread John . C . Harrison
From: John Harrison The scheduler decouples the submission of batch buffers to the driver with their submission to the hardware. This basically means splitting the execbuffer() function in half. This change rearranges some code ready for the split to occur. For: VIZ-1587 Signed-off-by: John Harr

[Intel-gfx] [PATCH v4 23/38] drm/i915: Defer seqno allocation until actual hardware submission time

2016-01-11 Thread John . C . Harrison
From: John Harrison The seqno value is now only used for the final test for completion of a request. It is no longer used to track the request through the software stack. Thus it is no longer necessary to allocate the seqno immediately with the request. Instead, it can be done lazily and left unt

[Intel-gfx] [PATCH] igt/gem_ctx_param_basic: Updated to support scheduler priority interface

2016-01-11 Thread John . C . Harrison
From: John Harrison The GPU scheduler has added an execution priority level to the context object. There is an IOCTL interface to allow user apps/libraries to set this priority. This patch updates the context paramter IOCTL test to include the new interface. For: VIZ-1587 Signed-off-by: John Har

[Intel-gfx] [PATCH v4 08/38] drm/i915: Prepare retire_requests to handle out-of-order seqnos

2016-01-11 Thread John . C . Harrison
From: John Harrison A major point of the GPU scheduler is that it re-orders batch buffers after they have been submitted to the driver. This leads to requests completing out of order. In turn, this means that the retire processing can no longer assume that all completed entries are at the front o

[Intel-gfx] [PATCH v4 10/38] drm/i915: Force MMIO flips when scheduler enabled

2016-01-11 Thread John . C . Harrison
From: John Harrison MMIO flips are the preferred mechanism now but more importantly, pipe based flips cause issues for the scheduler. Specifically, submitting work to the rings around the side of the scheduler could cause that work to be lost if the scheduler generates a pre-emption event on that

[Intel-gfx] [PATCH v4 14/38] drm/i915: Redirect execbuffer_final() via scheduler

2016-01-11 Thread John . C . Harrison
From: John Harrison Updated the execbuffer() code to pass the packaged up batch buffer information to the scheduler rather than calling execbuffer_final() directly. The scheduler queue() code is currently a stub which simply chains on to _final() immediately. For: VIZ-1587 Signed-off-by: John Ha

[Intel-gfx] [PATCH v4 25/38] drm/i915: Added trace points to scheduler

2016-01-11 Thread John . C . Harrison
From: John Harrison Added trace points to the scheduler to track all the various events, node state transitions and other interesting things that occur. v2: Updated for new request completion tracking implementation. v3: Updated for changes to node kill code. v4: Wrapped some long lines to kee

[Intel-gfx] [PATCH v4 35/38] drm/i915: Enable GPU scheduler by default

2016-01-11 Thread John . C . Harrison
From: John Harrison Now that all the scheduler patches have been applied, it is safe to enable. For: VIZ-1587 Signed-off-by: John Harrison --- drivers/gpu/drm/i915/i915_params.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_params.c b/driver

[Intel-gfx] [PATCH v4 29/38] drm/i915: Add early exit to execbuff_final() if insufficient ring space

2016-01-11 Thread John . C . Harrison
From: John Harrison One of the major purposes of the GPU scheduler is to avoid stalling the CPU when the GPU is busy and unable to accept more work. This change adds support to the ring submission code to allow a ring space check to be performed before attempting to submit a batch buffer to the h

[Intel-gfx] [PATCH v4 02/38] drm/i915: Explicit power enable during deferred context initialisation

2016-01-11 Thread John . C . Harrison
From: John Harrison A later patch in this series re-organises the batch buffer submission code. Part of that is to reduce the scope of a pm_get/put pair. Specifically, they previously wrapped the entire submission path from the very start to the very end, now they only wrap the actual hardware su

[Intel-gfx] [PATCH v4 32/38] drm/i915: Add scheduler support functions for TDR

2016-01-11 Thread John . C . Harrison
From: John Harrison The TDR code needs to know what the scheduler is up to in order to work out whether a ring is really hung or not. v4: Removed some unnecessary braces to keep the style checker happy. For: VIZ-1587 Signed-off-by: John Harrison --- drivers/gpu/drm/i915/i915_scheduler.c | 30

[Intel-gfx] [PATCH v4 24/38] drm/i915: Added immediate submission override to scheduler

2016-01-11 Thread John . C . Harrison
From: John Harrison To aid with debugging issues related to the scheduler, it can be useful to ensure that all batch buffers are submitted immediately rather than queued until later. This change adds an override flag via the module parameter to force instant submission. For: VIZ-1587 Signed-off-

[Intel-gfx] [PATCH v4 13/38] drm/i915: Added deferred work handler for scheduler

2016-01-11 Thread John . C . Harrison
From: John Harrison The scheduler needs to do interrupt triggered work that is too complex to do in the interrupt handler. Thus it requires a deferred work handler to process such tasks asynchronously. v2: Updated to reduce mutex lock usage. The lock is now only held for the minimum time within

[Intel-gfx] [PATCH v4 20/38] drm/i915: Added scheduler flush calls to ring throttle and idle functions

2016-01-11 Thread John . C . Harrison
From: John Harrison When requesting that all GPU work is completed, it is now necessary to get the scheduler involved in order to flush out work that queued and not yet submitted. v2: Updated to add support for flushing the scheduler queue by time stamp rather than just doing a blanket flush. v

[Intel-gfx] [PATCH v4 26/38] drm/i915: Added scheduler queue throttling by DRM file handle

2016-01-11 Thread John . C . Harrison
From: John Harrison The scheduler decouples the submission of batch buffers to the driver from their subsequent submission to the hardware. This means that an application which is continuously submitting buffers as fast as it can could potentialy flood the driver. To prevent this, the driver now

[Intel-gfx] [PATCH v4 34/38] drm/i915: Scheduler state dump via debugfs

2016-01-11 Thread John . C . Harrison
From: John Harrison Added a facility for triggering the scheduler state dump via a debugfs entry. v2: New patch in series. For: VIZ-1587 Signed-off-by: John Harrison --- drivers/gpu/drm/i915/i915_debugfs.c | 33 + drivers/gpu/drm/i915/i915_scheduler.c | 9 ++

[Intel-gfx] [PATCH v4 11/38] drm/i915: Added scheduler hook when closing DRM file handles

2016-01-11 Thread John . C . Harrison
From: John Harrison The scheduler decouples the submission of batch buffers to the driver with submission of batch buffers to the hardware. Thus it is possible for an application to close its DRM file handle while there is still work outstanding. That means the scheduler needs to know about file

[Intel-gfx] [PATCH v4 07/38] drm/i915: Start of GPU scheduler

2016-01-11 Thread John . C . Harrison
From: John Harrison Initial creation of scheduler source files. Note that this patch implements most of the scheduler functionality but does not hook it in to the driver yet. It also leaves the scheduler code in 'pass through' mode so that even when it is hooked in, it will not actually do very m

[Intel-gfx] [PATCH v4 28/38] drm/i915: Added debug state dump facilities to scheduler

2016-01-11 Thread John . C . Harrison
From: John Harrison When debugging batch buffer submission issues, it is useful to be able to see what the current state of the scheduler is. This change adds functions for decoding the internal scheduler state and reporting it. v3: Updated a debug message with the new state_str() function. v4:

[Intel-gfx] [PATCH v4 12/38] drm/i915: Added scheduler hook into i915_gem_request_notify()

2016-01-11 Thread John . C . Harrison
From: John Harrison The scheduler needs to know when requests have completed so that it can keep its own internal state up to date and can submit new requests to the hardware from its queue. v2: Updated due to changes in request handling. The operation is now reversed from before. Rather than th

[Intel-gfx] [PATCH v4 36/38] drm/i915: Add scheduling priority to per-context parameters

2016-01-11 Thread John . C . Harrison
From: Dave Gordon Added an interface for user land applications/libraries/services to set their GPU scheduler priority. This extends the existing context parameter IOCTL interface to add a scheduler priority parameter. The range is +/-1023 with +ve numbers meaning higher priority. Only system pro

[Intel-gfx] [PATCH v4 30/38] drm/i915: Added scheduler statistic reporting to debugfs

2016-01-11 Thread John . C . Harrison
From: John Harrison It is useful for know what the scheduler is doing for both debugging and performance analysis purposes. This change adds a bunch of counters and such that keep track of various scheduler operations (batches submitted, completed, flush requests, etc.). The data can then be read

[Intel-gfx] [PATCH v4 09/38] drm/i915: Disable hardware semaphores when GPU scheduler is enabled

2016-01-11 Thread John . C . Harrison
From: John Harrison Hardware sempahores require seqno values to be continuously incrementing. However, the scheduler's reordering of batch buffers means that the seqno values going through the hardware could be out of order. Thus semaphores can not be used. On the other hand, the scheduler super

[Intel-gfx] [PATCH v4 22/38] drm/i915: Support for 'unflushed' ring idle

2016-01-11 Thread John . C . Harrison
From: John Harrison When the seqno wraps around zero, the entire GPU is forced to be idle for some reason (possibly only to work around issues with hardware semaphores but no-one seems too sure!). This causes a problem if the force idle occurs at an inopportune moment such as in the middle of sub

[Intel-gfx] [PATCH v4 37/38] drm/i915: Add support for retro-actively banning batch buffers

2016-01-11 Thread John . C . Harrison
From: John Harrison If a given context submits too many hanging batch buffers then it will be banned and no further batch buffers will be accepted for it. However, it is possible that a large number of buffers may already have been accepted and are sat in the scheduler waiting to be executed. Thi

[Intel-gfx] [PATCH v4 31/38] drm/i915: Added seqno values to scheduler status dump

2016-01-11 Thread John . C . Harrison
From: John Harrison It is useful to be able to see what seqnos have actually popped out of the hardware when viewing the scheduler status. For: VIZ-1587 Signed-off-by: John Harrison --- drivers/gpu/drm/i915/i915_scheduler.c | 10 ++ drivers/gpu/drm/i915/i915_scheduler.h | 1 + 2 files

[Intel-gfx] [PATCH v4 21/38] drm/i915: Added a module parameter for allowing scheduler overrides

2016-01-11 Thread John . C . Harrison
From: John Harrison It can be useful to be able to disable certain features (e.g. the entire scheduler) via a module parameter for debugging purposes. A parameter has the advantage of not being a compile time switch but without implying that it can be changed dynamically at runtime. For: VIZ-158

[Intel-gfx] [PATCH v4 27/38] drm/i915: Added debugfs interface to scheduler tuning parameters

2016-01-11 Thread John . C . Harrison
From: John Harrison There are various parameters within the scheduler which can be tuned to improve performance, reduce memory footprint, etc. This change adds support for altering these via debugfs. v2: Updated for priorities now being signed values. For: VIZ-1587 Signed-off-by: John Harrison

[Intel-gfx] [PATCH] drm/i915: Fix for reserved space WARN_ON when ring begin fails

2016-01-13 Thread John . C . Harrison
From: John Harrison The reserved space code was not cleaning up properly in the case where the intel_ring_begin() call failed. This led to WARN_ONs firing about a double reserve call when running the gem_reset_stats IGT test. Signed-off-by: John Harrison --- drivers/gpu/drm/i915/intel_lrc.c

[Intel-gfx] [RFC 2/9] staging/android/sync: add sync_fence_create_dma

2016-01-13 Thread John . C . Harrison
From: Maarten Lankhorst This allows users of dma fences to create a android fence. v0.2: Added kerneldoc. (Tvrtko Ursulin). v0.4: Updated comments from review feedback by Maarten. Signed-off-by: Maarten Lankhorst Signed-off-by: Tvrtko Ursulin Cc: Maarten Lankhorst Cc: Daniel Vetter Cc: Jes

[Intel-gfx] [RFC 9/9] drm/i915: Add sync support to the scheduler statistics and status dump

2016-01-13 Thread John . C . Harrison
From: John Harrison There are useful statistics and debug information about fences that can be returned via the scheduler's existing reporting mechanisms (sysfs and debug output). These changes were previously part of the patches that originally added those mechanisms. However, as the sync framew

[Intel-gfx] [RFC 0/9] Add native sync support to i915 driver

2016-01-13 Thread John . C . Harrison
From: John Harrison This patch set was originally part of the struct fence and scheduler patch sets. However, it relies on de-staging the sync framework and that is now being done by another group. Hence these patches had to be split out into a separate series that can be merged after the de-stag

[Intel-gfx] [RFC 1/9] staging/android/sync: Support sync points created from dma-fences

2016-01-13 Thread John . C . Harrison
From: Maarten Lankhorst Debug output assumes all sync points are built on top of Android sync points and when we start creating them from dma-fences will NULL ptr deref unless taught about this. v0.4: Corrected patch ownership. v0.5: Removed redundant braces to keep style checker happy Signed-

[Intel-gfx] [RFC 3/9] staging/android/sync: Move sync framework out of staging

2016-01-13 Thread John . C . Harrison
From: John Harrison The sync framework is now used by the i915 driver. Therefore it can be moved out of staging and into the regular tree. Also, the public interfaces can actually be made public and exported. v0.3: New patch for series. Signed-off-by: John Harrison Signed-off-by: Geoff Miller

[Intel-gfx] [RFC 5/9] android/sync: Fix reversed sense of signaled fence

2016-01-13 Thread John . C . Harrison
From: Peter Lawthers In the 3.14 kernel, a signaled fence was indicated by the status field == 1. In 4.x, a status == 0 indicates signaled, status < 0 indicates error, and status > 0 indicates active. This patch wraps the check for a signaled fence in a function so that callers no longer needs t

[Intel-gfx] [RFC 8/9] drm/i915: Connecting execbuff fences to scheduler

2016-01-13 Thread John . C . Harrison
From: John Harrison The scheduler now supports sync framework fences being associated with batch buffers. The execbuff IOCTL allows such fences to be passed in from user land. This patch wires the two together so that the IOCTL no longer needs to stall on the fence immediately. Instead the stall

[Intel-gfx] [RFC 4/9] android/sync: Improved debug dump to dmesg

2016-01-13 Thread John . C . Harrison
From: John Harrison The sync code has a facility for dumping current state information via debugfs. It also has a way to re-use the same code for dumping to the kernel log on an internal error. However, the redirection was rather clunky and split the output across multiple prints at arbitrary bou

[Intel-gfx] [RFC 7/9] drm/i915: Add sync wait support to scheduler

2016-01-13 Thread John . C . Harrison
From: John Harrison There is a sync framework to allow work for multiple independent systems to be synchronised with each other but without stalling the CPU whether in the application or the driver. This patch adds support for this framework to the GPU scheduler. Batch buffers can now have sync

[Intel-gfx] [RFC 6/9] drm/i915: Add sync framework support to execbuff IOCTL

2016-01-13 Thread John . C . Harrison
From: John Harrison Various projects desire a mechanism for managing dependencies between work items asynchronously. This can also include work items across complete different and independent systems. For example, an application wants to retrieve a frame from a video in device, using it for rende

[Intel-gfx] [RFC] igt/gem_exec_fence: New test for sync/fence interface

2016-01-19 Thread John . C . Harrison
From: John Harrison Note, this is a work in progress. It is being posted now as there is work going on to change the debugging interface used by this test. So it would be useful to get some comments on whether the proposed changes will cause a problem for this test or whether the test itself shou

[Intel-gfx] [RFC 2/2] drm/i915: Avoid stalling on GuC send mutex lock

2017-11-17 Thread John . C . Harrison
From: John Harrison There is a mutex_lock in the GuC send action code path to ensure serialised access to the host-to-GuC mechanism. Acquiring the lock apparently sees random stalls of around 6ms. That is even when the lock is definitely not acquired by any other thread. In the case of sending pr

[Intel-gfx] [RFC 1/2] drm/i915: Extend GuC action fast spin time

2017-11-17 Thread John . C . Harrison
From: John Harrison The 'request pre-emption' GuC command seems to be slower than other commands. It typically takes 20-30us on a GP-MRB system (BXT). That means that the super-fast busy-spin wait in the GuC send action code hits the 10us time out. It then drops through to the more system friendl

[Intel-gfx] [RFC 0/2] Excessive latency in GuC send action

2017-11-17 Thread John . C . Harrison
From: John Harrison While working on a customer project, it was noticed that the time taken to issue a pre-emption request to the GuC would vary quite significantly. The correct case was low microseconds but the worst case was tens of milliseconds. Two separate issues were identified as causing t

[Intel-gfx] [PATCH 0/1] GuC submission vs request signaling race

2017-11-28 Thread John . C . Harrison
From: John Harrison Back in the days of 4.11, there was a nested_enable_signaling() function as part of the GuC submission path. It contained a BUG_ON to ensure that the request being processed had not already been signaled. However, there was a race condition that causes that BUG_ON to be hit. W

[Intel-gfx] [PATCH 1/1] drm/i915: Fix for nested_enable_signaling BUG_ON

2017-11-28 Thread John . C . Harrison
From: John Harrison The call to enable signaling was occuring after the request had been sent to the GuC for execution on the hardware. That means that it is possible for the request to actually complete before the code to enable signaling has executed. Potentially that means the request could b

[Intel-gfx] [PATCH i-g-t 3/4] scripts/trace.pl: Calculate stats only after all munging

2018-01-19 Thread John . C . Harrison
From: John Harrison There are various statistics being calculated multiple times in multiple places while the log file is being read in. Some of these are then re-calculated when the database is munged to correct various issues with the logs. This patch consolidates the calculations into a separa

[Intel-gfx] [PATCH i-g-t 2/4] scripts/trace.pl: Sort order

2018-01-19 Thread John . C . Harrison
From: John Harrison Add an extra level to the databse key sort so that the ordering is deterministic. If the time stamp matches, it now compares the key itself as well (context/seqno). This makes it much easier to determine if a change has actually broken anything. Previously back to back runs wi

[Intel-gfx] [PATCH i-g-t 1/4] scripts/trace.pl: More hash key optimisations

2018-01-19 Thread John . C . Harrison
From: John Harrison Cache the key count value rather than querying the hash every time. Also assert that the database does not magically change size after the fixups. Signed-off-by: John Harrison Cc: Tvrtko Ursulin --- scripts/trace.pl | 9 ++--- 1 file changed, 6 insertions(+), 3 deletio

[Intel-gfx] [PATCH i-g-t 0/4] scripts/trace.pl: Re-order calculations and fixups

2018-01-19 Thread John . C . Harrison
From: John Harrison The trace.pl script calculates a bunch of statistics. It also re-generates some timestamp values to correct issues with the log being processed. These operations were all mixed up together thus some were done multiple times (with different results each time). Whereas some stat

[Intel-gfx] [PATCH i-g-t 4/4] scripts/trace.pl: Simplify 'end' & 'notify' generation

2018-01-19 Thread John . C . Harrison
From: John Harrison Delay the auto-generation of end/notify values until the point where everything is known. As opposed to potentially generating them multiple times with differing values. Signed-off-by: John Harrison Cc: Tvrtko Ursulin --- scripts/trace.pl | 31 ++---

[Intel-gfx] [PATCH] drm/i915/guc: Fix potential null pointer deref in GuC 'steal id' test

2023-08-02 Thread John . C . Harrison
From: John Harrison It was noticed that if the very first 'stealing' request failed to create for some reason then the 'steal all ids' loop would immediately exit with 'last' still being NULL. The test would attempt to continue but using a null pointer. Fix that by aborting the test if it fails t

[Intel-gfx] [PATCH v2] drm/i915/guc: Force a reset on internal GuC error

2023-08-15 Thread John . C . Harrison
From: John Harrison If GuC hits an internal error (and survives long enough to report it to the KMD), it is basically toast and will stop until a GT reset and subsequent GuC reload is performed. Previously, the KMD just printed an error message and then waited for the heartbeat to eventually kick

[Intel-gfx] [PATCH 3/5] drm/i915/guc: Capture list clean up - 2

2023-04-06 Thread John . C . Harrison
From: John Harrison Don't use 'xe_lp*' prefixes for register lists that are common with Gen8. Signed-off-by: John Harrison --- .../gpu/drm/i915/gt/uc/intel_guc_capture.c| 30 +-- 1 file changed, 15 insertions(+), 15 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/uc/inte

[Intel-gfx] [PATCH 0/5] Improvements to GuC error capture list processing

2023-04-06 Thread John . C . Harrison
From: John Harrison The GuC error capture list creation was including Gen8 registers on Xe platforms. While fixing that, it was noticed that there were other issues. The platform naming was wrong, the naming of lists was misleading, the steered register code was duplicated and steered registers w

[Intel-gfx] [PATCH 2/5] drm/i915/guc: Capture list clean up - 1

2023-04-06 Thread John . C . Harrison
From: John Harrison Remove 99% duplicated steered register list code. Also, include the pre-Xe steered registers in the pre-Xe list generation. Signed-off-by: John Harrison --- .../gpu/drm/i915/gt/uc/intel_guc_capture.c| 112 +- 1 file changed, 29 insertions(+), 83 deletion

[Intel-gfx] [PATCH 4/5] drm/i915/guc: Capture list clean up - 3

2023-04-06 Thread John . C . Harrison
From: John Harrison Fix Xe_LP name. Signed-off-by: John Harrison --- .../gpu/drm/i915/gt/uc/intel_guc_capture.c| 44 +-- 1 file changed, 21 insertions(+), 23 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ca

[Intel-gfx] [PATCH 1/5] drm/i915/guc: Don't capture Gen8 regs on Xe devices

2023-04-06 Thread John . C . Harrison
From: John Harrison A pair of pre-Xe registers were being included in the Xe capture list. GuC was rejecting those as being invalid and logging errors about them. So, stop doing it. Signed-off-by: John Harrison Fixes: dce2bd542337 ("drm/i915/guc: Add Gen9 registers for GuC error state capture.

[Intel-gfx] [PATCH 5/5] drm/i915/guc: Capture list clean up - 4

2023-04-06 Thread John . C . Harrison
From: John Harrison Don't use GEN9 as a prefix for register lists that contain all GEN8 registers. Signed-off-by: John Harrison --- drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/uc/intel_g

[Intel-gfx] [PATCH 0/2] Add support for dumping error captures via kernel logging

2023-04-10 Thread John . C . Harrison
From: John Harrison Sometimes, the only effective way to debug an issue is to dump all the interesting information at the point of failure. So add support for doing that. Signed-off-by: John Harrison John Harrison (2): drm/i915: Dump error capture to kernel log drm/i915/guc: Dump error ca

[Intel-gfx] [PATCH 2/2] drm/i915/guc: Dump error capture to dmesg on CTB error

2023-04-10 Thread John . C . Harrison
From: John Harrison In the past, There have been sporadic CTB failures which proved hard to reproduce manually. The most effective solution was to dump the GuC log at the point of failure and let the CI system do the repro. It is preferable not to dump the GuC log via dmesg for all issues as it i

[Intel-gfx] [PATCH 1/2] drm/i915: Dump error capture to kernel log

2023-04-10 Thread John . C . Harrison
From: John Harrison This is useful for getting debug information out in certain situations, such as failing kernel selftests and CI runs that don't log error captures. It is especially useful for things like retrieving GuC logs as GuC operation can't be tracked by adding printk or ftrace entries.

[Intel-gfx] [PATCH] drm/i915/guc: Fix error capture for virtual engines

2023-04-14 Thread John . C . Harrison
From: John Harrison GuC based register dumps in error capture logs were basically broken for virtual engines. This can be seen in igt@gem_exec_balancer@hang: [IGT] gem_exec_balancer: starting subtest hang [drm] GPU HANG: ecode 12:4:e1524110, in gem_exec_balanc [6388] [drm] GT0: GUC: No regi

[Intel-gfx] [PATCH 0/5] Improvements to uc firmare management

2023-04-14 Thread John . C . Harrison
From: John Harrison Enhance the firmware table verification code to catch more potential errors and to generally improve the code itself. Track patch level version even on reduced version files to allow user notification of missing bug fixes. Detect another immediate failure case when loading G

[Intel-gfx] [PATCH 1/5] drm/i915/guc: Decode another GuC load failure case

2023-04-14 Thread John . C . Harrison
From: John Harrison Explain another potential firmware failure mode and early exit the long wait if hit. Signed-off-by: John Harrison --- drivers/gpu/drm/i915/gt/uc/abi/guc_errors_abi.h | 1 + drivers/gpu/drm/i915/gt/uc/intel_guc_fw.c | 6 ++ 2 files changed, 7 insertions(+) diff --

[Intel-gfx] [PATCH 3/5] drm/i915/uc: Track patch level versions on reduced version firmware files

2023-04-14 Thread John . C . Harrison
From: John Harrison When reduced version firmware files were added (matching major component being the only strict requirement), the minor version was still tracked and a notification reported if it was older. However, the patch version should really be tracked as well for the same reasons. The K

[Intel-gfx] [PATCH 5/5] drm/i915/uc: Reject doplicate entries in firmware table

2023-04-14 Thread John . C . Harrison
From: John Harrison It was noticed that duplicte entries in the firmware table could cause an infinite loop in the firmware loading code if that entry failed to load. Duplicate entries are a bug anyway and so should never happen. Ensure they don't by tweaking the table validation code to reject d

[Intel-gfx] [PATCH 2/5] drm/i915/guc: Print status register when waiting for GuC to load

2023-04-14 Thread John . C . Harrison
From: John Harrison If the GuC load is taking an excessively long time, the wait loop currently prints the GT frequency. Extend that to include the GuC status as well so we can see if the GuC is actually making progress or not. Signed-off-by: John Harrison --- drivers/gpu/drm/i915/gt/uc/intel_

[Intel-gfx] [PATCH 4/5] drm/i915/uc: Split firmware table validation to a separate function

2023-04-14 Thread John . C . Harrison
From: John Harrison The validation of the firmware table was being done inside the code for scanning the table for the next available firmware blob. Which is unnecessary. Potentially, it should be a selftest. But either way, the first step is pulling it out into a separate function that can be ca

[Intel-gfx] [PATCH v2 2/2] drm/i915/guc: Dump error capture to dmesg on CTB error

2023-04-18 Thread John . C . Harrison
From: John Harrison In the past, There have been sporadic CTB failures which proved hard to reproduce manually. The most effective solution was to dump the GuC log at the point of failure and let the CI system do the repro. It is preferable not to dump the GuC log via dmesg for all issues as it i

[Intel-gfx] [PATCH v2 0/2] Add support for dumping error captures via kernel logging

2023-04-18 Thread John . C . Harrison
From: John Harrison Sometimes, the only effective way to debug an issue is to dump all the interesting information at the point of failure. So add support for doing that. v2: Extra CONFIG wrapping (review feedback from Rodrigo) Signed-off-by: John Harrison John Harrison (2): drm/i915: Dump

[Intel-gfx] [PATCH v2 1/2] drm/i915: Dump error capture to kernel log

2023-04-18 Thread John . C . Harrison
From: John Harrison This is useful for getting debug information out in certain situations, such as failing kernel selftests and CI runs that don't log error captures. It is especially useful for things like retrieving GuC logs as GuC operation can't be tracked by adding printk or ftrace entries.

[Intel-gfx] [PATCH 2/6] drm/i915/guc: Print status register when waiting for GuC to load

2023-04-20 Thread John . C . Harrison
From: John Harrison If the GuC load is taking an excessively long time, the wait loop currently prints the GT frequency. Extend that to include the GuC status as well so we can see if the GuC is actually making progress or not. Signed-off-by: John Harrison Reviewed-by: Daniele Ceraolo Spurio -

[Intel-gfx] [PATCH 0/6] Improvements to uc firmare management

2023-04-20 Thread John . C . Harrison
From: John Harrison Enhance the firmware table verification code to catch more potential errors and to generally improve the code itself. Track patch level version even on reduced version files to allow user notification of missing bug fixes. Detect another immediate failure case when loading G

[Intel-gfx] [PATCH 3/6] drm/i915/uc: Track patch level versions on reduced version firmware files

2023-04-20 Thread John . C . Harrison
From: John Harrison When reduced version firmware files were added (matching major component being the only strict requirement), the minor version was still tracked and a notification reported if it was older. However, the patch version should really be tracked as well for the same reasons. The K

[Intel-gfx] [PATCH 1/6] drm/i915/guc: Decode another GuC load failure case

2023-04-20 Thread John . C . Harrison
From: John Harrison Explain another potential firmware failure mode and early exit the long wait if hit. Signed-off-by: John Harrison Reviewed-by: Daniele Ceraolo Spurio --- drivers/gpu/drm/i915/gt/uc/abi/guc_errors_abi.h | 1 + drivers/gpu/drm/i915/gt/uc/intel_guc_fw.c | 6 ++ 2 fi

[Intel-gfx] [PATCH 5/6] drm/i915/uc: Reject duplicate entries in firmware table

2023-04-20 Thread John . C . Harrison
From: John Harrison It was noticed that duplicate entries in the firmware table could cause an infinite loop in the firmware loading code if that entry failed to load. Duplicate entries are a bug anyway and so should never happen. Ensure they don't by tweaking the table validation code to reject

[Intel-gfx] [PATCH 6/6] drm/i915/uc: Make unexpected firmware versions an error in debug builds

2023-04-20 Thread John . C . Harrison
From: John Harrison If the DEBUG_GEM config option is set then escalate the 'unexpected firmware version' message from a notice to an error. This will ensure that the CI system treats such occurences as a failure and logs a bug about it (or fails the pre-merge testing). Signed-off-by: John Harri

[Intel-gfx] [PATCH 4/6] drm/i915/uc: Enhancements to firmware table validation

2023-04-20 Thread John . C . Harrison
From: John Harrison The validation of the firmware table was being done inside the code for scanning the table for the next available firmware blob. Which is unnecessary. So pull it out into a separate function that is only called once per blob type at init time. Also, drop the CONFIG_SELFTEST r

[Intel-gfx] [PATCH] drm/i915/guc: Actually return an error if GuC version range check fails

2023-04-21 Thread John . C . Harrison
From: John Harrison Dan Carpenter pointed out that 'err' was not being set in the case where the GuC firmware version range check fails. Fix that. Note that while this is bug fix for a previous patch (see Fixes tag below). It is an exceedingly low risk bug. The range check is asserting that the

[Intel-gfx] [PATCH i-g-t 2/2] tools/intel_error_decode: Correctly name the GuC CT buffer

2023-04-25 Thread John . C . Harrison
From: John Harrison The buffer decoding code doesn't cope well with unknown buffers. So add an entry for the GuC CTB so that it gets decoded correctly. Signed-off-by: John Harrison --- tools/intel_error_decode.c | 1 + 1 file changed, 1 insertion(+) diff --git a/tools/intel_error_decode.c b/t

[Intel-gfx] [PATCH i-g-t 0/2] Update intel_error_decode for Gen12

2023-04-25 Thread John . C . Harrison
From: John Harrison The error capture decoder was reporting invalid errors in batch buffers and getting confused about the prescence of the GuC CTB. So fix those up. Signed-off-by: John Harrison John Harrison (2): lib/intel_decode: Decode Gen12 ring/batch instructions correctly tools/inte

[Intel-gfx] [PATCH i-g-t 1/2] lib/intel_decode: Decode Gen12 ring/batch instructions correctly

2023-04-25 Thread John . C . Harrison
From: John Harrison Some MI_ instructions have changed (or are just new) for Gen12. So update the decoder code to match. Signed-off-by: John Harrison --- lib/i915/intel_decode.c | 15 +-- 1 file changed, 13 insertions(+), 2 deletions(-) diff --git a/lib/i915/intel_decode.c b/lib/i

[Intel-gfx] [PATCH 6/6] drm/i915/guc: Capture list clean up - 5

2023-04-26 Thread John . C . Harrison
From: John Harrison Rename the 'default_' register list prefix to 'gen8_' as that is the more accurate name. Signed-off-by: John Harrison --- drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c | 14 +++--- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/drivers/gpu/drm/i915/g

[Intel-gfx] [PATCH v2 4/4] drm/i915/guc: Fix error capture for virtual engines

2023-04-28 Thread John . C . Harrison
From: John Harrison GuC based register dumps in error capture logs were basically broken for virtual engines. This can be seen in igt@gem_exec_balancer@hang: [IGT] gem_exec_balancer: starting subtest hang [drm] GPU HANG: ecode 12:4:e1524110, in gem_exec_balanc [6388] [drm] GT0: GUC: No regi

[Intel-gfx] [PATCH v2 2/4] drm/i915/guc: Consolidate duplicated capture list code

2023-04-28 Thread John . C . Harrison
From: John Harrison Remove 99% duplicated steered register list code. Also, include the pre-Xe steered registers in the pre-Xe list generation. Signed-off-by: John Harrison Reviewed-by: Alan Previn --- .../gpu/drm/i915/gt/uc/intel_guc_capture.c| 112 +- 1 file changed, 29

[Intel-gfx] [PATCH v2 1/4] drm/i915/guc: Don't capture Gen8 regs on Xe devices

2023-04-28 Thread John . C . Harrison
From: John Harrison A pair of pre-Xe registers were being included in the Xe capture list. GuC was rejecting those as being invalid and logging errors about them. So, stop doing it. Signed-off-by: John Harrison Reviewed-by: Alan Previn Fixes: dce2bd542337 ("drm/i915/guc: Add Gen9 registers for

[Intel-gfx] [PATCH v2 0/4] Improvements to GuC error capture

2023-04-28 Thread John . C . Harrison
From: John Harrison The GuC error capture list creation was including Gen8 registers on Xe platforms. While fixing that, it was noticed that there were other issues. The platform naming was wrong, the naming of lists was misleading, the steered register code was duplicated and steered registers w

[Intel-gfx] [PATCH v2 3/4] drm/i915/guc: Capture list naming clean up

2023-04-28 Thread John . C . Harrison
From: John Harrison Don't use 'xe_lp*' prefixes for register lists that are common with Gen8. Don't add Xe only GSC registers to pre-Xe devices that don't even have a GSC engine. Fix Xe_LP name. Don't use GEN9 as a prefix for register lists that contain all GEN8 registers. Rename the 'default

[Intel-gfx] [PATCH v3 1/6] drm/i915/guc: Decode another GuC load failure case

2023-05-02 Thread John . C . Harrison
From: John Harrison Explain another potential firmware failure mode and early exit the long wait if hit. Signed-off-by: John Harrison Reviewed-by: Daniele Ceraolo Spurio --- drivers/gpu/drm/i915/gt/uc/abi/guc_errors_abi.h | 1 + drivers/gpu/drm/i915/gt/uc/intel_guc_fw.c | 6 ++ 2 fi

[Intel-gfx] [PATCH v3 0/6] Improvements to uc firmare management

2023-05-02 Thread John . C . Harrison
From: John Harrison Enhance the firmware table verification code to catch more potential errors and to generally improve the code itself. Track patch level version even on reduced version files to allow user notification of missing bug fixes. Detect another immediate failure case when loading G

[Intel-gfx] [PATCH v3 3/6] drm/i915/uc: Track patch level versions on reduced version firmware files

2023-05-02 Thread John . C . Harrison
From: John Harrison When reduced version firmware files were added (matching major component being the only strict requirement), the minor version was still tracked and a notification reported if it was older. However, the patch version should really be tracked as well for the same reasons. The K

[Intel-gfx] [PATCH v3 4/6] drm/i915/uc: Enhancements to firmware table validation

2023-05-02 Thread John . C . Harrison
From: John Harrison The validation of the firmware table was being done inside the code for scanning the table for the next available firmware blob. Which is unnecessary. So pull it out into a separate function that is only called once per blob type at init time. Also, drop the CONFIG_SELFTEST r

[Intel-gfx] [PATCH v3 5/6] drm/i915/uc: Reject duplicate entries in firmware table

2023-05-02 Thread John . C . Harrison
From: John Harrison It was noticed that duplicate entries in the firmware table could cause an infinite loop in the firmware loading code if that entry failed to load. Duplicate entries are a bug anyway and so should never happen. Ensure they don't by tweaking the table validation code to reject

[Intel-gfx] [PATCH v3 2/6] drm/i915/guc: Print status register when waiting for GuC to load

2023-05-02 Thread John . C . Harrison
From: John Harrison If the GuC load is taking an excessively long time, the wait loop currently prints the GT frequency. Extend that to include the GuC status as well so we can see if the GuC is actually making progress or not. Signed-off-by: John Harrison Reviewed-by: Daniele Ceraolo Spurio -

<    3   4   5   6   7   8   9   10   11   12   >