From: John Harrison
One of the major purposes of the GPU scheduler is to avoid stalling
the CPU when the GPU is busy and unable to accept more work. This
change adds support to the ring submission code to allow a ring space
check to be performed before attempting to submit a batch buffer to
the h
From: John Harrison
When the watchdog resets the GPU, all interrupts get disabled despite
the reference count remaining. As the scheduler probably had
interrupts enabled during the reset (it would have been waiting for
the bad batch to complete), it must be poked to tell it that the
interrupt has
From: John Harrison
It can be useful to be able to disable the GPU scheduler via a module
parameter for debugging purposes.
v5: Converted from a multi-feature 'overrides' mask to a single
'enable' boolean. Further features (e.g. pre-emption) will now be
separate 'enable' booleans added later. [C
From: John Harrison
The scheduler can cause batch buffers, and hence requests, to be
submitted to the ring out of order and asynchronously to their
submission to the driver. Thus at the point of waiting for the
completion of a given request, it is not even guaranteed that the
request has actually
From: John Harrison
The scheduler needs to track interdependencies between batch buffers.
These are calculated by analysing the object lists of the buffers and
looking for commonality. The scheduler also needs to keep those
buffers locked long after the initial IOCTL call has returned to user
lan
From: John Harrison
Added a facility for triggering the scheduler state dump via a debugfs
entry.
v2: New patch in series.
For: VIZ-1587
Signed-off-by: John Harrison
---
drivers/gpu/drm/i915/i915_debugfs.c | 33 +
drivers/gpu/drm/i915/i915_scheduler.c | 9 ++
From: John Harrison
The scheduler decouples the submission of batch buffers to the driver
from their subsequent submission to the hardware. This means that an
application which is continuously submitting buffers as fast as it can
could potentialy flood the driver. To prevent this, the driver now
From: John Harrison
When the seqno wraps around zero, the entire GPU is forced to be idle
for some reason (possibly only to work around issues with hardware
semaphores but no-one seems too sure!). This causes a problem if the
force idle occurs at an inopportune moment such as in the middle of
sub
From: John Harrison
The scheduler keeps its own lock on various DRM objects in order to
guarantee safe access long after the original execbuff IOCTL has
completed. This is especially important when pre-emption is enabled as
the batch buffer might need to be submitted to the hardware multiple
time
From: John Harrison
GPU page faults can now require scheduler operation in order to
complete. For example, in order to free up sufficient memory to handle
the fault the handler must wait for a batch buffer to complete that
has not even been sent to the hardware yet. Thus EAGAIN no longer
means a
From: John Harrison
The scheduler needs to do interrupt triggered work that is too complex
to do in the interrupt handler. Thus it requires a deferred work
handler to process such tasks asynchronously.
v2: Updated to reduce mutex lock usage. The lock is now only held for
the minimum time within
From: John Harrison
The GPU scheduler has added an execution priority level to the context
object. There is an IOCTL interface to allow user apps/libraries to
set this priority. This patch updates the context paramter IOCTL test
to include the new interface.
For: VIZ-1587
Signed-off-by: John Har
From: Dave Gordon
Added an interface for user land applications/libraries/services to
set their GPU scheduler priority. This extends the existing context
parameter IOCTL interface to add a scheduler priority parameter. The
range is +/-1023 with +ve numbers meaning higher priority. Only
system pro
From: John Harrison
There are various parameters within the scheduler which can be tuned
to improve performance, reduce memory footprint, etc. This change adds
support for altering these via debugfs.
v2: Updated for priorities now being signed values.
v5: Squashed priority bumping entries into
From: John Harrison
Now that all the scheduler patches have been applied, it is safe to enable.
v5: Updated for new module parameter.
For: VIZ-1587
Signed-off-by: John Harrison
---
drivers/gpu/drm/i915/i915_params.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drive
From: John Harrison
If a given context submits too many hanging batch buffers then it will
be banned and no further batch buffers will be accepted for it.
However, it is possible that a large number of buffers may already
have been accepted and are sat in the scheduler waiting to be
executed. Thi
From: John Harrison
When requesting that all GPU work is completed, it is now necessary to
get the scheduler involved in order to flush out work that queued and
not yet submitted.
v2: Updated to add support for flushing the scheduler queue by time
stamp rather than just doing a blanket flush.
v
From: John Harrison
The TDR code needs to know what the scheduler is up to in order to
work out whether a ring is really hung or not.
v4: Removed some unnecessary braces to keep the style checker happy.
v5: Removed white space and added documentation. [Joonas Lahtinen]
Also updated for new mod
From: John Harrison
It is useful for know what the scheduler is doing for both debugging
and performance analysis purposes. This change adds a bunch of
counters and such that keep track of various scheduler operations
(batches submitted, completed, flush requests, etc.). The data can
then be read
From: John Harrison
Ring space is reserved when constructing a request to ensure that the
subsequent 'add_request()' call cannot fail due to waiting for space
on a busy or broken GPU. However, the scheduler jumps in to the middle
of the execbuffer process between request creation and request
subm
From: John Harrison
The scheduler has always tracked batch buffer dependencies based on
DRM object usage. This means that it will not submit a batch on one
ring that has outstanding dependencies still executing on other rings.
This is exactly the same synchronisation performed by
i915_gem_object_
From: John Harrison
The notify function can be called many times without the seqno
changing. A large number of duplicates are to prevent races due to the
requirement of not enabling interrupts until requested. However, when
interrupts are enabled the IRQ handle can be called multiple times
withou
From: John Harrison
The request structure is reference counted. When the count reached
zero, the request was immediately freed and all associated objects
were unrefereced/unallocated. This meant that the driver mutex lock
must be held at the point where the count reaches zero. This was fine
while
From: John Harrison
There is a construct in the linux kernel called 'struct fence' that is
intended to keep track of work that is executed on hardware. I.e. it
solves the basic problem that the drivers 'struct
drm_i915_gem_request' is trying to address. The request structure does
quite a lot more
From: John Harrison
The change to the implementation of i915_gem_request_completed() means
that the lazy coherency flag is no longer used. This can now be
removed to simplify the interface.
v6: Updated to newer nigthly and resolved conflicts.
For: VIZ-5190
Signed-off-by: John Harrison
---
dri
From: John Harrison
The fence object used inside the request structure requires a sequence
number. Although this is not used by the i915 driver itself, it could
potentially be used by non-i915 code if the fence is passed outside of
the driver. This is the intention as it allows external kernel dr
From: John Harrison
There is a construct in the linux kernel called 'struct fence' that is
intended to keep track of work that is executed on hardware. I.e. it
solves the basic problem that the drivers 'struct
drm_i915_gem_request' is trying to address. The request structure does
quite a lot more
From: John Harrison
The intended usage model for struct fence is that the signalled status
should be set on demand rather than polled. That is, there should not
be a need for a 'signaled' function to be called everytime the status
is queried. Instead, 'something' should be done to enable a signal
From: John Harrison
Added the '_complete' trace event which occurs when a fence/request is
signaled as complete. Also moved the notify event from the IRQ handler
code to inside the notify function itself.
v3: Added the current ring seqno to the notify trace point.
v5: Line wrapping to keep the
From: John Harrison
Initial creation of scheduler source files. Note that this patch
implements most of the scheduler functionality but does not hook it in
to the driver yet. It also leaves the scheduler code in 'pass through'
mode so that even when it is hooked in, it will not actually do very
m
From: John Harrison
There is an EGL extension to set execution priority per context. This
can be implemented via the i915 per context priority parameter. This
patch adds a wrapper to connect the two together in a way that can be
updated as necessary without breaking one side or the other.
Signed
From: John Harrison
Now that all the scheduler patches have been applied, it is safe to enable.
v5: Updated for new module parameter.
For: VIZ-1587
Signed-off-by: John Harrison
---
drivers/gpu/drm/i915/i915_params.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drive
From: Dave Gordon
Added an interface for user land applications/libraries/services to
set their GPU scheduler priority. This extends the existing context
parameter IOCTL interface to add a scheduler priority parameter. The
range is +/-1023 with +ve numbers meaning higher priority. Only
system pro
From: John Harrison
If a given context submits too many hanging batch buffers then it will
be banned and no further batch buffers will be accepted for it.
However, it is possible that a large number of buffers may already
have been accepted and are sat in the scheduler waiting to be
executed. Thi
From: John Harrison
MMIO flips are the preferred mechanism now but more importantly, pipe
based flips cause issues for the scheduler. Specifically, submitting
work to the rings around the side of the scheduler could cause that
work to be lost if the scheduler generates a pre-emption event on that
From: John Harrison
The scheduler decouples the submission of batch buffers to the driver
from their subsequent submission to the hardware. This means that an
application which is continuously submitting buffers as fast as it can
could potentialy flood the driver. To prevent this, the driver now
From: John Harrison
It is useful for know what the scheduler is doing for both debugging
and performance analysis purposes. This change adds a bunch of
counters and such that keep track of various scheduler operations
(batches submitted, completed, flush requests, etc.). The data can
then be read
From: John Harrison
The notify function can be called many times without the seqno
changing. Some are to prevent races due to the requirement of not
enabling interrupts until requested. However, when interrupts are
enabled the IRQ handler can be called multiple times without the
ring's seqno valu
From: John Harrison
The change to the implementation of i915_gem_request_completed() means
that the lazy coherency flag is no longer used. This can now be
removed to simplify the interface.
v6: Updated to newer nightly and resolved conflicts.
v7: Updated to newer nightly (lots of ring -> engine
From: John Harrison
There is a construct in the linux kernel called 'struct fence' that is
intended to keep track of work that is executed on hardware. I.e. it
solves the basic problem that the drivers 'struct
drm_i915_gem_request' is trying to address. The request structure does
quite a lot more
From: John Harrison
Added the '_complete' trace event which occurs when a fence/request is
signaled as complete. Also moved the notify event from the IRQ handler
code to inside the notify function itself.
v3: Added the current ring seqno to the notify trace point.
v5: Line wrapping to keep the
From: John Harrison
There is a construct in the linux kernel called 'struct fence' that is
intended to keep track of work that is executed on hardware. I.e. it
solves the basic problem that the drivers 'struct
drm_i915_gem_request' is trying to address. The request structure does
quite a lot more
From: John Harrison
The fence object used inside the request structure requires a sequence
number. Although this is not used by the i915 driver itself, it could
potentially be used by non-i915 code if the fence is passed outside of
the driver. This is the intention as it allows external kernel dr
From: John Harrison
The intended usage model for struct fence is that the signalled status
should be set on demand rather than polled. That is, there should not
be a need for a 'signaled' function to be called everytime the status
is queried. Instead, 'something' should be done to enable a signal
From: John Harrison
There is a construct in the linux kernel called 'struct fence' that is
intended to keep track of work that is executed on hardware. I.e. it
solves the basic problem that the drivers 'struct
drm_i915_gem_request' is trying to address. The request structure does
quite a lot more
From: John Harrison
Added the '_complete' trace event which occurs when a fence/request is
signaled as complete. Also moved the notify event from the IRQ handler
code to inside the notify function itself.
v3: Added the current ring seqno to the notify trace point.
v5: Line wrapping to keep the
From: John Harrison
The intended usage model for struct fence is that the signalled status
should be set on demand rather than polled. That is, there should not
be a need for a 'signaled' function to be called everytime the status
is queried. Instead, 'something' should be done to enable a signal
From: John Harrison
The change to the implementation of i915_gem_request_completed() means
that the lazy coherency flag is no longer used. This can now be
removed to simplify the interface.
v6: Updated to newer nightly and resolved conflicts.
v7: Updated to newer nightly (lots of ring -> engine
From: John Harrison
The request structure is reference counted. When the count reached
zero, the request was immediately freed and all associated objects
were unrefereced/unallocated. This meant that the driver mutex lock
must be held at the point where the count reaches zero. This was fine
while
From: John Harrison
The notify function can be called many times without the seqno
changing. A large number of duplicates are to prevent races due to the
requirement of not enabling interrupts until requested. However, when
interrupts are enabled the IRQ handle can be called multiple times
withou
From: John Harrison
The fence object used inside the request structure requires a sequence
number. Although this is not used by the i915 driver itself, it could
potentially be used by non-i915 code if the fence is passed outside of
the driver. This is the intention as it allows external kernel dr
From: John Harrison
If an execbuff IOCTL call fails for some reason, it would leave the
request in the client list. The request clean up code would remove
this but only later on and only after the reference count has dropped
to zero. The entire sequence is contained within the driver mutex
lock.
From: John Harrison
There is a construct in the linux kernel called 'struct fence' that is
intended to keep track of work that is executed on hardware. I.e. it
solves the basic problem that the drivers 'struct
drm_i915_gem_request' is trying to address. The request structure does
quite a lot more
From: John Harrison
Split the execbuffer() function in half. The first half collects and
validates all the information required to process the batch buffer. It
also does all the object pinning, relocations, active list management,
etc - basically anything that must be done upfront before the IOCT
From: John Harrison
Ring space is reserved when constructing a request to ensure that the
subsequent 'add_request()' call cannot fail due to waiting for space
on a busy or broken GPU. However, the scheduler jumps in to the middle
of the execbuffer process between request creation and request
subm
From: John Harrison
The scheduler needs to track interdependencies between batch buffers.
These are calculated by analysing the object lists of the buffers and
looking for commonality. The scheduler also needs to keep those
buffers locked long after the initial IOCTL call has returned to user
lan
From: John Harrison
Updated the execbuffer() code to pass the packaged up batch buffer
information to the scheduler rather than calling execbuffer_final()
directly. The scheduler queue() code is currently a stub which simply
chains on to _final() immediately.
v6: Updated to newer nightly (lots o
From: John Harrison
It can be useful to be able to disable the GPU scheduler via a module
parameter for debugging purposes.
v5: Converted from a multi-feature 'overrides' mask to a single
'enable' boolean. Further features (e.g. pre-emption) will now be
separate 'enable' booleans added later. [C
From: John Harrison
Added trace points to the scheduler to track all the various events,
node state transitions and other interesting things that occur.
v2: Updated for new request completion tracking implementation.
v3: Updated for changes to node kill code.
v4: Wrapped some long lines to kee
From: John Harrison
GPU page faults can now require scheduler operation in order to
complete. For example, in order to free up sufficient memory to handle
the fault the handler must wait for a batch buffer to complete that
has not even been sent to the hardware yet. Thus EAGAIN no longer
means a
From: John Harrison
The scheduler decouples the submission of batch buffers to the driver
from their subsequent submission to the hardware. This means that an
application which is continuously submitting buffers as fast as it can
could potentialy flood the driver. To prevent this, the driver now
From: John Harrison
The scheduler can cause batch buffers, and hence requests, to be
submitted to the ring out of order and asynchronously to their
submission to the driver. Thus at the point of waiting for the
completion of a given request, it is not even guaranteed that the
request has actually
From: John Harrison
When the watchdog resets the GPU, all interrupts get disabled despite
the reference count remaining. As the scheduler probably had
interrupts enabled during the reset (it would have been waiting for
the bad batch to complete), it must be poked to tell it that the
interrupt has
From: John Harrison
When there are lots and lots and even more lots of contexts (e.g. when
running with execlists) it is useful to be able to immediately see
what the total context count is.
v4: Re-typed a variable (review feedback from Joonas)
For: VIZ-1587
Signed-off-by: John Harrison
Review
From: John Harrison
The seqno value is now only used for the final test for completion of
a request. It is no longer used to track the request through the
software stack. Thus it is no longer necessary to allocate the seqno
immediately with the request. Instead, it can be done lazily and left
unt
From: Dave Gordon
Keep a local copy of the request pointer in the _final() functions
rather than dereferencing the params block repeatedly.
v3: New patch in series.
v6: Updated to newer nightly (lots of ring -> engine renaming).
For: VIZ-1587
Signed-off-by: Dave Gordon
Signed-off-by: John Har
From: John Harrison
When requesting that all GPU work is completed, it is now necessary to
get the scheduler involved in order to flush out work that queued and
not yet submitted.
v2: Updated to add support for flushing the scheduler queue by time
stamp rather than just doing a blanket flush.
v
From: John Harrison
The scheduler decouples the submission of batch buffers to the driver
with submission of batch buffers to the hardware. Thus it is possible
for an application to close its DRM file handle while there is still
work outstanding. That means the scheduler needs to know about file
From: John Harrison
One of the major purposes of the GPU scheduler is to avoid stalling
the CPU when the GPU is busy and unable to accept more work. This
change adds support to the ring submission code to allow a ring space
check to be performed before attempting to submit a batch buffer to
the h
From: John Harrison
There are various parameters within the scheduler which can be tuned
to improve performance, reduce memory footprint, etc. This change adds
support for altering these via debugfs.
v2: Updated for priorities now being signed values.
v5: Squashed priority bumping entries into
From: John Harrison
The scheduler keeps its own lock on various DRM objects in order to
guarantee safe access long after the original execbuff IOCTL has
completed. This is especially important when pre-emption is enabled as
the batch buffer might need to be submitted to the hardware multiple
time
From: John Harrison
The scheduler needs to do interrupt triggered work that is too complex
to do in the interrupt handler. Thus it requires a deferred work
handler to process such tasks asynchronously.
v2: Updated to reduce mutex lock usage. The lock is now only held for
the minimum time within
From: John Harrison
The TDR code needs to know what the scheduler is up to in order to
work out whether a ring is really hung or not.
v4: Removed some unnecessary braces to keep the style checker happy.
v5: Removed white space and added documentation. [Joonas Lahtinen]
Also updated for new mod
From: John Harrison
When the seqno wraps around zero, the entire GPU is forced to be idle
for some reason (possibly only to work around issues with hardware
semaphores but no-one seems too sure!). This causes a problem if the
force idle occurs at an inopportune moment such as in the middle of
sub
From: John Harrison
Hardware sempahores require seqno values to be continuously
incrementing. However, the scheduler's reordering of batch buffers
means that the seqno values going through the hardware could be out of
order. Thus semaphores can not be used.
On the other hand, the scheduler super
From: John Harrison
It is useful for know what the scheduler is doing for both debugging
and performance analysis purposes. This change adds a bunch of
counters and such that keep track of various scheduler operations
(batches submitted, completed, flush requests, etc.). The data can
then be read
From: John Harrison
The seqno value cannot always be used when debugging issues via trace
points. This is because it can be reset back to start, especially
during TDR type tests. Also, when the scheduler arrives the seqno is
only valid while a given request is executing on the hardware. While
the
From: John Harrison
Implemented a batch buffer submission scheduler for the i915 DRM driver.
The general theory of operation is that when batch buffers are
submitted to the driver, the execbuffer() code assigns a unique seqno
value and then packages up all the information required to execute the
From: John Harrison
Initial creation of scheduler source files. Note that this patch
implements most of the scheduler functionality but does not hook it in
to the driver yet. It also leaves the scheduler code in 'pass through'
mode so that even when it is hooked in, it will not actually do very
m
From: John Harrison
The scheduler decouples the submission of batch buffers to the driver
with their submission to the hardware. This basically means splitting
the execbuffer() function in half. This change rearranges some code
ready for the split to occur.
v5: Dropped runtime PM calls as they c
From: John Harrison
The scheduler needs to know when requests have completed so that it
can keep its own internal state up to date and can submit new requests
to the hardware from its queue.
v2: Updated due to changes in request handling. The operation is now
reversed from before. Rather than th
From: John Harrison
The scheduler has always tracked batch buffer dependencies based on
DRM object usage. This means that it will not submit a batch on one
ring that has outstanding dependencies still executing on other rings.
This is exactly the same synchronisation performed by
i915_gem_object_
From: John Harrison
Added a facility for triggering the scheduler state dump via a debugfs
entry.
v2: New patch in series.
v6: Updated to newer nightly (lots of ring -> engine renaming).
Updated to use 'to_i915()' instead of dev_private. Converted all enum
labels to uppercase. [review feedback
From: John Harrison
The GPU scheduler has added an execution priority level to the context
object. There is an IOCTL interface to allow user apps/libraries to
set this priority. This patch updates the context paramter IOCTL test
to include the new interface.
For: VIZ-1587
Signed-off-by: John Har
From: Derek Morton
---
lib/Makefile.sources | 2 +
lib/igt.h | 1 +
lib/igt_bb_factory.c | 391 +
lib/igt_bb_factory.h | 47 ++
tests/Makefile.sources | 1 +
tests/gem_scheduler.c | 421 +++
From: John Harrison
When debugging batch buffer submission issues, it is useful to be able
to see what the current state of the scheduler is. This change adds
functions for decoding the internal scheduler state and reporting it.
v3: Updated a debug message with the new state_str() function.
v4:
From: John Harrison
There is a construct in the linux kernel called 'struct fence' that is
intended to keep track of work that is executed on hardware. I.e. it
solves the basic problem that the drivers 'struct
drm_i915_gem_request' is trying to address. The request structure does
quite a lot more
From: John Harrison
The intended usage model for struct fence is that the signalled status
should be set on demand rather than polled. That is, there should not
be a need for a 'signaled' function to be called everytime the status
is queried. Instead, 'something' should be done to enable a signal
From: John Harrison
The fence object used inside the request structure requires a sequence
number. Although this is not used by the i915 driver itself, it could
potentially be used by non-i915 code if the fence is passed outside of
the driver. This is the intention as it allows external kernel dr
From: John Harrison
Added the '_complete' trace event which occurs when a fence/request is
signaled as complete. Also moved the notify event from the IRQ handler
code to inside the notify function itself.
v3: Added the current ring seqno to the notify trace point.
v5: Line wrapping to keep the
From: John Harrison
There is a construct in the linux kernel called 'struct fence' that is
intended to keep track of work that is executed on hardware. I.e. it
solves the basic problem that the drivers 'struct
drm_i915_gem_request' is trying to address. The request structure does
quite a lot more
From: John Harrison
The change to the implementation of i915_gem_request_completed() means
that the lazy coherency flag is no longer used. This can now be
removed to simplify the interface.
For: VIZ-5190
Signed-off-by: John Harrison
---
drivers/gpu/drm/i915/i915_debugfs.c | 2 +-
drivers/gpu
From: John Harrison
The notify function can be called many times without the seqno
changing. A large number of duplicates are to prevent races due to the
requirement of not enabling interrupts until requested. However, when
interrupts are enabled the IRQ handle can be called multiple times
withou
From: John Harrison
The request structure is reference counted. When the count reached
zero, the request was immediately freed and all associated objects
were unrefereced/unallocated. This meant that the driver mutex lock
must be held at the point where the count reaches zero. This was fine
while
From: John Harrison
Split the execbuffer() function in half. The first half collects and
validates all the information required to process the batch buffer. It
also does all the object pinning, relocations, active list management,
etc - basically anything that must be done upfront before the IOCT
From: John Harrison
GPU page faults can now require scheduler operation in order to
complete. For example, in order to free up sufficient memory to handle
the fault the handler must wait for a batch buffer to complete that
has not even been sent to the hardware yet. Thus EAGAIN no longer
means a
From: John Harrison
When there are lots and lots and even more lots of contexts (e.g. when
running with execlists) it is useful to be able to immediately see
what the total context count is.
v4: Re-typed a variable (review feedback from Joonas)
For: VIZ-1587
Signed-off-by: John Harrison
Review
From: John Harrison
Implemented a batch buffer submission scheduler for the i915 DRM driver.
The general theory of operation is that when batch buffers are
submitted to the driver, the execbuffer() code assigns a unique seqno
value and then packages up all the information required to execute the
From: John Harrison
The scheduler needs to track interdependencies between batch buffers.
These are calculated by analysing the object lists of the buffers and
looking for commonality. The scheduler also needs to keep those
buffers locked long after the initial IOCTL call has returned to user
lan
From: John Harrison
The scheduler can cause batch buffers, and hence requests, to be
submitted to the ring out of order and asynchronously to their
submission to the driver. Thus at the point of waiting for the
completion of a given request, it is not even guaranteed that the
request has actually
601 - 700 of 1556 matches
Mail list logo