On Wed, 2025-06-18 at 11:47 -0300, Maíra Canal wrote:
> Xe can skip the reset if TDR has fired before the free job worker and
> can
> also re-arm the timeout timer in some scenarios. Instead of
> manipulating
> scheduler's internals, inform the scheduler that the job did not
> actually
> timeout an
On Thu, 2025-06-05 at 15:41 +0200, Philipp Stanner wrote:
> Since the drm_mock_scheduler does not have real users in userspace,
> nor
> does it have real hardware or firmware rings, it's not necessary to
> signal timedout fences nor free jobs - from a functional standpoint.
&g
On Mon, 2025-06-16 at 09:49 -0300, Maíra Canal wrote:
> Hi Danilo,
>
> On 16/06/25 08:14, Danilo Krummrich wrote:
> > On Mon, Jun 16, 2025 at 11:57:47AM +0100, Tvrtko Ursulin wrote:
> > > Code looks fine, but currently nothing is broken and I disagree
> > > with the
> > > goal that the _mock_^1 co
On Mon, 2025-06-16 at 10:27 +0100, Tvrtko Ursulin wrote:
>
> On 12/06/2025 15:20, Philipp Stanner wrote:
> > On Thu, 2025-06-12 at 15:17 +0100, Tvrtko Ursulin wrote:
> > >
> > > On 03/06/2025 10:31, Philipp Stanner wrote:
> > > > Since its inception
On Fri, 2025-06-13 at 10:23 +0200, Christian König wrote:
> On 6/13/25 01:48, Danilo Krummrich wrote:
> > On Thu, Jun 12, 2025 at 09:00:34AM +0200, Christian König wrote:
> > > On 6/11/25 17:11, Danilo Krummrich wrote:
> > > > > > > Mhm, reiterating our internal discussion on the mailing
> > > > >
about pitfalls.
Co-authored-by: Danilo Krummrich
Signed-off-by: Philipp Stanner
---
Changes in v2:
- Add new docu section for concurrency in the scheduler. (Sima)
- Document what an ordered workqueue passed to the scheduler can be
useful for. (Christian, Sima)
- Warn more detailed about pote
On Thu, 2025-06-12 at 15:17 +0100, Tvrtko Ursulin wrote:
>
> On 03/06/2025 10:31, Philipp Stanner wrote:
> > Since its inception, the GPU scheduler can leak memory if the
> > driver
> > calls drm_sched_fini() while there are still jobs in flight.
> >
> >
r new scheduler users. Therefore, they should approximate the
canonical usage as much as possible.
Make sure timed out hardware fences get signaled with the appropriate
error code.
Signed-off-by: Philipp Stanner
---
.../gpu/drm/scheduler/tests/mock_scheduler.c | 26 ++-
1
On Wed, 2025-06-04 at 17:07 +0200, Simona Vetter wrote:
> On Wed, Jun 04, 2025 at 11:41:25AM +0200, Christian König wrote:
> > On 6/4/25 10:16, Philipp Stanner wrote:
> > > struct drm_sched_init_args provides the possibility of letting
> > > the
> > > sche
n the documentation.
Suggested-by: Danilo Krummrich
Signed-off-by: Philipp Stanner
---
include/drm/gpu_scheduler.h | 7 +--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
index 81dcbfc8c223..11740d745223 100644
--- a/includ
On Tue, 2025-06-03 at 13:27 +0100, Tvrtko Ursulin wrote:
>
> On 03/06/2025 10:31, Philipp Stanner wrote:
> > An alternative version to [1], based on Tvrtko's suggestion from
> > [2].
> >
> > I tested this for Nouveau. Works.
> >
> > I'm having
nouveau_sched_fence_context_kill() the waitque is not necessary anymore.
Remove the waitque.
Signed-off-by: Philipp Stanner
---
drivers/gpu/drm/nouveau/nouveau_sched.c | 20 +++-
drivers/gpu/drm/nouveau/nouveau_sched.h | 9 +++--
drivers/gpu/drm/nouveau/nouveau_uvmm.c | 8
3 files
the hardware fence associated with the
job. Afterwards, the scheduler can savely use the established free_job()
callback for freeing the job.
Implement the new backend_ops callback cancel_job().
Suggested-by: Tvrtko Ursulin
Signed-off-by: Philipp Stanner
---
drivers/gpu/drm/scheduler
drm_sched_fini() can leak jobs under certain circumstances.
Warn if that happens.
Signed-off-by: Philipp Stanner
---
drivers/gpu/drm/scheduler/sched_main.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/gpu/drm/scheduler/sched_main.c
b/drivers/gpu/drm/scheduler/sched_main.c
hardware fence.
That should be repaired and cleaned up, but it's probably better to do
that in a separate series.
Signed-off-by: Philipp Stanner
---
.../gpu/drm/scheduler/tests/mock_scheduler.c | 71 +++
drivers/gpu/drm/scheduler/tests/sched_tests.h | 4 +-
2 files change
There is a new callback for always tearing the scheduler down in a
leak-free, deadlock-free manner.
Port Nouveau as its first user by providing the scheduler with a
callback that ensures the fence context gets killed in drm_sched_fini().
Signed-off-by: Philipp Stanner
---
drivers/gpu/drm
: Philipp Stanner
---
drivers/gpu/drm/nouveau/nouveau_fence.c | 20 +++-
drivers/gpu/drm/nouveau/nouveau_fence.h | 6 ++
2 files changed, 13 insertions(+), 13 deletions(-)
diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.c
b/drivers/gpu/drm/nouveau/nouveau_fence.c
index
ps://lore.kernel.org/dri-devel/20250418113211.69956-1-tvrtko.ursu...@igalia.com/
Philipp Stanner (6):
drm/sched: Avoid memory leaks with cancel_job() callback
drm/sched/tests: Implement cancel_job()
drm/sched: Warn if pending list is not empty
drm/nouveau: Make fence container helper usable driver-wide
es: 704d3d60fec4 ("drm/etnaviv: don't block scheduler when GPU is
> still active")
Could also contain a "Closes: " with the link to the appropriate
message from thread [1] from below.
You might also include "Reported-by: Philipp" since I technically first
describ
On Mon, 2025-06-02 at 08:36 -0300, Maíra Canal wrote:
> Hi Philipp,
>
> On 02/06/25 04:28, Philipp Stanner wrote:
> > On Fri, 2025-05-30 at 11:01 -0300, Maíra Canal wrote:
>
> [...]
>
> > > diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c
> > >
On Tue, 2025-05-27 at 12:10 +0200, Philipp Stanner wrote:
> There is no need for separate locks for single jobs and the entire
> scheduler. The dma_fence context can be protected by the scheduler
> lock,
> allowing for removing the jobs' locks. This simplifies things and
> re
I'd call that patch sth like "Make timeout unit tests faster". Makes
more obvious what it's about.
P.
On Fri, 2025-05-30 at 11:01 -0300, Maíra Canal wrote:
> As more KUnit tests are introduced to evaluate the basic capabilities
> of
> the `timedout_job()` hook, the test suite will continue to inc
On Fri, 2025-05-30 at 11:01 -0300, Maíra Canal wrote:
> Xe can skip the reset if TDR has fired before the free job worker and
> can
> also re-arm the timeout timer in some scenarios. Instead of using the
> scheduler internals to add the job to the pending list, use the
> DRM_GPU_SCHED_STAT_NO_HANG
On Fri, 2025-05-30 at 11:01 -0300, Maíra Canal wrote:
> Etnaviv can skip a hardware reset in two situations:
>
> 1. TDR has fired before the free-job worker and the timeout is
> spurious.
> 2. The GPU is still making progress on the front-end and we can
> give
> the job a chance to comple
On Fri, 2025-05-30 at 11:01 -0300, Maíra Canal wrote:
> When a CL/CSD job times out, we check if the GPU has made any
> progress
> since the last timeout. If so, instead of resetting the hardware, we
> skip
> the reset and allow the timer to be rearmed. This gives long-running
> jobs
> a chance to
Hi,
thx for the update. Seems to be developing nicely. Some comments below.
On Fri, 2025-05-30 at 11:01 -0300, Maíra Canal wrote:
> When the DRM scheduler times out, it's possible that the GPU isn't
> hung;
> instead, a job may still be running, and there may be no valid reason
> to
> reset the h
On Mon, 2025-05-26 at 14:54 +0200, Pierre-Eric Pelloux-Prayer wrote:
> Hi,
>
> The initial goal of this series was to improve the drm and amdgpu
> trace events to be able to expose more of the inner workings of
> the scheduler and drivers to developers via tools.
>
> Then, the series evolved to b
scheduler lock.
Signed-off-by: Philipp Stanner
---
Changes in v2:
- Make commit message more neutral by stating it's about simplifying
the code. (Tvrtko)
---
drivers/gpu/drm/scheduler/tests/mock_scheduler.c | 5 ++---
drivers/gpu/drm/scheduler/tests/sched_tests.h| 1 -
2 files change
On Mon, 2025-05-26 at 13:16 +0200, Christian König wrote:
> On 5/26/25 11:34, Philipp Stanner wrote:
> > On Mon, 2025-05-26 at 11:25 +0200, Christian König wrote:
> > > On 5/23/25 16:16, Danilo Krummrich wrote:
> > > > On Fri, May 23, 2025 at 04:11:39PM +0200,
On Fri, 2025-05-23 at 14:56 +0200, Christian König wrote:
> It turned out that we can actually massively optimize here.
>
> The previous code was horrible inefficient since it constantly
> released
> and re-acquired the lock of the xarray and started each iteration
> from the
> base of the array t
On Mon, 2025-05-26 at 11:25 +0200, Christian König wrote:
> On 5/23/25 16:16, Danilo Krummrich wrote:
> > On Fri, May 23, 2025 at 04:11:39PM +0200, Danilo Krummrich wrote:
> > > On Fri, May 23, 2025 at 02:56:40PM +0200, Christian König wrote:
> > > > It turned out that we can actually massively opt
+Cc Matthew, again :)
On Thu, 2025-05-22 at 18:19 +0200, Christian König wrote:
> On 5/22/25 16:27, Tvrtko Ursulin wrote:
> >
> > On 22/05/2025 14:41, Christian König wrote:
> > > Since we already iterated over the xarray we know at which index
> > > the new
> > > entry should be stored. So inste
On Thu, 2025-05-22 at 14:37 +0100, Tvrtko Ursulin wrote:
>
> On 22/05/2025 09:27, Philipp Stanner wrote:
> > From: Philipp Stanner
> >
> > The GPU scheduler currently does not ensure that its pending_list
> > is
> > empty before performing various other
On Thu, 2025-05-22 at 15:06 +0100, Tvrtko Ursulin wrote:
>
> On 22/05/2025 09:27, Philipp Stanner wrote:
> > The drm_gpu_scheduler now supports a callback to help
> > drm_sched_fini()
> > avoid memory leaks. This callback instructs the driver to signal
> > a
On Wed, 2025-05-21 at 11:24 +0100, Tvrtko Ursulin wrote:
>
> On 21/05/2025 11:04, Philipp Stanner wrote:
> > When the unit tests were implemented, each scheduler job got its
> > own,
> > distinct lock. This is not how dma_fence context locking rules are
> > t
On Thu, 2025-05-22 at 15:24 +0200, Christian König wrote:
> On 5/22/25 15:16, Philipp Stanner wrote:
> > On Thu, 2025-05-22 at 15:09 +0200, Christian König wrote:
> > > On 5/22/25 14:59, Danilo Krummrich wrote:
> > > > On Thu, May 22, 2025 at 02:34:33PM +0200,
On Thu, 2025-05-22 at 15:09 +0200, Christian König wrote:
> On 5/22/25 14:59, Danilo Krummrich wrote:
> > On Thu, May 22, 2025 at 02:34:33PM +0200, Christian König wrote:
> > > See all the functions inside include/linux/dma-fence.h can be
> > > used by everybody. It's basically the public interface
On Thu, 2025-05-22 at 14:34 +0200, Christian König wrote:
> On 5/22/25 14:20, Philipp Stanner wrote:
> > On Thu, 2025-05-22 at 14:06 +0200, Christian König wrote:
> > > On 5/22/25 13:25, Philipp Stanner wrote:
> > > > dma_fence_is_signa
On Thu, 2025-05-22 at 14:06 +0200, Christian König wrote:
> On 5/22/25 13:25, Philipp Stanner wrote:
> > dma_fence_is_signaled_locked(), which is used in
> > nouveau_fence_context_kill(), can signal fences below the surface
> > through a callback.
> >
> > The
ed. Use it internally.
Suggested-by: Tvrtko Ursulin
Signed-off-by: Philipp Stanner
---
include/linux/dma-fence.h | 24 ++--
1 file changed, 22 insertions(+), 2 deletions(-)
diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h
index 48b5202c531d..ac951a54a007 10
which only checks, never signals.
Signed-off-by: Philipp Stanner
---
drivers/gpu/drm/nouveau/nouveau_fence.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.c
b/drivers/gpu/drm/nouveau/nouveau_fence.c
index d5654e26d5bc..993b3dcb5db0
nouveau_sched_fence_context_kill() the waitque is not necessary anymore.
Remove the waitque.
Signed-off-by: Philipp Stanner
---
drivers/gpu/drm/nouveau/nouveau_sched.c | 20 +++-
drivers/gpu/drm/nouveau/nouveau_sched.h | 9 +++--
drivers/gpu/drm/nouveau/nouveau_uvmm.c | 8
3 files
There is a new callback for always tearing the scheduler down in a
leak-free, deadlock-free manner.
Port Nouveau as its first user by providing the scheduler with a
callback that ensures the fence context gets killed in drm_sched_fini().
Signed-off-by: Philipp Stanner
---
drivers/gpu/drm
ovide users with a more
reliable, clean scheduler API.
Philipp
Philipp Stanner (5):
drm/sched: Fix teardown leaks with waitqueue
drm/sched/tests: Port tests to new cleanup method
drm/sched: Warn if pending list is not empty
drm/nouveau: Add new callback for scheduler teardown
drm/nouveau: Remove
drm_sched_fini() can leak jobs under certain circumstances.
Warn if that happens.
Signed-off-by: Philipp Stanner
---
drivers/gpu/drm/scheduler/sched_main.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/gpu/drm/scheduler/sched_main.c
b/drivers/gpu/drm/scheduler/sched_main.c
a new error
field for the fence error.
Keep the job status as DRM_MOCK_SCHED_JOB_DONE for now, since there is
no party for which checking for a CANCELED status would be useful
currently.
Signed-off-by: Philipp Stanner
---
.../gpu/drm/scheduler/tests/mock_scheduler.c | 67
From: Philipp Stanner
The GPU scheduler currently does not ensure that its pending_list is
empty before performing various other teardown tasks in
drm_sched_fini().
If there are still jobs in the pending_list, this is problematic because
after scheduler teardown, no one will call
dma_fence rules, e.g., ensuring that only one fence gets
signaled at a time.
Use the fence context (scheduler) lock for the jobs.
Signed-off-by: Philipp Stanner
---
drivers/gpu/drm/scheduler/tests/mock_scheduler.c | 5 ++---
drivers/gpu/drm/scheduler/tests/sched_tests.h| 1 -
2 files changed
On Tue, 2025-05-20 at 17:15 +0100, Tvrtko Ursulin wrote:
>
> On 19/05/2025 10:04, Philipp Stanner wrote:
> > On Mon, 2025-05-19 at 09:51 +0100, Tvrtko Ursulin wrote:
> > >
> > > On 16/05/2025 18:16, Philipp Stanner wrote:
> > > > On Fri, 2025-
On Mon, 2025-05-19 at 13:02 +0200, Pierre-Eric Pelloux-Prayer wrote:
>
>
> Le 15/05/2025 à 08:53, Pierre-Eric Pelloux-Prayer a écrit :
> > Hi,
> >
> > Le 14/05/2025 à 14:44, Philipp Stanner a écrit :
> > > On Thu, 2025-04-24 at 10:38 +0200, Pierre-Eric Pell
On Mon, 2025-05-19 at 09:51 +0100, Tvrtko Ursulin wrote:
>
> On 16/05/2025 18:16, Philipp Stanner wrote:
> > On Fri, 2025-05-16 at 15:30 +0100, Tvrtko Ursulin wrote:
> > >
> > > On 16/05/2025 14:38, Philipp Stanner wrote:
> > > > On Fri, 2025-
On Fri, 2025-05-16 at 15:30 +0100, Tvrtko Ursulin wrote:
>
> On 16/05/2025 14:38, Philipp Stanner wrote:
> > On Fri, 2025-05-16 at 13:10 +0100, Tvrtko Ursulin wrote:
> > >
> > > On 16/05/2025 12:53, Tvrtko Ursulin wrote:
> > > >
> > > > On
On Fri, 2025-05-16 at 13:10 +0100, Tvrtko Ursulin wrote:
>
> On 16/05/2025 12:53, Tvrtko Ursulin wrote:
> >
> > On 16/05/2025 08:28, Philipp Stanner wrote:
> > > On Thu, 2025-05-15 at 17:17 +0100, Tvrtko Ursulin wrote:
> > > >
> > &
On Fri, 2025-05-16 at 10:33 +0100, Tvrtko Ursulin wrote:
>
> On 24/04/2025 10:55, Philipp Stanner wrote:
> > The waitqueue that ensures that drm_sched_fini() blocks until the
> > pending_list has become empty could theoretically cause that
> > function to
> > bl
that will never be resolved. Fix this issue by ensuring
> that
> scheduled fences are properly signaled when an entity is killed,
> allowing
> dependent applications to continue execution.
That sounds perfect, yes, Thx.
Reviewed-by: Philipp Stanner
P.
>
> Thanks,
>
On Thu, 2025-05-15 at 17:17 +0100, Tvrtko Ursulin wrote:
>
> On 15/05/2025 16:00, Christian König wrote:
> > Sometimes drivers need to be able to submit multiple jobs which
> > depend on
> > each other to different schedulers at the same time, but using
> > drm_sched_job_add_dependency() can't fai
Hello,
On Wed, 2025-05-14 at 09:59 -0700, Rob Clark wrote:
> From: Rob Clark
>
> Similar to the existing credit limit mechanism, but applying to jobs
> enqueued to the scheduler but not yet run.
>
> The use case is to put an upper bound on preallocated, and
> potentially
> unneeded, pgtable pag
ssue simply that the
fence might be dropped unsignaled, being a bug by definition? Needs to
be written down.
Grammar is also a bit too broken.
And running the unit tests before pushing is probably also a good idea.
> >
> > Signed-off-by: Lin.Cao
Acked-by: Philipp Stanner
>
> Revie
On Thu, 2025-04-24 at 10:38 +0200, Pierre-Eric Pelloux-Prayer wrote:
> This commit adds a document section in drm-uapi.rst about
> tracepoints,
> and mark the events gpu_scheduler_trace.h as stable uAPI.
>
> The goal is to explicitly state that tools can rely on the fields,
> formats and semantics
On Thu, 2025-04-24 at 10:38 +0200, Pierre-Eric Pelloux-Prayer wrote:
> Its only purpose was for trace events, but jobs can already be
> uniquely identified using their fence.
>
> The downside of using the fence is that it's only available
> after 'drm_sched_job_arm' was called which is true for al
nit: title: s/gpu/GPU
We also mostly start with an upper case letter after the :, but JFYI,
it's not a big deal.
P.
On Thu, 2025-04-24 at 10:38 +0200, Pierre-Eric Pelloux-Prayer wrote:
> We can't trace dependencies from drm_sched_job_add_dependency
> because when it's called the job's fence is
On Thu, 2025-04-24 at 10:38 +0200, Pierre-Eric Pelloux-Prayer wrote:
> This will be used in a later commit to trace the drm client_id in
> some of the gpu_scheduler trace events.
>
> This requires changing all the users of drm_sched_job_init to
> add an extra parameter.
>
> The newly added drm_cl
On Wed, 2025-05-14 at 09:30 +0100, Tvrtko Ursulin wrote:
>
> On 12/05/2025 09:00, Philipp Stanner wrote:
> > On Thu, 2025-05-08 at 13:51 +0100, Tvrtko Ursulin wrote:
> > >
> > > Hi Philipp,
> > >
> > > On 08/05/2025 12:03, Philipp Stanner
-managed pcim_request_all_regions().
Signed-off-by: Philipp Stanner
Reviewed-by: Zack Rusin
---
Changes in v3:
- Use the correct driver name in the commit message. (Zack)
Changes in v2:
- Fix unused variable error.
---
drivers/gpu/drm/vmwgfx/vmwgfx_drv.c | 14 +++---
1 file changed, 3
On Sat, 2025-05-03 at 17:59 -0300, Maíra Canal wrote:
> When the DRM scheduler times out, it's possible that the GPU isn't
> hung;
> instead, a job may still be running, and there may be no valid reason
> to
> reset the hardware. This can occur in two situations:
>
> 1. The GPU exposes some mech
On Mon, 2025-05-12 at 16:09 +0200, Philipp Stanner wrote:
> On Mon, 2025-05-12 at 11:04 -0300, Maíra Canal wrote:
> > Hi Philipp,
> >
> > On 12/05/25 08:13, Philipp Stanner wrote:
> > > On Tue, 2025-05-06 at 07:32 -0700, Matthew Brost wrote:
> > > >
On Mon, 2025-05-12 at 11:04 -0300, Maíra Canal wrote:
> Hi Philipp,
>
> On 12/05/25 08:13, Philipp Stanner wrote:
> > On Tue, 2025-05-06 at 07:32 -0700, Matthew Brost wrote:
> > > On Mon, May 05, 2025 at 07:41:09PM -0700, Matthew Brost wrote:
> > > > On S
o Ursulin
> Cc: Christian König
> Cc: Danilo Krummrich
> Cc: Matthew Brost
> Cc: Philipp Stanner
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 6 ++--
> drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 6 ++--
> drivers/gpu/drm/amd/amdgpu/amdgpu_job.h | 5 +++-
>
every
> popped job.
That there is no need to do so doesn't imply that you can't keep them
around. The commit message doesn't make the motivation clear
>
> Signed-off-by: Tvrtko Ursulin
> Cc: Christian König
> Cc: Danilo Krummrich
> Cc: Matthew Brost
> C
gt; completed jobs as soon as possible so the metric is most up to date
> when
> view from the submission side of things.
>
> Signed-off-by: Tvrtko Ursulin
> Cc: Christian König
> Cc: Danilo Krummrich
> Cc: Matthew Brost
> Cc: Philipp Stanner
> ---
&
he function.
Same here, that's a good candidate for a separate patch / series.
P.
>
> Signed-off-by: Tvrtko Ursulin
> Cc: Christian König
> Cc: Danilo Krummrich
> Cc: Matthew Brost
> Cc: Philipp Stanner
> ---
> drivers/gpu/drm/scheduler/sched_main.c | 37 +++
heduling policy, not general other improvements.
P.
>
> Signed-off-by: Tvrtko Ursulin
> Cc: Christian König
> Cc: Danilo Krummrich
> Cc: Matthew Brost
> Cc: Philipp Stanner
> ---
> drivers/gpu/drm/scheduler/sched_main.c | 39 +++-
> --
> 1
On Tue, 2025-05-06 at 07:32 -0700, Matthew Brost wrote:
> On Mon, May 05, 2025 at 07:41:09PM -0700, Matthew Brost wrote:
> > On Sat, May 03, 2025 at 05:59:52PM -0300, Maíra Canal wrote:
> > > When the DRM scheduler times out, it's possible that the GPU
> > > isn't hung;
> > > instead, a job may sti
On Sat, 2025-05-03 at 17:59 -0300, Maíra Canal wrote:
> When the DRM scheduler times out, it's possible that the GPU isn't
> hung;
> instead, a job may still be running, and there may be no valid reason
> to
> reset the hardware. This can occur in two situations:
>
> 1. The GPU exposes some mech
On Wed, 2025-05-07 at 13:50 +0100, Tvrtko Ursulin wrote:
>
> On 07/05/2025 13:33, Maíra Canal wrote:
> > Hi Tvrtko,
> >
> > Thanks for the review!
> >
> > On 06/05/25 08:32, Tvrtko Ursulin wrote:
> > >
> > > On 03/05/2025 21:59, Maíra Canal wrote:
> > > > When the DRM scheduler times out, it's
On Thu, 2025-05-08 at 13:51 +0100, Tvrtko Ursulin wrote:
>
> Hi Philipp,
>
> On 08/05/2025 12:03, Philipp Stanner wrote:
> > On Thu, 2025-04-24 at 11:55 +0200, Philipp Stanner wrote:
> > > The unit tests so far took care manually of avoiding memory leaks
> > >
Hi,
On Fri, 2025-05-09 at 14:29 -0700, Rob Clark wrote:
> From: Rob Clark
>
> The fence can outlive the sched, so it is not safe to dereference the
> sched in drm_sched_fence_get_timeline_name()
Thx for the fix. Looks correct to me. Some nits
>
> Signed-off-by: Rob Clark
This is clearly a b
On Thu, 2025-05-08 at 11:39 -0400, Zack Rusin wrote:
> On Thu, May 8, 2025 at 6:40 AM Philipp Stanner
> wrote:
> >
> > On Wed, 2025-04-23 at 14:06 +0200, Philipp Stanner wrote:
> > > vmgfx enables its PCI device with pcim_enable_device(). This,
> > &g
On Thu, 2025-05-08 at 12:44 +0200, Javier Martinez Canillas wrote:
> Philipp Stanner writes:
>
> Hello Philipp,
>
> > On Tue, 2025-04-22 at 23:51 +0200, Javier Martinez Canillas wrote:
> > > Philipp Stanner writes:
> > >
> > > Hello Philipp,
>
On Thu, 2025-05-08 at 11:39 -0400, Zack Rusin wrote:
> On Thu, May 8, 2025 at 6:40 AM Philipp Stanner
> wrote:
> >
> > On Wed, 2025-04-23 at 14:06 +0200, Philipp Stanner wrote:
> > > vmgfx enables its PCI device with pcim_enable_device(). This,
> > &g
On Thu, 2025-04-24 at 11:55 +0200, Philipp Stanner wrote:
> The unit tests so far took care manually of avoiding memory leaks
> that
> might have occurred when calling drm_sched_fini().
>
> The scheduler now takes care by itself of avoiding memory leaks if
> the
> driver
On Wed, 2025-04-23 at 14:06 +0200, Philipp Stanner wrote:
> vmgfx enables its PCI device with pcim_enable_device(). This,
> implicitly, switches the function pci_request_regions() into managed
> mode, where it becomes a devres function.
>
> The PCI subsystem wants to remove thi
On Tue, 2025-04-22 at 23:51 +0200, Javier Martinez Canillas wrote:
> Philipp Stanner writes:
>
> Hello Philipp,
>
> > cirrus enables its PCI device with pcim_enable_device(). This,
> > implicitly, switches the function pci_request_regions() into
> > managed
>
On Mon, 2025-04-28 at 16:45 +0200, Christian König wrote:
> On 4/24/25 15:02, Philipp Stanner wrote:
> > In nouveau_fence_done(), a fence is checked for being signaled by
> > manually evaluating the base fence's bits. This can be done in a
> > canonical manner thr
t;
> -Original Message-
> From: Koenig, Christian
> Sent: Tuesday, April 29, 2025 12:49 PM
> To: Khatri, Sunil ;
> dri-devel@lists.freedesktop.org; Danilo Krummrich ;
> Philipp Stanner
> Cc: Deucher, Alexander ; Tvrtko Ursulin
> ; Pelloux-Prayer, Pierre-Eric
>
&
nouveau_fence_signal() returns a de-facto boolean to indicate when
nvif_event_block() shall be called.
The code can be made more compact and readable by calling
nvif_event_block() in nouveau_fence_update() directly.
Make those calls in nouveau_fence.c more canonical.
Signed-off-by: Philipp
nouveau_fence.c iterates over lists in a non-canonical way. Since the
operations done are just basic for-each-loops and list-empty checks,
they should be written in the standard form.
Use standard list operations.
Signed-off-by: Philipp Stanner
---
drivers/gpu/drm/nouveau/nouveau_fence.c | 21
On Thu, 2025-04-24 at 15:24 +0200, Danilo Krummrich wrote:
> On 4/24/25 3:02 PM, Philipp Stanner wrote:
> > In nouveau_fence_done(), a fence is checked for being signaled by
> > manually evaluating the base fence's bits. This can be done in a
> > canonical manner thr
nouveau_fence_done() contains an if branch that checks whether a
nouveau_fence has either of the two existing nouveau_fence backend ops,
which will always evaluate to true.
Remove the surplus check.
Signed-off-by: Philipp Stanner
---
drivers/gpu/drm/nouveau/nouveau_fence.c | 24
In nouveau_fence_done(), a fence is checked for being signaled by
manually evaluating the base fence's bits. This can be done in a
canonical manner through dma_fence_is_signaled().
Replace the bit-check with dma_fence_is_signaled().
Signed-off-by: Philipp Stanner
---
drivers/gpu/drm/no
/
Philipp Stanner (4):
drm/nouveau: nouveau_fence: Standardize list iterations
drm/nouveau: Simplify calls to nvif_event_block()
drm/nouveau: Simplify nouveau_fence_done()
drm/nouveau: Check dma_fence in canonical way
drivers/gpu/drm/nouveau/nouveau_fence.c | 72 +++--
1 file
There is a new callback for always tearing the scheduler down in a
leak-free, deadlock-free manner.
Port Nouveau as its first user by providing the scheduler with a
callback that ensures the fence context gets killed in drm_sched_fini().
Signed-off-by: Philipp Stanner
---
drivers/gpu/drm
the unit tests. Remove the manual cleanup
code.
Signed-off-by: Philipp Stanner
---
.../gpu/drm/scheduler/tests/mock_scheduler.c | 34 ---
1 file changed, 21 insertions(+), 13 deletions(-)
diff --git a/drivers/gpu/drm/scheduler/tests/mock_scheduler.c
b/drivers/gpu/drm/scheduler
drm_sched_fini() can leak jobs under certain circumstances.
Warn if that happens.
Signed-off-by: Philipp Stanner
---
drivers/gpu/drm/scheduler/sched_main.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/gpu/drm/scheduler/sched_main.c
b/drivers/gpu/drm/scheduler/sched_main.c
callback is not implemented.
Suggested-by: Danilo Krummrich
Signed-off-by: Philipp Stanner
---
drivers/gpu/drm/scheduler/sched_main.c | 47 +-
include/drm/gpu_scheduler.h| 11 ++
2 files changed, 42 insertions(+), 16 deletions(-)
diff --git a/drivers/gpu
nouveau_sched_fence_context_kill() the waitque is not necessary anymore.
Remove the waitque.
Signed-off-by: Philipp Stanner
---
drivers/gpu/drm/nouveau/nouveau_sched.c | 20 +++-
drivers/gpu/drm/nouveau/nouveau_sched.h | 9 +++--
drivers/gpu/drm/nouveau/nouveau_uvmm.c | 8
3 files
From: Philipp Stanner
The GPU scheduler currently does not ensure that its pending_list is
empty before performing various other teardown tasks in
drm_sched_fini().
If there are still jobs in the pending_list, this is problematic because
after scheduler teardown, no one will call
ks fine and
solves the problem (though we did discover an unrelated problem inside
Nouveau in the process).
It also works with the unit tests.
I'm looking forward to your input and feedback. I really hope we can
work this RFC into something that can provide users with a more
reliable, clean
-managed pcim_request_all_regions().
Signed-off-by: Philipp Stanner
---
Changes in v2:
- Fix unused variable error.
---
drivers/gpu/drm/vmwgfx/vmwgfx_drv.c | 14 +++---
1 file changed, 3 insertions(+), 11 deletions(-)
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
b/drivers/gpu/drm
On Tue, 2025-04-22 at 16:08 +0200, Danilo Krummrich wrote:
> On Tue, Apr 22, 2025 at 02:39:21PM +0100, Tvrtko Ursulin wrote:
> >
> > On 22/04/2025 13:32, Danilo Krummrich wrote:
> > > On Tue, Apr 22, 2025 at 01:07:47PM +0100, Tvrtko Ursulin wrote:
> > > >
> > > > On 22/04/2025 12:13, Danilo Krumm
1 - 100 of 592 matches
Mail list logo