Re: drm_sched run_job and scheduling latency

Matthew Brost Sun, 15 Mar 2026 21:05:44 -0700

On Thu, Mar 05, 2026 at 10:47:32AM +0100, Philipp Stanner wrote:

Off the list... I don’t think airing our personal attacks publicly is a
good look. I’m going to be blunt here in an effort to help you.

> On Thu, 2026-03-05 at 01:10 -0800, Matthew Brost wrote:
> > On Thu, Mar 05, 2026 at 09:38:16AM +0100, Philipp Stanner wrote:
> > > On Thu, 2026-03-05 at 09:27 +0100, Boris Brezillon wrote:
> > > 
> > > > 
> 
> […]
> 
> > > > Honestly, I'm not thrilled by this fast-path/call-run_job-directly idea
> > > > you're describing. There's just so many things we can forget that would
> > > > lead to races/ordering issues that will end up being hard to trigger and
> > > > debug.
> > > > 
> > > 
> > > +1
> > > 
> > > I'm not thrilled either. More like the opposite of thrilled actually.
> > > 
> > > Even if we could get that to work. This is more of a maintainability
> > > issue.
> > > 
> > > The scheduler is full of insane performance hacks for this or that
> > > driver. Lockless accesses, a special lockless queue only used by that
> > > one party in the kernel (a lockless queue which is nowadays, after N
> > > reworks, being used with a lock. Ah well).
> > > 
> > 
> > This is not relevant to this discussion—see below. In general, I agree
> > that the lockless tricks in the scheduler are not great, nor is the fact
> > that the scheduler became a dumping ground for driver-specific features.
> > But again, that is not what we’re talking about here—see below.
> > 
> > > In the past discussions Danilo and I made it clear that more major
> > > features in _new_ patch series aimed at getting merged into drm/sched
> > > must be preceded by cleanup work to address some of the scheduler's
> > > major problems.
> > 
> > Ah, we've moved to dictatorship quickly. Noted.
> 
> I prefer the term "benevolent presidency" /s
> 
> Or even better: s/dictatorship/accountability enforcement.
> 

It’s very hard to take this seriously when I reply to threads saying
something breaks dma-fence rules and the response is, “what are
dma-fence rules?” Or I read through the jobqueue thread and see you
asking why a dma-fence would come from anywhere other than your own
driver — that’s the entire point of dma-fence; it’s a cross-driver
contract. I could go on, but I’d encourage you to take a hard look at
your understanding of DRM, and whether your responses — to me and to
others — are backed by the necessary technical knowledge.

Even better — what first annoyed me was your XDC presentation. You gave
an example of my driver modifying the pending list without a lock while
scheduling was stopped, and claimed you fixed a bug. That was not a bug
- Xe would explode if it was as we test our code. The pending list can
be modified without a lock if scheduling is stopped. I almost grabbed
the mic to correct you. Yes, it’s a layering violation, but presenting
it aa a bug shows a clear lack of understanding.

> How does it come that everyone is here and ready so quickly when it

I’ve suggested ideas to fix DRM sched (refcounting, clear teardown
flows), but they were immediately met with resistance — typically from
Christian with you agreeing. My willingness to fight with Christian is
low; I really don’t need another person to argue with.

> comes to new use cases and features, yet I never saw anyone except for
> Tvrtko and Maíra investing even 15 minutes to write a simple patch to
> address some of the *various* significant issues in that code base?
> 
> You were on CC on all discussions we've had here for the last years
> afair, but I rarely saw you participate. And you know what it's like:

I’ll admit I’m busy with many other things, so my bandwidth is limited.
But again, if I chime in and explain how I solved something in Xe (e.g.,
refcounting) and it’s met with resistance, I’ll likely move on — I’ve
already solved it, and I’ll just let you fail (see cancel_job).

> who doesn't speak up silently agrees in open source.
> 
> But tell me one thing, if you can be so kind:
> 

I'm glad you asked this, and it inspired me to fix this, more below [1].

> What is your theory why drm/sched came to be in such horrible shape?

drm/sched was ported from AMDGPU into common code. It carried many
AMDGPU-specific hacks, had no object-lifetime model thought out as a
common component, and included teardown nightmares that “worked,” but
other drivers immediately had to work around. With Christian involved —
who is notoriously hostile — everyone did their best to paper over
issues driver-side rather than get into fights and fix things properly.
Asahi Linux publicly aired grievances about this situation years ago.

> What circumstances, what human behavioral patterns have caused this?
> 

See above.

> The DRM subsystem has a bad reputation regarding stability among Linux
> users, as far as I have sensed. How can we do better?
> 

Write sane code and test it. fwiw, Google shared a doc with me
indicating that Xe has unprecedented stability, and to be honest, when I
first wrote Xe I barely knew what I was doing — but I did know how to
test. I’ve since cleaned up most of my mistakes though.

So how can we do better... We can [1].

I started on [1] after you asking what the problems in DRM sched - which
got me thinking about what it would look like if we took the good parts
(stop/start control plane, dependency tracking, ordering, finished
fences, etc.), dropped the bad parts (no object-lifetime model, no
refcounting, overly complex queue teardown, messy fence manipulation,
hardware-scheduling baggage, lack of annotations, etc.), and wrote
something that addresses all of these problems from the start
specifically for firmware-scheduling models.

It turns out pretty good.

Main patch [2].

Xe is fully converted, tested, and working. AMDXNDA and Panthor are
compiling. Nouveau and PVR seem like good candidates to convert as well.
Rust bindings are also possible given the clear object model with
refcounting and well-defined object lifetimes.

Thinking further, hardware schedulers should be able to be implemented
on top of this by embedding the objects in [2] and layering a
backend/API on top.

Let me know if you have any feedback (off-list) before I share this
publicly. So far, Dave, Sima, Danilo, and the other Xe maintainers have
been looped in.

Matt

[1] 
https://gitlab.freedesktop.org/mbrost/xe-kernel-driver-svn-perf-6-15-2025/-/tree/local_dev/new_scheduler.post?ref_type=heads
[2] 
https://gitlab.freedesktop.org/mbrost/xe-kernel-driver-svn-perf-6-15-2025/-/commit/0538a3bc2a3b562dc0427a5922958189e0be8271

> > 
> > > 
> > 
> > I can't say I agree with either of you here.
> > 
> > In about an hour, I seemingly have a bypass path working in DRM sched +
> > Xe, and my diff is:
> > 
> > 108 insertions(+), 31 deletions(-)
> 
> LOC is a bad metric for complexity.
> 
> > 
> > About 40 lines of the insertions are kernel-doc, so I'm not buying that
> > this is a maintenance issue or a major feature - it is literally a
> > single new function.
> > 
> > I understand a bypass path can create issues—for example, on certain
> > queues in Xe I definitely can't use the bypass path, so Xe simply
> > wouldn’t use it in those cases. This is the driver's choice to use or
> > not. If a driver doesn't know how to use the scheduler, well, that’s on
> > the driver. Providing a simple, documented function as a fast path
> > really isn't some crazy idea.
> 
> We're effectively talking about a deviation from the default submission
> mechanism, and all that seems to be desired for a luxury feature.
> 
> Then you end up with two submission mechanisms, whose correctness in
> the future relies on someone remembering what the background was, why
> it was added, and what the rules are..
> 
> The current scheduler rules are / were often not even documented, and
> sometimes even Christian took a few weeks to remember again why
> something had been added – and whether it can now be removed again or
> not.
> 
> > 
> > The alternative—asking for RT workqueues or changing the design to use
> > kthread_worker—actually is.
> > 
> > > That's especially true if it's features aimed at performance buffs.
> > > 
> > 
> > With the above mindset, I'm actually very confused why this series [1]
> > would even be considered as this order of magnitude greater in
> > complexity than my suggestion here.
> > 
> > Matt
> > 
> > [1] https://patchwork.freedesktop.org/series/159025/ 
> 
> The discussions about Tvrtko's CFS series were precisely the point
> where Danilo brought up that after this can be merged, future rework of
> the scheduler must focus on addressing some of the pending fundamental
> issues.
> 
> The background is that Tvrtko has worked on that series already for
> well over a year, it actually simplifies some things in the sense of
> removing unused code (obviously it's a complex series, no argument
> about that), and we agreed on XDC that this can be merged. So this is a
> question of fairness to the contributor.
> 
> But at one point you have to finally draw a line. No one will ever
> address major scheduler issues unless we demand it. Even very
> experienced devs usually prefer to hack around the central design
> issues in their drivers instead of fixing the shared infrastructure.
> 
> 
> P.

Re: drm_sched run_job and scheduling latency

Reply via email to