Am 02.04.25 um 15:22 schrieb Michel Dänzer:
> On 2025-04-02 14:00, Philipp Stanner wrote:
>> On Wed, 2025-04-02 at 12:58 +0200, Michel Dänzer wrote:
>>> On 2025-04-02 12:46, Philipp Stanner wrote:
>>>> On Mon, 2025-03-31 at 21:16 +0100, Tvrtko Ursulin wrote:
>>>>> Round-robin being the non-default policy and unclear how much it
>>>>> is
>>>>> used,
>>>>> we can notice that it can be implemented using the FIFO data
>>>>> structures if
>>>>> we only invent a fake submit timestamp which is monotonically
>>>>> increasing
>>>>> inside drm_sched_rq instances.
>>>>>
>>>>> So instead of remembering which was the last entity the scheduler
>>>>> worker
>>>>> picked, we can bump the picked one to the bottom of the tree,
>>>>> achieving
>>>>> the same round-robin behaviour.
>>>>>
>>>>> Advantage is that we can consolidate to a single code path and
>>>>> remove
>>>>> a
>>>>> bunch of code. Downside is round-robin mode now needs to lock on
>>>>> the
>>>>> job
>>>>> pop path but that should not be visible.
>>>> Why did you decide to do it that way and then later remove RR &
>>>> FIFO
>>>> alltogether in patch 10, basically?
>>>>
>>>> I think the far cleaner way for our development-process would be a
>>>> separate patch(-series) that *removes* RR completely. Advantages
>>>> are:
>>>>
>>>>    1. It should be relatively easy to do
>>>>    2. It would simplify the existing code base independently of
>>>> what
>>>>       happens with your RFC series here
>>>>    3. Before changing everyone's scheduling policy to a completely
>>>> new,
>>>>       deadline-based one, we could first be sure for a few release
>>>>       cycles that everyone is now on FIFO, establishing common
>>>> ground.
>>>>    4. We could CC every- and anyone who might use RR or might know
>>>>       someone who does
>>>>    5. If it turns out we screwed up and someone really relies on
>>>> RR, it
>>>>       would be easy to revert.
>>>>
>>>> I am not aware of any RR users and have, in past discussions, never
>>>> heard of any. So removing it is more tempting for the above
>>>> reasons.
>>> https://gitlab.freedesktop.org/drm/amd/-/issues/2516 has a bunch of
>>> RR users...
>> Right, there's a number of people complaining about the regression. But
>> what I'm interested in is: how did it evolve since then. Are there
>> distributions who set the module parameter? Does Steam do it? Or is it
>> individual users who work around the problem that way?
> I know only of the latter.
>> https://gitlab.freedesktop.org/drm/amd/-/issues/2516#note_2679509

Tvrtko, you should probably prepare a branch on fdo and let the people on that 
bug report test it.

>>
>> ^ this comment for example seems to indicate that on newer Wayland
>> versions part of the problem has vanished?
> That's about using the Wine wayland driver (which uses the Wayland protocol 
> directly) instead of the x11 driver (which uses the X11 protocol via 
> Xwayland). Xwayland not being involved can avoid at least some of the issues 
> (in particular, the scenario I described in 
> https://gitlab.freedesktop.org/drm/amd/-/issues/2516#note_2119750 can't 
> happen then). That doesn't solve the issues when Xwayland is involved though, 
> just avoids them.
>

>From what I remember the bug condition is that the display server submits 
>works after the client has started rendering the next frame. With RR that 
>resulted in the display servers work to be picked first while with FIFO the 
>clients works would be picked first.

Arguable that RR picked the display server was just coincident while with 
Tvrtko's work the fair queuing should make that perfectly intentional.

So this patch set here should (in theory) help a lot with that bug report and 
finally make RR superfluous.

Christian.

Reply via email to