Re: [RFC 0/4] dma-fence: Deadline awareness

Michel Dänzer Thu, 29 Jul 2021 01:17:50 -0700

On 2021-07-29 9:09 a.m., Daniel Vetter wrote:
> On Wed, Jul 28, 2021 at 08:34:13AM -0700, Rob Clark wrote:
>> On Wed, Jul 28, 2021 at 6:24 AM Michel Dänzer <mic...@daenzer.net> wrote:
>>> On 2021-07-28 3:13 p.m., Christian König wrote:
>>>> Am 28.07.21 um 15:08 schrieb Michel Dänzer:
>>>>> On 2021-07-28 1:36 p.m., Christian König wrote:
>>>>>> Am 27.07.21 um 17:37 schrieb Rob Clark:
>>>>>>> On Tue, Jul 27, 2021 at 8:19 AM Michel Dänzer <mic...@daenzer.net> 
>>>>>>> wrote:
>>>>>>>> On 2021-07-27 5:12 p.m., Rob Clark wrote:
>>>>>>>>> On Tue, Jul 27, 2021 at 7:50 AM Michel Dänzer <mic...@daenzer.net> 
>>>>>>>>> wrote:
>>>>>>>>>> On 2021-07-27 1:38 a.m., Rob Clark wrote:
>>>>>>>>>>> From: Rob Clark <robdcl...@chromium.org>
>>>>>>>>>>>
>>>>>>>>>>> Based on discussion from a previous series[1] to add a "boost" 
>>>>>>>>>>> mechanism
>>>>>>>>>>> when, for example, vblank deadlines are missed.  Instead of a boost
>>>>>>>>>>> callback, this approach adds a way to set a deadline on the fence, 
>>>>>>>>>>> by
>>>>>>>>>>> which the waiter would like to see the fence signalled.
>>>>>>>>>>>
>>>>>>>>>>> I've not yet had a chance to re-work the drm/msm part of this, but
>>>>>>>>>>> wanted to send this out as an RFC in case I don't have a chance to
>>>>>>>>>>> finish the drm/msm part this week.
>>>>>>>>>>>
>>>>>>>>>>> Original description:
>>>>>>>>>>>
>>>>>>>>>>> In some cases, like double-buffered rendering, missing vblanks can
>>>>>>>>>>> trick the GPU into running at a lower frequence, when really we
>>>>>>>>>>> want to be running at a higher frequency to not miss the vblanks
>>>>>>>>>>> in the first place.
>>>>>>>>>>>
>>>>>>>>>>> This is partially inspired by a trick i915 does, but implemented
>>>>>>>>>>> via dma-fence for a couple of reasons:
>>>>>>>>>>>
>>>>>>>>>>> 1) To continue to be able to use the atomic helpers
>>>>>>>>>>> 2) To support cases where display and gpu are different drivers
>>>>>>>>>>>
>>>>>>>>>>> [1] 
>>>>>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpatchwork.freedesktop.org%2Fseries%2F90331%2F&amp;data=04%7C01%7Cchristian.koenig%40amd.com%7C269b2df3e1dc4f0b856d08d951c8c768%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637630745091538563%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=eYaSOSS5wOngNAd9wufp5eWCx5GtAwo6GkultJgrjmA%3D&amp;reserved=0
>>>>>>>>>> Unfortunately, none of these approaches will have the full intended 
>>>>>>>>>> effect once Wayland compositors start waiting for client buffers to 
>>>>>>>>>> become idle before using them for an output frame (to prevent output 
>>>>>>>>>> frames from getting delayed by client work). See 
>>>>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.gnome.org%2FGNOME%2Fmutter%2F-%2Fmerge_requests%2F1880&amp;data=04%7C01%7Cchristian.koenig%40amd.com%7C269b2df3e1dc4f0b856d08d951c8c768%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637630745091538563%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=1ZkOzLqbiKSyCixGZ0u7Hd%2Fc1YnUZub%2F%2Fx7RuEclFKg%3D&amp;reserved=0
>>>>>>>>>>  (shameless plug :) for a proof of concept of this for mutter. The 
>>>>>>>>>> boost will only affect the compositor's own GPU work, not the client 
>>>>>>>>>> work (which means no effect at all for fullscreen apps where the 
>>>>>>>>>> compositor can scan out the client buffers directly).
>>>>>>>>>>
>>>>>>>>> I guess you mean "no effect at all *except* for fullscreen..."?
>>>>>>>> I meant what I wrote: The compositor will wait for the next buffer to 
>>>>>>>> become idle, so there's no boost from this mechanism for the client 
>>>>>>>> drawing to that buffer. And since the compositor does no drawing of 
>>>>>>>> its own in this case, there's no boost from that either.
>>>>>>>>
>>>>>>>>
>>>>>>>>> I'd perhaps recommend that wayland compositors, in cases where only a
>>>>>>>>> single layer is changing, not try to be clever and just push the
>>>>>>>>> update down to the kernel.
>>>>>>>> Even just for the fullscreen direct scanout case, that would require 
>>>>>>>> some kind of atomic KMS API extension to allow queuing multiple page 
>>>>>>>> flips for the same CRTC.
>>>>>>>>
>>>>>>>> For other cases, this would also require a mechanism to cancel a 
>>>>>>>> pending atomic commit, for when another surface update comes in before 
>>>>>>>> the compositor's deadline, which affects the previously single 
>>>>>>>> updating surface as well.
>>>>>>>>
>>>>>>> Well, in the end, there is more than one compositor out there.. and if
>>>>>>> some wayland compositors are going this route, they can also implement
>>>>>>> the same mechanism in userspace using the sysfs that devfreq exports.
>>>>>>>
>>>>>>> But it sounds simpler to me for the compositor to have a sort of "game
>>>>>>> mode" for fullscreen games.. I'm less worried about UI interactive
>>>>>>> workloads, boosting the GPU freq upon sudden activity after a period
>>>>>>> of inactivity seems to work reasonably well there.
>>>>>> At least AMD hardware is already capable of flipping frames on GPU 
>>>>>> events like finishing rendering (or uploading etc).
>>>>>>
>>>>>> By waiting in userspace on the CPU before send the frame to the hardware 
>>>>>> you are completely killing of such features.
>>>>>>
>>>>>> For composing use cases that makes sense, but certainly not for full 
>>>>>> screen applications as far as I can see.
>>>>> Even for fullscreen, the current KMS API only allows queuing a single 
>>>>> page flip per CRTC, with no way to cancel or otherwise modify it. 
>>>>> Therefore, a Wayland compositor has to set a deadline for the next 
>>>>> refresh cycle, and when the deadline passes, it has to select the best 
>>>>> buffer available for the fullscreen surface. To make sure the flip will 
>>>>> not miss the next refresh cycle, the compositor has to pick an idle 
>>>>> buffer. If it picks a non-idle buffer, and the pending rendering does not 
>>>>> finish in time for vertical blank, the flip will be delayed by at least 
>>>>> one refresh cycle, which results in visible stuttering.
>>>>>
>>>>> (Until the deadline passes, the Wayland compositor can't even know if a 
>>>>> previously fullscreen surface will still be fullscreen for the next 
>>>>> refresh cycle)
>>>>
>>>> Well then let's extend the KMS API instead of hacking together workarounds 
>>>> in userspace.
>>>
>>> That's indeed a possible solution for the fullscreen / direct scanout case.
>>>
>>> Not for the general compositing case though, since a compositor does not 
>>> want to composite multiple output frames per display refresh cycle, so it 
>>> has to make sure the one frame hits the target.
>>
>> I think solving the fullscreen game case is sufficient enough forward
>> progress to be useful.  And the results I'm seeing[1] are sufficiently
>> positive to convince me that dma-fence deadline support is the right
>> thing to do.


I'm not questioning that this approach helps when there's a direct chain of 
fences from the client to the page flip. I'm pointing out there will not always 
be such a chain.


>> But maybe the solution to make this also useful for mutter

It's not just mutter BTW. I understand gamescope has been doing this for some 
time already. And there seems to be consensus among developers of Wayland 
compositors that this is needed, so I expect at least all the major compositors 
to do this longer term.


>> is to, once we have deadline support, extend it with an ioctl to the
>> dma-fence fd so userspace can be the one setting the deadline.

I was thinking in a similar direction.

> atomic ioctl with TEST_ONLY and SET_DEADLINES? Still gives mutter the
> option to bail out with an old frame if it's too late?

This is a bit cryptic though, can you elaborate?


> Also mutter would need to supply the deadline, because we need to fit the
> rendering in still before the actual flip. So gets a bit quirky maybe ...

That should be fine. mutter is already keeping track of how long its rendering 
takes.


-- 
Earthling Michel Dänzer               |               https://redhat.com
Libre software enthusiast             |             Mesa and X developer

Re: [RFC 0/4] dma-fence: Deadline awareness

Reply via email to