On 2021-07-29 9:09 a.m., Daniel Vetter wrote: > On Wed, Jul 28, 2021 at 08:34:13AM -0700, Rob Clark wrote: >> On Wed, Jul 28, 2021 at 6:24 AM Michel Dänzer <mic...@daenzer.net> wrote: >>> On 2021-07-28 3:13 p.m., Christian König wrote: >>>> Am 28.07.21 um 15:08 schrieb Michel Dänzer: >>>>> On 2021-07-28 1:36 p.m., Christian König wrote: >>>>>> Am 27.07.21 um 17:37 schrieb Rob Clark: >>>>>>> On Tue, Jul 27, 2021 at 8:19 AM Michel Dänzer <mic...@daenzer.net> >>>>>>> wrote: >>>>>>>> On 2021-07-27 5:12 p.m., Rob Clark wrote: >>>>>>>>> On Tue, Jul 27, 2021 at 7:50 AM Michel Dänzer <mic...@daenzer.net> >>>>>>>>> wrote: >>>>>>>>>> On 2021-07-27 1:38 a.m., Rob Clark wrote: >>>>>>>>>>> From: Rob Clark <robdcl...@chromium.org> >>>>>>>>>>> >>>>>>>>>>> Based on discussion from a previous series[1] to add a "boost" >>>>>>>>>>> mechanism >>>>>>>>>>> when, for example, vblank deadlines are missed. Instead of a boost >>>>>>>>>>> callback, this approach adds a way to set a deadline on the fence, >>>>>>>>>>> by >>>>>>>>>>> which the waiter would like to see the fence signalled. >>>>>>>>>>> >>>>>>>>>>> I've not yet had a chance to re-work the drm/msm part of this, but >>>>>>>>>>> wanted to send this out as an RFC in case I don't have a chance to >>>>>>>>>>> finish the drm/msm part this week. >>>>>>>>>>> >>>>>>>>>>> Original description: >>>>>>>>>>> >>>>>>>>>>> In some cases, like double-buffered rendering, missing vblanks can >>>>>>>>>>> trick the GPU into running at a lower frequence, when really we >>>>>>>>>>> want to be running at a higher frequency to not miss the vblanks >>>>>>>>>>> in the first place. >>>>>>>>>>> >>>>>>>>>>> This is partially inspired by a trick i915 does, but implemented >>>>>>>>>>> via dma-fence for a couple of reasons: >>>>>>>>>>> >>>>>>>>>>> 1) To continue to be able to use the atomic helpers >>>>>>>>>>> 2) To support cases where display and gpu are different drivers >>>>>>>>>>> >>>>>>>>>>> [1] >>>>>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpatchwork.freedesktop.org%2Fseries%2F90331%2F&data=04%7C01%7Cchristian.koenig%40amd.com%7C269b2df3e1dc4f0b856d08d951c8c768%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637630745091538563%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=eYaSOSS5wOngNAd9wufp5eWCx5GtAwo6GkultJgrjmA%3D&reserved=0 >>>>>>>>>> Unfortunately, none of these approaches will have the full intended >>>>>>>>>> effect once Wayland compositors start waiting for client buffers to >>>>>>>>>> become idle before using them for an output frame (to prevent output >>>>>>>>>> frames from getting delayed by client work). See >>>>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.gnome.org%2FGNOME%2Fmutter%2F-%2Fmerge_requests%2F1880&data=04%7C01%7Cchristian.koenig%40amd.com%7C269b2df3e1dc4f0b856d08d951c8c768%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637630745091538563%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=1ZkOzLqbiKSyCixGZ0u7Hd%2Fc1YnUZub%2F%2Fx7RuEclFKg%3D&reserved=0 >>>>>>>>>> (shameless plug :) for a proof of concept of this for mutter. The >>>>>>>>>> boost will only affect the compositor's own GPU work, not the client >>>>>>>>>> work (which means no effect at all for fullscreen apps where the >>>>>>>>>> compositor can scan out the client buffers directly). >>>>>>>>>> >>>>>>>>> I guess you mean "no effect at all *except* for fullscreen..."? >>>>>>>> I meant what I wrote: The compositor will wait for the next buffer to >>>>>>>> become idle, so there's no boost from this mechanism for the client >>>>>>>> drawing to that buffer. And since the compositor does no drawing of >>>>>>>> its own in this case, there's no boost from that either. >>>>>>>> >>>>>>>> >>>>>>>>> I'd perhaps recommend that wayland compositors, in cases where only a >>>>>>>>> single layer is changing, not try to be clever and just push the >>>>>>>>> update down to the kernel. >>>>>>>> Even just for the fullscreen direct scanout case, that would require >>>>>>>> some kind of atomic KMS API extension to allow queuing multiple page >>>>>>>> flips for the same CRTC. >>>>>>>> >>>>>>>> For other cases, this would also require a mechanism to cancel a >>>>>>>> pending atomic commit, for when another surface update comes in before >>>>>>>> the compositor's deadline, which affects the previously single >>>>>>>> updating surface as well. >>>>>>>> >>>>>>> Well, in the end, there is more than one compositor out there.. and if >>>>>>> some wayland compositors are going this route, they can also implement >>>>>>> the same mechanism in userspace using the sysfs that devfreq exports. >>>>>>> >>>>>>> But it sounds simpler to me for the compositor to have a sort of "game >>>>>>> mode" for fullscreen games.. I'm less worried about UI interactive >>>>>>> workloads, boosting the GPU freq upon sudden activity after a period >>>>>>> of inactivity seems to work reasonably well there. >>>>>> At least AMD hardware is already capable of flipping frames on GPU >>>>>> events like finishing rendering (or uploading etc). >>>>>> >>>>>> By waiting in userspace on the CPU before send the frame to the hardware >>>>>> you are completely killing of such features. >>>>>> >>>>>> For composing use cases that makes sense, but certainly not for full >>>>>> screen applications as far as I can see. >>>>> Even for fullscreen, the current KMS API only allows queuing a single >>>>> page flip per CRTC, with no way to cancel or otherwise modify it. >>>>> Therefore, a Wayland compositor has to set a deadline for the next >>>>> refresh cycle, and when the deadline passes, it has to select the best >>>>> buffer available for the fullscreen surface. To make sure the flip will >>>>> not miss the next refresh cycle, the compositor has to pick an idle >>>>> buffer. If it picks a non-idle buffer, and the pending rendering does not >>>>> finish in time for vertical blank, the flip will be delayed by at least >>>>> one refresh cycle, which results in visible stuttering. >>>>> >>>>> (Until the deadline passes, the Wayland compositor can't even know if a >>>>> previously fullscreen surface will still be fullscreen for the next >>>>> refresh cycle) >>>> >>>> Well then let's extend the KMS API instead of hacking together workarounds >>>> in userspace. >>> >>> That's indeed a possible solution for the fullscreen / direct scanout case. >>> >>> Not for the general compositing case though, since a compositor does not >>> want to composite multiple output frames per display refresh cycle, so it >>> has to make sure the one frame hits the target. >> >> I think solving the fullscreen game case is sufficient enough forward >> progress to be useful. And the results I'm seeing[1] are sufficiently >> positive to convince me that dma-fence deadline support is the right >> thing to do.
I'm not questioning that this approach helps when there's a direct chain of fences from the client to the page flip. I'm pointing out there will not always be such a chain. >> But maybe the solution to make this also useful for mutter It's not just mutter BTW. I understand gamescope has been doing this for some time already. And there seems to be consensus among developers of Wayland compositors that this is needed, so I expect at least all the major compositors to do this longer term. >> is to, once we have deadline support, extend it with an ioctl to the >> dma-fence fd so userspace can be the one setting the deadline. I was thinking in a similar direction. > atomic ioctl with TEST_ONLY and SET_DEADLINES? Still gives mutter the > option to bail out with an old frame if it's too late? This is a bit cryptic though, can you elaborate? > Also mutter would need to supply the deadline, because we need to fit the > rendering in still before the actual flip. So gets a bit quirky maybe ... That should be fine. mutter is already keeping track of how long its rendering takes. -- Earthling Michel Dänzer | https://redhat.com Libre software enthusiast | Mesa and X developer