Re: [RFC 0/4] dma-fence: Deadline awareness

Christian König Wed, 28 Jul 2021 06:13:46 -0700

Am 28.07.21 um 15:08 schrieb Michel Dänzer:

On 2021-07-28 1:36 p.m., Christian König wrote:

Am 27.07.21 um 17:37 schrieb Rob Clark:

On Tue, Jul 27, 2021 at 8:19 AM Michel Dänzer <mic...@daenzer.net> wrote:

On 2021-07-27 5:12 p.m., Rob Clark wrote:

On Tue, Jul 27, 2021 at 7:50 AM Michel Dänzer <mic...@daenzer.net> wrote:

On 2021-07-27 1:38 a.m., Rob Clark wrote:

From: Rob Clark <robdcl...@chromium.org>


Based on discussion from a previous series[1] to add a "boost" mechanism
when, for example, vblank deadlines are missed.  Instead of a boost
callback, this approach adds a way to set a deadline on the fence, by
which the waiter would like to see the fence signalled.

I've not yet had a chance to re-work the drm/msm part of this, but
wanted to send this out as an RFC in case I don't have a chance to
finish the drm/msm part this week.

Original description:

In some cases, like double-buffered rendering, missing vblanks can
trick the GPU into running at a lower frequence, when really we
want to be running at a higher frequency to not miss the vblanks
in the first place.

This is partially inspired by a trick i915 does, but implemented
via dma-fence for a couple of reasons:

1) To continue to be able to use the atomic helpers
2) To support cases where display and gpu are different drivers

[1] 
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpatchwork.freedesktop.org%2Fseries%2F90331%2F&amp;data=04%7C01%7Cchristian.koenig%40amd.com%7C269b2df3e1dc4f0b856d08d951c8c768%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637630745091538563%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=eYaSOSS5wOngNAd9wufp5eWCx5GtAwo6GkultJgrjmA%3D&amp;reserved=0

Unfortunately, none of these approaches will have the full intended effect once Wayland 
compositors start waiting for client buffers to become idle before using them for an output 
frame (to prevent output frames from getting delayed by client work). See 
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.gnome.org%2FGNOME%2Fmutter%2F-%2Fmerge_requests%2F1880&amp;data=04%7C01%7Cchristian.koenig%40amd.com%7C269b2df3e1dc4f0b856d08d951c8c768%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637630745091538563%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=1ZkOzLqbiKSyCixGZ0u7Hd%2Fc1YnUZub%2F%2Fx7RuEclFKg%3D&amp;reserved=0
 (shameless plug :) for a proof of concept of this for mutter. The boost will only affect 
the compositor's own GPU work, not the client work (which means no effect at all for 
fullscreen apps where the compositor can scan out the client buffers directly).

I guess you mean "no effect at all *except* for fullscreen..."?

I meant what I wrote: The compositor will wait for the next buffer to become 
idle, so there's no boost from this mechanism for the client drawing to that 
buffer. And since the compositor does no drawing of its own in this case, 
there's no boost from that either.

I'd perhaps recommend that wayland compositors, in cases where only a
single layer is changing, not try to be clever and just push the
update down to the kernel.

Even just for the fullscreen direct scanout case, that would require some kind 
of atomic KMS API extension to allow queuing multiple page flips for the same 
CRTC.

For other cases, this would also require a mechanism to cancel a pending atomic 
commit, for when another surface update comes in before the compositor's 
deadline, which affects the previously single updating surface as well.

Well, in the end, there is more than one compositor out there.. and if
some wayland compositors are going this route, they can also implement
the same mechanism in userspace using the sysfs that devfreq exports.

But it sounds simpler to me for the compositor to have a sort of "game
mode" for fullscreen games.. I'm less worried about UI interactive
workloads, boosting the GPU freq upon sudden activity after a period
of inactivity seems to work reasonably well there.

At least AMD hardware is already capable of flipping frames on GPU events like 
finishing rendering (or uploading etc).

By waiting in userspace on the CPU before send the frame to the hardware you 
are completely killing of such features.

For composing use cases that makes sense, but certainly not for full screen 
applications as far as I can see.

Even for fullscreen, the current KMS API only allows queuing a single page flip 
per CRTC, with no way to cancel or otherwise modify it. Therefore, a Wayland 
compositor has to set a deadline for the next refresh cycle, and when the 
deadline passes, it has to select the best buffer available for the fullscreen 
surface. To make sure the flip will not miss the next refresh cycle, the 
compositor has to pick an idle buffer. If it picks a non-idle buffer, and the 
pending rendering does not finish in time for vertical blank, the flip will be 
delayed by at least one refresh cycle, which results in visible stuttering.

(Until the deadline passes, the Wayland compositor can't even know if a 
previously fullscreen surface will still be fullscreen for the next refresh 
cycle)

Well then let's extend the KMS API instead of hacking togetherworkarounds in userspace.

Making such decisions is the responsibility of the kernel and notuserspace in my opinion.

E.g. we could for example also need to reshuffle BOs so that a BO iseven scanout able. Userspace can't know about such stuff before handbecause the memory usage can change at any time.


Regards,
Christian.

Re: [RFC 0/4] dma-fence: Deadline awareness

Reply via email to