RE: [RFC v1 0/4] drm: Add support for DRM_CAP_DEFERRED_OUT_FENCE capability

Kasireddy, Vivek Wed, 11 Aug 2021 00:25:30 -0700

Hi Michel,
 
> On 2021-08-10 10:30 a.m., Daniel Vetter wrote:
> > On Tue, Aug 10, 2021 at 08:21:09AM +0000, Kasireddy, Vivek wrote:
> >>> On Fri, Aug 06, 2021 at 07:27:13AM +0000, Kasireddy, Vivek wrote:
> >>>>>>>
> >>>>>>> Hence my gut feeling reaction that first we need to get these two
> >>>>>>> compositors aligned in their timings, which propobably needs
> >>>>>>> consistent vblank periods/timestamps across them (plus/minux
> >>>>>>> guest/host clocksource fun ofc). Without this any of the next steps
> >>>>>>> will simply not work because there's too much jitter by the time the
> >>>>>>> guest compositor gets the flip completion events.
> >>>>>> [Kasireddy, Vivek] Timings are not a problem and do not significantly
> >>>>>> affect the repaint cycles from what I have seen so far.
> >>>>>>
> >>>>>>>
> >>>>>>> Once we have solid events I think we should look into statically
> >>>>>>> tuning guest/host compositor deadlines (like you've suggested in a
> >>>>>>> bunch of places) to consisently make that deadline and hit 60 fps.
> >>>>>>> With that we can then look into tuning this automatically and what to
> >>>>>>> do when e.g. switching between copying and zero-copy on the host side
> >>>>>>> (which might be needed in some cases) and how to handle all that.
> >>>>>> [Kasireddy, Vivek] As I confirm here:
> >>> https://gitlab.freedesktop.org/wayland/weston/-
> >>>>> /issues/514#note_984065
> >>>>>> tweaking the deadlines works (i.e., we get 60 FPS) as we expect. 
> >>>>>> However,
> >>>>>> I feel that this zero-copy solution I am trying to create should be 
> >>>>>> independent
> >>>>>> of compositors' deadlines, delays or other scheduling parameters.
> >>>>>
> >>>>> That's not how compositors work nowadays. Your problem is that you don't
> >>>>> have the guest/host compositor in sync. zero-copy only changes the 
> >>>>> timing,
> >>>>> so it changes things from "rendering way too many frames" to "rendering
> >>>>> way too few frames".
> >>>>>
> >>>>> We need to fix the timing/sync issue here first, not paper over it with
> >>>>> hacks.
> >>>> [Kasireddy, Vivek] What I really meant is that the zero-copy solution 
> >>>> should be
> >>>> independent of the scheduling policies to ensure that it works with all 
> >>>> compositors.
> >>>>  IIUC, Weston for example uses the vblank/pageflip completion timestamp, 
> >>>> the
> >>>> configurable repaint-window value, refresh-rate, etc to determine when 
> >>>> to start
> >>>> its next repaint -- if there is any damage:
> >>>> timespec_add_nsec(&output->next_repaint, stamp, refresh_nsec);
> >>>> timespec_add_msec(&output->next_repaint, &output->next_repaint, 
> >>>> -compositor-
> >>>> repaint_msec);
> >>>>
> >>>> And, in the case of VKMS, since there is no real hardware, the timestamp 
> >>>> is always:
> >>>> now = ktime_get();
> >>>> send_vblank_event(dev, e, seq, now);
> >>>
> >>> vkms has been fixed since a while to fake high-precision timestamps like
> >>> from a real display.
> >> [Kasireddy, Vivek] IIUC, that might be one of the reasons why the Guest 
> >> does not need
> >> to have the same timestamp as that of the Host -- to work as expected.
> >>
> >>>
> >>>> When you say that the Guest/Host compositor need to stay in sync, are you
> >>>> suggesting that we need to ensure that the vblank timestamp on the Host
> >>>> needs to be shared and be the same on the Guest and a vblank/pageflip
> >>>> completion for the Guest needs to be sent at exactly the same time it is 
> >>>> sent
> >>>> on the Host? If yes, I'd say that we do send the pageflip completion to 
> >>>> Guest
> >>>> around the same time a vblank is generated on the Host but it does not 
> >>>> help
> >>>> because the Guest compositor would only have 9 ms to submit a new frame
> >>>> and if the Host is running Mutter, the Guest would only have 2 ms.
> >>>> (https://gitlab.freedesktop.org/wayland/weston/-/issues/514#note_984341)
> >>>
> >>> Not at the same time, but the same timestamp. And yes there is some fun
> >>> there, which is I think the fundamental issue. Or at least some of the
> >>> compositor experts seem to think so, and it makes sense to me.
> >> [Kasireddy, Vivek] It is definitely possible that if the timestamp is 
> >> messed up, then
> >> the Guest repaint cycle would be affected. However, I do not believe that 
> >> is the case
> >> here given the debug and instrumentation data we collected and 
> >> scrutinized. Hopefully,
> >> compositor experts could chime in to shed some light on this matter.
> >>
> >>>
> >>>>>
> >>>>> Only, and I really mean only, when that shows that it's simply 
> >>>>> impossible
> >>>>> to hit 60fps with zero-copy and the guest/host fully aligned should we
> >>>>> look into making the overall pipeline deeper.
> >>>> [Kasireddy, Vivek] From all the experiments conducted so far and given 
> >>>> the
> >>>> discussion associated with 
> >>>> https://gitlab.freedesktop.org/wayland/weston/-
> /issues/514
> >>>> I think we have already established that in order for a zero-copy 
> >>>> solution to work
> >>>> reliably, the Guest compositor needs to start its repaint cycle when the 
> >>>> Host
> >>>> compositor sends a frame callback event to its clients.
> >>>>
> >>>>>
> >>>>>>> Only when that all shows that we just can't hit 60fps consistently and
> >>>>>>> really need 3 buffers in flight should we look at deeper kms queues.
> >>>>>>> And then we really need to implement them properly and not with a
> >>>>>>> mismatch between drm_event an out-fence signalling. These quick hacks
> >>>>>>> are good for experiments, but there's a pile of other things we need
> >>>>>>> to do first. At least that's how I understand the problem here right
> >>>>>>> now.
> >>>>>> [Kasireddy, Vivek] Experiments done so far indicate that we can hit 59 
> >>>>>> FPS
> >>> consistently
> >>>>>> -- in a zero-copy way independent of compositors' delays/deadlines -- 
> >>>>>> with this
> >>>>>> patch series + the Weston MR I linked in the cover letter. The main 
> >>>>>> reason why
> this
> >>>>>> works is because we relax the assumption that when the Guest 
> >>>>>> compositor gets a
> >>>>>> pageflip completion event that it could reuse the old FB it submitted 
> >>>>>> in the
> previous
> >>>>>> atomic flip and instead force it to use a new one. And, we send the 
> >>>>>> pageflip
> >>> completion
> >>>>>> event to the Guest when the Host compositor sends a frame callback 
> >>>>>> event.
> Lastly,
> >>>>>> we use the (deferred) out_fence as just a mechanism to tell the Guest 
> >>>>>> compositor
> >>> when
> >>>>>> it can release references on old FBs so that they can be reused again.
> >>>>>>
> >>>>>> With that being said, the only question is how can we accomplish the 
> >>>>>> above in an
> >>>>> upstream
> >>>>>> acceptable way without regressing anything particularly on bare-metal. 
> >>>>>> Its not
> clear
> >>> if
> >>>>> just
> >>>>>> increasing the queue depth would work or not but I think the Guest 
> >>>>>> compositor
> has to
> >>> be
> >>>>> told
> >>>>>> when it can start its repaint cycle and when it can assume the old FB 
> >>>>>> is no longer
> in
> >>> use.
> >>>>>> On bare-metal -- and also with VKMS as of today -- a pageflip 
> >>>>>> completion
> indicates
> >>>>> both.
> >>>>>> In other words, Vblank event is the same as Flip done, which makes 
> >>>>>> sense on
> bare-
> >>> metal.
> >>>>>> But if we were to have two events at-least for VKMS: vblank to 
> >>>>>> indicate to Guest
> to
> >>> start
> >>>>>> repaint and flip_done to indicate to drop references on old FBs, I 
> >>>>>> think this
> problem
> >>> can
> >>>>>> be solved even without increasing the queue depth. Can this be 
> >>>>>> acceptable?
> >>>>>
> >>>>> That's just another flavour of your "increase queue depth without
> >>>>> increasing the atomic queue depth" approach. I still think the 
> >>>>> underlying
> >>>>> fundamental issue is a timing confusion, and the fact that adjusting the
> >>>>> timings fixes things too kinda proves that. So we need to fix that in a
> >>>>> clean way, not by shuffling things around semi-randomly until the 
> >>>>> specific
> >>>>> config we tests works.
> >>>> [Kasireddy, Vivek] This issue is not due to a timing or timestamp 
> >>>> mismatch. We
> >>>> have carefully instrumented both the Host and Guest compositors and 
> >>>> measured
> >>>> the latencies at each step. The relevant debug data only points to the 
> >>>> scheduling
> >>>> policy -- of both Host and Guest compositors -- playing a role in Guest 
> >>>> rendering
> >>>> at 30 FPS.
> >>>
> >>> Hm but that essentially means that the events your passing around have an
> >>> even more ad-hoc implementation specific meaning: Essentially it's the
> >>> kick-off for the guest's repaint loop? That sounds even worse for a kms
> >>> uapi extension.
> >> [Kasireddy, Vivek] The pageflip completion event/vblank event indeed 
> >> serves as the
> >> kick-off for a compositor's (both Guest and Host) repaint loop. AFAICT, 
> >> Weston
> >> works that way and even if we increase the queue depth to solve this 
> >> problem, I don't
> >> think it'll help because the arrival of this event always indicates to a 
> >> compositor to
> >> start its repaint cycle again and assume that the previous buffers are all 
> >> free.
> >
> > I thought this is how simple compositors work, and weston has since a
> > while it's own timer, which is based on the timestamp it gets (at on
> > drivers with vblank support), so that it starts the repaint loop a few ms
> > before the next vblank. And not immediately when it receives the old page
> > flip completion event.
> 
> As long as it's a fixed timer, there's still a risk that the guest compositor 
> repaint cycle runs
> too late for the host one (unless the guest cycle happens to be scheduled 
> significantly
> earlier than the host one).
> 
> Note that current mutter Git main (to become the 41 release this autumn) uses 
> dynamic
> scheduling of its repaint cycle based on how long the last 16 frames took to 
> draw and
> present. In theory, this could automatically schedule the guest cycle early 
> enough for the
> host one.
[Kasireddy, Vivek] I'd like to try it out soon; it'd be very interesting to see 
how Mutter
works in both Guest and Host with this new scheduling policy. Having said that, 
I think
there is still a need to come up with a comprehensive solution that is 
independent of
compositors' scheduling policies. To that end, I am thinking of splitting the 
pageflip
completion event into two events: vblank event (to indicate to compositor to 
start repaint)
and flip_done event (to indicate to release references on old FBs). Or, 
introduce two new
signals/fences along similar lines. Thoughts?


Thanks,
Vivek

> 
> 
> --
> Earthling Michel Dänzer               |               https://redhat.com
> Libre software enthusiast             |             Mesa and X developer

RE: [RFC v1 0/4] drm: Add support for DRM_CAP_DEFERRED_OUT_FENCE capability

Reply via email to