Hi Michel, > On 2021-08-10 10:30 a.m., Daniel Vetter wrote: > > On Tue, Aug 10, 2021 at 08:21:09AM +0000, Kasireddy, Vivek wrote: > >>> On Fri, Aug 06, 2021 at 07:27:13AM +0000, Kasireddy, Vivek wrote: > >>>>>>> > >>>>>>> Hence my gut feeling reaction that first we need to get these two > >>>>>>> compositors aligned in their timings, which propobably needs > >>>>>>> consistent vblank periods/timestamps across them (plus/minux > >>>>>>> guest/host clocksource fun ofc). Without this any of the next steps > >>>>>>> will simply not work because there's too much jitter by the time the > >>>>>>> guest compositor gets the flip completion events. > >>>>>> [Kasireddy, Vivek] Timings are not a problem and do not significantly > >>>>>> affect the repaint cycles from what I have seen so far. > >>>>>> > >>>>>>> > >>>>>>> Once we have solid events I think we should look into statically > >>>>>>> tuning guest/host compositor deadlines (like you've suggested in a > >>>>>>> bunch of places) to consisently make that deadline and hit 60 fps. > >>>>>>> With that we can then look into tuning this automatically and what to > >>>>>>> do when e.g. switching between copying and zero-copy on the host side > >>>>>>> (which might be needed in some cases) and how to handle all that. > >>>>>> [Kasireddy, Vivek] As I confirm here: > >>> https://gitlab.freedesktop.org/wayland/weston/- > >>>>> /issues/514#note_984065 > >>>>>> tweaking the deadlines works (i.e., we get 60 FPS) as we expect. > >>>>>> However, > >>>>>> I feel that this zero-copy solution I am trying to create should be > >>>>>> independent > >>>>>> of compositors' deadlines, delays or other scheduling parameters. > >>>>> > >>>>> That's not how compositors work nowadays. Your problem is that you don't > >>>>> have the guest/host compositor in sync. zero-copy only changes the > >>>>> timing, > >>>>> so it changes things from "rendering way too many frames" to "rendering > >>>>> way too few frames". > >>>>> > >>>>> We need to fix the timing/sync issue here first, not paper over it with > >>>>> hacks. > >>>> [Kasireddy, Vivek] What I really meant is that the zero-copy solution > >>>> should be > >>>> independent of the scheduling policies to ensure that it works with all > >>>> compositors. > >>>> IIUC, Weston for example uses the vblank/pageflip completion timestamp, > >>>> the > >>>> configurable repaint-window value, refresh-rate, etc to determine when > >>>> to start > >>>> its next repaint -- if there is any damage: > >>>> timespec_add_nsec(&output->next_repaint, stamp, refresh_nsec); > >>>> timespec_add_msec(&output->next_repaint, &output->next_repaint, > >>>> -compositor- > >>>> repaint_msec); > >>>> > >>>> And, in the case of VKMS, since there is no real hardware, the timestamp > >>>> is always: > >>>> now = ktime_get(); > >>>> send_vblank_event(dev, e, seq, now); > >>> > >>> vkms has been fixed since a while to fake high-precision timestamps like > >>> from a real display. > >> [Kasireddy, Vivek] IIUC, that might be one of the reasons why the Guest > >> does not need > >> to have the same timestamp as that of the Host -- to work as expected. > >> > >>> > >>>> When you say that the Guest/Host compositor need to stay in sync, are you > >>>> suggesting that we need to ensure that the vblank timestamp on the Host > >>>> needs to be shared and be the same on the Guest and a vblank/pageflip > >>>> completion for the Guest needs to be sent at exactly the same time it is > >>>> sent > >>>> on the Host? If yes, I'd say that we do send the pageflip completion to > >>>> Guest > >>>> around the same time a vblank is generated on the Host but it does not > >>>> help > >>>> because the Guest compositor would only have 9 ms to submit a new frame > >>>> and if the Host is running Mutter, the Guest would only have 2 ms. > >>>> (https://gitlab.freedesktop.org/wayland/weston/-/issues/514#note_984341) > >>> > >>> Not at the same time, but the same timestamp. And yes there is some fun > >>> there, which is I think the fundamental issue. Or at least some of the > >>> compositor experts seem to think so, and it makes sense to me. > >> [Kasireddy, Vivek] It is definitely possible that if the timestamp is > >> messed up, then > >> the Guest repaint cycle would be affected. However, I do not believe that > >> is the case > >> here given the debug and instrumentation data we collected and > >> scrutinized. Hopefully, > >> compositor experts could chime in to shed some light on this matter. > >> > >>> > >>>>> > >>>>> Only, and I really mean only, when that shows that it's simply > >>>>> impossible > >>>>> to hit 60fps with zero-copy and the guest/host fully aligned should we > >>>>> look into making the overall pipeline deeper. > >>>> [Kasireddy, Vivek] From all the experiments conducted so far and given > >>>> the > >>>> discussion associated with > >>>> https://gitlab.freedesktop.org/wayland/weston/- > /issues/514 > >>>> I think we have already established that in order for a zero-copy > >>>> solution to work > >>>> reliably, the Guest compositor needs to start its repaint cycle when the > >>>> Host > >>>> compositor sends a frame callback event to its clients. > >>>> > >>>>> > >>>>>>> Only when that all shows that we just can't hit 60fps consistently and > >>>>>>> really need 3 buffers in flight should we look at deeper kms queues. > >>>>>>> And then we really need to implement them properly and not with a > >>>>>>> mismatch between drm_event an out-fence signalling. These quick hacks > >>>>>>> are good for experiments, but there's a pile of other things we need > >>>>>>> to do first. At least that's how I understand the problem here right > >>>>>>> now. > >>>>>> [Kasireddy, Vivek] Experiments done so far indicate that we can hit 59 > >>>>>> FPS > >>> consistently > >>>>>> -- in a zero-copy way independent of compositors' delays/deadlines -- > >>>>>> with this > >>>>>> patch series + the Weston MR I linked in the cover letter. The main > >>>>>> reason why > this > >>>>>> works is because we relax the assumption that when the Guest > >>>>>> compositor gets a > >>>>>> pageflip completion event that it could reuse the old FB it submitted > >>>>>> in the > previous > >>>>>> atomic flip and instead force it to use a new one. And, we send the > >>>>>> pageflip > >>> completion > >>>>>> event to the Guest when the Host compositor sends a frame callback > >>>>>> event. > Lastly, > >>>>>> we use the (deferred) out_fence as just a mechanism to tell the Guest > >>>>>> compositor > >>> when > >>>>>> it can release references on old FBs so that they can be reused again. > >>>>>> > >>>>>> With that being said, the only question is how can we accomplish the > >>>>>> above in an > >>>>> upstream > >>>>>> acceptable way without regressing anything particularly on bare-metal. > >>>>>> Its not > clear > >>> if > >>>>> just > >>>>>> increasing the queue depth would work or not but I think the Guest > >>>>>> compositor > has to > >>> be > >>>>> told > >>>>>> when it can start its repaint cycle and when it can assume the old FB > >>>>>> is no longer > in > >>> use. > >>>>>> On bare-metal -- and also with VKMS as of today -- a pageflip > >>>>>> completion > indicates > >>>>> both. > >>>>>> In other words, Vblank event is the same as Flip done, which makes > >>>>>> sense on > bare- > >>> metal. > >>>>>> But if we were to have two events at-least for VKMS: vblank to > >>>>>> indicate to Guest > to > >>> start > >>>>>> repaint and flip_done to indicate to drop references on old FBs, I > >>>>>> think this > problem > >>> can > >>>>>> be solved even without increasing the queue depth. Can this be > >>>>>> acceptable? > >>>>> > >>>>> That's just another flavour of your "increase queue depth without > >>>>> increasing the atomic queue depth" approach. I still think the > >>>>> underlying > >>>>> fundamental issue is a timing confusion, and the fact that adjusting the > >>>>> timings fixes things too kinda proves that. So we need to fix that in a > >>>>> clean way, not by shuffling things around semi-randomly until the > >>>>> specific > >>>>> config we tests works. > >>>> [Kasireddy, Vivek] This issue is not due to a timing or timestamp > >>>> mismatch. We > >>>> have carefully instrumented both the Host and Guest compositors and > >>>> measured > >>>> the latencies at each step. The relevant debug data only points to the > >>>> scheduling > >>>> policy -- of both Host and Guest compositors -- playing a role in Guest > >>>> rendering > >>>> at 30 FPS. > >>> > >>> Hm but that essentially means that the events your passing around have an > >>> even more ad-hoc implementation specific meaning: Essentially it's the > >>> kick-off for the guest's repaint loop? That sounds even worse for a kms > >>> uapi extension. > >> [Kasireddy, Vivek] The pageflip completion event/vblank event indeed > >> serves as the > >> kick-off for a compositor's (both Guest and Host) repaint loop. AFAICT, > >> Weston > >> works that way and even if we increase the queue depth to solve this > >> problem, I don't > >> think it'll help because the arrival of this event always indicates to a > >> compositor to > >> start its repaint cycle again and assume that the previous buffers are all > >> free. > > > > I thought this is how simple compositors work, and weston has since a > > while it's own timer, which is based on the timestamp it gets (at on > > drivers with vblank support), so that it starts the repaint loop a few ms > > before the next vblank. And not immediately when it receives the old page > > flip completion event. > > As long as it's a fixed timer, there's still a risk that the guest compositor > repaint cycle runs > too late for the host one (unless the guest cycle happens to be scheduled > significantly > earlier than the host one). > > Note that current mutter Git main (to become the 41 release this autumn) uses > dynamic > scheduling of its repaint cycle based on how long the last 16 frames took to > draw and > present. In theory, this could automatically schedule the guest cycle early > enough for the > host one. [Kasireddy, Vivek] I'd like to try it out soon; it'd be very interesting to see how Mutter works in both Guest and Host with this new scheduling policy. Having said that, I think there is still a need to come up with a comprehensive solution that is independent of compositors' scheduling policies. To that end, I am thinking of splitting the pageflip completion event into two events: vblank event (to indicate to compositor to start repaint) and flip_done event (to indicate to release references on old FBs). Or, introduce two new signals/fences along similar lines. Thoughts?
Thanks, Vivek > > > -- > Earthling Michel Dänzer | https://redhat.com > Libre software enthusiast | Mesa and X developer