On Thu, May 20, 2021 at 9:04 PM Jason Ekstrand <ja...@jlekstrand.net> wrote: > > On Thu, May 20, 2021 at 12:23 PM Jason Ekstrand <ja...@jlekstrand.net> wrote: > > > > On Thu, May 20, 2021 at 5:50 AM Christian König > > <ckoenig.leichtzumer...@gmail.com> wrote: > > > > > > Am 20.05.21 um 09:55 schrieb Daniel Vetter: > > > > On Wed, May 19, 2021 at 5:48 PM Michel Dänzer <mic...@daenzer.net> > > > > wrote: > > > >> On 2021-05-19 5:21 p.m., Jason Ekstrand wrote: > > > >>> On Wed, May 19, 2021 at 5:52 AM Michel Dänzer <mic...@daenzer.net> > > > >>> wrote: > > > >>>> On 2021-05-19 12:06 a.m., Jason Ekstrand wrote: > > > >>>>> On Tue, May 18, 2021 at 4:17 PM Daniel Vetter <dan...@ffwll.ch> > > > >>>>> wrote: > > > >>>>>> On Tue, May 18, 2021 at 7:40 PM Christian König > > > >>>>>> <ckoenig.leichtzumer...@gmail.com> wrote: > > > >>>>>>> Am 18.05.21 um 18:48 schrieb Daniel Vetter: > > > >>>>>>>> On Tue, May 18, 2021 at 2:49 PM Christian König > > > >>>>>>>> <ckoenig.leichtzumer...@gmail.com> wrote: > > > >>>>>>>> > > > >>>>>>>>> And as long as we are all inside amdgpu we also don't have any > > > >>>>>>>>> oversync, > > > >>>>>>>>> the issue only happens when we share dma-bufs with i915 (radeon > > > >>>>>>>>> and > > > >>>>>>>>> AFAIK nouveau does the right thing as well). > > > >>>>>>>> Yeah because then you can't use the amdgpu dma_resv model > > > >>>>>>>> anymore and > > > >>>>>>>> have to use the one atomic helpers use. Which is also the one > > > >>>>>>>> that > > > >>>>>>>> e.g. Jason is threathening to bake in as uapi with his dma_buf > > > >>>>>>>> ioctl, > > > >>>>>>>> so as soon as that lands and someone starts using it, something > > > >>>>>>>> has to > > > >>>>>>>> adapt _anytime_ you have a dma-buf hanging around. Not just when > > > >>>>>>>> it's > > > >>>>>>>> shared with another device. > > > >>>>>>> Yeah, and that is exactly the reason why I will NAK this uAPI > > > >>>>>>> change. > > I just re-sent my dma-buf sync_file import/export series. Assuming we > can sort out what implicit sync looks like on the inside of dma-buf, > would that alleviate some of your uAPI fears? The idea would be that > radeonsi and RADV would use amdgpu explicit sync primitives for > everything and then, at the very end, fetch a sync_file and stuff it > in the dma-buf's implicit sync container. No nasty new uAPI for you. > We still get implicit sync. Everyone wins?
You still need the implicit fencing opt-out, which currently amdgpu lacks completely. But I also thought through the security implications of the patch set (including the exclusive injection patch 4), and I think even with current amdgpu that's perfectly fine. Not very useful since the fences you get out aren't reflecting status accurately, but that's not a correctness/security issue. You'll simply hit stalls when you don't expect, because the kernel is allowed to throw random other exclusive fences in whenever it feels like. > Of course, that still leaves the question of what read and write > fences are, what they mean, and where they go in the dma_resv. But > I'm trying to separate problems here. Yeah I'll dump my patch set for clarifying status quo tomorrow for that. -Daniel > > --Jason > > > > > >>>>>>> This doesn't works for amdgpu at all for the reasons outlined > > > >>>>>>> above. > > > >>>>>> Uh that's really not how uapi works. "my driver is right, everyone > > > >>>>>> else is wrong" is not how cross driver contracts are defined. If > > > >>>>>> that > > > >>>>>> means a perf impact until you've fixed your rules, that's on you. > > > >>>>>> > > > >>>>>> Also you're a few years too late with nacking this, it's already > > > >>>>>> uapi > > > >>>>>> in the form of the dma-buf poll() support. > > > >>>>> ^^ My fancy new ioctl doesn't expose anything that isn't already > > > >>>>> there. It just lets you take a snap-shot of a wait instead of doing > > > >>>>> an active wait which might end up with more fences added depending > > > >>>>> on > > > >>>>> interrupts and retries. The dma-buf poll waits on all fences for > > > >>>>> POLLOUT and only the exclusive fence for POLLIN. It's already uAPI. > > > >>>> Note that the dma-buf poll support could be useful to Wayland > > > >>>> compositors for the same purpose as Jason's new ioctl (only using > > > >>>> client buffers which have finished drawing for an output frame, to > > > >>>> avoid missing a refresh cycle due to client drawing), *if* it didn't > > > >>>> work differently with amdgpu. > > > >>>> > > > >>>> Am I understanding correctly that Jason's new ioctl would also work > > > >>>> differently with amdgpu as things stand currently? If so, that would > > > >>>> be a real bummer and might hinder adoption of the ioctl by Wayland > > > >>>> compositors. > > > >>> My new ioctl has identical semantics to poll(). It just lets you take > > > >>> a snapshot in time to wait on later instead of waiting on whatever > > > >>> happens to be set right now. IMO, having identical semantics to > > > >>> poll() isn't something we want to change. > > > >> Agreed. > > > >> > > > >> I'd argue then that making amdgpu poll semantics match those of other > > > >> drivers is a pre-requisite for the new ioctl, otherwise it seems > > > >> unlikely that the ioctl will be widely adopted. > > > > This seems backwards, because that means useful improvements in all > > > > other drivers are stalled until amdgpu is fixed. > > > > > > Well there is nothing to fix in amdgpu, what we need to is to come up > > > with an DMA-buf implicit syncing model which works for everyone. > > > > > > I've pointed this problem out at FOSDEM roughly 6 years ago, before > > > DMA-buf was even merged upstream and way before amdgpu even existed. And > > > the response was yeah, maybe we need to look at this as well. > > > > > > Over the years I've mentioned now at least 5 times that this isn't going > > > to work in some situations and came up with different approaches how to > > > fix it. > > > > > > And you still have the nerves to tell me that this isn't a problem and > > > we should fix amdgpu instead? Sorry, but I'm really running out of ideas > > > how to explain why this isn't working for everybody. > > > > I'm trying really hard to not fuel a flame war here but I tend to lean > > Daniel's direction on this. Stepping back from the individual needs > > of amdgpu and looking at things from the PoV of Linux as a whole, AMD > > being a special snowflake here is bad. I think we have two problems: > > amdgpu doesn't play by the established rules, and the rules don't work > > well for amdgpu. We need to solve BOTH problems. Does that mean we > > need to smash something into amdgpu to force it into the dma-buf model > > today? Maybe not; stuff's working well enough, I guess. But we can't > > just rewrite all the rules and break everyone else either. > > > > > That amdgpu wants to be special is true, but it is a fundamental problem > > > that we have designed the implicit sync in DMA-buf only around the needs > > > of DRM drivers at that time instead of going a step back and saying hey > > > what would be an approach which works for everyone. > > > > How else was it supposed to be designed? Based on the needs of > > non-existent future drivers? That's just not fair. We (Intel) are > > being burned by various aspects of dma-buf these days too. It does no > > good to blame past developers or our past selves for not knowing the > > future. It sucks but it's what we have. And, to move forward, we > > need to fix it. Let's do that. > > > > My concern with the flags approach as I'm beginning to digest it is > > that it's a bit too much of an attempt to rewrite history for my > > liking. What do I mean by that? I mean that any solution we come up > > with needs ensure that legacy drivers and modern drivers can play > > nicely together. Either that or we need to modernize all the users of > > dma-buf implicit sync. I really don't like the "as long as AMD+Intel > > works, we're good" approach. > > > > > You just need to apply my example from FOSDEM with ring buffers in a > > > single BO to the DMA-buf implicit sync model and immediately see how it > > > falls apart. > > > > > > > I think we need agreement on what the rules are, reasonable plan to > > > > get there, and then that should be enough to unblock work in the wider > > > > community. Holding the community at large hostage because one driver > > > > is different is really not great. > > > > > > Well forcing a drivers into a synchronization model not ideal for their > > > hardware isn't great either. > > > > As I said above, we're feeling the pain too. > > > > --Jason -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch