On Tue, Apr 20, 2021 at 9:17 PM Jason Ekstrand <ja...@jlekstrand.net> wrote: > > On Tue, Apr 20, 2021 at 1:54 PM Daniel Vetter <dan...@ffwll.ch> wrote: > > > > On Tue, Apr 20, 2021 at 7:45 PM Daniel Stone <dan...@fooishbar.org> wrote: > > > > > And something more concrete: > > > > > > dma_fence. > > > > > > This already has all of the properties described above. Kernel-wise, it > > > already devolves to CPU-side signaling when it crosses device boundaries. > > > We need to support it roughly forever since it's been plumbed so far and > > > so wide. Any primitive which is acceptable for winsys-like usage which > > > crosses so many device/subsystem/process/security boundaries has to meet > > > the same requirements. So why reinvent something which looks so similar, > > > and has the same requirements of the kernel babysitting completion, > > > providing little to no benefit for that difference? > > > > So I can mostly get behind this, except it's _not_ going to be > > dma_fence. That thing has horrendous internal ordering constraints > > within the kernel, and the one thing that doesn't allow you is to make > > a dma_fence depend upon a userspace fence. > > Let me elaborate on this a bit. One of the problems I mentioned > earlier is the conflation of fence types inside the kernel. dma_fence > is used for solving two different semi-related but different problems: > client command synchronization and memory residency synchronization. > In the old implicit GL world, we conflated these two and thought we > were providing ourselves a service. Not so much.... > > It's all well and good to say that we should turn the memory fence > into a dma_fence and throw a timeout on it. However, these > window-system sync primitives, as you said, have to be able to be > shared across everything. In particular, we have to be able to share > them with drivers that don't make a good separation between command > and memory synchronization. > > Let's say we're rendering on ANV with memory fences and presenting on > some USB display adapter whose kernel driver is a bit old-school. > When we pass that fence to the other driver via a sync_file or > similar, that driver may shove that dma_fence into the dma_resv on > some buffer somewhere. Then our client, completely unaware of > internal kernel dependencies, binds that buffer into its address space > and kicks off another command buffer. So i915 throws in a dependency > on that dma_resv which contains the previously created dma_fence and > refuses to execute any more command buffers until it signals. > Unfortunately, unbeknownst to i915, that command buffer which the > client kicked off after doing that bind was required for signaling the > memory fence on which our first dma_fence depends. Deadlock.
Nope. Because the waiting for this future fence will only happen in two places: - driver submit thread, which is just userspace without holding anything. From the kernel pov this can be preempted, memory temporarily taken away, all these things. Until that's done you will _not_ get a real dma_fence, but just another future fence. - but what about the usb display you're asking? well for that we'll need a new atomic extension, which takes a timeline syncobj and gives you back a timeline syncobj. And the rules are that if one of the is a future fence/userspace fence, so will the other (even if it's created by the kernel) Either way you get a timeline syncobj back which anv can then again handle properly with it's submit thread. Not a dma_fence with a funny timeout because there's deadlock issues with those. So no you wont be able to get a dma_fence out of your slight of hands here. > Sure, we put a timeout on the dma_fence and it will eventually fire > and unblock everything. However, there's one very important point > that's easy to miss here: Neither i915 nor the client did anything > wrong in the above scenario. The Vulkan footgun approach works > because there are a set of rules and, if you follow those rules, > you're guaranteed everything works. In the above scenario, however, > the client followed all of the rules and got a deadlock anyway. We > can't have that. > > > > But what we can do is use the same currently existing container > > objects like drm_syncobj or sync_file (timeline syncobj would fit best > > tbh), and stuff a userspace fence behind it. The only trouble is that > > currently timeline syncobj implement vulkan's spec, which means if you > > build a wait-before-signal deadlock, you'll wait forever. Well until > > the user ragequits and kills your process. > > Yeah, it may be that this approach can be made to work. Instead of > reusing dma_fence, maybe we can reuse syncobj and have another form of > syncobj which is a memory fence, a value to wait on, and a timeout. It's going to be the same container. But very much not a dma_fence. Note the other approach is if you split the kernel's notion of what a dma_fence is into two parts: memory fence and synchronization primitive. The trouble is that there's tons of hw for which these are by necessity the same things (because they can't preempt or dont have a scheduler), so the value of this for the overall ecosystem is slim. And the work to make it happen (plump future fences through the drm/scheduler and everything) is giantic. drm/i915-gem tried, the result is not pretty and we're now backing it largely all out least because it's not where hw/vulkan/compute are actually going I think. So that's an approach which I think does exist in theory, but really not something I think we should attempt. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev