Re: [Mesa-dev] [RFC] Linux Graphics Next: Explicit fences everywhere and no BO fences - initial proposal

Daniel Vetter Tue, 20 Apr 2021 12:27:31 -0700

On Tue, Apr 20, 2021 at 9:17 PM Jason Ekstrand <ja...@jlekstrand.net> wrote:
>
> On Tue, Apr 20, 2021 at 1:54 PM Daniel Vetter <dan...@ffwll.ch> wrote:
> >
> > On Tue, Apr 20, 2021 at 7:45 PM Daniel Stone <dan...@fooishbar.org> wrote:
> >
> > > And something more concrete:
> > >
> > > dma_fence.
> > >
> > > This already has all of the properties described above. Kernel-wise, it 
> > > already devolves to CPU-side signaling when it crosses device boundaries. 
> > > We need to support it roughly forever since it's been plumbed so far and 
> > > so wide. Any primitive which is acceptable for winsys-like usage which 
> > > crosses so many device/subsystem/process/security boundaries has to meet 
> > > the same requirements. So why reinvent something which looks so similar, 
> > > and has the same requirements of the kernel babysitting completion, 
> > > providing little to no benefit for that difference?
> >
> > So I can mostly get behind this, except it's _not_ going to be
> > dma_fence. That thing has horrendous internal ordering constraints
> > within the kernel, and the one thing that doesn't allow you is to make
> > a dma_fence depend upon a userspace fence.
>
> Let me elaborate on this a bit.  One of the problems I mentioned
> earlier is the conflation of fence types inside the kernel.  dma_fence
> is used for solving two different semi-related but different problems:
> client command synchronization and memory residency synchronization.
> In the old implicit GL world, we conflated these two and thought we
> were providing ourselves a service.  Not so much....
>
> It's all well and good to say that we should turn the memory fence
> into a dma_fence and throw a timeout on it.  However, these
> window-system sync primitives, as you said, have to be able to be
> shared across everything.  In particular, we have to be able to share
> them with drivers that don't make a good separation between command
> and memory synchronization.
>
> Let's say we're rendering on ANV with memory fences and presenting on
> some USB display adapter whose kernel driver is a bit old-school.
> When we pass that fence to the other driver via a sync_file or
> similar, that driver may shove that dma_fence into the dma_resv on
> some buffer somewhere.  Then our client, completely unaware of
> internal kernel dependencies, binds that buffer into its address space
> and kicks off another command buffer.  So i915 throws in a dependency
> on that dma_resv which contains the previously created dma_fence and
> refuses to execute any more command buffers until it signals.
> Unfortunately, unbeknownst to i915, that command buffer which the
> client kicked off after doing that bind was required for signaling the
> memory fence on which our first dma_fence depends.  Deadlock.


Nope. Because the waiting for this future fence will only happen in two places:
- driver submit thread, which is just userspace without holding
anything. From the kernel pov this can be preempted, memory
temporarily taken away, all these things. Until that's done you will
_not_ get a real dma_fence, but just another future fence.
- but what about the usb display you're asking? well for that we'll
need a new atomic extension, which takes a timeline syncobj and gives
you back a timeline syncobj. And the rules are that if one of the is a
future fence/userspace fence, so will the other (even if it's created
by the kernel)

Either way you get a timeline syncobj back which anv can then again
handle properly with it's submit thread. Not a dma_fence with a funny
timeout because there's deadlock issues with those.

So no you wont be able to get a dma_fence out of your slight of hands here.

> Sure, we put a timeout on the dma_fence and it will eventually fire
> and unblock everything.  However, there's one very important point
> that's easy to miss here:  Neither i915 nor the client did anything
> wrong in the above scenario.  The Vulkan footgun approach works
> because there are a set of rules and, if you follow those rules,
> you're guaranteed everything works.  In the above scenario, however,
> the client followed all of the rules and got a deadlock anyway.  We
> can't have that.
>
>
> > But what we can do is use the same currently existing container
> > objects like drm_syncobj or sync_file (timeline syncobj would fit best
> > tbh), and stuff a userspace fence behind it. The only trouble is that
> > currently timeline syncobj implement vulkan's spec, which means if you
> > build a wait-before-signal deadlock, you'll wait forever. Well until
> > the user ragequits and kills your process.
>
> Yeah, it may be that this approach can be made to work.  Instead of
> reusing dma_fence, maybe we can reuse syncobj and have another form of
> syncobj which is a memory fence, a value to wait on, and a timeout.

It's going to be the same container. But very much not a dma_fence.

Note the other approach is if you split the kernel's notion of what a
dma_fence is into two parts: memory fence and synchronization
primitive. The trouble is that there's tons of hw for which these are
by necessity the same things (because they can't preempt or dont have
a scheduler), so the value of this for the overall ecosystem is slim.
And the work to make it happen (plump future fences through the
drm/scheduler and everything) is giantic. drm/i915-gem tried, the
result is not pretty and we're now backing it largely all out least
because it's not where hw/vulkan/compute are actually going I think.

So that's an approach which I think does exist in theory, but really
not something I think we should attempt.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [RFC] Linux Graphics Next: Explicit fences everywhere and no BO fences - initial proposal

Reply via email to