Re: [Mesa-dev] [RFC] Linux Graphics Next: Explicit fences everywhere and no BO fences - initial proposal

Christian König Tue, 27 Apr 2021 08:14:06 -0700

Uff good question. DMA-buf certainly supports that use case, but I haveno idea if that is actually used somewhere.


Daniel do you know any case?


Christian.

Am 27.04.21 um 15:26 schrieb Marek Olšák:

Ok. So that would only make the following use cases broken for now:
- amd render -> external gpu
- amd video encode -> network device

What about the case when we get a buffer from an external device andwe're supposed to make it "busy" when we are using it, and theexternal device wants to wait until we stop using it? Is it somethingthat can happen, thus turning "external -> amd" into "external <-> amd"?


Marek

On Tue., Apr. 27, 2021, 08:50 Christian König,<ckoenig.leichtzumer...@gmail.com<mailto:ckoenig.leichtzumer...@gmail.com>> wrote:


    Only amd -> external.

    We can easily install something in an user queue which waits for a
    dma_fence in the kernel.

    But we can't easily wait for an user queue as dependency of a
    dma_fence.

    The good thing is we have this wait before signal case on Vulkan
    timeline semaphores which have the same problem in the kernel.

    The good news is I think we can relatively easily convert i915 and
    older amdgpu device to something which is compatible with user fences.

    So yes, getting that fixed case by case should work.

    Christian

    Am 27.04.21 um 14:46 schrieb Marek Olšák:

    I'll defer to Christian and Alex to decide whether dropping sync
    with non-amd devices (GPUs, cameras etc.) is acceptable.

    Rewriting those drivers to this new sync model could be done on a
    case by case basis.

    For now, would we only lose the "amd -> external" dependency? Or
    the "external -> amd" dependency too?

    Marek

    On Tue., Apr. 27, 2021, 08:15 Daniel Vetter, <dan...@ffwll.ch
    <mailto:dan...@ffwll.ch>> wrote:

        On Tue, Apr 27, 2021 at 2:11 PM Marek Olšák <mar...@gmail.com
        <mailto:mar...@gmail.com>> wrote:
        > Ok. I'll interpret this as "yes, it will work, let's do it".

        It works if all you care about is drm/amdgpu. I'm not sure
        that's a
        reasonable approach for upstream, but it definitely is an
        approach :-)

        We've already gone somewhat through the pain of drm/amdgpu
        redefining
        how implicit sync works without sufficiently talking with other
        people, maybe we should avoid a repeat of this ...
        -Daniel

        >
        > Marek
        >
        > On Tue., Apr. 27, 2021, 08:06 Christian König,
        <ckoenig.leichtzumer...@gmail.com
        <mailto:ckoenig.leichtzumer...@gmail.com>> wrote:
        >>
        >> Correct, we wouldn't have synchronization between device
        with and without user queues any more.
        >>
        >> That could only be a problem for A+I Laptops.
        >>
        >> Memory management will just work with preemption fences
        which pause the user queues of a process before evicting
        something. That will be a dma_fence, but also a well known
        approach.
        >>
        >> Christian.
        >>
        >> Am 27.04.21 um 13:49 schrieb Marek Olšák:
        >>
        >> If we don't use future fences for DMA fences at all, e.g.
        we don't use them for memory management, it can work, right?
        Memory management can suspend user queues anytime. It doesn't
        need to use DMA fences. There might be something that I'm
        missing here.
        >>
        >> What would we lose without DMA fences? Just inter-device
        synchronization? I think that might be acceptable.
        >>
        >> The only case when the kernel will wait on a future fence
        is before a page flip. Everything today already depends on
        userspace not hanging the gpu, which makes everything a
        future fence.
        >>
        >> Marek
        >>
        >> On Tue., Apr. 27, 2021, 04:02 Daniel Vetter,
        <dan...@ffwll.ch <mailto:dan...@ffwll.ch>> wrote:
        >>>
        >>> On Mon, Apr 26, 2021 at 04:59:28PM -0400, Marek Olšák wrote:
        >>> > Thanks everybody. The initial proposal is dead. Here
        are some thoughts on
        >>> > how to do it differently.
        >>> >
        >>> > I think we can have direct command submission from
        userspace via
        >>> > memory-mapped queues ("user queues") without changing
        window systems.
        >>> >
        >>> > The memory management doesn't have to use GPU page
        faults like HMM.
        >>> > Instead, it can wait for user queues of a specific
        process to go idle and
        >>> > then unmap the queues, so that userspace can't submit
        anything. Buffer
        >>> > evictions, pinning, etc. can be executed when all
        queues are unmapped
        >>> > (suspended). Thus, no BO fences and page faults are needed.
        >>> >
        >>> > Inter-process synchronization can use timeline
        semaphores. Userspace will
        >>> > query the wait and signal value for a shared buffer
        from the kernel. The
        >>> > kernel will keep a history of those queries to know
        which process is
        >>> > responsible for signalling which buffer. There is only
        the wait-timeout
        >>> > issue and how to identify the culprit. One of the
        solutions is to have the
        >>> > GPU send all GPU signal commands and all timed out wait
        commands via an
        >>> > interrupt to the kernel driver to monitor and validate
        userspace behavior.
        >>> > With that, it can be identified whether the culprit is
        the waiting process
        >>> > or the signalling process and which one. Invalid
        signal/wait parameters can
        >>> > also be detected. The kernel can force-signal only the
        semaphores that time
        >>> > out, and punish the processes which caused the timeout
        or used invalid
        >>> > signal/wait parameters.
        >>> >
        >>> > The question is whether this synchronization solution
        is robust enough for
        >>> > dma_fence and whatever the kernel and window systems need.
        >>>
        >>> The proper model here is the preempt-ctx dma_fence that
        amdkfd uses
        >>> (without page faults). That means dma_fence for
        synchronization is doa, at
        >>> least as-is, and we're back to figuring out the winsys
        problem.
        >>>
        >>> "We'll solve it with timeouts" is very tempting, but
        doesn't work. It's
        >>> akin to saying that we're solving deadlock issues in a
        locking design by
        >>> doing a global s/mutex_lock/mutex_lock_timeout/ in the
        kernel. Sure it
        >>> avoids having to reach the reset button, but that's about it.
        >>>
        >>> And the fundamental problem is that once you throw in
        userspace command
        >>> submission (and syncing, at least within the userspace
        driver, otherwise
        >>> there's kinda no point if you still need the kernel for
        cross-engine sync)
        >>> means you get deadlocks if you still use dma_fence for
        sync under
        >>> perfectly legit use-case. We've discussed that one ad
        nauseam last summer:
        >>>
        >>>
        
https://dri.freedesktop.org/docs/drm/driver-api/dma-buf.html?highlight=dma_fence#indefinite-dma-fences
        
<https://dri.freedesktop.org/docs/drm/driver-api/dma-buf.html?highlight=dma_fence#indefinite-dma-fences>
        >>>
        >>> See silly diagramm at the bottom.
        >>>
        >>> Now I think all isn't lost, because imo the first step to
        getting to this
        >>> brave new world is rebuilding the driver on top of
        userspace fences, and
        >>> with the adjusted cmd submit model. You probably don't
        want to use amdkfd,
        >>> but port that as a context flag or similar to render
        nodes for gl/vk. Of
        >>> course that means you can only use this mode in headless,
        without
        >>> glx/wayland winsys support, but it's a start.
        >>> -Daniel
        >>>
        >>> >
        >>> > Marek
        >>> >
        >>> > On Tue, Apr 20, 2021 at 4:34 PM Daniel Stone
        <dan...@fooishbar.org <mailto:dan...@fooishbar.org>> wrote:
        >>> >
        >>> > > Hi,
        >>> > >
        >>> > > On Tue, 20 Apr 2021 at 20:30, Daniel Vetter
        <dan...@ffwll.ch <mailto:dan...@ffwll.ch>> wrote:
        >>> > >
        >>> > >> The thing is, you can't do this in drm/scheduler. At
        least not without
        >>> > >> splitting up the dma_fence in the kernel into
        separate memory fences
        >>> > >> and sync fences
        >>> > >
        >>> > >
        >>> > > I'm starting to think this thread needs its own
        glossary ...
        >>> > >
        >>> > > I propose we use 'residency fence' for execution
        fences which enact
        >>> > > memory-residency operations, e.g. faulting in a page
        ultimately depending
        >>> > > on GPU work retiring.
        >>> > >
        >>> > > And 'value fence' for the pure-userspace model
        suggested by timeline
        >>> > > semaphores, i.e. fences being (*addr == val) rather
        than being able to look
        >>> > > at ctx seqno.
        >>> > >
        >>> > > Cheers,
        >>> > > Daniel
        >>> > > _______________________________________________
        >>> > > mesa-dev mailing list
        >>> > > mesa-dev@lists.freedesktop.org
        <mailto:mesa-dev@lists.freedesktop.org>
        >>> > >
        https://lists.freedesktop.org/mailman/listinfo/mesa-dev
        <https://lists.freedesktop.org/mailman/listinfo/mesa-dev>
        >>> > >
        >>>
        >>> --
        >>> Daniel Vetter
        >>> Software Engineer, Intel Corporation
        >>> http://blog.ffwll.ch <http://blog.ffwll.ch>
        >>
        >>
        >> _______________________________________________
        >> mesa-dev mailing list
        >> mesa-dev@lists.freedesktop.org
        <mailto:mesa-dev@lists.freedesktop.org>
        >> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
        <https://lists.freedesktop.org/mailman/listinfo/mesa-dev>
        >>
        >>

--Daniel Vetter

        Software Engineer, Intel Corporation
        http://blog.ffwll.ch <http://blog.ffwll.ch>

_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [RFC] Linux Graphics Next: Explicit fences everywhere and no BO fences - initial proposal

Reply via email to