Yes, exactly that's my thinking and also the reason why I'm pondering so hard on the requirement that the memory for shared user fences should not be modifiable by userspace directly.

Christian.

Am 29.05.21 um 05:33 schrieb Marek Olšák:
My first email can be ignored except for the sync files. Oh well.

I think I see what you mean, Christian. If we assume that an imported fence is always read only (the buffer with the sequence number is read only), only the process that created and exported the fence can signal it. If the fence is not signaled, the exporting process is guilty. The only thing the importing process must do when it's about to use the fence as a dependency is to notify the kernel about it. Thus, the kernel will always know the dependency graph. Then if the importing process times out, the kernel will blame any of the processes that passed it a fence that is still unsignaled. The kernel will blame the process that timed out only if all imported fences have been signaled. It seems pretty robust.

It's the same with implicit sync except that the buffer with the sequence number is writable. Any process that has an implicitly-sync'd buffer can set the sequence number to 0 or UINT64_MAX. 0 will cause a timeout for the next job, while UINT64_MAX might cause a timeout a little later. The timeout can be mitigated by the kernel because the kernel knows the greatest number that should be there, but it's not possible to know which process is guilty (all processes holding the buffer handle would be suspects).

Marek

On Fri, May 28, 2021 at 6:25 PM Marek Olšák <mar...@gmail.com <mailto:mar...@gmail.com>> wrote:

    If both implicit and explicit synchronization are handled the
    same, then the kernel won't be able to identify the process that
    caused an implicit sync deadlock. The process that is stuck
    waiting for a fence can be innocent, and the kernel can't punish
    it. Likewise, the GPU reset guery that reports which process is
    guilty and innocent will only be able to report unknown. Is that OK?

    Marek

    On Fri, May 28, 2021 at 10:41 AM Christian König
    <ckoenig.leichtzumer...@gmail.com
    <mailto:ckoenig.leichtzumer...@gmail.com>> wrote:

        Hi Marek,

        well I don't think that implicit and explicit synchronization
        needs to be mutual exclusive.

        What we should do is to have the ability to transport an
        synchronization object with each BO.

        Implicit and explicit synchronization then basically become
        the same, they just transport the synchronization object
        differently.

        The biggest problem are the sync_files for Android, since they
        are really not easy to support at all. If Android wants to
        support user queues we would probably have to do some changes
        there.

        Regards,
        Christian.

        Am 27.05.21 um 23:51 schrieb Marek Olšák:
        Hi,

        Since Christian believes that we can't deadlock the kernel
        with some changes there, we just need to make everything nice
        for userspace too. Instead of explaining how it will work, I
        will explain the cases where future hardware (and its kernel
        driver) will break existing userspace in order to protect
        everybody from deadlocks. Anything that uses implicit sync
        will be spared, so X and Wayland will be fine, assuming they
        don't import/export fences. Those use cases that do
        import/export fences might or might not work, depending on
        how the fences are used.

        One of the necessities is that all fences will become future
        fences. The semantics of imported/exported fences will change
        completely and will have new restrictions on the usage. The
        restrictions are:


        1) Android sync files will be impossible to support, so won't
        be supported. (they don't allow future fences)


        2) Implicit sync and explicit sync will be mutually exclusive
        between process. A process can either use one or the other,
        but not both. This is meant to prevent a deadlock condition
        with future fences where any process can malevolently
        deadlock execution of any other process, even execution of a
        higher-privileged process. The kernel will impose the
        following restrictions to protect against the deadlock:

        a) a process with an implicitly-sync'd imported/exported
        buffer can't import/export a fence from/to another process
        b) a process with an imported/exported fence can't
        import/export an implicitly-sync'd buffer from/to another process

        Alternative: A higher-privileged process could enforce both
        restrictions instead of the kernel to protect itself from the
        deadlock, but this would be a can of worms for existing
        userspace. It would be better if the kernel just broke unsafe
        userspace on future hw, just like sync files.

        If both implicit and explicit sync are allowed to occur
        simultaneously, sending a future fence that will never signal
        to any process will deadlock that process after it acquires
        the implicit sync lock, which is a sequence number that the
        process is required to write to memory and send an interrupt
        from the GPU in a finite time. This is how the deadlock can
        happen:

        * The process gets sequence number N from the kernel for an
        implicitly-sync'd buffer.
        * The process inserts (into the GPU user-mapped queue) a wait
        for sequence number N-1.
        * The process inserts a wait for a fence, but it doesn't know
        that it will never signal ==> deadlock.
        ...
        * The process inserts a command to write sequence number N to
        a predetermined memory location. (which will make the buffer
        idle and send an interrupt to the kernel)
        ...
        * The kernel will terminate the process because it has never
        received the interrupt. (i.e. a less-privileged process just
        killed a more-privileged process)

        It's the interrupt for implicit sync that never arrived that
        caused the termination, and the only way another process can
        cause it is by sending a fence that will never signal. Thus,
        importing/exporting fences from/to other processes can't be
        allowed simultaneously with implicit sync.


        3) Compositors (and other privileged processes, and display
        flipping) can't trust imported/exported fences. They need a
        timeout recovery mechanism from the beginning, and the
        following are some possible solutions to timeouts:

        a) use a CPU wait with a small absolute timeout, and display
        the previous content on timeout
        b) use a GPU wait with a small absolute timeout, and
        conditional rendering will choose between the latest content
        (if signalled) and previous content (if timed out)

        The result would be that the desktop can run close to 60 fps
        even if an app runs at 1 fps.

        *Redefining imported/exported fences and breaking some
        users/OSs is the only way to have userspace GPU command
        submission, and the deadlock example here is the
        counterexample proving that there is no other way.*

        So, what are the chances this is going to fly with the ecosystem?

        Thanks,
        Marek


Reply via email to