Hi Marek,
well I don't think that implicit and explicit synchronization needs to
be mutual exclusive.
What we should do is to have the ability to transport an synchronization
object with each BO.
Implicit and explicit synchronization then basically become the same,
they just transport the synchronization object differently.
The biggest problem are the sync_files for Android, since they are
really not easy to support at all. If Android wants to support user
queues we would probably have to do some changes there.
Regards,
Christian.
Am 27.05.21 um 23:51 schrieb Marek Olšák:
Hi,
Since Christian believes that we can't deadlock the kernel with some
changes there, we just need to make everything nice for userspace too.
Instead of explaining how it will work, I will explain the cases where
future hardware (and its kernel driver) will break existing userspace
in order to protect everybody from deadlocks. Anything that uses
implicit sync will be spared, so X and Wayland will be fine, assuming
they don't import/export fences. Those use cases that do import/export
fences might or might not work, depending on how the fences are used.
One of the necessities is that all fences will become future fences.
The semantics of imported/exported fences will change completely and
will have new restrictions on the usage. The restrictions are:
1) Android sync files will be impossible to support, so won't be
supported. (they don't allow future fences)
2) Implicit sync and explicit sync will be mutually exclusive between
process. A process can either use one or the other, but not both. This
is meant to prevent a deadlock condition with future fences where any
process can malevolently deadlock execution of any other process, even
execution of a higher-privileged process. The kernel will impose the
following restrictions to protect against the deadlock:
a) a process with an implicitly-sync'd imported/exported buffer can't
import/export a fence from/to another process
b) a process with an imported/exported fence can't import/export an
implicitly-sync'd buffer from/to another process
Alternative: A higher-privileged process could enforce both
restrictions instead of the kernel to protect itself from the
deadlock, but this would be a can of worms for existing userspace. It
would be better if the kernel just broke unsafe userspace on future
hw, just like sync files.
If both implicit and explicit sync are allowed to occur
simultaneously, sending a future fence that will never signal to any
process will deadlock that process after it acquires the implicit sync
lock, which is a sequence number that the process is required to write
to memory and send an interrupt from the GPU in a finite time. This is
how the deadlock can happen:
* The process gets sequence number N from the kernel for an
implicitly-sync'd buffer.
* The process inserts (into the GPU user-mapped queue) a wait for
sequence number N-1.
* The process inserts a wait for a fence, but it doesn't know that it
will never signal ==> deadlock.
...
* The process inserts a command to write sequence number N to a
predetermined memory location. (which will make the buffer idle and
send an interrupt to the kernel)
...
* The kernel will terminate the process because it has never received
the interrupt. (i.e. a less-privileged process just killed a
more-privileged process)
It's the interrupt for implicit sync that never arrived that caused
the termination, and the only way another process can cause it is by
sending a fence that will never signal. Thus, importing/exporting
fences from/to other processes can't be allowed simultaneously with
implicit sync.
3) Compositors (and other privileged processes, and display flipping)
can't trust imported/exported fences. They need a timeout recovery
mechanism from the beginning, and the following are some possible
solutions to timeouts:
a) use a CPU wait with a small absolute timeout, and display the
previous content on timeout
b) use a GPU wait with a small absolute timeout, and conditional
rendering will choose between the latest content (if signalled) and
previous content (if timed out)
The result would be that the desktop can run close to 60 fps even if
an app runs at 1 fps.
*Redefining imported/exported fences and breaking some users/OSs is
the only way to have userspace GPU command submission, and the
deadlock example here is the counterexample proving that there is no
other way.*
So, what are the chances this is going to fly with the ecosystem?
Thanks,
Marek