On 02/08/2019 09:10, Koenig, Christian wrote:
Am 02.08.2019 07:38 schrieb Lionel Landwerlin
<lionel.g.landwer...@intel.com>:
On 02/08/2019 08:21, Koenig, Christian wrote:
Am 02.08.2019 07:17 schrieb Lionel Landwerlin
<lionel.g.landwer...@intel.com>
<mailto:lionel.g.landwer...@intel.com>:
On 02/08/2019 08:08, Koenig, Christian wrote:
Hi Lionel,
Well that looks more like your test case is buggy.
According to the code the ctx1 queue always waits for
sem1 and ctx2 queue always waits for sem2.
That's supposed to be the same underlying syncobj because
it's exported from one VkDevice as opaque FD from sem1 and
imported into sem2.
Well than that's still buggy and won't synchronize at all.
When ctx1 waits for a semaphore and then signals the same
semaphore there is no guarantee that ctx2 will run in between
jobs.
It's perfectly valid in this case to first run all jobs from
ctx1 and then all jobs from ctx2.
That's not really how I see the semaphores working.
The spec describe VkSemaphore as an interface to an internal
payload opaque to the application.
When ctx1 waits on the semaphore, it waits on the payload put
there by the previous iteration.
And who says that it's not waiting for it's own previous payload?
That's was I understood from you previous comment : "there is no
guarantee that ctx2 will run in between jobs"
See if the payload is a counter this won't work either. Keep in mind
that this has the semantic of a semaphore. Whoever grabs the semaphore
first wins and can run, everybody else has to wait.
What performs the "grab" here?
I thought that would be vkQueueSubmit().
Since that occuring from a single application thread, that should then
be ordered in execution of ctx1,ctx2,ctx1,...
Thanks for your time on this,
-Lionel
Then it proceeds to signal it by replacing the internal payload.
That's an implementation detail of our sync objects, but I don't think
that this behavior is part of the Vulkan specification.
Regards,
Christian.
ctx2 then waits on that and replaces the payload again with the
new internal synchronization object.
The internal payload is a dma fence in our case and signaling just
replaces a dma fence by another or puts one where there was none
before.
So we should have created a dependecy link between all the
submissions and then should be executed in the order of
QueueSubmit() calls.
-Lionel
It only prevents running both at the same time and as far as I
can see that still works even with threaded submission.
You need at least two semaphores for a tandem submission.
Regards,
Christian.
This way there can't be any Synchronisation between
the two.
Regards,
Christian.
Am 02.08.2019 06:55 schrieb Lionel Landwerlin
<lionel.g.landwer...@intel.com>
<mailto:lionel.g.landwer...@intel.com>:
Hey Christian,
The problem boils down to the fact that we don't
immediately create dma fences when calling
vkQueueSubmit().
This is delayed to a thread.
From a single application thread, you can
QueueSubmit() to 2 queues from 2 different devices.
Each QueueSubmit to one queue has a dependency on the
previous QueueSubmit on the other queue through an
exported/imported semaphore.
From the API point of view the state of the semaphore
should be changed after each QueueSubmit().
The problem is that it's not because of the thread and
because you might have those 2 submission threads tied
to different VkDevice/VkInstance or even different
applications (synchronizing themselves outside the
vulkan API).
Hope that makes sense.
It's not really easy to explain by mail, the best
explanation is probably reading the test :
https://gitlab.freedesktop.org/mesa/crucible/blob/master/src/tests/func/sync/semaphore-fd.c#L788
Like David mentioned you're not running into that
issue right now, because you only dispatch to the
thread under specific conditions.
But I could build a case to force that and likely run
into the same issue.
-Lionel
On 02/08/2019 07:33, Koenig, Christian wrote:
Hi Lionel,
Well could you describe once more what the problem is?
Cause I don't fully understand why a rather normal
tandem submission with two semaphores should fail
in any way.
Regards,
Christian.
Am 02.08.2019 06:28 schrieb Lionel Landwerlin
<lionel.g.landwer...@intel.com>
<mailto:lionel.g.landwer...@intel.com>:
There aren't CTS tests covering the issue I was
mentioning.
But we could add them.
I don't have all the details regarding your
implementation but even with
the "semaphore thread", I could see it running
into the same issues.
What if a mix of binary & timeline semaphores are
handed to vkQueueSubmit()?
For example with queueA & queueB from 2 different
VkDevice :
vkQueueSubmit(queueA, signal semA);
vkQueueSubmit(queueA, wait on [semA,
timelineSemB]); with
timelineSemB triggering a wait before signal.
vkQueueSubmit(queueB, signal semA);
-Lionel
On 02/08/2019 06:18, Zhou, David(ChunMing) wrote:
> Hi Lionel,
>
> By the Queue thread is a heavy thread, which is
always resident in driver during application
running, our guys don't like that. So we switch to
Semaphore Thread, only when waitBeforeSignal of
timeline happens, we spawn a thread to handle that
wait. So we don't have your this issue.
> By the way, I already pass all your CTS cases
for now. I suggest you to switch to Semaphore
Thread instead of Queue Thread as well. It works
very well.
>
> -David
>
> -----Original Message-----
> From: Lionel Landwerlin
<lionel.g.landwer...@intel.com>
<mailto:lionel.g.landwer...@intel.com>
> Sent: Friday, August 2, 2019 4:52 AM
> To: dri-devel <dri-devel@lists.freedesktop.org>
<mailto:dri-devel@lists.freedesktop.org>; Koenig,
Christian <christian.koe...@amd.com>
<mailto:christian.koe...@amd.com>; Zhou,
David(ChunMing) <david1.z...@amd.com>
<mailto:david1.z...@amd.com>; Jason Ekstrand
<ja...@jlekstrand.net> <mailto:ja...@jlekstrand.net>
> Subject: Threaded submission & semaphore sharing
>
> Hi Christian, David,
>
> Sorry to report this so late in the process, but
I think we found an issue not directly related to
syncobj timelines themselves but with a side
effect of the threaded submissions.
>
> Essentially we're failing a test in crucible :
> func.sync.semaphore-fd.opaque-fd
> This test create a single binary semaphore,
shares it between 2 VkDevice/VkQueue.
> Then in a loop it proceeds to submit workload
alternating between the 2 VkQueue with one submit
depending on the other.
> It does so by waiting on the VkSemaphore
signaled in the previous iteration and resignaling it.
>
> The problem for us is that once things are
dispatched to the submission thread, the ordering
of the submission is lost.
> Because we have 2 devices and they both have
their own submission thread.
>
> Jason suggested that we reestablish the ordering
by having semaphores/syncobjs carry an additional
uint64_t payload.
> This 64bit integer would represent be an
identifier that submission threads will
WAIT_FOR_AVAILABLE on.
>
> The scenario would look like this :
> - vkQueueSubmit(queueA, signal on semA);
> - in the caller thread, this would
increment the syncobj additional u64 payload and
return it to userspace.
> - at some point the submission thread
of queueA submits the workload and signal the
syncobj of semA with value returned in the caller
thread of vkQueueSubmit().
> - vkQueueSubmit(queueB, wait on semA);
> - in the caller thread, this would
read the syncobj additional
> u64 payload
> - at some point the submission thread
of queueB will try to submit the work, but first
it will WAIT_FOR_AVAILABLE the u64 value returned
in the step above
>
> Because we want the binary semaphores to be
shared across processes and would like this to
remain a single FD, the simplest location to store
this additional u64 payload would be the DRM syncobj.
> It would need an additional ioctl to read &
increment the value.
>
> What do you think?
>
> -Lionel
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel