On 02/08/2019 08:21, Koenig, Christian wrote:


Am 02.08.2019 07:17 schrieb Lionel Landwerlin <lionel.g.landwer...@intel.com>:

    On 02/08/2019 08:08, Koenig, Christian wrote:

        Hi Lionel,

        Well that looks more like your test case is buggy.

        According to the code the ctx1 queue always waits for sem1 and
        ctx2 queue always waits for sem2.


    That's supposed to be the same underlying syncobj because it's
    exported from one VkDevice as opaque FD from sem1 and imported
    into sem2.


Well than that's still buggy and won't synchronize at all.

When ctx1 waits for a semaphore and then signals the same semaphore there is no guarantee that ctx2 will run in between jobs.

It's perfectly valid in this case to first run all jobs from ctx1 and then all jobs from ctx2.


That's not really how I see the semaphores working.

The spec describe VkSemaphore as an interface to an internal payload opaque to the application.


When ctx1 waits on the semaphore, it waits on the payload put there by the previous iteration.

Then it proceeds to signal it by replacing the internal payload.


ctx2 then waits on that and replaces the payload again with the new internal synchronization object.


The internal payload is a dma fence in our case and signaling just replaces a dma fence by another or puts one where there was none before.

So we should have created a dependecy link between all the submissions and then should be executed in the order of QueueSubmit() calls.


-Lionel



It only prevents running both at the same time and as far as I can see that still works even with threaded submission.

You need at least two semaphores for a tandem submission.

Regards,
Christian.



        This way there can't be any Synchronisation between the two.

        Regards,
        Christian.

        Am 02.08.2019 06:55 schrieb Lionel Landwerlin
        <lionel.g.landwer...@intel.com>
        <mailto:lionel.g.landwer...@intel.com>:
        Hey Christian,

        The problem boils down to the fact that we don't immediately
        create dma fences when calling vkQueueSubmit().
        This is delayed to a thread.

        From a single application thread, you can QueueSubmit() to 2
        queues from 2 different devices.
        Each QueueSubmit to one queue has a dependency on the previous
        QueueSubmit on the other queue through an exported/imported
        semaphore.

        From the API point of view the state of the semaphore should
        be changed after each QueueSubmit().
        The problem is that it's not because of the thread and because
        you might have those 2 submission threads tied to different
        VkDevice/VkInstance or even different applications
        (synchronizing themselves outside the vulkan API).

        Hope that makes sense.
        It's not really easy to explain by mail, the best explanation
        is probably reading the test :
        
https://gitlab.freedesktop.org/mesa/crucible/blob/master/src/tests/func/sync/semaphore-fd.c#L788

        Like David mentioned you're not running into that issue right
        now, because you only dispatch to the thread under specific
        conditions.
        But I could build a case to force that and likely run into the
        same issue.

        -Lionel

        On 02/08/2019 07:33, Koenig, Christian wrote:

            Hi Lionel,

            Well could you describe once more what the problem is?

            Cause I don't fully understand why a rather normal tandem
            submission with two semaphores should fail in any way.

            Regards,
            Christian.

            Am 02.08.2019 06:28 schrieb Lionel Landwerlin
            <lionel.g.landwer...@intel.com>
            <mailto:lionel.g.landwer...@intel.com>:
            There aren't CTS tests covering the issue I was mentioning.
            But we could add them.

            I don't have all the details regarding your implementation
            but even with
            the "semaphore thread", I could see it running into the
            same issues.
            What if a mix of binary & timeline semaphores are handed
            to vkQueueSubmit()?

            For example with queueA & queueB from 2 different VkDevice :
                 vkQueueSubmit(queueA, signal semA);
                 vkQueueSubmit(queueA, wait on [semA, timelineSemB]);
            with
            timelineSemB triggering a wait before signal.
                 vkQueueSubmit(queueB, signal semA);


            -Lionel

            On 02/08/2019 06:18, Zhou, David(ChunMing) wrote:
            > Hi Lionel,
            >
            > By the Queue thread is a heavy thread, which is always
            resident in driver during application running, our guys
            don't like that. So we switch to Semaphore Thread, only
            when waitBeforeSignal of timeline happens, we spawn a
            thread to handle that wait. So we don't have your this issue.
            > By the way, I already pass all your CTS cases for now. I
            suggest you to switch to Semaphore Thread instead of Queue
            Thread as well. It works very well.
            >
            > -David
            >
            > -----Original Message-----
            > From: Lionel Landwerlin <lionel.g.landwer...@intel.com>
            <mailto:lionel.g.landwer...@intel.com>
            > Sent: Friday, August 2, 2019 4:52 AM
            > To: dri-devel <dri-devel@lists.freedesktop.org>
            <mailto:dri-devel@lists.freedesktop.org>; Koenig,
            Christian <christian.koe...@amd.com>
            <mailto:christian.koe...@amd.com>; Zhou, David(ChunMing)
            <david1.z...@amd.com> <mailto:david1.z...@amd.com>; Jason
            Ekstrand <ja...@jlekstrand.net> <mailto:ja...@jlekstrand.net>
            > Subject: Threaded submission & semaphore sharing
            >
            > Hi Christian, David,
            >
            > Sorry to report this so late in the process, but I think
            we found an issue not directly related to syncobj
            timelines themselves but with a side effect of the
            threaded submissions.
            >
            > Essentially we're failing a test in crucible :
            > func.sync.semaphore-fd.opaque-fd
            > This test create a single binary semaphore, shares it
            between 2 VkDevice/VkQueue.
            > Then in a loop it proceeds to submit workload
            alternating between the 2 VkQueue with one submit
            depending on the other.
            > It does so by waiting on the VkSemaphore signaled in the
            previous iteration and resignaling it.
            >
            > The problem for us is that once things are dispatched to
            the submission thread, the ordering of the submission is lost.
            > Because we have 2 devices and they both have their own
            submission thread.
            >
            > Jason suggested that we reestablish the ordering by
            having semaphores/syncobjs carry an additional uint64_t
            payload.
            > This 64bit integer would represent be an identifier that
            submission threads will WAIT_FOR_AVAILABLE on.
            >
            > The scenario would look like this :
            >       - vkQueueSubmit(queueA, signal on semA);
            >           - in the caller thread, this would increment
            the syncobj additional u64 payload and return it to userspace.
            >           - at some point the submission thread of
            queueA submits the workload and signal the syncobj of semA
            with value returned in the caller thread of vkQueueSubmit().
            >       - vkQueueSubmit(queueB, wait on semA);
            >           - in the caller thread, this would read the
            syncobj additional
            > u64 payload
            >           - at some point the submission thread of
            queueB will try to submit the work, but first it will
            WAIT_FOR_AVAILABLE the u64 value returned in the step above
            >
            > Because we want the binary semaphores to be shared
            across processes and would like this to remain a single
            FD, the simplest location to store this additional u64
            payload would be the DRM syncobj.
            > It would need an additional ioctl to read & increment
            the value.
            >
            > What do you think?
            >
            > -Lionel






_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Reply via email to