Went through the spec VUs again and there is a bit that says that binary semaphores must have all of the their dependencies submitted for execution. So essentially it means in our implementation of QueueSubmit() we should be able to do a short wait_for_available on all of those which does ensure everything is properly ordered.

So I think you're right on this David, waitBeforeSignal on binary semaphore is out of scope.

Again sorry for the noise.

-Lionel

On 02/08/2019 13:01, zhoucm1 wrote:



On 2019年08月02日 17:41, Lionel Landwerlin wrote:
Hey David,

On 02/08/2019 12:11, zhoucm1 wrote:

Hi Lionel,

For binary semaphore, I guess every one will think application will guarantee wait is behind the signal, whenever the semaphore is shared or used in internal-process.

I think below two options can fix your problem:

a. Can we extend vkWaitForFence so that it can be able to wait on fence-available? If fence is available, then it's safe to do semaphore wait in vkQueueSubmit.


I'm sorry, but I don't understand what vkWaitForFence() has to do with this problem.

They test case we're struggling with doesn't use that API.


Can you maybe explain a bit more how it relates?

vkQueueSubmit()
vkWaitForFenceAvailable()
vkQueueSubmit()
vkWaitForFenceAvailable()
vkQueueSubmit()
vkWaitForFenceAvailable()

Sorry, that could lead application program very ugly.


b. Make waitBeforeSignal is valid for binary semaphore as well, as that way, It is reasonable to add wait/signal counting for binary syncobj.


Yeah essentially the change we're proposing internally makes binary semaphores use syncobj timelines.

Will you raise up a MR to add the language of waitBeforeSignal support of binary semaphore to vulkan spec?

-David

There is just another u64 associated with them.


-Lionel



-David


On 2019年08月02日 14:27, Lionel Landwerlin wrote:
On 02/08/2019 09:10, Koenig, Christian wrote:


Am 02.08.2019 07:38 schrieb Lionel Landwerlin <lionel.g.landwer...@intel.com>:

    On 02/08/2019 08:21, Koenig, Christian wrote:



        Am 02.08.2019 07:17 schrieb Lionel Landwerlin
        <lionel.g.landwer...@intel.com>
        <mailto:lionel.g.landwer...@intel.com>:

            On 02/08/2019 08:08, Koenig, Christian wrote:

                Hi Lionel,

                Well that looks more like your test case is buggy.

                According to the code the ctx1 queue always waits
                for sem1 and ctx2 queue always waits for sem2.


            That's supposed to be the same underlying syncobj
            because it's exported from one VkDevice as opaque FD
            from sem1 and imported into sem2.


        Well than that's still buggy and won't synchronize at all.

        When ctx1 waits for a semaphore and then signals the same
        semaphore there is no guarantee that ctx2 will run in
        between jobs.

        It's perfectly valid in this case to first run all jobs
        from ctx1 and then all jobs from ctx2.


    That's not really how I see the semaphores working.

    The spec describe VkSemaphore as an interface to an internal
    payload opaque to the application.


    When ctx1 waits on the semaphore, it waits on the payload put
    there by the previous iteration.


And who says that it's not waiting for it's own previous payload?


That's was I understood from you previous comment : "there is no guarantee that ctx2 will run in between jobs"



See if the payload is a counter this won't work either. Keep in mind that this has the semantic of a semaphore. Whoever grabs the semaphore first wins and can run, everybody else has to wait.


What performs the "grab" here?

I thought that would be vkQueueSubmit().

Since that occuring from a single application thread, that should then be ordered in execution of ctx1,ctx2,ctx1,...


Thanks for your time on this,


-Lionel



    Then it proceeds to signal it by replacing the internal payload.


That's an implementation detail of our sync objects, but I don't think that this behavior is part of the Vulkan specification.

Regards,
Christian.


    ctx2 then waits on that and replaces the payload again with
    the new internal synchronization object.


    The internal payload is a dma fence in our case and signaling
    just replaces a dma fence by another or puts one where there
    was none before.

    So we should have created a dependecy link between all the
    submissions and then should be executed in the order of
    QueueSubmit() calls.


    -Lionel



        It only prevents running both at the same time and as far
        as I can see that still works even with threaded submission.

        You need at least two semaphores for a tandem submission.

        Regards,
        Christian.



                This way there can't be any Synchronisation
                between the two.

                Regards,
                Christian.

                Am 02.08.2019 06:55 schrieb Lionel Landwerlin
                <lionel.g.landwer...@intel.com>
                <mailto:lionel.g.landwer...@intel.com>:
                Hey Christian,

                The problem boils down to the fact that we don't
                immediately create dma fences when calling
                vkQueueSubmit().
                This is delayed to a thread.

                From a single application thread, you can
                QueueSubmit() to 2 queues from 2 different devices.
                Each QueueSubmit to one queue has a dependency on
                the previous QueueSubmit on the other queue
                through an exported/imported semaphore.

                From the API point of view the state of the
                semaphore should be changed after each QueueSubmit().
                The problem is that it's not because of the thread
                and because you might have those 2 submission
                threads tied to different VkDevice/VkInstance or
                even different applications (synchronizing
                themselves outside the vulkan API).

                Hope that makes sense.
                It's not really easy to explain by mail, the best
                explanation is probably reading the test :
                
https://gitlab.freedesktop.org/mesa/crucible/blob/master/src/tests/func/sync/semaphore-fd.c#L788

                Like David mentioned you're not running into that
                issue right now, because you only dispatch to the
                thread under specific conditions.
                But I could build a case to force that and likely
                run into the same issue.

                -Lionel

                On 02/08/2019 07:33, Koenig, Christian wrote:

                    Hi Lionel,

                    Well could you describe once more what the
                    problem is?

                    Cause I don't fully understand why a rather
                    normal tandem submission with two semaphores
                    should fail in any way.

                    Regards,
                    Christian.

                    Am 02.08.2019 06:28 schrieb Lionel Landwerlin
                    <lionel.g.landwer...@intel.com>
                    <mailto:lionel.g.landwer...@intel.com>:
                    There aren't CTS tests covering the issue I
                    was mentioning.
                    But we could add them.

                    I don't have all the details regarding your
                    implementation but even with
                    the "semaphore thread", I could see it running
                    into the same issues.
                    What if a mix of binary & timeline semaphores
                    are handed to vkQueueSubmit()?

                    For example with queueA & queueB from 2
                    different VkDevice :
                    vkQueueSubmit(queueA, signal semA);
                    vkQueueSubmit(queueA, wait on [semA,
                    timelineSemB]); with
                    timelineSemB triggering a wait before signal.
                    vkQueueSubmit(queueB, signal semA);


                    -Lionel

                    On 02/08/2019 06:18, Zhou, David(ChunMing) wrote:
                    > Hi Lionel,
                    >
                    > By the Queue thread is a heavy thread, which
                    is always resident in driver during
                    application running, our guys don't like that.
                    So we switch to Semaphore Thread, only when
                    waitBeforeSignal of timeline happens, we spawn
                    a thread to handle that wait. So we don't have
                    your this issue.
                    > By the way, I already pass all your CTS
                    cases for now. I suggest you to switch to
                    Semaphore Thread instead of Queue Thread as
                    well. It works very well.
                    >
                    > -David
                    >
                    > -----Original Message-----
                    > From: Lionel Landwerlin
                    <lionel.g.landwer...@intel.com>
                    <mailto:lionel.g.landwer...@intel.com>
                    > Sent: Friday, August 2, 2019 4:52 AM
                    > To: dri-devel
                    <dri-devel@lists.freedesktop.org>
                    <mailto:dri-devel@lists.freedesktop.org>;
                    Koenig, Christian <christian.koe...@amd.com>
                    <mailto:christian.koe...@amd.com>; Zhou,
                    David(ChunMing) <david1.z...@amd.com>
                    <mailto:david1.z...@amd.com>; Jason Ekstrand
                    <ja...@jlekstrand.net>
                    <mailto:ja...@jlekstrand.net>
                    > Subject: Threaded submission & semaphore sharing
                    >
                    > Hi Christian, David,
                    >
                    > Sorry to report this so late in the process,
                    but I think we found an issue not directly
                    related to syncobj timelines themselves but
                    with a side effect of the threaded submissions.
                    >
                    > Essentially we're failing a test in crucible :
                    > func.sync.semaphore-fd.opaque-fd
                    > This test create a single binary semaphore,
                    shares it between 2 VkDevice/VkQueue.
                    > Then in a loop it proceeds to submit
                    workload alternating between the 2 VkQueue
                    with one submit depending on the other.
                    > It does so by waiting on the VkSemaphore
                    signaled in the previous iteration and
                    resignaling it.
                    >
                    > The problem for us is that once things are
                    dispatched to the submission thread, the
                    ordering of the submission is lost.
                    > Because we have 2 devices and they both have
                    their own submission thread.
                    >
                    > Jason suggested that we reestablish the
                    ordering by having semaphores/syncobjs carry
                    an additional uint64_t payload.
                    > This 64bit integer would represent be an
                    identifier that submission threads will
                    WAIT_FOR_AVAILABLE on.
                    >
                    > The scenario would look like this :
                    >       - vkQueueSubmit(queueA, signal on semA);
                    >           - in the caller thread, this would
                    increment the syncobj additional u64 payload
                    and return it to userspace.
                    >           - at some point the submission
                    thread of queueA submits the workload and
                    signal the syncobj of semA with value returned
                    in the caller thread of vkQueueSubmit().
                    >       - vkQueueSubmit(queueB, wait on semA);
                    >           - in the caller thread, this would
                    read the syncobj additional
                    > u64 payload
                    >           - at some point the submission
                    thread of queueB will try to submit the work,
                    but first it will WAIT_FOR_AVAILABLE the u64
                    value returned in the step above
                    >
                    > Because we want the binary semaphores to be
                    shared across processes and would like this to
                    remain a single FD, the simplest location to
                    store this additional u64 payload would be the
                    DRM syncobj.
                    > It would need an additional ioctl to read &
                    increment the value.
                    >
                    > What do you think?
                    >
                    > -Lionel












_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Reply via email to