syncobj: add new drm_syncobj_add_point interface v2

Chunming Zhou Wed, 12 Dec 2018 23:33:46 -0800

在 2018/12/12 20:24, Daniel Vetter 写道:
> On Wed, Dec 12, 2018 at 12:40 PM Zhou, David(ChunMing)
> <david1.z...@amd.com> wrote:
>> + Daniel Rakos and Jason Ekstrand.
>>
>>   Below is the background, which is from Daniel R should  be able to explain 
>> that's why:
>> " ISVs, especially those coming from D3D12, are unsatisfied with the 
>> behavior of the Vulkan semaphores as they are unhappy with the fact that for 
>> every single dependency they need to use separate semaphores due to their 
>> binary nature.
>> Compared to that a synchronization primitive like D3D12 monitored fences 
>> enable one of those to be used to track a sequence of operations by simply 
>> associating timeline values to the completion of individual operations. This 
>> allows them to track the lifetime and usage of resources and the ordered 
>> completion of sequences.
>> Besides that, they also want to use a single synchronization primitive to be 
>> able to handle GPU-to-GPU and GPU-to-CPU dependencies, compared to using 
>> semaphores for the former and fences for the latter.
>> In addition, compared to legacy semaphores, timeline semaphores are proposed 
>> to support wait-before-signal, i.e. allow enqueueing a semaphore wait 
>> operation with a wait value that is larger than any of the already enqueued 
>> signal values. This seems to be a hard requirement for ISVs. Without 
>> UMD-side queue batching, and even UMD-side queue batching doesn’t help the 
>> situation when such a semaphore is externally shared with another API. Thus 
>> in order to properly support wait-before-signal the KMD implementation has 
>> to also be able to support such dependencies.
>> "
> I was tangetially involved in that wg too, I understand the overall
> use-case of vk timelines. I don't understand the exact corner case
> here, because I wasn't deeply involved in the details.



all details are here: 
https://gitlab.khronos.org/vulkan/vulkan/merge_requests/2696

-David

> -Daniel
>
>> Btw, we already add test case to igt, and tested by many existing test, like 
>> libdrm unit test, igt related test, vulkan cts, and steam games.
>>
>> -David
>>> -----Original Message-----
>>> From: Daniel Vetter <dan...@ffwll.ch>
>>> Sent: Wednesday, December 12, 2018 7:15 PM
>>> To: Koenig, Christian <christian.koe...@amd.com>
>>> Cc: Zhou, David(ChunMing) <david1.z...@amd.com>; dri-devel <dri-
>>> de...@lists.freedesktop.org>; amd-gfx list <amd-...@lists.freedesktop.org>;
>>> intel-gfx <intel-gfx@lists.freedesktop.org>; Christian König
>>> <ckoenig.leichtzumer...@gmail.com>
>>> Subject: Re: [Intel-gfx] [PATCH 03/10] drm/syncobj: add new
>>> drm_syncobj_add_point interface v2
>>>
>>> On Wed, Dec 12, 2018 at 12:08 PM Koenig, Christian
>>> <christian.koe...@amd.com> wrote:
>>>> Am 12.12.18 um 11:49 schrieb Daniel Vetter:
>>>>> On Fri, Dec 07, 2018 at 11:54:15PM +0800, Chunming Zhou wrote:
>>>>>> From: Christian König <ckoenig.leichtzumer...@gmail.com>
>>>>>>
>>>>>> Use the dma_fence_chain object to create a timeline of fence
>>>>>> objects instead of just replacing the existing fence.
>>>>>>
>>>>>> v2: rebase and cleanup
>>>>>>
>>>>>> Signed-off-by: Christian König <christian.koe...@amd.com>
>>>>> Somewhat jumping back into this. Not sure we discussed this already
>>>>> or not. I'm a bit unclear on why we have to chain the fences in the
>>> timeline:
>>>>> - The timeline stuff is modelled after the WDDM2 monitored fences.
>>> Which
>>>>>     really are just u64 counters in memory somewhere (I think could be
>>>>>     system ram or vram). Because WDDM2 has the memory management
>>> entirely
>>>>>     separated from rendering synchronization it totally allows userspace 
>>>>> to
>>>>>     create loops and deadlocks and everything else nasty using this - the
>>>>>     memory manager won't deadlock because these monitored fences
>>> never leak
>>>>>     into the buffer manager. And if CS deadlock, gpu reset takes care of 
>>>>> the
>>>>>     mess.
>>>>>
>>>>> - This has a few consequences, as in they seem to indeed work like a
>>>>>     memory location: Userspace incrementing out-of-order (because they
>>> run
>>>>>     batches updating the same fence on different engines) is totally fine,
>>>>>     as is doing anything else "stupid".
>>>>>
>>>>> - Now on linux we can't allow anything, because we need to make sure
>>> that
>>>>>     deadlocks don't leak into the memory manager. But as long as we block
>>>>>     until the underlying dma_fence has materialized, nothing userspace can
>>>>>     do will lead to such a deadlock. Even if userspace ends up submitting
>>>>>     jobs without enough built-in synchronization, leading to out-of-order
>>>>>     signalling of fences on that "timeline". And I don't think that would
>>>>>     pose a problem for us.
>>>>>
>>>>> Essentially I think we can look at timeline syncobj as a dma_fence
>>>>> container indexed through an integer, and there's no need to enforce
>>>>> that the timline works like a real dma_fence timeline, with all it's
>>>>> guarantees. It's just a pile of (possibly, if userspace is stupid)
>>>>> unrelated dma_fences. You could implement the entire thing in
>>>>> userspace after all, except for the "we want to share these timeline
>>>>> objects between processes" problem.
>>>>>
>>>>> tldr; I think we can drop the dma_fence_chain complexity completely.
>>>>> Or at least I'm not really understanding why it's needed.
>>>>>
>>>>> Of course that means drivers cannot treat a drm_syncobj timeline as
>>>>> a dma_fence timeline. But given the future fences stuff and all
>>>>> that, that's already out of the window anyway.
>>>>>
>>>>> What am I missing?
>>>> Good question, since that was exactly my initial idea as well.
>>>>
>>>> Key point is that our Vulcan guys came back and said that this
>>>> wouldn't be sufficient, but I honestly don't fully understand why.
>>> Hm, sounds like we really need those testscases (vk cts on top of mesa, igt)
>>> so we can talk about the exact corner cases we care about and why.
>>>
>>> I guess one thing that might happen is that userspace leaves out a number
>>> and never sets that fence, relying on the >= semantics of the monitored
>>> fence to unblock that thread. E.g. when skipping a frame in one of the
>>> auxiliary workloads. For that case we'd need to make sure we don't just wait
>>> for the given fence to materialize, but also any fences later in the 
>>> timeline.
>>>
>>> But we can't decide that without understanding the actual use-case that
>>> needs to be supported at the other end of the stack, and how all the bits in
>>> between should look like.
>>>
>>> I guess we're back to "uapi design without userspace doesn't make sense" ...
>>>
>>>> Anyway that's why David came up with using the fence array to wait for
>>>> all previously added fences, which I then later on extended into this
>>>> chain container.
>>>>
>>>> I have to admit that it is way more defensive implemented this way. E.g.
>>>> there is much fewer things userspace can do wrong.
>>>>
>>>> The principal idea is that when they mess things up they are always
>>>> going to wait more than necessary, but never less.
>>> That seems against the spirit of vulkan, which is very much about "you get 
>>> all
>>> the pieces". It also might dig us a hole in the future, if we ever get 
>>> around to
>>> moving towards a WDDM2 style memory management model. For future
>>> proving I think it would make sense if we implement the minimal uapi we
>>> need for vk timelines, not the strictest guarantees we can get away with
>>> (without performance impact) with current drivers.
>>> -Daniel
>>>
>>>
>>>> Christian.
>>>>
>>>>> -Daniel
>>>>>
>>>>>> ---
>>>>>>    drivers/gpu/drm/drm_syncobj.c | 37
>>> +++++++++++++++++++++++++++++++++++
>>>>>>    include/drm/drm_syncobj.h     |  5 +++++
>>>>>>    2 files changed, 42 insertions(+)
>>>>>>
>>>>>> diff --git a/drivers/gpu/drm/drm_syncobj.c
>>>>>> b/drivers/gpu/drm/drm_syncobj.c index e19525af0cce..51f798e2194f
>>>>>> 100644
>>>>>> --- a/drivers/gpu/drm/drm_syncobj.c
>>>>>> +++ b/drivers/gpu/drm/drm_syncobj.c
>>>>>> @@ -122,6 +122,43 @@ static void drm_syncobj_remove_wait(struct
>>> drm_syncobj *syncobj,
>>>>>>       spin_unlock(&syncobj->lock);
>>>>>>    }
>>>>>>
>>>>>> +/**
>>>>>> + * drm_syncobj_add_point - add new timeline point to the syncobj
>>>>>> + * @syncobj: sync object to add timeline point do
>>>>>> + * @chain: chain node to use to add the point
>>>>>> + * @fence: fence to encapsulate in the chain node
>>>>>> + * @point: sequence number to use for the point
>>>>>> + *
>>>>>> + * Add the chain node as new timeline point to the syncobj.
>>>>>> + */
>>>>>> +void drm_syncobj_add_point(struct drm_syncobj *syncobj,
>>>>>> +                       struct dma_fence_chain *chain,
>>>>>> +                       struct dma_fence *fence,
>>>>>> +                       uint64_t point) {
>>>>>> +    struct syncobj_wait_entry *cur, *tmp;
>>>>>> +    struct dma_fence *prev;
>>>>>> +
>>>>>> +    dma_fence_get(fence);
>>>>>> +
>>>>>> +    spin_lock(&syncobj->lock);
>>>>>> +
>>>>>> +    prev = rcu_dereference_protected(syncobj->fence,
>>>>>> +                                     lockdep_is_held(&syncobj->lock));
>>>>>> +    dma_fence_chain_init(chain, prev, fence, point);
>>>>>> +    rcu_assign_pointer(syncobj->fence, &chain->base);
>>>>>> +
>>>>>> +    list_for_each_entry_safe(cur, tmp, &syncobj->cb_list, node) {
>>>>>> +            list_del_init(&cur->node);
>>>>>> +            syncobj_wait_syncobj_func(syncobj, cur);
>>>>>> +    }
>>>>>> +    spin_unlock(&syncobj->lock);
>>>>>> +
>>>>>> +    /* Walk the chain once to trigger garbage collection */
>>>>>> +    dma_fence_chain_for_each(prev, fence); }
>>>>>> +EXPORT_SYMBOL(drm_syncobj_add_point);
>>>>>> +
>>>>>>    /**
>>>>>>     * drm_syncobj_replace_fence - replace fence in a sync object.
>>>>>>     * @syncobj: Sync object to replace fence in diff --git
>>>>>> a/include/drm/drm_syncobj.h b/include/drm/drm_syncobj.h index
>>>>>> 7c6ed845c70d..8acb4ae4f311 100644
>>>>>> --- a/include/drm/drm_syncobj.h
>>>>>> +++ b/include/drm/drm_syncobj.h
>>>>>> @@ -27,6 +27,7 @@
>>>>>>    #define __DRM_SYNCOBJ_H__
>>>>>>
>>>>>>    #include "linux/dma-fence.h"
>>>>>> +#include "linux/dma-fence-chain.h"
>>>>>>
>>>>>>    /**
>>>>>>     * struct drm_syncobj - sync object.
>>>>>> @@ -110,6 +111,10 @@ drm_syncobj_fence_get(struct drm_syncobj
>>>>>> *syncobj)
>>>>>>
>>>>>>    struct drm_syncobj *drm_syncobj_find(struct drm_file *file_private,
>>>>>>                                    u32 handle);
>>>>>> +void drm_syncobj_add_point(struct drm_syncobj *syncobj,
>>>>>> +                       struct dma_fence_chain *chain,
>>>>>> +                       struct dma_fence *fence,
>>>>>> +                       uint64_t point);
>>>>>>    void drm_syncobj_replace_fence(struct drm_syncobj *syncobj,
>>>>>>                              struct dma_fence *fence);
>>>>>>    int drm_syncobj_find_fence(struct drm_file *file_private,
>>>>>> --
>>>>>> 2.17.1
>>>>>>
>>>>>> _______________________________________________
>>>>>> Intel-gfx mailing list
>>>>>> Intel-gfx@lists.freedesktop.org
>>>>>> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
>>>
>>> --
>>> Daniel Vetter
>>> Software Engineer, Intel Corporation
>>> +41 (0) 79 365 57 48 - http://blog.ffwll.ch
>
>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH 03/10] drm/syncobj: add new drm_syncobj_add_point interface v2

Reply via email to