在 2018/12/12 20:24, Daniel Vetter 写道: > On Wed, Dec 12, 2018 at 12:40 PM Zhou, David(ChunMing) > <david1.z...@amd.com> wrote: >> + Daniel Rakos and Jason Ekstrand. >> >> Below is the background, which is from Daniel R should be able to explain >> that's why: >> " ISVs, especially those coming from D3D12, are unsatisfied with the >> behavior of the Vulkan semaphores as they are unhappy with the fact that for >> every single dependency they need to use separate semaphores due to their >> binary nature. >> Compared to that a synchronization primitive like D3D12 monitored fences >> enable one of those to be used to track a sequence of operations by simply >> associating timeline values to the completion of individual operations. This >> allows them to track the lifetime and usage of resources and the ordered >> completion of sequences. >> Besides that, they also want to use a single synchronization primitive to be >> able to handle GPU-to-GPU and GPU-to-CPU dependencies, compared to using >> semaphores for the former and fences for the latter. >> In addition, compared to legacy semaphores, timeline semaphores are proposed >> to support wait-before-signal, i.e. allow enqueueing a semaphore wait >> operation with a wait value that is larger than any of the already enqueued >> signal values. This seems to be a hard requirement for ISVs. Without >> UMD-side queue batching, and even UMD-side queue batching doesn’t help the >> situation when such a semaphore is externally shared with another API. Thus >> in order to properly support wait-before-signal the KMD implementation has >> to also be able to support such dependencies. >> " > I was tangetially involved in that wg too, I understand the overall > use-case of vk timelines. I don't understand the exact corner case > here, because I wasn't deeply involved in the details.
all details are here: https://gitlab.khronos.org/vulkan/vulkan/merge_requests/2696 -David > -Daniel > >> Btw, we already add test case to igt, and tested by many existing test, like >> libdrm unit test, igt related test, vulkan cts, and steam games. >> >> -David >>> -----Original Message----- >>> From: Daniel Vetter <dan...@ffwll.ch> >>> Sent: Wednesday, December 12, 2018 7:15 PM >>> To: Koenig, Christian <christian.koe...@amd.com> >>> Cc: Zhou, David(ChunMing) <david1.z...@amd.com>; dri-devel <dri- >>> de...@lists.freedesktop.org>; amd-gfx list <amd-...@lists.freedesktop.org>; >>> intel-gfx <intel-gfx@lists.freedesktop.org>; Christian König >>> <ckoenig.leichtzumer...@gmail.com> >>> Subject: Re: [Intel-gfx] [PATCH 03/10] drm/syncobj: add new >>> drm_syncobj_add_point interface v2 >>> >>> On Wed, Dec 12, 2018 at 12:08 PM Koenig, Christian >>> <christian.koe...@amd.com> wrote: >>>> Am 12.12.18 um 11:49 schrieb Daniel Vetter: >>>>> On Fri, Dec 07, 2018 at 11:54:15PM +0800, Chunming Zhou wrote: >>>>>> From: Christian König <ckoenig.leichtzumer...@gmail.com> >>>>>> >>>>>> Use the dma_fence_chain object to create a timeline of fence >>>>>> objects instead of just replacing the existing fence. >>>>>> >>>>>> v2: rebase and cleanup >>>>>> >>>>>> Signed-off-by: Christian König <christian.koe...@amd.com> >>>>> Somewhat jumping back into this. Not sure we discussed this already >>>>> or not. I'm a bit unclear on why we have to chain the fences in the >>> timeline: >>>>> - The timeline stuff is modelled after the WDDM2 monitored fences. >>> Which >>>>> really are just u64 counters in memory somewhere (I think could be >>>>> system ram or vram). Because WDDM2 has the memory management >>> entirely >>>>> separated from rendering synchronization it totally allows userspace >>>>> to >>>>> create loops and deadlocks and everything else nasty using this - the >>>>> memory manager won't deadlock because these monitored fences >>> never leak >>>>> into the buffer manager. And if CS deadlock, gpu reset takes care of >>>>> the >>>>> mess. >>>>> >>>>> - This has a few consequences, as in they seem to indeed work like a >>>>> memory location: Userspace incrementing out-of-order (because they >>> run >>>>> batches updating the same fence on different engines) is totally fine, >>>>> as is doing anything else "stupid". >>>>> >>>>> - Now on linux we can't allow anything, because we need to make sure >>> that >>>>> deadlocks don't leak into the memory manager. But as long as we block >>>>> until the underlying dma_fence has materialized, nothing userspace can >>>>> do will lead to such a deadlock. Even if userspace ends up submitting >>>>> jobs without enough built-in synchronization, leading to out-of-order >>>>> signalling of fences on that "timeline". And I don't think that would >>>>> pose a problem for us. >>>>> >>>>> Essentially I think we can look at timeline syncobj as a dma_fence >>>>> container indexed through an integer, and there's no need to enforce >>>>> that the timline works like a real dma_fence timeline, with all it's >>>>> guarantees. It's just a pile of (possibly, if userspace is stupid) >>>>> unrelated dma_fences. You could implement the entire thing in >>>>> userspace after all, except for the "we want to share these timeline >>>>> objects between processes" problem. >>>>> >>>>> tldr; I think we can drop the dma_fence_chain complexity completely. >>>>> Or at least I'm not really understanding why it's needed. >>>>> >>>>> Of course that means drivers cannot treat a drm_syncobj timeline as >>>>> a dma_fence timeline. But given the future fences stuff and all >>>>> that, that's already out of the window anyway. >>>>> >>>>> What am I missing? >>>> Good question, since that was exactly my initial idea as well. >>>> >>>> Key point is that our Vulcan guys came back and said that this >>>> wouldn't be sufficient, but I honestly don't fully understand why. >>> Hm, sounds like we really need those testscases (vk cts on top of mesa, igt) >>> so we can talk about the exact corner cases we care about and why. >>> >>> I guess one thing that might happen is that userspace leaves out a number >>> and never sets that fence, relying on the >= semantics of the monitored >>> fence to unblock that thread. E.g. when skipping a frame in one of the >>> auxiliary workloads. For that case we'd need to make sure we don't just wait >>> for the given fence to materialize, but also any fences later in the >>> timeline. >>> >>> But we can't decide that without understanding the actual use-case that >>> needs to be supported at the other end of the stack, and how all the bits in >>> between should look like. >>> >>> I guess we're back to "uapi design without userspace doesn't make sense" ... >>> >>>> Anyway that's why David came up with using the fence array to wait for >>>> all previously added fences, which I then later on extended into this >>>> chain container. >>>> >>>> I have to admit that it is way more defensive implemented this way. E.g. >>>> there is much fewer things userspace can do wrong. >>>> >>>> The principal idea is that when they mess things up they are always >>>> going to wait more than necessary, but never less. >>> That seems against the spirit of vulkan, which is very much about "you get >>> all >>> the pieces". It also might dig us a hole in the future, if we ever get >>> around to >>> moving towards a WDDM2 style memory management model. For future >>> proving I think it would make sense if we implement the minimal uapi we >>> need for vk timelines, not the strictest guarantees we can get away with >>> (without performance impact) with current drivers. >>> -Daniel >>> >>> >>>> Christian. >>>> >>>>> -Daniel >>>>> >>>>>> --- >>>>>> drivers/gpu/drm/drm_syncobj.c | 37 >>> +++++++++++++++++++++++++++++++++++ >>>>>> include/drm/drm_syncobj.h | 5 +++++ >>>>>> 2 files changed, 42 insertions(+) >>>>>> >>>>>> diff --git a/drivers/gpu/drm/drm_syncobj.c >>>>>> b/drivers/gpu/drm/drm_syncobj.c index e19525af0cce..51f798e2194f >>>>>> 100644 >>>>>> --- a/drivers/gpu/drm/drm_syncobj.c >>>>>> +++ b/drivers/gpu/drm/drm_syncobj.c >>>>>> @@ -122,6 +122,43 @@ static void drm_syncobj_remove_wait(struct >>> drm_syncobj *syncobj, >>>>>> spin_unlock(&syncobj->lock); >>>>>> } >>>>>> >>>>>> +/** >>>>>> + * drm_syncobj_add_point - add new timeline point to the syncobj >>>>>> + * @syncobj: sync object to add timeline point do >>>>>> + * @chain: chain node to use to add the point >>>>>> + * @fence: fence to encapsulate in the chain node >>>>>> + * @point: sequence number to use for the point >>>>>> + * >>>>>> + * Add the chain node as new timeline point to the syncobj. >>>>>> + */ >>>>>> +void drm_syncobj_add_point(struct drm_syncobj *syncobj, >>>>>> + struct dma_fence_chain *chain, >>>>>> + struct dma_fence *fence, >>>>>> + uint64_t point) { >>>>>> + struct syncobj_wait_entry *cur, *tmp; >>>>>> + struct dma_fence *prev; >>>>>> + >>>>>> + dma_fence_get(fence); >>>>>> + >>>>>> + spin_lock(&syncobj->lock); >>>>>> + >>>>>> + prev = rcu_dereference_protected(syncobj->fence, >>>>>> + lockdep_is_held(&syncobj->lock)); >>>>>> + dma_fence_chain_init(chain, prev, fence, point); >>>>>> + rcu_assign_pointer(syncobj->fence, &chain->base); >>>>>> + >>>>>> + list_for_each_entry_safe(cur, tmp, &syncobj->cb_list, node) { >>>>>> + list_del_init(&cur->node); >>>>>> + syncobj_wait_syncobj_func(syncobj, cur); >>>>>> + } >>>>>> + spin_unlock(&syncobj->lock); >>>>>> + >>>>>> + /* Walk the chain once to trigger garbage collection */ >>>>>> + dma_fence_chain_for_each(prev, fence); } >>>>>> +EXPORT_SYMBOL(drm_syncobj_add_point); >>>>>> + >>>>>> /** >>>>>> * drm_syncobj_replace_fence - replace fence in a sync object. >>>>>> * @syncobj: Sync object to replace fence in diff --git >>>>>> a/include/drm/drm_syncobj.h b/include/drm/drm_syncobj.h index >>>>>> 7c6ed845c70d..8acb4ae4f311 100644 >>>>>> --- a/include/drm/drm_syncobj.h >>>>>> +++ b/include/drm/drm_syncobj.h >>>>>> @@ -27,6 +27,7 @@ >>>>>> #define __DRM_SYNCOBJ_H__ >>>>>> >>>>>> #include "linux/dma-fence.h" >>>>>> +#include "linux/dma-fence-chain.h" >>>>>> >>>>>> /** >>>>>> * struct drm_syncobj - sync object. >>>>>> @@ -110,6 +111,10 @@ drm_syncobj_fence_get(struct drm_syncobj >>>>>> *syncobj) >>>>>> >>>>>> struct drm_syncobj *drm_syncobj_find(struct drm_file *file_private, >>>>>> u32 handle); >>>>>> +void drm_syncobj_add_point(struct drm_syncobj *syncobj, >>>>>> + struct dma_fence_chain *chain, >>>>>> + struct dma_fence *fence, >>>>>> + uint64_t point); >>>>>> void drm_syncobj_replace_fence(struct drm_syncobj *syncobj, >>>>>> struct dma_fence *fence); >>>>>> int drm_syncobj_find_fence(struct drm_file *file_private, >>>>>> -- >>>>>> 2.17.1 >>>>>> >>>>>> _______________________________________________ >>>>>> Intel-gfx mailing list >>>>>> Intel-gfx@lists.freedesktop.org >>>>>> https://lists.freedesktop.org/mailman/listinfo/intel-gfx >>> >>> -- >>> Daniel Vetter >>> Software Engineer, Intel Corporation >>> +41 (0) 79 365 57 48 - http://blog.ffwll.ch > > _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx