FLUSH_VERTICES should be called by mesa/main if it's inside a GL call. st/mesa should only flush bitmaps.
Marek On Fri, Nov 3, 2017 at 6:36 PM, Andres Rodriguez <andre...@gmail.com> wrote: > > > On 2017-11-03 05:17 AM, Nicolai Hähnle wrote: >> >> On 02.11.2017 04:57, Andres Rodriguez wrote: >>> >>> Bits to implement ServerWaitSemaphoreObject/ServerSignalSemaphoreObject >>> >>> Signed-off-by: Andres Rodriguez <andre...@gmail.com> >>> --- >>> src/mesa/state_tracker/st_cb_semaphoreobjects.c | 28 >>> +++++++++++++++++++++++++ >>> 1 file changed, 28 insertions(+) >>> >>> diff --git a/src/mesa/state_tracker/st_cb_semaphoreobjects.c >>> b/src/mesa/state_tracker/st_cb_semaphoreobjects.c >>> index 47ece47..4cff3fd 100644 >>> --- a/src/mesa/state_tracker/st_cb_semaphoreobjects.c >>> +++ b/src/mesa/state_tracker/st_cb_semaphoreobjects.c >>> @@ -1,5 +1,7 @@ >>> + >>> #include "main/imports.h" >>> #include "main/mtypes.h" >>> +#include "main/context.h" >>> #include "main/externalobjects.h" >>> @@ -47,10 +49,36 @@ st_import_semaphoreobj_fd(struct gl_context *ctx, >>> #endif >>> } >>> +static void >>> +st_server_wait_semaphore(struct gl_context *ctx, >>> + struct gl_semaphore_object *semObj) >>> +{ >>> + struct st_semaphore_object *st_obj = st_semaphore_object(semObj); >>> + struct st_context *st = st_context(ctx); >>> + struct pipe_context *pipe = st->pipe; >>> + >>> + _mesa_flush(ctx); >> >> >> What's the justification of this flush? > > > Thanks for all the review feedback :) > > Technically according to the spec we shouldn't need to flush here. But the > flush is required due to how amdgpu handles syncobjs. > > For amdgpu, the wait and signal operations happens on a submission boundary. > I.e, we can say "wait for this syncobj before executing this submission", or > "signal this syncobj after this submission completes". But we can't really > introduce a wait/signal in the middle of a commandstream as the spec > requires us to do. > > This flush (and the one in st_server_signal_semaphore) exist to adjust the > submission boundaries. If we align the submission boundary with the > semaphore operation boundary we can accurately emulate the timing of the > semaphore operations. > > In this case, when st_server_wait_semaphore() is called, the user expects > that any work queued up until this point will not wait for the semaphore. > Any work submitted after the wait call returns should wait for the semaphore > to signal. Hence, we call flush so that the amdgpu winsys will submit all > the current work. Then we queue up the syncobj for future submissions to > wait on. > > In the signal case, we flush so that the syncobj will signal when the > current work finishes (instead of at "current work + N"). That way we don't > have any unnecessary latency. > > However, I'm not 100% certain my approach here is the best solution. My main > concern is that an amdgpu specific detail is surfaced in what should be a > pipe agnostic layer. If you have any recommendations on how to move this > code into the pipe layer, or achieve the same results with a different > strategy I'd love to take the advice. > > From the performance point of view, I'm not as concerned. Wait/signal tend > to be aligned at frame boundaries so flushing there shouldn't be a big deal. > But I do think that improving the code structure from my current iteration > would have value. > > >> >> >>> + pipe->semobj_wait(pipe, st_obj->semaphore); >>> +} >>> + >>> +static void >>> +st_server_signal_semaphore(struct gl_context *ctx, >>> + struct gl_semaphore_object *semObj) >>> +{ >>> + struct st_semaphore_object *st_obj = st_semaphore_object(semObj); >>> + struct st_context *st = st_context(ctx); >>> + struct pipe_context *pipe = st->pipe; >>> + >> >> >> You have to flush state-tracker internal state here. This means >> FLUSH_VERTICES, st_flush_bitmap, possibly others. I don't think we have this >> factored out well yet, but see below. > > > Thanks for pointing this out, this actually gives me an idea for getting rid > of the full flushes, see below. > >> >> >>> + pipe->semobj_signal(pipe, st_obj->semaphore); >>> + _mesa_flush(ctx); >> >> >> What's the justification of this flush? Does EXT_external_objects contain >> any language to guarantee that SignalSemaphoreEXT will "finish in finite >> time"? If no, the flush should probably not be there. If yes, please add a >> spec quote. > > > See above for the flush reasoning. > >> >> Furthermore, this whole code section is a strong hint that merging fences >> and semaphores at the Gallium level is indeed a good idea. Basically, the >> idea is that you'd end up calling >> >> pipe->flush(pipe, &st_obj->semaphore, flags); >> >> where flags is PIPE_FLUSH_DEFERRED | PIPE_FLUSH_REUSE_FENCE or >> PIPE_FLUSH_ASYNC | PIPE_FLUSH_REUSE_FENCE, depending on how the previous >> paragraph is resolved. In one of my recent series that includes deferred >> fences for threaded contexts, I'm already doing the equivalent of adding a >> "PIPE_FLUSH_REUSE_FENCE". We could formalize this as a more broadly >> applicable feature in the Gallium interface. >> > > Thanks for pointing out PIPE_FLUSH_REUSE_FENCE, I wasn't aware of this flag. > > My reasoning for keeping semaphore operations separate is that a different > pipe implementation might be able to emit a wait/signal packet into the > middle of the commandstream, with no flushing required. > > However, that argument has a giant hole in it. I have a _mesa_flush() which > completely negates any benefits of a "pipe agnostic" implementation. So in > this case, as you suggested, I think merging semaphores with re-usable > fences would be a good idea. > > I do have one alternative proposal try to keep the option of > mid-commandstream semaphores open. How about something like: > > flush_state_tracker_internal_state() { > flush_vertices(); > flush_bitmaps(); > } > > st_server_wait_semaphore() { > flush_state_tracker_internal_state(); > pipe->semobj_wait(pipe, st_obj->semaphore); > > // The amdgpu semobj_wait() implementation would > // then be responsible for adjusting the submission > // windows internally, i.e. calling flush() > } > > > st_server_signal_semaphore() { > flush_state_tracker_internal_state(); > pipe->semobj_signal(pipe, st_obj->semaphore); > > // The amdgpu semobj_signal() implementation would > // then be responsible for adjusting the submission > // windows internally, i.e. calling flush() > } > > Let me know what you think of this slightly more agnostic implementation. > > Sorry for the long reply! > > Regards, > Andres > > >> Cheers, >> Nicolai >> >> >>> +} >>> + >>> void >>> st_init_semaphoreobject_functions(struct dd_function_table *functions) >>> { >>> functions->NewSemaphoreObject = st_semaphoreobj_alloc; >>> functions->DeleteSemaphoreObject = st_semaphoreobj_free; >>> functions->ImportSemaphoreFd = st_import_semaphoreobj_fd; >>> + functions->ServerWaitSemaphoreObject = st_server_wait_semaphore; >>> + functions->ServerSignalSemaphoreObject = st_server_signal_semaphore; >>> } >>> >> >> > _______________________________________________ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev