On Thu, Jun 15, 2017 at 4:11 AM, Chris Wilson <ch...@chris-wilson.co.uk> wrote:
> Quoting Kenneth Graunke (2017-06-14 21:41:56) > > On Tuesday, June 13, 2017 2:53:24 PM PDT Jason Ekstrand wrote: > > > From: Topi Pohjolainen <topi.pohjolai...@intel.com> > > > > > > v2 (Jason Ekstrand): > > > - Take a flags parameter to control the flushes > > > - Refactoring > > > > > > Signed-off-by: Topi Pohjolainen <topi.pohjolai...@intel.com> > > > --- > > > src/mesa/drivers/dri/i965/brw_context.h | 1 + > > > src/mesa/drivers/dri/i965/brw_pipe_control.c | 96 > +++++++++++++++++++++++++++- > > > 2 files changed, 96 insertions(+), 1 deletion(-) > > > > > > diff --git a/src/mesa/drivers/dri/i965/brw_context.h > b/src/mesa/drivers/dri/i965/brw_context.h > > > index 7b9be8a..b137409 100644 > > > --- a/src/mesa/drivers/dri/i965/brw_context.h > > > +++ b/src/mesa/drivers/dri/i965/brw_context.h > > > @@ -1641,6 +1641,7 @@ void brw_emit_pipe_control_flush(struct > brw_context *brw, uint32_t flags); > > > void brw_emit_pipe_control_write(struct brw_context *brw, uint32_t > flags, > > > struct brw_bo *bo, uint32_t offset, > > > uint64_t imm); > > > +void brw_emit_end_of_pipe_sync(struct brw_context *brw, uint32_t > flags); > > > void brw_emit_mi_flush(struct brw_context *brw); > > > void brw_emit_post_sync_nonzero_flush(struct brw_context *brw); > > > void brw_emit_depth_stall_flushes(struct brw_context *brw); > > > diff --git a/src/mesa/drivers/dri/i965/brw_pipe_control.c > b/src/mesa/drivers/dri/i965/brw_pipe_control.c > > > index 39bb9c7..338e4fc 100644 > > > --- a/src/mesa/drivers/dri/i965/brw_pipe_control.c > > > +++ b/src/mesa/drivers/dri/i965/brw_pipe_control.c > > > @@ -271,7 +271,6 @@ gen7_emit_cs_stall_flush(struct brw_context *brw) > > > brw->workaround_bo, 0, 0); > > > } > > > > > > - > > > /** > > > * Emits a PIPE_CONTROL with a non-zero post-sync operation, for > > > * implementing two workarounds on gen6. From section 1.4.7.1 > > > @@ -320,6 +319,101 @@ brw_emit_post_sync_nonzero_flush(struct > brw_context *brw) > > > brw->workaround_bo, 0, 0); > > > } > > > > > > +/* > > > + * From Sandybridge PRM, volume 2, "1.7.2 End-of-Pipe > Synchronization": > > > + * > > > + * Write synchronization is a special case of end-of-pipe > > > + * synchronization that requires that the render cache and/or depth > > > + * related caches are flushed to memory, where the data will become > > > + * globally visible. This type of synchronization is required prior > to > > > + * SW (CPU) actually reading the result data from memory, or > initiating > > > + * an operation that will use as a read surface (such as a texture > > > + * surface) a previous render target and/or depth/stencil buffer > > > + * > > > + * > > > + * From Haswell PRM, volume 2, part 1, "End-of-Pipe Synchronization": > > > + * > > > + * Exercising the write cache flush bits (Render Target Cache Flush > > > + * Enable, Depth Cache Flush Enable, DC Flush) in PIPE_CONTROL only > > > + * ensures the write caches are flushed and doesn't guarantee the > data > > > + * is globally visible. > > > + * > > > + * SW can track the completion of the end-of-pipe-synchronization by > > > + * using "Notify Enable" and "PostSync Operation - Write Immediate > > > + * Data" in the PIPE_CONTROL command. > > > + */ > > > +void > > > +brw_emit_end_of_pipe_sync(struct brw_context *brw, uint32_t flags) > > > +{ > > > + if (brw->gen >= 6) { > > > + /* From Sandybridge PRM, volume 2, "1.7.3.1 Writing a Value to > Memory": > > > + * > > > + * "The most common action to perform upon reaching a > synchronization > > > + * point is to write a value out to memory. An immediate > value > > > + * (included with the synchronization command) may be > written." > > > + * > > > + * > > > + * From Broadwell PRM, volume 7, "End-of-Pipe Synchronization": > > > + * > > > + * "In case the data flushed out by the render engine is to > be read > > > + * back in to the render engine in coherent manner, then the > render > > > + * engine has to wait for the fence completion before > accessing the > > > + * flushed data. This can be achieved by following means on > various > > > + * products: PIPE_CONTROL command with CS Stall and the > required > > > + * write caches flushed with Post-Sync-Operation as Write > Immediate > > > + * Data. > > > + * > > > + * Example: > > > + * - Workload-1 (3D/GPGPU/MEDIA) > > > + * - PIPE_CONTROL (CS Stall, Post-Sync-Operation Write > Immediate > > > + * Data, Required Write Cache Flush bits set) > > > + * - Workload-2 (Can use the data produce or output by > Workload-1) > > > + */ > > > + brw_emit_pipe_control_write(brw, > > > + flags | PIPE_CONTROL_CS_STALL | > > > + PIPE_CONTROL_WRITE_IMMEDIATE, > > > + brw->workaround_bo, 0, 0); > > > + > > > + if (brw->is_haswell) { > > > + /* Haswell needs addition work-arounds: > > > + * > > > + * From Haswell PRM, volume 2, part 1, "End-of-Pipe > Synchronization": > > > + * > > > + * Option 1: > > > + * PIPE_CONTROL command with the CS Stall and the > required write > > > + * caches flushed with Post-SyncOperation as Write > Immediate Data > > > + * followed by eight dummy MI_STORE_DATA_IMM (write to > scratch > > > + * spce) commands. > > > + * > > > + * Example: > > > + * - Workload-1 > > > + * - PIPE_CONTROL (CS Stall, Post-Sync-Operation Write > > > + * Immediate Data, Required Write Cache Flush bits > set) > > > + * - MI_STORE_DATA_IMM (8 times) (Dummy data, Scratch > Address) > > > + * - Workload-2 (Can use the data produce or output by > > > + * Workload-1) > > > + * > > > + * Unfortunately, both the PRMs and the internal docs are a > bit > > > + * out-of-date in this regard. What the windows driver does > (and > > > + * this appears to actually work) is to emit a register read > from the > > > + * memory address written by the pipe control above. > > > + * > > > + * What register we load into doesn't matter. We choose an > indirect > > > + * rendering register because we know it always exists and > doesn't > > > + * require command parser support. This is perfectly safe > to do > > > + * since we always re-load all of the indirect draw > registers right > > > + * before 3DPRIMITIVE when needed anyway. > > > + */ > > > + brw_load_register_mem(brw, GEN7_3DPRIM_START_INSTANCE, > > > + brw->workaround_bo, > > > + I915_GEM_DOMAIN_INSTRUCTION, 0, 0); > > > > This does require command parser support. Your MI_LOAD_REGISTER_MEM > will be > > converted to MI_NOOP if the command parser is not enabled. That means > that > > the workaround effectively won't happen for users with kernels older > than 4.2 > > (specifically kernel sha 245054a1fe33c06ad233e0d58a27ec7b64db9284). > > > > ChromeOS in particular was running a 3.8 based kernel, with no command > > parser, last I checked, which means they won't get this bug fix. > Applying > > a kernel patch may actually be a faster route to getting this bug fixed, > > for them, at least. For normal Linux distro users, this Mesa patch is > > probably the fastest approach. ChromeOS should update their kernel to > 4.4+ > > anyway. > > The kernel does have a LRI after a flush before signaling the batch is > complete. I don't see a need to add another... > > The question is whether this posting is required for GPU visibility of > results or just CPU? I suspect this is just for CPU in which case it > doesn't belong here at all, but before flagging rendering as ready for > async (i.e. not involving the kernel) inspection. > The docs, if you choose to believe them, seem to indicate that this is needed for GPU visibility as well as CPU.
_______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev