Alejandro Piñeiro <apinhe...@igalia.com> writes: > Fixes: > GL44-CTS.texture_barrier_ARB.same-texel-rw-multipass > > On Haswell, Broadwell and Skylake (note that in order to execute > that test, it is needed to override GL and GLSL versions). > > I was not able to find a documentation reference that justifies it.
> --- > > Having said, I didn't find a documentation reference explicitly > mention that this is needed. > > Initially I thought that a flag was missing when calling > emit_pipe_control_flush at brw_emit_mi_flush, but it was not the case > as far as I saw. Then I noted that there is a gen6 workaround on that > code: > > if (brw->gen == 6) { > /* Hardware workaround: SNB B-Spec says: > * > * [Dev-SNB{W/A}]: Before a PIPE_CONTROL with Write Cache > * Flush Enable =1, a PIPE_CONTROL with any non-zero > * post-sync-op is required. > */ > brw_emit_post_sync_nonzero_flush(brw); > } > > I tested calling that method for any gen, guessing if the workaround > was needed also for other gens, and the test got fixed. But looking at > the documentation of other gens, I didn't find the need for this > workaround. For that reason I moved to use gen7_emit_cs_stall, that is > less agressive and get the test fixed too. It seems that in order to > get a complete flush you need a cs stall flush with a > pipe_control_write. But again, I didn't find any reference at the PRMs > confirming it. > > Intuitively, this would be needed on brw_emit_mi_flush or even at > brw_emit_pipe_control_flush (this one already include some > gen-specific workarounds), but I prefered to keep it on the only place > that seems to need it for now. > > In addition to solve that CTS test, it also gets it passing for the > test I recently sent to the piglit list, and not included on master > yet (acked for now): > https://lists.freedesktop.org/archives/piglit/2016-June/020055.html > > That piglit patch adds 48 parameter combination for the basic > test. Without this mesa patch 5-6 subtests fails. With this patch all > of them passes. Tested on Haswell, Broadwell and Skylake too. > I believe this test is hitting the same hardware race condition that most callers of brw_emit_mi_flush() suffer from: The problem of brw_emit_mi_flush() is that, even though it is supposed to both invalidate R/O caches (e.g. the sampler caches) and flush R/W caches (e.g. the render cache), the former happens at the top of the pipeline (i.e. as soon as the CS processor parses the PIPE_CONTROL command, irrespective of whether a concurrent rendering workload could pollute a R/O cache again in parallel), while the latter happens at the bottom of the pipeline (i.e. after any concurrent rendering completes). The gen7_emit_cs_stall_flush() call you have introduced seems to fix the issue because it forces additional serialization with respect to previous rendering commands before the R/O caches are invalidated, which is a clear indicative that you're hitting the same bug. The right way to fix it would be to remove the brw_emit_mi_flush() call for Gen6+ at least (brw_emit_mi_flush() is BTW a pretty big hammer and causes a bunch of other caches to be flushed which aren't necessarily relevant to texture barrier), and instead call brw_emit_pipe_control_flush() twice: The first PIPE_CONTROL command should have at least RENDER_TARGET_FLUSH and CS_STALL set to initiate a render cache flush after any concurrent rendering completes and cause the CS to stop parsing commands until the render cache becomes coherent with memory (the DEPTH_CACHE_FLUSH bit may also be necessary for some workloads using depth texturing). The second PIPE_CONTROL should have TEXTURE_CACHE_INVALIDATE set (and no CS stall) to clean up any stale data from the sampler caches before rendering continues. See 0aa4f99f562a05880a779707cbcd46be459863bf for how I addressed the same problem in the L3 cache partitioning code (where I noticed the problem originally), or 72473658c51d5e074ce219c1e6385a4cce29f467 for how Ken fixed the same issue in the draw-time surface validation path. Incidentally I had written some code just a couple of days ago to address the same issue in the implementation of glMemoryBarrier (I'll send it for review soon-ish). There are likely many more instances of this race condition in the driver, most callers of brw_emit_mi_flush are suspect... > src/mesa/drivers/dri/i965/intel_tex.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/src/mesa/drivers/dri/i965/intel_tex.c > b/src/mesa/drivers/dri/i965/intel_tex.c > index cac33ac..e7459cd 100644 > --- a/src/mesa/drivers/dri/i965/intel_tex.c > +++ b/src/mesa/drivers/dri/i965/intel_tex.c > @@ -362,6 +362,7 @@ intel_texture_barrier(struct gl_context *ctx) > { > struct brw_context *brw = brw_context(ctx); > > + gen7_emit_cs_stall_flush(brw); > brw_emit_mi_flush(brw); > } > > -- > 2.7.4 > > _______________________________________________ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
signature.asc
Description: PGP signature
_______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev