Re: [Mesa-dev] [PATCH] i965: adds gen7_emit_cs_stall_flush on intel_texture_barrier

Francisco Jerez Tue, 28 Jun 2016 13:48:04 -0700

Alejandro Piñeiro <apinhe...@igalia.com> writes:

> Fixes:
> GL44-CTS.texture_barrier_ARB.same-texel-rw-multipass
>
> On Haswell, Broadwell and Skylake (note that in order to execute
> that test, it is needed to override GL and GLSL versions).
>
> I was not able to find a documentation reference that justifies it.


> ---
>
> Having said, I didn't find a documentation reference explicitly
> mention that this is needed.
>
> Initially I thought that a flag was missing when calling
> emit_pipe_control_flush at brw_emit_mi_flush, but it was not the case
> as far as I saw.  Then I noted that there is a gen6 workaround on that
> code:
>
>          if (brw->gen == 6) {
>             /* Hardware workaround: SNB B-Spec says:
>              *
>              * [Dev-SNB{W/A}]: Before a PIPE_CONTROL with Write Cache
>              * Flush Enable =1, a PIPE_CONTROL with any non-zero
>              * post-sync-op is required.
>              */
>             brw_emit_post_sync_nonzero_flush(brw);
>          }
>
> I tested calling that method for any gen, guessing if the workaround
> was needed also for other gens, and the test got fixed. But looking at
> the documentation of other gens, I didn't find the need for this
> workaround. For that reason I moved to use gen7_emit_cs_stall, that is
> less agressive and get the test fixed too. It seems that in order to
> get a complete flush you need a cs stall flush with a
> pipe_control_write. But again, I didn't find any reference at the PRMs
> confirming it.
>
> Intuitively, this would be needed on brw_emit_mi_flush or even at
> brw_emit_pipe_control_flush (this one already include some
> gen-specific workarounds), but I prefered to keep it on the only place
> that seems to need it for now.
>
> In addition to solve that CTS test, it also gets it passing for the
> test I recently sent to the piglit list, and not included on master
> yet (acked for now):
> https://lists.freedesktop.org/archives/piglit/2016-June/020055.html
>
> That piglit patch adds 48 parameter combination for the basic
> test. Without this mesa patch 5-6 subtests fails. With this patch all
> of them passes. Tested on Haswell, Broadwell and Skylake too.
>
I believe this test is hitting the same hardware race condition that
most callers of brw_emit_mi_flush() suffer from: The problem of
brw_emit_mi_flush() is that, even though it is supposed to both
invalidate R/O caches (e.g. the sampler caches) and flush R/W caches
(e.g. the render cache), the former happens at the top of the pipeline
(i.e. as soon as the CS processor parses the PIPE_CONTROL command,
irrespective of whether a concurrent rendering workload could pollute a
R/O cache again in parallel), while the latter happens at the bottom of
the pipeline (i.e. after any concurrent rendering completes).

The gen7_emit_cs_stall_flush() call you have introduced seems to fix the
issue because it forces additional serialization with respect to
previous rendering commands before the R/O caches are invalidated, which
is a clear indicative that you're hitting the same bug.  The right way
to fix it would be to remove the brw_emit_mi_flush() call for Gen6+ at
least (brw_emit_mi_flush() is BTW a pretty big hammer and causes a bunch
of other caches to be flushed which aren't necessarily relevant to
texture barrier), and instead call brw_emit_pipe_control_flush() twice:
The first PIPE_CONTROL command should have at least RENDER_TARGET_FLUSH
and CS_STALL set to initiate a render cache flush after any concurrent
rendering completes and cause the CS to stop parsing commands until the
render cache becomes coherent with memory (the DEPTH_CACHE_FLUSH bit may
also be necessary for some workloads using depth texturing).  The second
PIPE_CONTROL should have TEXTURE_CACHE_INVALIDATE set (and no CS stall)
to clean up any stale data from the sampler caches before rendering
continues.

See 0aa4f99f562a05880a779707cbcd46be459863bf for how I addressed the
same problem in the L3 cache partitioning code (where I noticed the
problem originally), or 72473658c51d5e074ce219c1e6385a4cce29f467 for how
Ken fixed the same issue in the draw-time surface validation path.
Incidentally I had written some code just a couple of days ago to
address the same issue in the implementation of glMemoryBarrier (I'll
send it for review soon-ish).  There are likely many more instances of
this race condition in the driver, most callers of brw_emit_mi_flush are
suspect...

>  src/mesa/drivers/dri/i965/intel_tex.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/src/mesa/drivers/dri/i965/intel_tex.c 
> b/src/mesa/drivers/dri/i965/intel_tex.c
> index cac33ac..e7459cd 100644
> --- a/src/mesa/drivers/dri/i965/intel_tex.c
> +++ b/src/mesa/drivers/dri/i965/intel_tex.c
> @@ -362,6 +362,7 @@ intel_texture_barrier(struct gl_context *ctx)
>  {
>     struct brw_context *brw = brw_context(ctx);
>  
> +   gen7_emit_cs_stall_flush(brw);
>     brw_emit_mi_flush(brw);
>  }
>  
> -- 
> 2.7.4
>
> _______________________________________________
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev

signature.asc
Description: PGP signature

_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] i965: adds gen7_emit_cs_stall_flush on intel_texture_barrier

Reply via email to