On Skylake enabling of ForceThreadDispatchEnable causes gpu-hang. -v2: enabling of ForceThreadDispatchEnable is only for gen8, for gen9 and higher reverted enabling of PixelShaderHasUAV.
CC: Jason Ekstrand <jason.ekstr...@intel.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107941 Fixes: 79270d2140ec (anv: Stop setting 3DSTATE_PS_EXTRA::PixelShaderHasUAV) Signed-off-by: Sergii Romantsov <sergii.romant...@globallogic.com> --- src/intel/vulkan/genX_pipeline.c | 33 ++++++++++++++++++++++++++++++++- 1 file changed, 32 insertions(+), 1 deletion(-) diff --git a/src/intel/vulkan/genX_pipeline.c b/src/intel/vulkan/genX_pipeline.c index 9595a71..b469270 100644 --- a/src/intel/vulkan/genX_pipeline.c +++ b/src/intel/vulkan/genX_pipeline.c @@ -1445,7 +1445,7 @@ emit_3dstate_wm(struct anv_pipeline *pipeline, struct anv_subpass *subpass, wm.EarlyDepthStencilControl = EDSC_NORMAL; } -#if GEN_GEN >= 8 +#if GEN_GEN == 8 /* Gen8 hardware tries to compute ThreadDispatchEnable for us but * doesn't take into account KillPixels when no depth or stencil * writes are enabled. In order for occlusion queries to work @@ -1663,6 +1663,37 @@ emit_3dstate_ps_extra(struct anv_pipeline *pipeline, wm_prog_data->uses_kill; #if GEN_GEN >= 9 + /* The stricter cross-primitive coherency guarantees that the hardware + * gives us with the "Accesses UAV" bit set for at least one shader stage + * and the "UAV coherency required" bit set on the 3DPRIMITIVE command are + * redundant within the current image, atomic counter and SSBO GL APIs, + * which all have very loose ordering and coherency requirements and + * generally rely on the application to insert explicit barriers when a + * shader invocation is expected to see the memory writes performed by the + * invocations of some previous primitive. Regardless of the value of + * "UAV coherency required", the "Accesses UAV" bits will implicitly cause + * an in most cases useless DC flush when the lowermost stage with the bit + * set finishes execution. + * + * It would be nice to disable it, but in some cases we can't because on + * Gen8+ it also has an influence on rasterization via the PS UAV-only + * signal (which could be set independently from the coherency mechanism + * in the 3DSTATE_WM command on Gen7), and because in some cases it will + * determine whether the hardware skips execution of the fragment shader + * or not via the ThreadDispatchEnable signal. However if we know that + * GEN8_PS_BLEND_HAS_WRITEABLE_RT is going to be set and + * GEN8_PSX_PIXEL_SHADER_NO_RT_WRITE is not set it shouldn't make any + * difference so we may just disable it here. + * + * Gen8 hardware tries to compute ThreadDispatchEnable for us but doesn't + * take into account KillPixels when no depth or stencil writes are + * enabled. In order for occlusion queries to work correctly with no + * attachments, we need to force-enable here. + */ + if ((wm_prog_data->has_side_effects || wm_prog_data->uses_kill) && + !has_color_buffer_write_enabled(pipeline, blend)) + ps.PixelShaderHasUAV = true; + ps.PixelShaderComputesStencil = wm_prog_data->computed_stencil; ps.PixelShaderPullsBary = wm_prog_data->pulls_bary; -- 2.7.4 _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev