On Tue, 2018-10-16 at 15:12 -0500, Jason Ekstrand wrote: > This reverts commit 0fa9e6d7b304f6a8064ed78a4b9c557e1026e7e5. The real > issue appears to have been that HiZ ops don't like having WM thread > dispatch force-enabled. The previous commit fixes that problem so we > can go back to using the ForceThreadDispatchEnable bit even on SKL+. > --- > src/intel/vulkan/genX_pipeline.c | 42 ++++++-------------------------- > 1 file changed, 7 insertions(+), 35 deletions(-) >
While this patch has landed in master with CC to @stable, I'm adding it to the rejected list for 18.2, as the commit it reverts, 0fa9e6d7b30 ("anv/skylake: disable ForceThreadDispatchEnable"), never landed in 18.2 branch. The reason why 0fa9e6d7b30 didn't arrive to 18.2 was due an error in the fixes tag. This commit fixes abd629eb3d4 "(anv: Stop setting 3DSTATE_PS_EXTRA::PixelShaderHasUAV"), but instead we set in the Fixes: tag 79270d2140ec "(anv: Stop setting 3DSTATE_PS_EXTRA::PixelShaderHasUAV"), which is actually a cherry-pick of the former. Hence that is why our scripts didn't catch it. In any case as 0fa9e6d7b30 ("anv/skylake: disable ForceThreadDispatchEnable") is not in 18.2, no need to revert it, so hence adding this patch to cherry-ignore. J.A. > diff --git a/src/intel/vulkan/genX_pipeline.c > b/src/intel/vulkan/genX_pipeline.c > index 33f1f7832ac..9595a7133ae 100644 > --- a/src/intel/vulkan/genX_pipeline.c > +++ b/src/intel/vulkan/genX_pipeline.c > @@ -1445,12 +1445,12 @@ emit_3dstate_wm(struct anv_pipeline *pipeline, struct > anv_subpass *subpass, > wm.EarlyDepthStencilControl = EDSC_NORMAL; > } > > -#if GEN_GEN == 8 > - /* Gen8 and later hardware tries to compute ThreadDispatchEnable for > - * us but doesn't take into account KillPixels when no depth or > - * stencil writes are enabled. In order for occlusion queries to > - * work correctly with no attachments, we need to force-enable PS > - * thread dispatch. > +#if GEN_GEN >= 8 > + /* Gen8 hardware tries to compute ThreadDispatchEnable for us but > + * doesn't take into account KillPixels when no depth or stencil > + * writes are enabled. In order for occlusion queries to work > + * correctly with no attachments, we need to force-enable PS thread > + * dispatch. > * > * The BDW docs are pretty clear that that this bit isn't validated > * and probably shouldn't be used in production: > @@ -1460,9 +1460,7 @@ emit_3dstate_wm(struct anv_pipeline *pipeline, struct > anv_subpass *subpass, > * > * Unfortunately, however, the other mechanism we have for doing > this > * is 3DSTATE_PS_EXTRA::PixelShaderHasUAV which causes hangs on BDW. > - * Given two bad options, we choose the one which works. On Skylake > - * and later, setting ForceThreadDispatchEnable causes GPU hangs so > - * we use the PixelShaderHasUAV mechanism there. > + * Given two bad options, we choose the one which works. > */ > if ((wm_prog_data->has_side_effects || wm_prog_data->uses_kill) && > !has_color_buffer_write_enabled(pipeline, blend)) > @@ -1665,32 +1663,6 @@ emit_3dstate_ps_extra(struct anv_pipeline *pipeline, > wm_prog_data->uses_kill; > > #if GEN_GEN >= 9 > - /* Gen8 and later hardware tries to compute ThreadDispatchEnable for us > - * but doesn't take into account KillPixels when no depth or stencil > - * writes are enabled. In order for occlusion queries to work > correctly > - * with no attachments, we need to force-enable PS thread dispatch. > - * > - * The stricter cross-primitive coherency guarantees that the hardware > - * gives us with the "Accesses UAV" bit set for at least one shader > stage > - * and the "UAV coherency required" bit set on the 3DPRIMITIVE command > are > - * redundant within the current image, atomic counter and SSBO GL and > - * Vulkan APIs, which all have very loose ordering and coherency > - * requirements and generally rely on the application to insert > explicit > - * barriers when a shader invocation is expected to see the memory > - * writes performed by the invocations of some previous primitive. > - * Regardless of the value of "UAV coherency required", the "Accesses > - * UAV" bits will implicitly cause an in most cases useless DC flush > - * when the lowermost stage with the bit set finishes execution. > - * > - * Unfortunately, however, the other mechanism we have for doing this > is > - * 3DSTATE_WM::ForceThreadDispatchEnable which causes GPU hangs on > - * Skylake and later hardware. On Broadwell, however, setting this bit > - * causes GPU hangs so we use ForceThreadDispatchEnable there. > - */ > - if ((wm_prog_data->has_side_effects || wm_prog_data->uses_kill) && > - !has_color_buffer_write_enabled(pipeline, blend)) > - ps.PixelShaderHasUAV = true; > - > ps.PixelShaderComputesStencil = wm_prog_data->computed_stencil; > ps.PixelShaderPullsBary = wm_prog_data->pulls_bary; > _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev