On Thu, Oct 06, 2016 at 03:21:54PM -0700, Nanley Chery wrote: > Provides an FPS increase of ~30% on the Sascha triangle and multisampling > demos.
After attempting to enable fast depth clears in more areas, I noticed something possibly worth sharing. Most of the performance gain from this patch isn't due to the delta in fast depth clear's performance compared to that of a regular depth clear with HiZ enabled. Most of the gain is from the avoidance of from meta's suboptimal interaction with the HiZ buffer, which if I understand correctly was as follows: 1. HiZ Resolve 2. Depth clear With HiZ enabled 3. Depth Resolve A similar increase in performance arose for slow depth clears once the meta path was replaced with blorp at commit d823f92970447859c4891728da4e48f0c9bc0044 . It seems like the actual fast vs slow depth clear delta is only roughly 1%. I obtained that figure by simply looking at the FPS counter across several runs. - Nanley > > Signed-off-by: Nanley Chery <nanley.g.ch...@intel.com> > Reviewed-by: Jason Ekstrand <ja...@jlekstrand.net> (v2) > > --- > v3. Emit required clear_params packet (Chad) > Share clear_params code path IVB+ (Jason) > > src/intel/vulkan/anv_pass.c | 13 +++++++++++++ > src/intel/vulkan/genX_cmd_buffer.c | 24 ++++++++++++++++++++++-- > 2 files changed, 35 insertions(+), 2 deletions(-) > > diff --git a/src/intel/vulkan/anv_pass.c b/src/intel/vulkan/anv_pass.c > index 69c3c7e..595c2ea 100644 > --- a/src/intel/vulkan/anv_pass.c > +++ b/src/intel/vulkan/anv_pass.c > @@ -155,5 +155,18 @@ void anv_GetRenderAreaGranularity( > VkRenderPass renderPass, > VkExtent2D* pGranularity) > { > + ANV_FROM_HANDLE(anv_render_pass, pass, renderPass); > + > + /* This granularity satisfies HiZ fast clear alignment requirements > + * for all sample counts. > + */ > + for (unsigned i = 0; i < pass->subpass_count; ++i) { > + if (pass->subpasses[i].depth_stencil_attachment != > + VK_ATTACHMENT_UNUSED) { > + *pGranularity = (VkExtent2D) { .width = 8, .height = 4 }; > + return; > + } > + } > + > *pGranularity = (VkExtent2D) { 1, 1 }; > } > diff --git a/src/intel/vulkan/genX_cmd_buffer.c > b/src/intel/vulkan/genX_cmd_buffer.c > index ed6a109..4089fc7 100644 > --- a/src/intel/vulkan/genX_cmd_buffer.c > +++ b/src/intel/vulkan/genX_cmd_buffer.c > @@ -1318,8 +1318,27 @@ cmd_buffer_emit_depth_stencil(struct anv_cmd_buffer > *cmd_buffer) > anv_batch_emit(&cmd_buffer->batch, GENX(3DSTATE_STENCIL_BUFFER), sb); > } > > - /* Clear the clear params. */ > - anv_batch_emit(&cmd_buffer->batch, GENX(3DSTATE_CLEAR_PARAMS), cp); > + /* From the IVB PRM Vol2P1, 11.5.5.4 3DSTATE_CLEAR_PARAMS: > + * > + * 3DSTATE_CLEAR_PARAMS must always be programmed in the along with > + * the other Depth/Stencil state commands(i.e. 3DSTATE_DEPTH_BUFFER, > + * 3DSTATE_STENCIL_BUFFER, or 3DSTATE_HIER_DEPTH_BUFFER) > + * > + * Testing also shows that some variant of this restriction may exist > HSW+. > + * On BDW+, it is not possible to emit 2 of these packets consecutively > when > + * both have DepthClearValueValid set. An analysis of such state > programming > + * on SKL showed that the GPU doesn't register the latter packet's clear > + * value. > + */ > + anv_batch_emit(&cmd_buffer->batch, GENX(3DSTATE_CLEAR_PARAMS), cp) { > + if (has_hiz) { > + cp.DepthClearValueValid = true; > + const uint32_t ds = > + cmd_buffer->state.subpass->depth_stencil_attachment; > + cp.DepthClearValue = > + cmd_buffer->state.attachments[ds].clear_value.depthStencil.depth; > + } > + } > } > > static void > @@ -1332,6 +1351,7 @@ genX(cmd_buffer_set_subpass)(struct anv_cmd_buffer > *cmd_buffer, > > cmd_buffer_emit_depth_stencil(cmd_buffer); > genX(cmd_buffer_emit_hz_op)(cmd_buffer, BLORP_HIZ_OP_HIZ_RESOLVE); > + genX(cmd_buffer_emit_hz_op)(cmd_buffer, BLORP_HIZ_OP_DEPTH_CLEAR); > > anv_cmd_buffer_clear_subpass(cmd_buffer); > } > -- > 2.10.0 > _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev