On Fri, Jan 19, 2018 at 03:47:34PM -0800, Jason Ekstrand wrote: > Even though the blorp pass looks a bit on the sketchy side, the end > result in the Vulkan driver is very nice. Instead of having this weird > case where you do a fast clear and then maybe have to resolve, we just > do the ambiguate and are done with it. The ambiguate does exactly what > we want of setting all the CCS values to 0 which puts it inot the ^ in > pass-through state. > > This should also improve performance a bit in certain cases. For > instance, if we did a transition from UNDEFINED to GENERAL for a surface > that doesn't have CCS enabled all the time, we would end up doing a > fast-clear and then a full resolve which ends up touching every byte in > the main surface as well as the CCS. With the ambiguate pass, that > transition only touches the CCS. > --- > src/intel/vulkan/anv_blorp.c | 5 ++++ > src/intel/vulkan/genX_cmd_buffer.c | 54 > +++++++++----------------------------- > 2 files changed, 17 insertions(+), 42 deletions(-) > > diff --git a/src/intel/vulkan/anv_blorp.c b/src/intel/vulkan/anv_blorp.c > index 05efc6d..3698543 100644 > --- a/src/intel/vulkan/anv_blorp.c > +++ b/src/intel/vulkan/anv_blorp.c > @@ -1792,6 +1792,11 @@ anv_image_ccs_op(struct anv_cmd_buffer *cmd_buffer, > surf.surf->format, > isl_to_blorp_fast_clear_op(ccs_op)); > break; > case ISL_AUX_OP_AMBIGUATE: > + for (uint32_t a = 0; a < layer_count; a++) { > + const uint32_t layer = base_layer + a; > + blorp_ccs_ambiguate(&batch, &surf, level, layer); > + } > + break; > default: > unreachable("Unsupported CCS operation"); > } > diff --git a/src/intel/vulkan/genX_cmd_buffer.c > b/src/intel/vulkan/genX_cmd_buffer.c > index 77fdadf..9e2eba3 100644 > --- a/src/intel/vulkan/genX_cmd_buffer.c > +++ b/src/intel/vulkan/genX_cmd_buffer.c > @@ -486,15 +486,6 @@ init_fast_clear_state_entry(struct anv_cmd_buffer > *cmd_buffer, > uint32_t plane = anv_image_aspect_to_plane(image->aspects, aspect); > enum isl_aux_usage aux_usage = image->planes[plane].aux_usage; > > - /* The resolve flag should updated to signify that fast-clear/compression > - * data needs to be removed when leaving the undefined layout. Such data > - * may need to be removed if it would cause accesses to the color buffer > - * to return incorrect data. The fast clear data in CCS_D buffers should > - * be removed because CCS_D isn't enabled all the time. > - */ > - genX(set_image_needs_resolve)(cmd_buffer, image, aspect, level, > - aux_usage == ISL_AUX_USAGE_NONE); > - > /* The fast clear value dword(s) will be copied into a surface state > object. > * Ensure that the restrictions of the fields in the dword(s) are > followed. > * > @@ -677,10 +668,9 @@ transition_color_buffer(struct anv_cmd_buffer > *cmd_buffer, > for (unsigned level = base_level; level < last_level_num; level++) > init_fast_clear_state_entry(cmd_buffer, image, aspect, level); > > - /* Initialize the aux buffers to enable correct rendering. This > operation > - * requires up to two steps: one to rid the aux buffer of data that may > - * cause GPU hangs, and another to ensure that writes done without aux > - * will be visible to reads done with aux. > + /* Initialize the aux buffers to enable correct rendering. In order to > + * ensure that things such as storage images work correctly, aux > buffers > + * are initialized to the pass-through state.
Only CCS is initialized to the pass-through state while MCS is fast-cleared. We may also want to update the comment below since we're no longer fast-clearing CCS. > * > * Having an aux buffer with invalid data is possible for CCS buffers > * SKL+ and for MCS buffers with certain sample counts (2x and 8x). One > @@ -692,16 +682,18 @@ transition_color_buffer(struct anv_cmd_buffer > *cmd_buffer, > * We don't have any data to show that this is a problem, but we want > to > * avoid causing difficult-to-debug problems. > */ > - if (GEN_GEN >= 9 && image->samples == 1) { > + if (image->samples == 1) { > for (uint32_t l = 0; l < level_count; l++) { > const uint32_t level = base_level + l; > const uint32_t level_layer_count = > MIN2(layer_count, anv_image_aux_layers(image, aspect, level)); > anv_image_ccs_op(cmd_buffer, image, aspect, level, > base_layer, level_layer_count, > - ISL_AUX_OP_FAST_CLEAR, false); > + ISL_AUX_OP_AMBIGUATE, false); > + genX(set_image_needs_resolve)(cmd_buffer, image, > + aspect, level, false); > } > - } else if (image->samples > 1) { > + } else { > if (image->samples == 4 || image->samples == 16) { > anv_perf_warn(cmd_buffer->device->instance, image, > "Doing a potentially unnecessary fast-clear to " > @@ -712,32 +704,10 @@ transition_color_buffer(struct anv_cmd_buffer > *cmd_buffer, > anv_image_mcs_op(cmd_buffer, image, aspect, > base_layer, layer_count, > ISL_AUX_OP_FAST_CLEAR, false); > - } > - /* At this point, some elements of the CCS buffer may have the > fast-clear > - * bit-arrangement. As the user writes to a subresource, we need to > have > - * the associated CCS elements enter the ambiguated state. This enables > - * reads (implicit or explicit) to reflect the user-written data > instead > - * of the clear color. The only time such elements will not change > their > - * state as described above, is in a final layout that doesn't have CCS > - * enabled. In this case, we must force the associated CCS buffers of > the > - * specified range to enter the ambiguated state in advance. > - */ > - if (image->samples == 1 && > - image->planes[plane].aux_usage != ISL_AUX_USAGE_CCS_E && > - final_layout != VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL) { > - /* The CCS_D buffer may not be enabled in the final layout. Call > this > - * function again with a initial layout of COLOR_ATTACHMENT_OPTIMAL > - * to perform a resolve. > - */ > - anv_perf_warn(cmd_buffer->device->instance, image, > - "Performing an additional resolve for CCS_D layout " > - "transition. Consider always leaving it on or " > - "performing an ambiguation pass."); > - transition_color_buffer(cmd_buffer, image, aspect, > - base_level, level_count, > - base_layer, layer_count, > - VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL, > - final_layout); Nice to be rid of this. > + for (unsigned level = base_level; level < last_level_num; level++) { > + genX(set_image_needs_resolve)(cmd_buffer, image, > + aspect, level, true); > + } Why set needs resolve true here? -Nanley > } > return; > } > -- > 2.5.0.400.gff86faf > > _______________________________________________ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev