On Mon, May 29, 2017 at 12:09:01PM -0700, Jason Ekstrand wrote: > Sandy Bridge does not technically support mipmapped depth/stencil. In > order to work around this, we allocate what are effectively completely > separate images for each miplevel, ensure that they are page-aligned, > and manually offset to them. Prior to layered rendering, this was a > simple matter of setting a large enough halign/valign. > > With the advent of layered rendering, however, things got more > complicated. Now, things weren't as simple as just handing a surface > off to the hardware. Any miplevel of a normally mipmapped surface can > be considered as just an array surface given the right qpitch. However, > the hardware gives us no capability to specify qpitch so this won't > work. Instead, the chosen solution was to use a new "all slices at each > LOD" layout which laid things out as a mipmap of arrays rather than an > array of mipmaps. This way you can easily offset to any of the > miplevels and each is a valid array. > > Unfortunately, the "all slices at each lod" concept missed one > fundamental thing about SNB HiZ and stencil hardware: It doesn't just > always act as if you're always working with a non-mipmapped surface, it > acts as if you're always working on a non-mipmapped surface of the same > size as LOD0. In other words, even though it may only write the > upper-left corner of each array slice, the qpitch for the array is for a > surface the size of LOD0 of the depth surface. This mistake causes us > to under-allocate HiZ and stencil in some cases and also to accidentally > allow different miplevels to overlap. Sadly, piglit test coverage > didn't quite catch this until I started making changes to the resolve > code that caused additional HiZ resolves in certain tests. > > This commit switches Sandy Bridge HiZ and stencil over to a new scheme > that lays out the non-zero miplevels horizontally below LOD0. This way > they can all have the same qpitch without interfering with each other. > Technically, the miplevels still overlap, but things are spaced out > enough that each page is only in the "written area" of one LOD. Hopefully, > this will get rid of at least some of the random SNB hangs. > > Cc: "17.0 17.1" <mesa-sta...@lists.freedesktop.org> > Cc: Topi Pohjolainen <topi.pohjolai...@intel.com> > Cc: Nanley Chery <nanley.g.ch...@intel.com> > Cc: Jordan Justen <jordan.l.jus...@intel.com> > Cc: Kenneth Graunke <kenn...@whitecape.org>
There is a few nits, but looks good. Thanks a lot for figuring out how it really works: Reviewed-by: Topi Pohjolainen <topi.pohjolai...@intel.com> > > --- > The series I sent out on Friday suffered from a GPU hang or two on Sandy > Bridge. It turns out that those hangs were caused by the hardware HiZ > resolving part of my batch buffer due to this under-allocation. > > Topi, I'm sorry but this will likely make hash of your earlier patch > series. Sadly, I don't think there's really anything else we can do. :-( > Also, given how tricky this is to get right, I concede that we may want to > add an ISL_DIM_LAYOUT_GEN6_HIZ_STENCIL layout to ISL. We could still do it > with the "array of surfaces" approach but I think these sorts of > calculations are best done in with the other surface calculation code. I > can help draft it if you'd like. > > src/mesa/drivers/dri/i965/brw_blorp.c | 11 ++- > src/mesa/drivers/dri/i965/brw_tex_layout.c | 100 > ++++++++++++++++++++++---- > src/mesa/drivers/dri/i965/gen6_depth_state.c | 4 +- > src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 11 +-- > src/mesa/drivers/dri/i965/intel_mipmap_tree.h | 37 +++++++++- > 5 files changed, 134 insertions(+), 29 deletions(-) > > diff --git a/src/mesa/drivers/dri/i965/brw_blorp.c > b/src/mesa/drivers/dri/i965/brw_blorp.c > index 6e860f0..cb9933b 100644 > --- a/src/mesa/drivers/dri/i965/brw_blorp.c > +++ b/src/mesa/drivers/dri/i965/brw_blorp.c > @@ -123,7 +123,7 @@ apply_gen6_stencil_hiz_offset(struct isl_surf *surf, > uint32_t lod, > uint32_t *offset) > { > - assert(mt->array_layout == ALL_SLICES_AT_EACH_LOD); > + assert(mt->array_layout == GEN6_HIZ_STENCIL); > > if (mt->format == MESA_FORMAT_S_UINT8) { > /* Note: we can't compute the stencil offset using > @@ -183,12 +183,12 @@ blorp_surf_for_miptree(struct brw_context *brw, > }; > > if (brw->gen == 6 && mt->format == MESA_FORMAT_S_UINT8 && > - mt->array_layout == ALL_SLICES_AT_EACH_LOD) { > - /* Sandy bridge stencil and HiZ use this ALL_SLICES_AT_EACH_LOD hack in > + mt->array_layout == GEN6_HIZ_STENCIL) { > + /* Sandy bridge stencil and HiZ use this GEN6_HIZ_STENCIL hack in > * order to allow for layered rendering. The hack makes each LOD of > the > * stencil or HiZ buffer a single tightly packed array surface at some > * offset into the surface. Since ISL doesn't know how to deal with > the > - * crazy ALL_SLICES_AT_EACH_LOD layout and since we have to do a manual > + * crazy GEN6_HIZ_STENCIL layout and since we have to do a manual > * offset of it anyway, we might as well do the offset here and keep > the > * hacks inside the i965 driver. > * > @@ -241,8 +241,7 @@ blorp_surf_for_miptree(struct brw_context *brw, > > struct intel_mipmap_tree *hiz_mt = mt->hiz_buf->mt; > if (hiz_mt) { > - assert(brw->gen == 6 && > - hiz_mt->array_layout == ALL_SLICES_AT_EACH_LOD); > + assert(brw->gen == 6 && hiz_mt->array_layout == > GEN6_HIZ_STENCIL); > > /* gen6 requires the HiZ buffer to be manually offset to the > * right location. We could fixup the surf but it doesn't > diff --git a/src/mesa/drivers/dri/i965/brw_tex_layout.c > b/src/mesa/drivers/dri/i965/brw_tex_layout.c > index bfa8afa..30e6233 100644 > --- a/src/mesa/drivers/dri/i965/brw_tex_layout.c > +++ b/src/mesa/drivers/dri/i965/brw_tex_layout.c > @@ -216,6 +216,8 @@ brw_miptree_layout_2d(struct intel_mipmap_tree *mt) > mt->total_height = MAX2(mt->total_height, y + img_height); > > /* Layout_below: step right after second mipmap. > + * > + * For Sandy Bridge HiZ and stencil, we always step down. > */ > if (level == mt->first_level + 1) { > x += ALIGN_NPOT(width, mt->halign) / bw; > @@ -231,6 +233,67 @@ brw_miptree_layout_2d(struct intel_mipmap_tree *mt) > } > } > > +static void > +brw_miptree_layout_gen6_hiz_stencil(struct intel_mipmap_tree *mt) > +{ > + unsigned x = 0; > + unsigned y = 0; > + unsigned width = mt->physical_width0; > + unsigned height = mt->physical_height0; > + /* Number of layers of array texture. */ > + unsigned depth = mt->physical_depth0; We could mark height and depth const, only width is variant. I was also thinking if we want to comment anything about needing to use also level zero depth for all subsequent levels in case of 3D textures (instead of minifying depth). > + unsigned tile_width, tile_height, bw, bh; > + > + if (mt->format == MESA_FORMAT_S_UINT8) { > + bw = bh = 1; > + /* W-tiled */ > + tile_width = 64; > + tile_height = 64; > + } else { > + assert(_mesa_get_format_base_format(mt->format) == GL_DEPTH_COMPONENT > || > + _mesa_get_format_base_format(mt->format) == GL_DEPTH_STENCIL); > + /* Each 128-bit HiZ block corresponds to a region of of 8x4 depth > + * samples. Each cache line in the Y-Tiled HiZ image contains 2x2 HiZ > + * blocks. Therefore, each Y-tiled cache line corresponds to an 16x8 > + * region in the depth surface. Since we're representing it as > + * RGBA_FLOAT32, the miptree calculations will think that each cache > + * line is 1x4 pixels. Therefore, we need a scale-down factor of 16x2 > + * and a vertical alignment of 2. > + */ > + mt->cpp = 16; > + bw = 16; > + bh = 2; > + /* Y-tiled */ > + tile_width = 128 / mt->cpp; > + tile_height = 32; > + } > + > + mt->total_width = 0; > + mt->total_height = 0; We could pull height here as it is invariant in the loop: const unsigned img_height = ALIGN(DIV_ROUND_UP(height, bh), mt->valign) * depth; > + > + for (unsigned level = mt->first_level; level <= mt->last_level; level++) { > + intel_miptree_set_level_info(mt, level, x, y, depth); > + > + const unsigned img_width = ALIGN(DIV_ROUND_UP(width, bw), mt->halign); > + > + mt->total_width = MAX2(mt->total_width, x + img_width); > + mt->total_height = MAX2(mt->total_height, y + img_height); > + > + if (level == mt->first_level) { > + y += ALIGN(img_height, tile_height); > + } else { > + x += ALIGN(img_width, tile_width); > + } > + > + /* We only minify the width. We want qpitch to match for all miplevels > + * because the hardware doesn't know we aren't on LOD0. > + */ > + width = minify(width, 1); > + } > +} > + > unsigned > brw_miptree_get_horizontal_slice_pitch(const struct brw_context *brw, > const struct intel_mipmap_tree *mt, > @@ -249,6 +312,8 @@ brw_miptree_get_vertical_slice_pitch(const struct > brw_context *brw, > const struct intel_mipmap_tree *mt, > unsigned level) > { > + assert(mt->array_layout != GEN6_HIZ_STENCIL || brw->gen == 6); > + > if (brw->gen >= 9) { > /* ALL_SLICES_AT_EACH_LOD isn't supported on Gen8+ but this code will > * effectively end up with a packed qpitch anyway whenever > @@ -281,6 +346,15 @@ brw_miptree_get_vertical_slice_pitch(const struct > brw_context *brw, > mt->array_layout == ALL_SLICES_AT_EACH_LOD) { > return ALIGN_NPOT(minify(mt->physical_height0, level), mt->valign); > > + } else if (mt->array_layout == GEN6_HIZ_STENCIL) { > + /* For HiZ and stencil on Sandy Bridge, we don't minify the height. */ > + if (mt->format == MESA_FORMAT_S_UINT8) { > + return ALIGN(mt->physical_height0, mt->valign); > + } else { > + /* HiZ has a vertical scale factor of 2. */ > + return ALIGN(DIV_ROUND_UP(mt->physical_height0, 2), mt->halign); > + } > + > } else { > const unsigned h0 = ALIGN_NPOT(mt->physical_height0, mt->valign); > const unsigned h1 = ALIGN_NPOT(minify(mt->physical_height0, 1), > mt->valign); > @@ -333,6 +407,8 @@ brw_miptree_layout_texture_array(struct brw_context *brw, > > if (layout_1d) > gen9_miptree_layout_1d(mt); > + else if (mt->array_layout == GEN6_HIZ_STENCIL) > + brw_miptree_layout_gen6_hiz_stencil(mt); > else > brw_miptree_layout_2d(mt); > > @@ -556,6 +632,8 @@ intel_miptree_set_total_width_height(struct brw_context > *brw, > case INTEL_MSAA_LAYOUT_IMS: > if (gen9_use_linear_1d_layout(brw, mt)) > gen9_miptree_layout_1d(mt); > + else if (mt->array_layout == GEN6_HIZ_STENCIL) > + brw_miptree_layout_gen6_hiz_stencil(mt); > else > brw_miptree_layout_2d(mt); > break; > @@ -579,15 +657,9 @@ intel_miptree_set_alignment(struct brw_context *brw, > * - Ironlake and Sandybridge PRMs: Volume 1, Part 1, Section 7.18.3.4 > * - BSpec (for Ivybridge and slight variations in separate stencil) > */ > - bool gen6_hiz_or_stencil = false; > > - if (brw->gen == 6 && mt->array_layout == ALL_SLICES_AT_EACH_LOD) { > - const GLenum base_format = _mesa_get_format_base_format(mt->format); > - gen6_hiz_or_stencil = _mesa_is_depth_or_stencil_format(base_format); > - } > - > - if (gen6_hiz_or_stencil) { > - /* On gen6, we use ALL_SLICES_AT_EACH_LOD for stencil/hiz because the > + if (mt->array_layout == GEN6_HIZ_STENCIL) { > + /* On gen6, we use GEN6_HIZ_STENCIL for stencil/hiz because the > * hardware doesn't support multiple mip levels on stencil/hiz. > * > * PRM Vol 2, Part 1, 7.5.3 Hierarchical Depth Buffer: > @@ -600,15 +672,13 @@ intel_miptree_set_alignment(struct brw_context *brw, > /* Stencil uses W tiling, so we force W tiling alignment for the > * ALL_SLICES_AT_EACH_LOD miptree layout. > */ > - mt->halign = 64; > - mt->valign = 64; > + mt->halign = 4; > + mt->valign = 2; > assert((layout_flags & MIPTREE_LAYOUT_FORCE_HALIGN16) == 0); > } else { > - /* Depth uses Y tiling, so we force need Y tiling alignment for the > - * ALL_SLICES_AT_EACH_LOD miptree layout. > - */ > - mt->halign = 128 / mt->cpp; > - mt->valign = 32; > + /* See intel_hiz_miptree_buf_create() */ There isnt't really anything about halign/valign in intel_hiz_miptree_buf_create()? Should we refer to brw_miptree_layout_gen6_hiz_stencil() instead? > + mt->halign = 1; > + mt->valign = 2; > } > } else if (mt->compressed) { > /* The hardware alignment requirements for compressed textures > diff --git a/src/mesa/drivers/dri/i965/gen6_depth_state.c > b/src/mesa/drivers/dri/i965/gen6_depth_state.c > index ae4f681..20992d5 100644 > --- a/src/mesa/drivers/dri/i965/gen6_depth_state.c > +++ b/src/mesa/drivers/dri/i965/gen6_depth_state.c > @@ -164,7 +164,7 @@ gen6_emit_depth_stencil_hiz(struct brw_context *brw, > struct intel_mipmap_tree *hiz_mt = depth_mt->hiz_buf->mt; > uint32_t offset = 0; > > - if (hiz_mt->array_layout == ALL_SLICES_AT_EACH_LOD) { > + if (hiz_mt->array_layout == GEN6_HIZ_STENCIL) { > offset = intel_miptree_get_aligned_offset( > hiz_mt, > hiz_mt->level[lod].level_x, > @@ -190,7 +190,7 @@ gen6_emit_depth_stencil_hiz(struct brw_context *brw, > if (separate_stencil) { > uint32_t offset = 0; > > - if (stencil_mt->array_layout == ALL_SLICES_AT_EACH_LOD) { > + if (stencil_mt->array_layout == GEN6_HIZ_STENCIL) { > assert(stencil_mt->format == MESA_FORMAT_S_UINT8); > > /* Note: we can't compute the stencil offset using > diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c > b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c > index e334951..9f9e68a 100644 > --- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c > +++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c > @@ -452,7 +452,7 @@ intel_miptree_create_layout(struct brw_context *brw, > intel_miptree_wants_hiz_buffer(brw, mt)))) { > uint32_t stencil_flags = MIPTREE_LAYOUT_ACCELERATED_UPLOAD; > if (brw->gen == 6) { > - stencil_flags |= MIPTREE_LAYOUT_FORCE_ALL_SLICE_AT_LOD | > + stencil_flags |= MIPTREE_LAYOUT_GEN6_HIZ_STENCIL | > MIPTREE_LAYOUT_TILING_ANY; > } > > @@ -485,8 +485,8 @@ intel_miptree_create_layout(struct brw_context *brw, > } > } > > - if (layout_flags & MIPTREE_LAYOUT_FORCE_ALL_SLICE_AT_LOD) > - mt->array_layout = ALL_SLICES_AT_EACH_LOD; > + if (layout_flags & MIPTREE_LAYOUT_GEN6_HIZ_STENCIL) > + mt->array_layout = GEN6_HIZ_STENCIL; > > /* > * Obey HALIGN_16 constraints for Gen8 and Gen9 buffers which are > @@ -1830,7 +1830,7 @@ intel_hiz_miptree_buf_create(struct brw_context *brw, > uint32_t layout_flags = MIPTREE_LAYOUT_ACCELERATED_UPLOAD; > > if (brw->gen == 6) > - layout_flags |= MIPTREE_LAYOUT_FORCE_ALL_SLICE_AT_LOD; > + layout_flags |= MIPTREE_LAYOUT_GEN6_HIZ_STENCIL; > > if (!buf) > return NULL; > @@ -2770,7 +2770,7 @@ intel_update_r8stencil(struct brw_context *brw, > const uint32_t r8stencil_flags = > MIPTREE_LAYOUT_ACCELERATED_UPLOAD | MIPTREE_LAYOUT_TILING_Y | > MIPTREE_LAYOUT_DISABLE_AUX; > - assert(brw->gen > 6); /* Handle MIPTREE_LAYOUT_FORCE_ALL_SLICE_AT_LOD > */ > + assert(brw->gen > 6); /* Handle MIPTREE_LAYOUT_GEN6_HIZ_STENCIL */ > mt->r8stencil_mt = intel_miptree_create(brw, > src->target, > MESA_FORMAT_R_UINT8, > @@ -3670,6 +3670,7 @@ intel_miptree_get_isl_surf(struct brw_context *brw, > surf->array_pitch_span = ISL_ARRAY_PITCH_SPAN_FULL; > break; > case ALL_SLICES_AT_EACH_LOD: > + case GEN6_HIZ_STENCIL: > surf->array_pitch_span = ISL_ARRAY_PITCH_SPAN_COMPACT; > break; > default: > diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.h > b/src/mesa/drivers/dri/i965/intel_mipmap_tree.h > index aa52f48..e2ec26f 100644 > --- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.h > +++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.h > @@ -250,6 +250,41 @@ enum miptree_array_layout { > * +---+ > */ > ALL_SLICES_AT_EACH_LOD, > + > + /* On Sandy Bridge, HiZ and stencil buffers work the same as on Ivy Bridge > + * except that they don't technically support mipmapping. That does not, > + * however, stop us from doing it. As far as Sandy Bridge hardware is > + * concerned, HiZ and stencil always operates on a single miplevel 2D > + * (possibly array) image. The dimensions of that image are NOT minified. > + * > + * In order to implement HiZ and stencil on Sandy Bridge, we create one > + * full-sized 2D (possibly array) image for every LOD with every image > + * aligned to a page boundary. In order to save memory, we pretend that > + * the width of each miplevel is minified and we place LOD1 and above > below > + * LOD0 but horizontally adjacent to each other. When considered as > + * full-sized images, LOD1 and above technically overlap. However, since > + * we only write to part of that image, the hardware will never notice the > + * overlap. > + * > + * This layout looks something like this: > + * > + * +---------+ > + * | | > + * | | > + * +---------+ > + * | | > + * | | > + * +---------+ > + * > + * +----+ +-+ . > + * | | +-+ > + * +----+ > + * > + * +----+ +-+ . > + * | | +-+ > + * +----+ If we want to reflect all levels having the same heigth, maybe: +---------+ | | | | +---------+ | | | | +---------+ +----+ +-+ . | | | | | | | | +----+ +-+ +----+ +-+ . | | | | | | | | +----+ +-+ > + */ > + GEN6_HIZ_STENCIL, > }; > > enum intel_aux_disable { > @@ -637,7 +672,7 @@ intel_miptree_alloc_non_msrt_mcs(struct brw_context *brw, > > enum { > MIPTREE_LAYOUT_ACCELERATED_UPLOAD = 1 << 0, > - MIPTREE_LAYOUT_FORCE_ALL_SLICE_AT_LOD = 1 << 1, > + MIPTREE_LAYOUT_GEN6_HIZ_STENCIL = 1 << 1, > MIPTREE_LAYOUT_FOR_BO = 1 << 2, > MIPTREE_LAYOUT_DISABLE_AUX = 1 << 3, > MIPTREE_LAYOUT_FORCE_HALIGN16 = 1 << 4, > -- > 2.5.0.400.gff86faf > > _______________________________________________ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev