On Mon, May 29, 2017 at 11:51 PM, Pohjolainen, Topi < topi.pohjolai...@gmail.com> wrote:
> On Mon, May 29, 2017 at 12:09:01PM -0700, Jason Ekstrand wrote: > > Sandy Bridge does not technically support mipmapped depth/stencil. In > > order to work around this, we allocate what are effectively completely > > separate images for each miplevel, ensure that they are page-aligned, > > and manually offset to them. Prior to layered rendering, this was a > > simple matter of setting a large enough halign/valign. > > > > With the advent of layered rendering, however, things got more > > complicated. Now, things weren't as simple as just handing a surface > > off to the hardware. Any miplevel of a normally mipmapped surface can > > be considered as just an array surface given the right qpitch. However, > > the hardware gives us no capability to specify qpitch so this won't > > work. Instead, the chosen solution was to use a new "all slices at each > > LOD" layout which laid things out as a mipmap of arrays rather than an > > array of mipmaps. This way you can easily offset to any of the > > miplevels and each is a valid array. > > > > Unfortunately, the "all slices at each lod" concept missed one > > fundamental thing about SNB HiZ and stencil hardware: It doesn't just > > always act as if you're always working with a non-mipmapped surface, it > > acts as if you're always working on a non-mipmapped surface of the same > > size as LOD0. In other words, even though it may only write the > > upper-left corner of each array slice, the qpitch for the array is for a > > surface the size of LOD0 of the depth surface. This mistake causes us > > to under-allocate HiZ and stencil in some cases and also to accidentally > > allow different miplevels to overlap. Sadly, piglit test coverage > > didn't quite catch this until I started making changes to the resolve > > code that caused additional HiZ resolves in certain tests. > > > > This commit switches Sandy Bridge HiZ and stencil over to a new scheme > > that lays out the non-zero miplevels horizontally below LOD0. This way > > they can all have the same qpitch without interfering with each other. > > Technically, the miplevels still overlap, but things are spaced out > > enough that each page is only in the "written area" of one LOD. > Hopefully, > > this will get rid of at least some of the random SNB hangs. > > > > Cc: "17.0 17.1" <mesa-sta...@lists.freedesktop.org> > > Cc: Topi Pohjolainen <topi.pohjolai...@intel.com> > > Cc: Nanley Chery <nanley.g.ch...@intel.com> > > Cc: Jordan Justen <jordan.l.jus...@intel.com> > > Cc: Kenneth Graunke <kenn...@whitecape.org> > > There is a few nits, but looks good. Thanks a lot for figuring out how it > really works: > > Reviewed-by: Topi Pohjolainen <topi.pohjolai...@intel.com> > > > > > --- > > The series I sent out on Friday suffered from a GPU hang or two on Sandy > > Bridge. It turns out that those hangs were caused by the hardware HiZ > > resolving part of my batch buffer due to this under-allocation. > > > > Topi, I'm sorry but this will likely make hash of your earlier patch > > series. Sadly, I don't think there's really anything else we can do. :-( > > Also, given how tricky this is to get right, I concede that we may want > to > > add an ISL_DIM_LAYOUT_GEN6_HIZ_STENCIL layout to ISL. We could still > do it > > with the "array of surfaces" approach but I think these sorts of > > calculations are best done in with the other surface calculation code. I > > can help draft it if you'd like. > > > > src/mesa/drivers/dri/i965/brw_blorp.c | 11 ++- > > src/mesa/drivers/dri/i965/brw_tex_layout.c | 100 > ++++++++++++++++++++++---- > > src/mesa/drivers/dri/i965/gen6_depth_state.c | 4 +- > > src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 11 +-- > > src/mesa/drivers/dri/i965/intel_mipmap_tree.h | 37 +++++++++- > > 5 files changed, 134 insertions(+), 29 deletions(-) > > > > diff --git a/src/mesa/drivers/dri/i965/brw_blorp.c > b/src/mesa/drivers/dri/i965/brw_blorp.c > > index 6e860f0..cb9933b 100644 > > --- a/src/mesa/drivers/dri/i965/brw_blorp.c > > +++ b/src/mesa/drivers/dri/i965/brw_blorp.c > > @@ -123,7 +123,7 @@ apply_gen6_stencil_hiz_offset(struct isl_surf *surf, > > uint32_t lod, > > uint32_t *offset) > > { > > - assert(mt->array_layout == ALL_SLICES_AT_EACH_LOD); > > + assert(mt->array_layout == GEN6_HIZ_STENCIL); > > > > if (mt->format == MESA_FORMAT_S_UINT8) { > > /* Note: we can't compute the stencil offset using > > @@ -183,12 +183,12 @@ blorp_surf_for_miptree(struct brw_context *brw, > > }; > > > > if (brw->gen == 6 && mt->format == MESA_FORMAT_S_UINT8 && > > - mt->array_layout == ALL_SLICES_AT_EACH_LOD) { > > - /* Sandy bridge stencil and HiZ use this ALL_SLICES_AT_EACH_LOD > hack in > > + mt->array_layout == GEN6_HIZ_STENCIL) { > > + /* Sandy bridge stencil and HiZ use this GEN6_HIZ_STENCIL hack in > > * order to allow for layered rendering. The hack makes each LOD > of the > > * stencil or HiZ buffer a single tightly packed array surface at > some > > * offset into the surface. Since ISL doesn't know how to deal > with the > > - * crazy ALL_SLICES_AT_EACH_LOD layout and since we have to do a > manual > > + * crazy GEN6_HIZ_STENCIL layout and since we have to do a manual > > * offset of it anyway, we might as well do the offset here and > keep the > > * hacks inside the i965 driver. > > * > > @@ -241,8 +241,7 @@ blorp_surf_for_miptree(struct brw_context *brw, > > > > struct intel_mipmap_tree *hiz_mt = mt->hiz_buf->mt; > > if (hiz_mt) { > > - assert(brw->gen == 6 && > > - hiz_mt->array_layout == ALL_SLICES_AT_EACH_LOD); > > + assert(brw->gen == 6 && hiz_mt->array_layout == > GEN6_HIZ_STENCIL); > > > > /* gen6 requires the HiZ buffer to be manually offset to the > > * right location. We could fixup the surf but it doesn't > > diff --git a/src/mesa/drivers/dri/i965/brw_tex_layout.c > b/src/mesa/drivers/dri/i965/brw_tex_layout.c > > index bfa8afa..30e6233 100644 > > --- a/src/mesa/drivers/dri/i965/brw_tex_layout.c > > +++ b/src/mesa/drivers/dri/i965/brw_tex_layout.c > > @@ -216,6 +216,8 @@ brw_miptree_layout_2d(struct intel_mipmap_tree *mt) > > mt->total_height = MAX2(mt->total_height, y + img_height); > > > > /* Layout_below: step right after second mipmap. > > + * > > + * For Sandy Bridge HiZ and stencil, we always step down. > > */ > > if (level == mt->first_level + 1) { > > x += ALIGN_NPOT(width, mt->halign) / bw; > > @@ -231,6 +233,67 @@ brw_miptree_layout_2d(struct intel_mipmap_tree *mt) > > } > > } > > > > +static void > > +brw_miptree_layout_gen6_hiz_stencil(struct intel_mipmap_tree *mt) > > +{ > > + unsigned x = 0; > > + unsigned y = 0; > > + unsigned width = mt->physical_width0; > > + unsigned height = mt->physical_height0; > > + /* Number of layers of array texture. */ > > + unsigned depth = mt->physical_depth0; > > We could mark height and depth const, only width is variant. I was also > thinking if we want to comment anything about needing to use also level > zero > depth for all subsequent levels in case of 3D textures (instead of > minifying > depth). > > > + unsigned tile_width, tile_height, bw, bh; > > + > > + if (mt->format == MESA_FORMAT_S_UINT8) { > > + bw = bh = 1; > > + /* W-tiled */ > > + tile_width = 64; > > + tile_height = 64; > > + } else { > > + assert(_mesa_get_format_base_format(mt->format) == > GL_DEPTH_COMPONENT || > > + _mesa_get_format_base_format(mt->format) == > GL_DEPTH_STENCIL); > > + /* Each 128-bit HiZ block corresponds to a region of of 8x4 depth > > + * samples. Each cache line in the Y-Tiled HiZ image contains > 2x2 HiZ > > + * blocks. Therefore, each Y-tiled cache line corresponds to an > 16x8 > > + * region in the depth surface. Since we're representing it as > > + * RGBA_FLOAT32, the miptree calculations will think that each > cache > > + * line is 1x4 pixels. Therefore, we need a scale-down factor of > 16x2 > > + * and a vertical alignment of 2. > > + */ > > + mt->cpp = 16; > > + bw = 16; > > + bh = 2; > > + /* Y-tiled */ > > + tile_width = 128 / mt->cpp; > > + tile_height = 32; > > + } > > + > > + mt->total_width = 0; > > + mt->total_height = 0; > > We could pull height here as it is invariant in the loop: > > const unsigned img_height = > ALIGN(DIV_ROUND_UP(height, bh), mt->valign) * depth; > Sure. I considered doing that. In the end I thought it was nice to keep it next to img_width. The compiler will pull it out of the loop for us anyway. If you want it moved, I don't mind. > > + > > + for (unsigned level = mt->first_level; level <= mt->last_level; > level++) { > > + intel_miptree_set_level_info(mt, level, x, y, depth); > > + > > + const unsigned img_width = ALIGN(DIV_ROUND_UP(width, bw), > mt->halign); > > + > > + mt->total_width = MAX2(mt->total_width, x + img_width); > > + mt->total_height = MAX2(mt->total_height, y + img_height); > > + > > + if (level == mt->first_level) { > > + y += ALIGN(img_height, tile_height); > > + } else { > > + x += ALIGN(img_width, tile_width); > > + } > > + > > + /* We only minify the width. We want qpitch to match for all > miplevels > > + * because the hardware doesn't know we aren't on LOD0. > > + */ > > + width = minify(width, 1); > > + } > > +} > > + > > unsigned > > brw_miptree_get_horizontal_slice_pitch(const struct brw_context *brw, > > const struct intel_mipmap_tree > *mt, > > @@ -249,6 +312,8 @@ brw_miptree_get_vertical_slice_pitch(const struct > brw_context *brw, > > const struct intel_mipmap_tree *mt, > > unsigned level) > > { > > + assert(mt->array_layout != GEN6_HIZ_STENCIL || brw->gen == 6); > > + > > if (brw->gen >= 9) { > > /* ALL_SLICES_AT_EACH_LOD isn't supported on Gen8+ but this code > will > > * effectively end up with a packed qpitch anyway whenever > > @@ -281,6 +346,15 @@ brw_miptree_get_vertical_slice_pitch(const struct > brw_context *brw, > > mt->array_layout == ALL_SLICES_AT_EACH_LOD) { > > return ALIGN_NPOT(minify(mt->physical_height0, level), > mt->valign); > > > > + } else if (mt->array_layout == GEN6_HIZ_STENCIL) { > > + /* For HiZ and stencil on Sandy Bridge, we don't minify the > height. */ > > + if (mt->format == MESA_FORMAT_S_UINT8) { > > + return ALIGN(mt->physical_height0, mt->valign); > > + } else { > > + /* HiZ has a vertical scale factor of 2. */ > > + return ALIGN(DIV_ROUND_UP(mt->physical_height0, 2), > mt->halign); > > + } > > + > > } else { > > const unsigned h0 = ALIGN_NPOT(mt->physical_height0, mt->valign); > > const unsigned h1 = ALIGN_NPOT(minify(mt->physical_height0, 1), > mt->valign); > > @@ -333,6 +407,8 @@ brw_miptree_layout_texture_array(struct brw_context > *brw, > > > > if (layout_1d) > > gen9_miptree_layout_1d(mt); > > + else if (mt->array_layout == GEN6_HIZ_STENCIL) > > + brw_miptree_layout_gen6_hiz_stencil(mt); > > else > > brw_miptree_layout_2d(mt); > > > > @@ -556,6 +632,8 @@ intel_miptree_set_total_width_height(struct > brw_context *brw, > > case INTEL_MSAA_LAYOUT_IMS: > > if (gen9_use_linear_1d_layout(brw, mt)) > > gen9_miptree_layout_1d(mt); > > + else if (mt->array_layout == GEN6_HIZ_STENCIL) > > + brw_miptree_layout_gen6_hiz_stencil(mt); > > else > > brw_miptree_layout_2d(mt); > > break; > > @@ -579,15 +657,9 @@ intel_miptree_set_alignment(struct brw_context > *brw, > > * - Ironlake and Sandybridge PRMs: Volume 1, Part 1, Section > 7.18.3.4 > > * - BSpec (for Ivybridge and slight variations in separate stencil) > > */ > > - bool gen6_hiz_or_stencil = false; > > > > - if (brw->gen == 6 && mt->array_layout == ALL_SLICES_AT_EACH_LOD) { > > - const GLenum base_format = _mesa_get_format_base_format( > mt->format); > > - gen6_hiz_or_stencil = _mesa_is_depth_or_stencil_ > format(base_format); > > - } > > - > > - if (gen6_hiz_or_stencil) { > > - /* On gen6, we use ALL_SLICES_AT_EACH_LOD for stencil/hiz because > the > > + if (mt->array_layout == GEN6_HIZ_STENCIL) { > > + /* On gen6, we use GEN6_HIZ_STENCIL for stencil/hiz because the > > * hardware doesn't support multiple mip levels on stencil/hiz. > > * > > * PRM Vol 2, Part 1, 7.5.3 Hierarchical Depth Buffer: > > @@ -600,15 +672,13 @@ intel_miptree_set_alignment(struct brw_context > *brw, > > /* Stencil uses W tiling, so we force W tiling alignment for > the > > * ALL_SLICES_AT_EACH_LOD miptree layout. > > */ > > - mt->halign = 64; > > - mt->valign = 64; > > + mt->halign = 4; > > + mt->valign = 2; > > assert((layout_flags & MIPTREE_LAYOUT_FORCE_HALIGN16) == 0); > > } else { > > - /* Depth uses Y tiling, so we force need Y tiling alignment > for the > > - * ALL_SLICES_AT_EACH_LOD miptree layout. > > - */ > > - mt->halign = 128 / mt->cpp; > > - mt->valign = 32; > > + /* See intel_hiz_miptree_buf_create() */ > > There isnt't really anything about halign/valign in > intel_hiz_miptree_buf_create()? Should we refer to > brw_miptree_layout_gen6_hiz_stencil() instead? > Yes, that is a stale comment. I'll update it. > > + mt->halign = 1; > > + mt->valign = 2; > > } > > } else if (mt->compressed) { > > /* The hardware alignment requirements for compressed textures > > diff --git a/src/mesa/drivers/dri/i965/gen6_depth_state.c > b/src/mesa/drivers/dri/i965/gen6_depth_state.c > > index ae4f681..20992d5 100644 > > --- a/src/mesa/drivers/dri/i965/gen6_depth_state.c > > +++ b/src/mesa/drivers/dri/i965/gen6_depth_state.c > > @@ -164,7 +164,7 @@ gen6_emit_depth_stencil_hiz(struct brw_context *brw, > > struct intel_mipmap_tree *hiz_mt = depth_mt->hiz_buf->mt; > > uint32_t offset = 0; > > > > - if (hiz_mt->array_layout == ALL_SLICES_AT_EACH_LOD) { > > + if (hiz_mt->array_layout == GEN6_HIZ_STENCIL) { > > offset = intel_miptree_get_aligned_offset( > > hiz_mt, > > hiz_mt->level[lod].level_x, > > @@ -190,7 +190,7 @@ gen6_emit_depth_stencil_hiz(struct brw_context *brw, > > if (separate_stencil) { > > uint32_t offset = 0; > > > > - if (stencil_mt->array_layout == ALL_SLICES_AT_EACH_LOD) { > > + if (stencil_mt->array_layout == GEN6_HIZ_STENCIL) { > > assert(stencil_mt->format == MESA_FORMAT_S_UINT8); > > > > /* Note: we can't compute the stencil offset using > > diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c > b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c > > index e334951..9f9e68a 100644 > > --- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c > > +++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c > > @@ -452,7 +452,7 @@ intel_miptree_create_layout(struct brw_context *brw, > > intel_miptree_wants_hiz_buffer(brw, mt)))) { > > uint32_t stencil_flags = MIPTREE_LAYOUT_ACCELERATED_UPLOAD; > > if (brw->gen == 6) { > > - stencil_flags |= MIPTREE_LAYOUT_FORCE_ALL_SLICE_AT_LOD | > > + stencil_flags |= MIPTREE_LAYOUT_GEN6_HIZ_STENCIL | > > MIPTREE_LAYOUT_TILING_ANY; > > } > > > > @@ -485,8 +485,8 @@ intel_miptree_create_layout(struct brw_context *brw, > > } > > } > > > > - if (layout_flags & MIPTREE_LAYOUT_FORCE_ALL_SLICE_AT_LOD) > > - mt->array_layout = ALL_SLICES_AT_EACH_LOD; > > + if (layout_flags & MIPTREE_LAYOUT_GEN6_HIZ_STENCIL) > > + mt->array_layout = GEN6_HIZ_STENCIL; > > > > /* > > * Obey HALIGN_16 constraints for Gen8 and Gen9 buffers which are > > @@ -1830,7 +1830,7 @@ intel_hiz_miptree_buf_create(struct brw_context > *brw, > > uint32_t layout_flags = MIPTREE_LAYOUT_ACCELERATED_UPLOAD; > > > > if (brw->gen == 6) > > - layout_flags |= MIPTREE_LAYOUT_FORCE_ALL_SLICE_AT_LOD; > > + layout_flags |= MIPTREE_LAYOUT_GEN6_HIZ_STENCIL; > > > > if (!buf) > > return NULL; > > @@ -2770,7 +2770,7 @@ intel_update_r8stencil(struct brw_context *brw, > > const uint32_t r8stencil_flags = > > MIPTREE_LAYOUT_ACCELERATED_UPLOAD | MIPTREE_LAYOUT_TILING_Y | > > MIPTREE_LAYOUT_DISABLE_AUX; > > - assert(brw->gen > 6); /* Handle MIPTREE_LAYOUT_FORCE_ALL_SLICE_AT_LOD > */ > > + assert(brw->gen > 6); /* Handle MIPTREE_LAYOUT_GEN6_HIZ_STENCIL > */ > > mt->r8stencil_mt = intel_miptree_create(brw, > > src->target, > > MESA_FORMAT_R_UINT8, > > @@ -3670,6 +3670,7 @@ intel_miptree_get_isl_surf(struct brw_context > *brw, > > surf->array_pitch_span = ISL_ARRAY_PITCH_SPAN_FULL; > > break; > > case ALL_SLICES_AT_EACH_LOD: > > + case GEN6_HIZ_STENCIL: > > surf->array_pitch_span = ISL_ARRAY_PITCH_SPAN_COMPACT; > > break; > > default: > > diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.h > b/src/mesa/drivers/dri/i965/intel_mipmap_tree.h > > index aa52f48..e2ec26f 100644 > > --- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.h > > +++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.h > > @@ -250,6 +250,41 @@ enum miptree_array_layout { > > * +---+ > > */ > > ALL_SLICES_AT_EACH_LOD, > > + > > + /* On Sandy Bridge, HiZ and stencil buffers work the same as on Ivy > Bridge > > + * except that they don't technically support mipmapping. That does > not, > > + * however, stop us from doing it. As far as Sandy Bridge hardware > is > > + * concerned, HiZ and stencil always operates on a single miplevel 2D > > + * (possibly array) image. The dimensions of that image are NOT > minified. > > + * > > + * In order to implement HiZ and stencil on Sandy Bridge, we create > one > > + * full-sized 2D (possibly array) image for every LOD with every > image > > + * aligned to a page boundary. In order to save memory, we pretend > that > > + * the width of each miplevel is minified and we place LOD1 and > above below > > + * LOD0 but horizontally adjacent to each other. When considered as > > + * full-sized images, LOD1 and above technically overlap. However, > since > > + * we only write to part of that image, the hardware will never > notice the > > + * overlap. > > + * > > + * This layout looks something like this: > > + * > > + * +---------+ > > + * | | > > + * | | > > + * +---------+ > > + * | | > > + * | | > > + * +---------+ > > + * > > + * +----+ +-+ . > > + * | | +-+ > > + * +----+ > > + * > > + * +----+ +-+ . > > + * | | +-+ > > + * +----+ > > If we want to reflect all levels having the same heigth, maybe: > What matters isn't that they all have the same height but that they all have the same qpitch. I don't know if there was a better way to denote that. > +---------+ > | | > | | > +---------+ > | | > | | > +---------+ > > +----+ +-+ . > | | | | > | | | | > +----+ +-+ > > +----+ +-+ . > | | | | > | | | | > +----+ +-+ > > > + */ > > + GEN6_HIZ_STENCIL, > > }; > > > > enum intel_aux_disable { > > @@ -637,7 +672,7 @@ intel_miptree_alloc_non_msrt_mcs(struct brw_context > *brw, > > > > enum { > > MIPTREE_LAYOUT_ACCELERATED_UPLOAD = 1 << 0, > > - MIPTREE_LAYOUT_FORCE_ALL_SLICE_AT_LOD = 1 << 1, > > + MIPTREE_LAYOUT_GEN6_HIZ_STENCIL = 1 << 1, > > MIPTREE_LAYOUT_FOR_BO = 1 << 2, > > MIPTREE_LAYOUT_DISABLE_AUX = 1 << 3, > > MIPTREE_LAYOUT_FORCE_HALIGN16 = 1 << 4, > > -- > > 2.5.0.400.gff86faf > > > > _______________________________________________ > > mesa-dev mailing list > > mesa-dev@lists.freedesktop.org > > https://lists.freedesktop.org/mailman/listinfo/mesa-dev >
_______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev