On 10/08/2013 04:27 PM, Ben Widawsky wrote:
After the last patch, we can replace the region allocated in the miptree
creation with a more straightforward (and hopefully smaller resulting)
buffer based on the bspec's allocation formula.
Since I am relatively new to this part of the bspec, I would very much
appreciate scrutiny during review of this. There were some ambiguities
to me which are likely obvious to others.
To prove the reduced [GPU] memory usage I created a simple script which
polls the memory usage of the process through debugfs ever .1 seconds.
The following results show the memory usage difference over 5 runs of
xonotic-glx with ultra settings.
The data suggests a 10MB savings on average. I've not measured the
savings on the CPU side, but I imagine some amount of savings would be
present there as well.
x head/xonotic
+ mine/xonotic
N Min Max Median Avg Stddev
x 5 25.683336 25.898164 25.872499 25.842426 0.089829019
+ 5 25.841368 25.934931 25.869051 25.877494 0.039885576
x head/memusage
+ mine/memusage
N Min Max Median Avg Stddev
x 18036 89432064 8.6380954e+08 7.9515648e+08 7.930405e+08 42774265
+ 18030 86548480 8.6262989e+08 7.8178714e+08 7.7978462e+08 42099587
NOTE: There were a couple of places in the arithmetic where I could have
taken some shortcuts. In order to make the code match with the spec as
much as possible, I've decided not to do this. One shortcut I did make
was the tiling type. Digging through the code it looks like you always
want Y-tiled, except when it won't fit, in which case you want X-tiled.
I wasn't a fan of the existing helper function that's there since it has
a few irrelevant parameters for this operation. I suspect people
reviewing this might ask me to change this, which is fine; I just wanted
to explain the motivation.
v2: copy-paste fix where I used I915_TILING_Y where I meant _X. (Topi)
v3:
Updated to directly use the bo/stride instead of intel_region. (Ken, Chad)
Fix the reference count leak on the hiz buffer (Chad)
Don't allow fallback to old mt allocation. It should never happen. (Ben)
Break out hz_depth/width calculation to separete functions. (Ben)
Use cpp = 1, since the calculation takes cpp into account (Ben)
v4: Don't make the physical size calculation. It is unnecessary and just
confusion on my part (Chad)
BO size before: 10485760
BO size after: 1310720
This savings of 8.75MB provides an 8x savings over the original size.
v5:
Commit message cleanups (Chad)
Align Z_Height to 8. (Chad)
Formatting cleanups (Ian, Chad)
Change comments to quote PRM more explicitly (Chad)
Use mt->logical_depth0 instead of mt->level[i].depth (Chad)
Kill TODO about compressed depth textures (Chad)
Use target_to_target (Chad)
Add missing GL_TEXTURE_1D, and GL_TEXTURE_CUBE_MAP cases
Remove X tiled fallback (Chad)
CC: Chad Versace <chad.vers...@linux.intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=67564
Signed-off-by: Ben Widawsky <b...@bwidawsk.net>
---
src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 147 ++++++++++++++++++++++----
src/mesa/drivers/dri/i965/intel_mipmap_tree.h | 2 +-
2 files changed, 129 insertions(+), 20 deletions(-)
diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
index e1da9de..3c94d89 100644
--- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
+++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
@@ -793,8 +793,12 @@ intel_miptree_release(struct intel_mipmap_tree **mt)
intel_region_release(&((*mt)->region));
intel_miptree_release(&(*mt)->stencil_mt);
- intel_miptree_release(&(*mt)->hiz_buffer.mt);
- (*mt)->hiz_buffer.bo = NULL;
+ if (&(*mt)->hiz_buffer.mt)
+ intel_miptree_release(&(*mt)->hiz_buffer.mt);
+ else {
+ drm_intel_bo_unreference((*mt)->hiz_buffer.bo);
+ (*mt)->hiz_buffer.bo = NULL;
+ }
intel_miptree_release(&(*mt)->mcs_mt);
intel_miptree_release(&(*mt)->singlesample_mt);
intel_resolve_map_clear(&(*mt)->hiz_map);
@@ -1271,30 +1275,135 @@ intel_miptree_slice_enable_hiz(struct brw_context *brw,
return true;
}
+static unsigned int
+calculate_z_height(const struct intel_mipmap_tree *mt, const int level)
+{
+ unsigned int height = ALIGN(minify(mt->logical_height0, level), 8);
+
+ /* From the Ivybridge PRM, Section 11.5.3: "Hierarchical Depth Buffer":
+ * The Surface Type, Height, Width, Depth, Minimum Array Element, Render
+ * Target View Extent, and Depth Coordinate Offset X/Y of the hierarchical
+ * depth buffer are inherited from the depth buffer The height and width
of
+ * the hierarchical depth buffer that must be allocated are computed by
the
+ * following formulas, where HZ is the hierarchical depth buffer and Z is
+ * the depth buffer The Z_Height, Z_Width, and Z_Depth values given in
these
+ * formulas are those present in 3DSTATE_DEPTH_BUFFER incremented by one.
:
+ * The value of Z_Height and Z_Width must each be multiplied by 2 before
+ * being applied to the table below if Number of Multisamples is set to
+ * NUMSAMPLES_4. The value of Z_Height must be multiplied by 2 and
Z_Width
+ * must be multiplied by 4 before being applied to the table below if
+ * Number of Multisamples is set to NUMSAMPLES_8..
+ */
+ if (mt->num_samples == 4 || mt->num_samples == 8)
+ height *= 2;
+
+ return height;
+}
+
+static unsigned int
+calculate_z_width(const struct intel_mipmap_tree *mt, const int level)
+{
+ unsigned int width = minify(mt->logical_width0, level);
+
+ /* See PRM notice in calculate_z_height() */
+ if (mt->num_samples == 4)
+ width *= 2;
+ else if (mt->num_samples == 8)
+ width *= 4;
+
+ return width;
+}
+
+static void
+gen7_create_hiz_buffer(struct brw_context *brw,
+ struct intel_mipmap_tree *mt)
+{
+ uint32_t q_pitch, w0, h0, h1, h_level, z_depth; /* Inputs to formula */
+ size_t hz_width; /* Number of bytes */
+ unsigned int hz_height; /* Number of rows */
+ unsigned int tiling = I915_TILING_Y;
+
+ z_depth = mt->logical_depth0;
+ w0 = calculate_z_width(mt, 0);
+ h0 = calculate_z_height(mt, 0);
+ h1 = calculate_z_height(mt, 1);
Not quite. According to the docs,
h_level = ALIGN(minify(adjust_for_msaa(Z_Height), level), j=8)
but this patch instead calculates
h_level = adjust_for_msaa(ALIGN(minify(Z_Height, level), j=8))
+
+ hz_width = ALIGN(w0, 16);
+
+ /* ... Where, Qpitch is computed using vertical alignment j=8. Please refer
+ * to the GPU overview volume for Qpitch definition. NB: The docs have
+ * multiple formulas for q_pitch on IVB, but the HSW docs only have the
+ * below definition.
+ */
+ q_pitch = h0 + h1 + 11 * 8;
+
+ /* The following is directly derived from the "Hierarchical Depth Buffer"
+ * section of the bspec.
+ */
+ switch (target_to_target(mt->target)) {
+ case GL_TEXTURE_1D:
+ case GL_TEXTURE_1D_ARRAY:
+ case GL_TEXTURE_2D_ARRAY:
+ case GL_TEXTURE_2D:
+ hz_height = ALIGN((q_pitch * z_depth / 2), 8);
+ break;
+ case GL_TEXTURE_CUBE_MAP:
+ case GL_TEXTURE_CUBE_MAP_ARRAY:
+ hz_height = ALIGN((q_pitch * z_depth * 6 / 2), 8);
+ break;
+ case GL_TEXTURE_3D:
+ hz_height = 0;
+ for (int i = 0; i < mt->last_level; i++) {
+ int tmp;
+ h_level = calculate_z_height(mt, i);
+ tmp = floorf(z_depth / pow(2, i));
+ if (!tmp)
+ tmp++;
+ hz_height += h_level * tmp;
The way you do max() is very clever, and hard to understand. Let's just use
MAX2().
And, there's no need for circuitous floating point calculations here, because
all inputs and outputs are ints.
This accomplishes the same thing, is easier to read, avoids floats, and looks
more similar to the bspec:
hz_height += h_level * MAX2(1, z_depth >> i)
+ }
+ hz_height /= 2;
+ break;
+ default:
+ perf_debug("Unknown depthbuffer texture type (%d).", mt->target);
I thought I commented on the perf_debug() before, but maybe I forgot to send
that draft.
perf_debug() should be used only for known issues that affect performance. The
issue
here is much more dire: if we hit the default case, then we have hit a real
bug, because
we forgot to write code to handle all the targets.
Instead, use fprintf(stderr, ...). Also, always print GLenum values as "0x%x".
+ return;
+ }
+
+ mt->hiz_buffer.bo = drm_intel_bo_alloc_tiled(brw->intelScreen->bufmgr,
+ "hiz_buffer",
+ hz_width, hz_height, 1,
+ &tiling,
&mt->hiz_buffer.stride,
+ BO_ALLOC_FOR_RENDER);
+}
bool
intel_miptree_alloc_hiz(struct brw_context *brw,
struct intel_mipmap_tree *mt)
{
- assert(mt->hiz_buffer.mt == NULL);
- mt->hiz_buffer.mt = intel_miptree_create(brw,
- mt->target,
- mt->format,
- mt->first_level,
- mt->last_level,
- mt->logical_width0,
- mt->logical_height0,
- mt->logical_depth0,
- true,
- mt->num_samples,
- INTEL_MIPTREE_TILING_ANY);
-
- if (!mt->hiz_buffer.mt)
- return false;
+ if (brw->gen >= 7) {
+ assert(mt->hiz_buffer.bo == NULL);
+ gen7_create_hiz_buffer(brw, mt);
+ } else {
+ assert(mt->hiz_buffer.mt == NULL);
+ mt->hiz_buffer.mt = intel_miptree_create(brw,
+ mt->target,
+ mt->format,
+ mt->first_level,
+ mt->last_level,
+ mt->logical_width0,
+ mt->logical_height0,
+ mt->logical_depth0,
+ true,
+ mt->num_samples,
+ INTEL_MIPTREE_TILING_ANY);
+ if (mt->hiz_buffer.mt) {
+ mt->hiz_buffer.bo = mt->hiz_buffer.mt->region->bo;
There's tab here.
+ mt->hiz_buffer.stride = mt->hiz_buffer.mt->region->pitch;
+ }
+ }
- mt->hiz_buffer.bo = mt->hiz_buffer.mt->region->bo;
- mt->hiz_buffer.stride = mt->hiz_buffer.mt->region->pitch;
+ if (!mt->hiz_buffer.bo)
+ return false;
Other than that, the patch looks good to me.
_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev