Much better. Reviewed-by: Jason Ekstrand <jason.ekstr...@intel.com>
On Fri, Jul 24, 2015 at 9:33 AM, Francisco Jerez <curroje...@riseup.net> wrote: > Define a function to calculate the memory address of the image > location given by a vector of coordinates. This is required in cases > where we need to fall back to untyped surface access, which take a raw > memory offset and know nothing about surface coordinates, type > conversion or memory tiling and swizzling. They are still useful > because typed surface reads don't support any 64 or 128-bit formats on > IVB, and they don't support any 128-bit formats on HSW and BDW. > > The tiling algorithm is implemented based on a number of parameters > which are passed in as uniforms and determine whether the surface > layout is X-tiled, Y-tiled or untiled. This allows binding surfaces > of different tiling layouts to the pipeline without recompiling the > program. > > v2: Drop VEC4 suport. > v3: Rebase. > v4: Add plenty of comments (Jason). > --- > Here's an updated version of the same patch with the comments you > suggested and some more of my own. Let me know if I missed anything. > > .../drivers/dri/i965/brw_fs_surface_builder.cpp | 169 > +++++++++++++++++++++ > 1 file changed, 169 insertions(+) > > diff --git a/src/mesa/drivers/dri/i965/brw_fs_surface_builder.cpp > b/src/mesa/drivers/dri/i965/brw_fs_surface_builder.cpp > index 5ee04de..a0bedf2 100644 > --- a/src/mesa/drivers/dri/i965/brw_fs_surface_builder.cpp > +++ b/src/mesa/drivers/dri/i965/brw_fs_surface_builder.cpp > @@ -215,4 +215,173 @@ namespace { > return BRW_PREDICATE_NORMAL; > } > } > + > + namespace image_coordinates { > + /** > + * Calculate the offset in memory of the texel given by \p coord. > + * > + * This is meant to be used with untyped surface messages to access a > + * tiled surface, what involves taking into account the tiling and > + * swizzling modes of the surface manually so it will hopefully not > + * happen very often. > + * > + * The tiling algorithm implemented here matches either the X or Y > + * tiling layouts supported by the hardware depending on the tiling > + * coefficients passed to the program as uniforms. See Volume 1 Part 2 > + * Section 4.5 "Address Tiling Function" of the IVB PRM for an in-depth > + * explanation of the hardware tiling format. > + */ > + fs_reg > + emit_address_calculation(const fs_builder &bld, const fs_reg &image, > + const fs_reg &coord, unsigned dims) > + { > + const brw_device_info *devinfo = bld.shader->devinfo; > + const fs_reg off = offset(image, bld, > BRW_IMAGE_PARAM_OFFSET_OFFSET); > + const fs_reg stride = offset(image, bld, > BRW_IMAGE_PARAM_STRIDE_OFFSET); > + const fs_reg tile = offset(image, bld, > BRW_IMAGE_PARAM_TILING_OFFSET); > + const fs_reg swz = offset(image, bld, > BRW_IMAGE_PARAM_SWIZZLING_OFFSET); > + const fs_reg addr = bld.vgrf(BRW_REGISTER_TYPE_UD, 2); > + const fs_reg tmp = bld.vgrf(BRW_REGISTER_TYPE_UD, 2); > + const fs_reg minor = bld.vgrf(BRW_REGISTER_TYPE_UD, 2); > + const fs_reg major = bld.vgrf(BRW_REGISTER_TYPE_UD, 2); > + const fs_reg dst = bld.vgrf(BRW_REGISTER_TYPE_UD); > + > + /* Shift the coordinates by the fixed surface offset. It may be > + * non-zero if the image is a single slice of a higher-dimensional > + * surface, or if a non-zero mipmap level of the surface is bound to > + * the pipeline. The offset needs to be applied here rather than at > + * surface state set-up time because the desired slice-level may > + * start mid-tile, so simply shifting the surface base address > + * wouldn't give a well-formed tiled surface in the general case. > + */ > + for (unsigned c = 0; c < 2; ++c) > + bld.ADD(offset(addr, bld, c), offset(off, bld, c), > + (c < dims ? > + offset(retype(coord, BRW_REGISTER_TYPE_UD), bld, c) : > + fs_reg(0))); > + > + /* The layout of 3-D textures in memory is sort-of like a tiling > + * format. At each miplevel, the slices are arranged in rows of > + * 2^level slices per row. The slice row is stored in tmp.y and > + * the slice within the row is stored in tmp.x. > + * > + * The layout of 2-D array textures and cubemaps is much simpler: > + * Depending on whether the ARYSPC_LOD0 layout is in use it will be > + * stored in memory as an array of slices, each one being a 2-D > + * arrangement of miplevels, or as a 2D arrangement of miplevels, > + * each one being an array of slices. In either case the separation > + * between slices of the same LOD is equal to the qpitch value > + * provided as stride.w. > + * > + * This code can be made to handle either 2D arrays and 3D textures > + * by passing in the miplevel as tile.z for 3-D textures and 0 in > + * tile.z for 2-D array textures. > + * > + * See Volume 1 Part 1 of the Gen7 PRM, sections 6.18.4.7 "Surface > + * Arrays" and 6.18.6 "3D Surfaces" for a more extensive discussion > + * of the hardware 3D texture and 2D array layouts. > + */ > + if (dims > 2) { > + /* Decompose z into a major (tmp.y) and a minor (tmp.x) > + * index. > + */ > + bld.BFE(offset(tmp, bld, 0), offset(tile, bld, 2), fs_reg(0), > + offset(retype(coord, BRW_REGISTER_TYPE_UD), bld, 2)); > + bld.SHR(offset(tmp, bld, 1), > + offset(retype(coord, BRW_REGISTER_TYPE_UD), bld, 2), > + offset(tile, bld, 2)); > + > + /* Take into account the horizontal (tmp.x) and vertical (tmp.y) > + * slice offset. > + */ > + for (unsigned c = 0; c < 2; ++c) { > + bld.MUL(offset(tmp, bld, c), > + offset(stride, bld, 2 + c), offset(tmp, bld, c)); > + bld.ADD(offset(addr, bld, c), > + offset(addr, bld, c), offset(tmp, bld, c)); > + } > + } > + > + if (dims > 1) { > + /* Calculate the major/minor x and y indices. In order to > + * accommodate both X and Y tiling, the Y-major tiling format is > + * treated as being a bunch of narrow X-tiles placed next to each > + * other. This means that the tile width for Y-tiling is > actually > + * the width of one sub-column of the Y-major tile where each 4K > + * tile has 8 512B sub-columns. > + * > + * The major Y value is the row of tiles in which the pixel > lives. > + * The major X value is the tile sub-column in which the pixel > + * lives; for X tiling, this is the same as the tile column, for > Y > + * tiling, each tile has 8 sub-columns. The minor X and Y > indices > + * are the position within the sub-column. > + */ > + for (unsigned c = 0; c < 2; ++c) { > + /* Calculate the minor x and y indices. */ > + bld.BFE(offset(minor, bld, c), offset(tile, bld, c), > + fs_reg(0), offset(addr, bld, c)); > + > + /* Calculate the major x and y indices. */ > + bld.SHR(offset(major, bld, c), > + offset(addr, bld, c), offset(tile, bld, c)); > + } > + > + /* Calculate the texel index from the start of the tile row and > + * the vertical coordinate of the row. > + * Equivalent to: > + * tmp.x = (major.x << tile.y << tile.x) + > + * (minor.y << tile.x) + minor.x > + * tmp.y = major.y << tile.y > + */ > + bld.SHL(tmp, major, offset(tile, bld, 1)); > + bld.ADD(tmp, tmp, offset(minor, bld, 1)); > + bld.SHL(tmp, tmp, offset(tile, bld, 0)); > + bld.ADD(tmp, tmp, minor); > + bld.SHL(offset(tmp, bld, 1), > + offset(major, bld, 1), offset(tile, bld, 1)); > + > + /* Add it to the start of the tile row. */ > + bld.MUL(offset(tmp, bld, 1), > + offset(tmp, bld, 1), offset(stride, bld, 1)); > + bld.ADD(tmp, tmp, offset(tmp, bld, 1)); > + > + /* Multiply by the Bpp value. */ > + bld.MUL(dst, tmp, stride); > + > + if (devinfo->gen < 8 && !devinfo->is_baytrail) { > + /* Take into account the two dynamically specified shifts. > + * Both need are used to implement swizzling of X-tiled > + * surfaces. For Y-tiled surfaces only one bit needs to be > + * XOR-ed with bit 6 of the memory address, so a swz value of > + * 0xff (actually interpreted as 31 by the hardware) will be > + * provided to cause the relevant bit of tmp.y to be zero and > + * turn the first XOR into the identity. For linear surfaces > + * or platforms lacking address swizzling both shifts will be > + * 0xff causing the relevant bits of both tmp.x and .y to be > + * zero, what effectively disables swizzling. > + */ > + for (unsigned c = 0; c < 2; ++c) > + bld.SHR(offset(tmp, bld, c), dst, offset(swz, bld, c)); > + > + /* XOR tmp.x and tmp.y with bit 6 of the memory address. */ > + bld.XOR(tmp, tmp, offset(tmp, bld, 1)); > + bld.AND(tmp, tmp, fs_reg(1 << 6)); > + bld.XOR(dst, dst, tmp); > + } > + > + } else { > + /* Multiply by the Bpp/stride value. Note that the addr.y may be > + * non-zero even if the image is one-dimensional because a > + * vertical offset may have been applied above to select a > + * non-zero slice or level of a higher-dimensional texture. > + */ > + bld.MUL(offset(addr, bld, 1), > + offset(addr, bld, 1), offset(stride, bld, 1)); > + bld.ADD(addr, addr, offset(addr, bld, 1)); > + bld.MUL(dst, addr, stride); > + } > + > + return dst; > + } > + } > } > -- > 2.4.3 > _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev