Jason Ekstrand <ja...@jlekstrand.net> writes: > On Jul 24, 2015 8:02 AM, "Francisco Jerez" <curroje...@riseup.net> wrote: >> >> Jason Ekstrand <ja...@jlekstrand.net> writes: >> >> > On Fri, Jul 24, 2015 at 7:39 AM, Francisco Jerez <curroje...@riseup.net> > wrote: >> >> Jason Ekstrand <ja...@jlekstrand.net> writes: >> >> >> >>> On Jul 24, 2015 4:00 AM, "Francisco Jerez" <curroje...@riseup.net> > wrote: >> >>>> >> >>>> Jason Ekstrand <ja...@jlekstrand.net> writes: >> >>>> >> >>>> > Ok, I've looked through this again and convinced myself that it's >> >>>> > *mostly* correct. I am a bit skeptical of the address swizzling > for >> >>>> > Y-major tiling. >> >>>> > >> >>>> > I've also included some comments that I'd like to see added > (assuming >> >>>> > they're correct). Sometimes it's easier to write helpful comments > if >> >>>> > the person who is confused does the writing. :-) >> >>>> > >> >>>> >> >>>> Thanks! This will save me quite some guesswork, I'll add them to the >> >>>> patch. >> >>>> >> >>>> > Also, I'd just like to say that I'm terribly impressed after > reading >> >>>> > through this. You managed to handle an amazing number of cases >> >>>> > without so much as a single if statement or predicated instruction. >> >>>> > My hat's off to you sir. >> >>>> > >> >>>> Hah, I feel flattered. :) >> >>>> >> >>>> > Modulo Y-tiling and comments, >> >>>> > >> >>>> > Reviewed-by: Jason Ekstrand <jason.ekstr...@intel.com> >> >>>> > >> >>>> > On Tue, Jul 21, 2015 at 9:38 AM, Francisco Jerez < > curroje...@riseup.net> >> >>> wrote: >> >>>> >> Define a function to calculate the memory address of the image >> >>>> >> location given by a vector of coordinates. This is required in > cases >> >>>> >> where we need to fall back to untyped surface access, which take > a raw >> >>>> >> memory offset and know nothing about surface coordinates, type >> >>>> >> conversion or memory tiling and swizzling. They are still useful >> >>>> >> because typed surface reads don't support any 64 or 128-bit > formats on >> >>>> >> IVB, and they don't support any 128-bit formats on HSW and BDW. >> >>>> >> >> >>>> >> The tiling algorithm is implemented based on a number of > parameters >> >>>> >> which are passed in as uniforms and determine whether the surface >> >>>> >> layout is X-tiled, Y-tiled or untiled. This allows binding > surfaces >> >>>> >> of different tiling layouts to the pipeline without recompiling > the >> >>>> >> program. >> >>>> >> >> >>>> >> v2: Drop VEC4 suport. >> >>>> >> v3: Rebase. >> >>>> >> --- >> >>>> >> .../drivers/dri/i965/brw_fs_surface_builder.cpp | 108 >> >>> +++++++++++++++++++++ >> >>>> >> 1 file changed, 108 insertions(+) >> >>>> >> >> >>>> >> diff --git a/src/mesa/drivers/dri/i965/brw_fs_surface_builder.cpp >> >>> b/src/mesa/drivers/dri/i965/brw_fs_surface_builder.cpp >> >>>> >> index 5ee04de..0c879db 100644 >> >>>> >> --- a/src/mesa/drivers/dri/i965/brw_fs_surface_builder.cpp >> >>>> >> +++ b/src/mesa/drivers/dri/i965/brw_fs_surface_builder.cpp >> >>>> >> @@ -215,4 +215,112 @@ namespace { >> >>>> >> return BRW_PREDICATE_NORMAL; >> >>>> >> } >> >>>> >> } >> >>>> >> + >> >>>> >> + namespace image_coordinates { >> >>>> >> + /** >> >>>> >> + * Calculate the offset in memory of the texel given by \p >> >>> coord. >> >>>> >> + * >> >>>> >> + * This is meant to be used with untyped surface messages > to >> >>> access a >> >>>> >> + * tiled surface, what involves taking into account the > tiling >> >>> and >> >>>> >> + * swizzling modes of the surface manually so it will > hopefully >> >>> not >> >>>> >> + * happen very often. >> >>>> >> + */ >> >>>> >> + fs_reg >> >>>> >> + emit_address_calculation(const fs_builder &bld, const > fs_reg >> >>> &image, >> >>>> >> + const fs_reg &coord, unsigned > dims) >> >>>> >> + { >> >>>> >> + const brw_device_info *devinfo = bld.shader->devinfo; >> >>>> >> + const fs_reg off = offset(image, bld, >> >>> BRW_IMAGE_PARAM_OFFSET_OFFSET); >> >>>> >> + const fs_reg stride = offset(image, bld, >> >>> BRW_IMAGE_PARAM_STRIDE_OFFSET); >> >>>> >> + const fs_reg tile = offset(image, bld, >> >>> BRW_IMAGE_PARAM_TILING_OFFSET); >> >>>> >> + const fs_reg swz = offset(image, bld, >> >>> BRW_IMAGE_PARAM_SWIZZLING_OFFSET); >> >>>> >> + const fs_reg addr = bld.vgrf(BRW_REGISTER_TYPE_UD, 2); >> >>>> >> + const fs_reg tmp = bld.vgrf(BRW_REGISTER_TYPE_UD, 2); >> >>>> >> + const fs_reg minor = bld.vgrf(BRW_REGISTER_TYPE_UD, 2); >> >>>> >> + const fs_reg major = bld.vgrf(BRW_REGISTER_TYPE_UD, 2); >> >>>> >> + const fs_reg dst = bld.vgrf(BRW_REGISTER_TYPE_UD); >> >>>> >> + >> >>>> >> + /* Shift the coordinates by the fixed surface offset. */ >> >>>> >> + for (unsigned c = 0; c < 2; ++c) >> >>>> >> + bld.ADD(offset(addr, bld, c), offset(off, bld, c), >> >>>> >> + (c < dims ? >> >>>> >> + offset(retype(coord, BRW_REGISTER_TYPE_UD), > bld, >> >>> c) : >> >>>> >> + fs_reg(0))); >> >>>> >> + >> >>>> >> + if (dims > 2) { >> >>>> >> + /* Decompose z into a major (tmp.y) and a minor > (tmp.x) >> >>>> >> + * index. >> >>>> > >> >>>> > Please add a comment: >> >>>> > /* >> >>>> > The layout of 3-D textures in memory is sort-of like a tiling > format. >> >>>> > At each miplevel, the slices are arranged in rows of 2^level slices >> >>>> > per row. The slice row is stored in tmp.y and the slice within the >> >>>> > row is stored in tmp.x. The layout of 2-D array textures is much >> >>>> > simpler. Each slice of the texture is storred as a 2-D texture > with >> >>>> > all its miplevels and then one qpitch later, the next slice is > stored >> >>>> > with its miplevels. This code handles both by passing in the > miplevel >> >>>> > as tile[2] for 3-D textures and 0 in tile[2] for 2-D array > textures. >> >>>> > */ >> >>>> >> >>>> Note that depending on the ARYSPC_LOD0 surface state flag array > textures >> >>>> may also be interleaved in the opposite way -- All slices for LOD0, > then >> >>>> all slices for LOD1, and so on. We probably don't hit this case >> >>>> currently because MS images are not exposed but it's handled in the > same >> >>>> way by setting stride[3] to the spacing between slices. I'll add > that >> >>>> to your comment. >> >>>> >> >>>> >> + */ >> >>>> >> + bld.BFE(offset(tmp, bld, 0), offset(tile, bld, 2), >> >>> fs_reg(0), >> >>>> >> + offset(retype(coord, BRW_REGISTER_TYPE_UD), > bld, >> >>> 2)); >> >>>> >> + bld.SHR(offset(tmp, bld, 1), >> >>>> >> + offset(retype(coord, BRW_REGISTER_TYPE_UD), > bld, >> >>> 2), >> >>>> >> + offset(tile, bld, 2)); >> >>>> >> + >> >>>> >> + /* Take into account the horizontal (tmp.x) and > vertical >> >>> (tmp.y) >> >>>> >> + * slice offset. >> >>>> >> + */ >> >>>> >> + for (unsigned c = 0; c < 2; ++c) { >> >>>> >> + bld.MUL(offset(tmp, bld, c), >> >>>> >> + offset(stride, bld, 2 + c), offset(tmp, > bld, >> >>> c)); >> >>>> >> + bld.ADD(offset(addr, bld, c), >> >>>> >> + offset(addr, bld, c), offset(tmp, bld, > c)); >> >>>> >> + } >> >>>> >> + } >> >>>> >> + >> >>>> >> + if (dims > 1) { >> >>>> > >> >>>> > Please add as a comment: >> >>>> > /* >> >>>> > Calculate the major/minor x and y indices. In order to accommodate >> >>>> > both X and Y tiling, the Y-major tiling format is treated as being > a >> >>>> > bunch of narrow X-tiles placed next to each other. This means > that >> >>>> > the tile width for Y-tiling is actually the width of one > sub-column of >> >>>> > the Y-major tile where each 4K tile has 8 512B sub-columns. >> >>>> > >> >>>> > The major Y value is the row of tiles in which the pixel lives. > The >> >>>> > major X value is the tile sub-column in which the pixel lives; for > X >> >>>> > tiling, this is the same as the tile column, for Y tiling, each > tile >> >>>> > has 8 sub-columns. The minor X and Y indices are the position > within >> >>>> > the sub-column. >> >>>> > */ >> >>>> >> + for (unsigned c = 0; c < 2; ++c) { >> >>>> >> + /* Calculate the minor x and y indices. */ >> >>>> >> + bld.BFE(offset(minor, bld, c), offset(tile, bld, > c), >> >>>> >> + fs_reg(0), offset(addr, bld, c)); >> >>>> >> + >> >>>> >> + /* Calculate the major x and y indices. */ >> >>>> >> + bld.SHR(offset(major, bld, c), >> >>>> >> + offset(addr, bld, c), offset(tile, bld, > c)); >> >>>> >> + } >> >>>> >> + >> >>>> >> + /* Calculate the texel index from the start of the > tile >> >>> row and >> >>>> >> + * the vertical coordinate of the row. >> >>>> >> + * Equivalent to: >> >>>> >> + * tmp.x = (major.x << tile.y << tile.x) + >> >>>> >> + * (minor.y << tile.x) + minor.x >> >>>> >> + * tmp.y = major.y << tile.y >> >>>> >> + */ >> >>>> >> + bld.SHL(tmp, major, offset(tile, bld, 1)); >> >>>> >> + bld.ADD(tmp, tmp, offset(minor, bld, 1)); >> >>>> >> + bld.SHL(tmp, tmp, offset(tile, bld, 0)); >> >>>> >> + bld.ADD(tmp, tmp, minor); >> >>>> >> + bld.SHL(offset(tmp, bld, 1), >> >>>> >> + offset(major, bld, 1), offset(tile, bld, 1)); >> >>>> >> + >> >>>> >> + /* Add it to the start of the tile row. */ >> >>>> >> + bld.MUL(offset(tmp, bld, 1), >> >>>> >> + offset(tmp, bld, 1), offset(stride, bld, 1)); >> >>>> >> + bld.ADD(tmp, tmp, offset(tmp, bld, 1)); >> >>>> >> + >> >>>> >> + /* Multiply by the Bpp value. */ >> >>>> >> + bld.MUL(dst, tmp, stride); >> >>>> >> + >> >>>> >> + if (devinfo->gen < 8 && !devinfo->is_baytrail) { >> >>>> > >> >>>> > Talking to Ken a bit about this. It seems as if this is actually > the >> >>>> > check we want. It's going to be a nasty bug if someone has a > shader >> >>>> > cache on a HSW machine, runs some compute shaders, pulls out a > stick >> >>>> > of ram (going from dual to single-channel), and starts it up again. >> >>>> > Probably best to compile the shader the same regardless of how much >> >>>> > ram you have. >> >>>> > >> >>>> Hmm, I guess we could also invalidate the cache if the flag changes? >> >>>> (or in case anything else changes from the brw_device_info structure) >> >>>> >> >>>> >> + /* Take into account the two dynamically specified >> >>> shifts. */ >> >>>> >> + for (unsigned c = 0; c < 2; ++c) >> >>>> >> + bld.SHR(offset(tmp, bld, c), dst, offset(swz, > bld, >> >>> c)); >> >>>> >> + >> >>>> >> + /* XOR tmp.x and tmp.y with bit 6 of the memory >> >>> address. */ >> >>>> >> + bld.XOR(tmp, tmp, offset(tmp, bld, 1)); >> >>>> >> + bld.AND(tmp, tmp, fs_reg(1 << 6)); >> >>>> >> + bld.XOR(dst, dst, tmp); >> >>>> > >> >>>> > I don't think this is correct for Y-tiled surfaces. In Y-tiling, > bit >> >>>> > 6 of the address is given by bit6 XOR bit9. You have one extra XOR >> >>>> > which means you just get just bit9. >> >>>> > >> >>>> The extra XOR is made a no-op for the Y-tiled case in the same way > that >> >>>> they are both made a no-op in the no-swizzling case. I'll add > another >> >>>> comment clarifying that. >> >>> >> >>> I don't think that actually works. In the zero-shift case, the first > XOR >> >>> produces an all-zero value, you mask to bit 6 (still all-zero) and > XOR with >> >>> the original value and get the original back. In the Y-tiled case, > you >> >>> compute a ^ (a >> 3) and mask off bit 6. This gives you bit6 ^ bit9 > in the >> >>> sixth bit and zero everywhere else. When you then XOR that with the >> >>> original, the two copies of bit6 cancel out and you are left with > bit9 only. >> >>> >> >> Not exactly, in either case the result of the swizzling calculation >> >> is: >> >> addr' = addr ^ ((1 << 6) & ((addr >> swz.x) ^ (addr >> swz.y))) >> >> >> >> If the surface is Y-tiled, swz.x = 3 and swz.y = 31, which gives: >> >> addr' = addr ^ ((1 << 6) & ((addr[9] ^ addr[37]) << 6)) = >> >> addr ^ ((addr[9] ^ 0) << 6) = addr ^ (addr[9] << 6) >> >> >> >> I guess I should add another comment here about how the XOR > instructions >> >> are NOP-ed out when they're not needed. :) >> > >> > Right. I'd missed the fact that the default swizzle was not zero. >> > Also, it's 0xff, so it's 255, not 31. I'm not sure what addr >> 255 >> > does. Maybe it only takes 5 bits and it's actually 31, but we should >> > probably use 31 rather than 255. >> >> Yeah, I wrote 31 because the actual value used by the hardware is the >> LSBs of the shift value depending on the execution type -- I.e. 5 bits >> for UD. We could definitely initialize the swz parameters to 0x1f as >> well but I used 0xff instead because it would also work for larger >> offset types, e.g. 64bit in which case the hardware shift instruction >> would take the lowest 6 bits -- I'm not planning to extend the code >> above to handle 64-bit offsets anytime soon though. > > I don't care too much which we use. However, there should be comments both > here and where we set the defaults saying why the defeault is 31/0xff. > Sure, I'll fix that.
>> > --Jason >> > >> >>>> >> + } >> >>>> >> + >> >>>> >> + } else { >> >>>> >> + /* Multiply by the Bpp/stride value. */ >> >>>> > >> >>>> > I'm still confused as to how addr can have 2 things while dims == > 1. >> >>>> > Does dims specify the per-slice dimension or total dimension? That >> >>>> > should probably be documented at the top of the function. >> >>>> > >> >>>> dims specifies the dimension of the coord argument. addr may still > have >> >>>> a non-zero y coordinate because it's the sum of coord and the >> >>>> two-dimensional offset value specified by surface set-up. 1D texture >> >>>> slices and levels are laid out in a 2D layout just like the rest, the >> >>>> offset value specifies where in the 2D miptree the requested 1D image >> >>>> starts. You may wonder why we pass a 2D offset to the shader > instead of >> >>>> just fixing up the surface base memory address to point at the right >> >>>> slice at offset (0, 0), the answer is that in the general case (i.e. > not >> >>>> necessarily 1D) it wouldn't work, because a level may start mid-tile > and >> >>>> the result of shifting a tiled surface by less than the tile size is > in >> >>>> general not a well-formed tiled surface -- In the 1D case it could >> >>>> probably be done though because I deliberately disabled tiling for > those >> >>>> (36a17f0f9913), but it would require special-casing them in the > surface >> >>>> state set-up code and another manual calculation of the base memory >> >>>> address of the right slice-level of the miptree. >> >>>> >> >>>> >> + bld.MUL(offset(addr, bld, 1), >> >>>> >> + offset(addr, bld, 1), offset(stride, bld, > 1)); >> >>>> >> + bld.ADD(addr, addr, offset(addr, bld, 1)); >> >>>> >> + bld.MUL(dst, addr, stride); >> >>>> >> + } >> >>>> >> + >> >>>> >> + return dst; >> >>>> >> + } >> >>>> >> + } >> >>>> >> } >> >>>> >> -- >> >>>> >> 2.4.3 >> >>>> >> >> >>>> >> _______________________________________________ >> >>>> >> mesa-dev mailing list >> >>>> >> mesa-dev@lists.freedesktop.org >> >>>> >> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
signature.asc
Description: PGP signature
_______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev