On 2 December 2013 11:39, Francisco Jerez <curroje...@riseup.net> wrote:
> diff --git a/src/mesa/drivers/dri/i965/brw_vec4_surface_visitor.cpp > b/src/mesa/drivers/dri/i965/brw_vec4_surface_visitor.cpp > new file mode 100644 > index 0000000..3528bbe > --- /dev/null > +++ b/src/mesa/drivers/dri/i965/brw_vec4_surface_visitor.cpp > @@ -0,0 +1,846 @@ > +/* > + * Copyright 2013 Intel Corporation > + * > + * Permission is hereby granted, free of charge, to any person obtaining a > + * copy of this software and associated documentation files (the > "Software"), > + * to deal in the Software without restriction, including without > limitation > + * the rights to use, copy, modify, merge, publish, distribute, > sublicense, > + * and/or sell copies of the Software, and to permit persons to whom the > + * Software is furnished to do so, subject to the following conditions: > + * > + * The above copyright notice and this permission notice (including the > next > + * paragraph) shall be included in all copies or substantial portions of > the > + * Software. > + * > + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, > EXPRESS OR > + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF > MERCHANTABILITY, > + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT > SHALL > + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR > OTHER > + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING > + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER > DEALINGS > + * IN THE SOFTWARE. > + * > + * Authors: > + * Francisco Jerez <curroje...@riseup.net> > + */ > + > +#include "brw_vec4_surface_visitor.h" > + > +using namespace brw; > + > +namespace { > + vec4_instruction & > + exec_all(vec4_instruction &inst) > + { > + inst.force_writemask_all = true; > + return inst; > + } > + > + vec4_instruction & > + exec_predicated(backend_reg flag, vec4_instruction &inst) > + { > + if (flag.file != BAD_FILE) > + inst.predicate = BRW_PREDICATE_ALIGN16_ALL4H; > There should be a comment here explaining why you chose this predicate (or at the very least referring the reader to emit_coordinate_check() for context). > + > + return inst; > + } > +} > + > + > +/** > + * Copy a SIMD4x2 vector to its transpose SIMD8x4 vector. > Usually SIMDNxM means "M separate execution flows are simultaneously processed, with each execution flow processing a packed vector of N elements at a time". (Note that SIMD8 is considered an abbreviation for SIMD1x8 in this interpretation). So SIMD8x4 doesn't make any sense. How about instead saying "Copy a SIMD4x2 vector to execution channels 0 and 4 of a SIMD8 vector"? Also, there should be a comment clarifying why it's ok to leave channels 1-3 and 5-7 uninitialized. > + */ > +void > +brw_vec4_surface_visitor::emit_assign_to_transpose( > + dst_reg dst, src_reg src, unsigned size) const > +{ > + for (unsigned i = 0; i < size; ++i) { > + emit(BRW_OPCODE_MOV, > + writemask(offset(dst, i), WRITEMASK_X), > + swizzle(src, BRW_SWIZZLE4(i, i, i, i))); > + } > +} > + > +/** > + * Copy a SIMD4x2 vector from its transpose SIMD8x4 vector. > Similar clarification is needed here. > + */ > +void > +brw_vec4_surface_visitor::emit_assign_from_transpose( > + dst_reg dst, src_reg src, unsigned size) const > +{ > + for (unsigned i = 0; i < size; ++i) { > + emit(BRW_OPCODE_MOV, > + writemask(dst, 1 << i), > + swizzle(offset(src, i), BRW_SWIZZLE_XXXX)); > + } > +} > + > +/** > + * Initialize the header present in some surface access messages. > + */ > +void > +brw_vec4_surface_visitor::emit_surface_header(struct dst_reg dst) const > +{ > + assert(dst.file == MRF); > + exec_all(emit(BRW_OPCODE_MOV, dst, 0)); > + > + if (!v->brw->is_haswell) { > + /* The sample mask is used on IVB for the SIMD8 messages that > + * have no SIMD4x2 counterpart. We only use the two X channels > + * in that case, mask everything else out. > + */ > + exec_all(emit(BRW_OPCODE_MOV, > + brw_writemask(brw_uvec_mrf(4, dst.reg, 4), > WRITEMASK_W), > + 0x11)); > + } > +} > As with the corresponding fs function, it would be helpful for the comment to say which messages this function is expected to be used for, or to include a bspec reference that shows the header format for the messages you care about. > +backend_reg > +brw_vec4_surface_visitor::emit_coordinate_address_calculation( > + backend_reg image, backend_reg addr, unsigned dims) const > Most of my comments on brw_fs_surface_visitor::emit_coordinate_address_calculation() apply here as well. > +{ > + const unsigned mask = (1 << dims) - 1; > + src_reg off = offset(image, BRW_IMAGE_PARAM_OFFSET_OFFSET / 4); > + src_reg stride = offset(image, BRW_IMAGE_PARAM_STRIDE_OFFSET / 4); > + src_reg tile = offset(image, BRW_IMAGE_PARAM_TILING_OFFSET / 4); > + src_reg swz = offset(image, BRW_IMAGE_PARAM_SWIZZLING_OFFSET / 4); > + src_reg dst = make_grf(BRW_REGISTER_TYPE_UD, 1); > + src_reg tmp = make_grf(BRW_REGISTER_TYPE_UD, 4); > + > + /* Shift the coordinates by the fixed surface offset. */ > + emit(BRW_OPCODE_ADD, writemask(addr, WRITEMASK_XY & mask), > + addr, off); > + > + if (dims > 2) { > + /* Decompose z into a major (tmp.w) and a minor (tmp.z) > + * index. > + */ > + emit(BRW_OPCODE_SHL, writemask(tmp, WRITEMASK_Z), > + addr, negate(tile)); > + > + emit(BRW_OPCODE_SHR, writemask(tmp, WRITEMASK_Z), > + tmp, negate(tile)); > The use of negate() with a shift is sufficiently clever that it could use a comment explaining what's going on. In partuclar, that the SHL and SHR are effectively preserving the lowest-order tile.z bits of addr.z and discarding the rest, except in the case where tile.z == 0, in which case they are keeping all the bits. > + > + emit(BRW_OPCODE_SHR, writemask(tmp, WRITEMASK_W), > + swizzle(addr, BRW_SWIZZLE_ZZZZ), > + swizzle(tile, BRW_SWIZZLE_ZZZZ)); > + > + /* Calculate the horizontal (tmp.z) and vertical (tmp.w) slice > + * offset. > + */ > + emit(BRW_OPCODE_MUL, writemask(tmp, WRITEMASK_ZW), > + stride, tmp); > + emit(BRW_OPCODE_ADD, writemask(addr, WRITEMASK_XY), > + addr, swizzle(tmp, BRW_SWIZZLE_ZWZW)); > There should be a comment clarifying that in the case where tile.z == 0, since tmp.z == addr.z, we are relying on the fact that stride.z is 0 in order to avoid corrupting the contents of addr.x. Additionally, in patch 04/25, a comment should be added to brw_miptree_get_horizontal_slice_pitch() to clarify that in the non-GL_TEXTURE_3D case, brw_vec4_surface_visitor::emit_coordinate_address_calculation() relies on the return value being 0. > + } > + > + if (dims > 1) { > + /* Calculate the minor x (tmp.x) and y (tmp.y) indices. */ > + emit(BRW_OPCODE_SHL, writemask(tmp, WRITEMASK_XY), > + addr, negate(tile)); > + > + emit(BRW_OPCODE_SHR, writemask(tmp, WRITEMASK_XY), > + tmp, negate(tile)); > + > + /* Calculate the major x (tmp.z) and y (tmp.w) indices. */ > + emit(BRW_OPCODE_SHR, writemask(tmp, WRITEMASK_ZW), > + swizzle(addr, BRW_SWIZZLE_XYXY), > + swizzle(tile, BRW_SWIZZLE_XYXY)); > + > + /* Multiply the minor indices and the major x index (tmp.x, > + * tmp.y and tmp.w) by the Bpp, and the major y index (tmp.w) by > + * the vertical stride. > + */ > + emit(BRW_OPCODE_MUL, writemask(tmp, WRITEMASK_XYZW), > + swizzle(stride, BRW_SWIZZLE_XXXY), tmp); > + > + /* Multiply by the tile dimensions using two shift instructions. > + * Equivalent to: > + * minor.y = minor.y << tile.x > + * major.x = major.x << tile.x << tile.y > + * major.y = major.y << tile.y > + */ > + emit(BRW_OPCODE_SHL, writemask(tmp, WRITEMASK_ZW), > + swizzle(tmp, BRW_SWIZZLE_ZWZW), > + swizzle(tile, BRW_SWIZZLE_YYYY)); > + > + emit(BRW_OPCODE_SHL, writemask(tmp, WRITEMASK_YZ), > + swizzle(tmp, BRW_SWIZZLE_YYZZ), > + swizzle(tile, BRW_SWIZZLE_XXXX)); > + > + /* Add everything up. */ > + emit(BRW_OPCODE_ADD, writemask(tmp, WRITEMASK_XY), > + swizzle(tmp, BRW_SWIZZLE_XYXY), > + swizzle(tmp, BRW_SWIZZLE_ZWZW)); > + > + emit(BRW_OPCODE_ADD, writemask(dst, WRITEMASK_X), > + swizzle(tmp, BRW_SWIZZLE_XXXX), > + swizzle(tmp, BRW_SWIZZLE_YYYY)); > I believe this code calculates the wrong value when tile.x == tile.y == 0, because all the shifts shift by zero, so just before the line "/* Add everything up. */", tmp has the values: tmp.x = addr.x * stride.x tmp.y = addr.y * stride.x tmp.z = addr.x * stride.x tmp.w = addr.y * stride.y And then when you add it all up you get dst.x = addr.x * (2 * stride.x) + addr.y * (stride.x + stride.y) Which isn't correct. What you want is: dst.x = addr.x * stride.x + addr.y * stride.y > + > + if (v->brw->has_swizzling) { > + /* Take into account the two dynamically specified shifts. */ > + emit(BRW_OPCODE_SHR, writemask(tmp, WRITEMASK_XY), > + swizzle(dst, BRW_SWIZZLE_XXXX), swz); > + > + /* XOR tmp.x and tmp.y with bit 6 of the memory address. */ > + emit(BRW_OPCODE_XOR, writemask(tmp, WRITEMASK_X), > + swizzle(tmp, BRW_SWIZZLE_XXXX), > + swizzle(tmp, BRW_SWIZZLE_YYYY)); > + > + emit(BRW_OPCODE_AND, writemask(tmp, WRITEMASK_X), > + tmp, 1 << 6); > + > + emit(BRW_OPCODE_XOR, writemask(dst, WRITEMASK_X), > + dst, tmp); > + } > + > + } else { > + /* Multiply by the Bpp value. */ > + emit(BRW_OPCODE_MUL, writemask(dst, WRITEMASK_X), > + addr, stride); > + } > + > + return dst; > +} > + > +backend_reg > +brw_vec4_surface_visitor::emit_untyped_read( > + backend_reg flag, backend_reg surface, backend_reg addr, > + unsigned dims, unsigned size) const > It would be nice to have a comment in this function saying that for both IVB and HSW, we use SIMD4x2 messages for untyped surface read. > + > +void > +brw_vec4_surface_visitor::emit_untyped_write( > + backend_reg flag, backend_reg surface, backend_reg addr, > + backend_reg src, unsigned dims, unsigned size) const > +{ > + const unsigned mask = (v->brw->is_haswell ? (1 << size) - 1 : 1); > + unsigned mlen = 0; > It would be nice to move the comment about IVB using a SIMD8 message up here, since without that context, the code to emit the surface write address and the source value don't make sense. A similar comment applies to emit_untyped_atomic(), emit_typed_read(), emit_typed_write(), and emit_typed_atomic(). > + > + /* Set the surface write address. */ > + if (v->brw->is_haswell) { > + emit_assign_with_pad(make_mrf(mlen), addr, dims); > + mlen++; > + } else { > + emit_assign_to_transpose(make_mrf(mlen), addr, dims); > + mlen += dims; > + } > + > + /* Set the source value. */ > + if (v->brw->is_haswell) { > + emit_assign_with_pad(make_mrf(mlen), src, size); > + mlen++; > + } else { > + emit_assign_to_transpose(make_mrf(mlen), src, size); > + mlen += size; > + } > + > + /* Emit the instruction. Note that this is translated into the > + * SIMD8 untyped surface write message on IVB because the > + * hardware lacks a SIMD4x2 counterpart. > + */ > + vec4_instruction &inst = exec_predicated( > + flag, emit(SHADER_OPCODE_UNTYPED_SURFACE_WRITE, > + brw_writemask(brw_null_reg(), mask), > + surface, size)); > + inst.base_mrf = 0; > + inst.mlen = mlen; > +} > + > +backend_reg > +brw_vec4_surface_visitor::emit_untyped_atomic( > + backend_reg flag, backend_reg surface, backend_reg addr, > + backend_reg src0, backend_reg src1, > + unsigned dims, unsigned op) const > +{ > + src_reg dst = make_grf(BRW_REGISTER_TYPE_UD, 1); > + unsigned mlen = 0; > + > + /* Set the atomic operation address. */ > + if (v->brw->is_haswell) { > + emit_assign_with_pad(make_mrf(mlen), addr, dims); > + mlen++; > + } else { > + emit_assign_to_transpose(make_mrf(mlen), addr, dims); > + mlen += dims; > + } > + > + /* Set the source arguments. */ > + if (v->brw->is_haswell) { > + if (src0.file != BAD_FILE) > + emit(BRW_OPCODE_MOV, writemask(make_mrf(mlen), WRITEMASK_X), > + src0); > + > + if (src1.file != BAD_FILE) > + emit(BRW_OPCODE_MOV, writemask(make_mrf(mlen), WRITEMASK_Y), > + swizzle(src1, BRW_SWIZZLE_XXXX)); > + > + mlen++; > According to Graphics BSpec: 3D-Media-GPGPU Engine > Shared Functions > Shared Functions – Data Port [Pre-SKL] > Messages > TypedUntyped Surface ReadWrite and TypedUntyped Atomic Operation [IVB+] > Message Payload > SIMD4x2 Source Payload (Atomic Operation message only), "The following atomic operations require no sources, thus the source payload is not delivered: AOP_INC, AOP_DEC, AOP_PREDEC". I think this means that if both src0.file and src1.file are BAD_FILE, we shouldn't increment mlen. A similar comment applies to emit_typed_atomic(). > + > + } else { > + if (src0.file != BAD_FILE) { > + emit(BRW_OPCODE_MOV, writemask(make_mrf(mlen), WRITEMASK_X), > + src0); > + mlen++; > + } > + > + if (src1.file != BAD_FILE) { > + emit(BRW_OPCODE_MOV, writemask(make_mrf(mlen), WRITEMASK_X), > + src1); > + mlen++; > + } > + } > + > + /* Emit the instruction. Note that this is translated into the > + * SIMD8 untyped atomic message on IVB because the hardware lacks a > + * SIMD4x2 counterpart. > + */ > + vec4_instruction &inst = exec_predicated( > + flag, emit(SHADER_OPCODE_UNTYPED_ATOMIC, > + writemask(dst, WRITEMASK_X), > + surface, op)); > + inst.base_mrf = 0; > + inst.mlen = mlen; > + > + return dst; > +} > + > +backend_reg > +brw_vec4_surface_visitor::emit_typed_read( > + backend_reg flag, backend_reg surface, backend_reg addr, > + unsigned dims, unsigned size) const > +{ > + const unsigned rlen = size * (v->brw->is_haswell ? 1 : 8); > I think this is 2x larger than necessary for Ivy Bridge. brw_vec4_surface_visitor::make_grf() will divide rlen by 4 to determine how many registers to allocate. So, for example, if size == 4, it will allocate 8 consecutive registers to hold the result. But the reply from the untyped read message will only fill up 4 of them, and the call to emit_assign_from_transpose() will only cause 4 of them to be read. > +backend_reg > +brw_vec4_surface_visitor::emit_pack_generic( > + backend_reg src, > + unsigned shift_r, unsigned width_r, > + unsigned shift_g, unsigned width_g, > + unsigned shift_b, unsigned width_b, > + unsigned shift_a, unsigned width_a) const > +{ > + const unsigned mask = (!!width_r << 0 | !!width_g << 1 | > + !!width_b << 2 | !!width_a << 3); > + const bool homogeneous = ((!width_g || width_r == width_g) && > + (!width_b || width_g == width_b) && > + (!width_a || width_b == width_a)); > + const unsigned bits = width_r + width_g + width_b + width_a; > + src_reg shift = make_grf(BRW_REGISTER_TYPE_UD, 4); > + > + /* Shift left to discard the most significant bits. */ > + emit(BRW_OPCODE_MOV, writemask(shift, mask), > + (homogeneous ? brw_imm_ud(32 - width_r) : > + brw_imm_vf4(32 - width_r, 32 - width_g, > + 32 - width_b, 32 - width_a))); > It seems a little silly to go to extra effort to switch between brw_imm_ud() and brw_imm_vc4(), given that SHL and SHR only pay attention to the bottom 5 bits of their shift argument, and brw_imm_vf4 is capable of representing values up to 31. How about instead: emit(BRW_OPCODE_MOV, writemask(shift, mask), brw_imm_vf4((32 - width_r) % 32, (32 - width_g) % 32, (32 - width_b) % 32, (32 - width_a) % 32)); A similar comment applies to emit_unpack_generic(). > + > + emit(BRW_OPCODE_SHL, writemask(src, mask), src, shift); > As in my comments on brw_fs_surface_visitor.cpp, I don't like writing to src. It seems like there's too big a risk that a user of this function will assume that src isn't written to (and can thus be used again). I'd prefer to allocate a new temporary and then let register coalescing take care of getting rid of it. > + > + /* Shift right to the final bit field positions. */ > + emit(BRW_OPCODE_MOV, writemask(shift, mask), > + brw_imm_vf4(32 - shift_r % 32 - width_r, > + 32 - shift_g % 32 - width_g, > + 32 - shift_b % 32 - width_b, > + 32 - shift_a % 32 - width_a)); > Won't this have trouble with unused channels? For example, if shift_a == width_a == 0 (i.e. because channel a doesn't exist in the format we're packing), then we'll pass 32 to brw_imm_vf4(), which isn't allowed. I think we need to do: emit(BRW_OPCODE_MOV, writemask(shift, mask), brw_imm_vf4((32 - shift_r % 32 - width_r) % 32, (32 - shift_g % 32 - width_g) % 32, (32 - shift_b % 32 - width_b) % 32, (32 - shift_a % 32 - width_a) % 32)); A similar comment applies to emit_unpack_generic(). > + > + emit(BRW_OPCODE_SHR, writemask(src, mask), src, shift); > + > + /* Add everything up. */ > Since we're not adding, let's change the comment to "OR everything together." > + if (mask >> 2) > + emit(BRW_OPCODE_OR, > + writemask(src, WRITEMASK_XY), > + swizzle(src, BRW_SWIZZLE_XZXZ), > + swizzle(src, (mask >> 3 ? BRW_SWIZZLE_YWYW : > + BRW_SWIZZLE_YZYZ))); > + > + if (mask >> 1 && bits <= 32) > + emit(BRW_OPCODE_OR, > + writemask(src, WRITEMASK_X), > + swizzle(src, BRW_SWIZZLE_XXXX), > + swizzle(src, BRW_SWIZZLE_YYYY)); > This code only works if a number of assumptions about the shift and width values are correct. Can we document (and check) those assumptions with assertions? E.g. something like: assert(shift_r + width_r <= 32); if (bits <= 32) { // Pack all channels into result.x assert(!width_g || shift_g + width_g <= 32); assert(!width_b || shift_b + width_b <= 32); assert(!width_a || shift_a + width_a <= 32); } else if (mask >> 2) { // Pack rg into result.x and bw into result.y assert(shift_g + width_g <= 32); assert(width_b >= 32 && shift_b + width_b <= 64); assert(!width_a || (width_a >= 32 && shift_a + width_a <= 64)); } else { // Pack r into result.x and g into result.y assert(width_g >= 32 &7 shift_g + width_g <= 64); } > > + > + return src; > +} > + > +backend_reg > +brw_vec4_surface_visitor::emit_unpack_generic( > + backend_reg src, > + unsigned shift_r, unsigned width_r, > + unsigned shift_g, unsigned width_g, > + unsigned shift_b, unsigned width_b, > + unsigned shift_a, unsigned width_a) const > +{ > + const unsigned mask = (!!width_r << 0 | !!width_g << 1 | > + !!width_b << 2 | !!width_a << 3); > + const bool homogeneous = ((!width_g || width_r == width_g) && > + (!width_b || width_g == width_b) && > + (!width_a || width_b == width_a)); > + src_reg shift = make_grf(BRW_REGISTER_TYPE_UD, 4); > + src_reg dst = make_grf(src.type, 4); > + > + /* Shift left to discard the most significant bits. */ > + emit(BRW_OPCODE_MOV, writemask(shift, mask), > + brw_imm_vf4(32 - shift_r % 32 - width_r, > + 32 - shift_g % 32 - width_g, > + 32 - shift_b % 32 - width_b, > + 32 - shift_a % 32 - width_a)); > + > + emit(BRW_OPCODE_SHL, writemask(dst, mask), > + swizzle(src, BRW_SWIZZLE4(shift_r / 32, shift_g / 32, > + shift_b / 32, shift_a / 32)), > + shift); > + > + /* Shift back to the least significant bits using an arithmetic > + * shift to get sign extension on signed types. > + */ > + emit(BRW_OPCODE_MOV, writemask(shift, mask), > + (homogeneous ? brw_imm_ud(32 - width_r) : > + brw_imm_vf4(32 - width_r, 32 - width_g, > + 32 - width_b, 32 - width_a))); > + > + emit(BRW_OPCODE_ASR, writemask(dst, mask), dst, shift); > + > + return dst; > +} > + > +backend_reg > +brw_vec4_surface_visitor::emit_pack_homogeneous( > + backend_reg src, > + unsigned shift_r, unsigned width_r, > + unsigned shift_g, unsigned width_g, > + unsigned shift_b, unsigned width_b, > + unsigned shift_a, unsigned width_a) const > +{ > + /* We could do the same with less instructions if we had some way > + * to use Align1 addressing in the VEC4 visitor. Just use the > + * general path for now... > + */ > + return emit_pack_generic(src, shift_r, width_r, shift_g, width_g, > + shift_b, width_b, shift_a, width_a); > +} > + > +backend_reg > +brw_vec4_surface_visitor::emit_unpack_homogeneous( > + backend_reg src, > + unsigned shift_r, unsigned width_r, > + unsigned shift_g, unsigned width_g, > + unsigned shift_b, unsigned width_b, > + unsigned shift_a, unsigned width_a) const > +{ > + /* We could do the same with less instructions if we had some way > + * to use Align1 addressing in the VEC4 visitor. Just use the > + * general path for now... > + */ > + return emit_unpack_generic(src, shift_r, width_r, shift_g, width_g, > + shift_b, width_b, shift_a, width_a); > +} > It seems a little odd that for fs, the generic and homogeneous packing functions are completely separate; but for vec4, they wind up calling the same function, which then dynamically determines whether or not we're doing homogeneous packing/unpacking. How about if instead we simplify the interface so that there's just an emit_pack() function and an emit_unpack() function. In the vec4 implementation, we can just drop the emit_{pack,unpack}_homogeneous() functions and rename emit_{pack,unpack}_generic() to emit_{pack,unpack}. In the fs implementation, we can do: brw_fs_surface_visitor::emit_{pack,unpack}(...) { const bool homogeneous = ((!width_g || width_r == width_g) && (!width_b || width_g == width_b) && (!width_a || width_b == width_a)); if (homogeneous) { ...homogeneous logic... } else { ...generic logic... } } > + > +backend_reg > +brw_vec4_surface_visitor::emit_convert_to_float( > + backend_reg src, > + unsigned mask0, unsigned width0, > + unsigned mask1, unsigned width1) const > My comments about the fs implementation of emit_convert_to_float() apply here as well. Final thoughts about this patch, now that I've gotten through it: 1. The patch is very large, and that made reviewing it difficult. Please break it out into separate patches. One possible way of breaking it up would be: a. add the brw_surface_visitor base class. b. Add the brw_fs_surface_visitor class. c. Add the brw_vec4_surface_visitor class. That would have the added advantage that it would lead people to review the base class before reviewing the derived classes, which I think would have been an easier order to review the patch in. 2. The object splicing concerns I brought up in "[PATCH 06/23] i965: Define common register base class shared between both back-ends." apply to this patch as well. I will try to talk to some folks at Intel next week about my ideas to mitigate this.
_______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev