On Thu, Jun 4, 2015 at 9:04 AM, Francisco Jerez <curroje...@riseup.net> wrote: > The purpose of this change is threefold: First, it improves the > modularity of the compiler back-end by separating the functionality > required to construct an i965 IR program from the rest of the visitor > god-object, what in turn will reduce the coupling between other > components and the visitor allowing a more modular design. This patch > doesn't yet remove the equivalent functionality from the visitor > classes, that task will be undertaken by a separate series, as it > involves major back-end surgery. > > Second, it improves consistency between the scalar and vector > back-ends. The FS and VEC4 builders can both be used to generate > scalar code with a compatible interface or they can be used to > generate natural vector width code -- 1 or 4 components respectively. > > Third, the approach to IR construction is somewhat different to what > the visitor classes currently do. All parameters affecting code > generation (execution size, half control, point in the program where > new instructions are inserted, etc.) are encapsulated in a stand-alone > object rather than being quasi-global state (yes, anything defined in > one of the visitor classes is effectively global due to the tight > coupling with virtually everything else in the compiler back-end). > This object is lightweight and can be copied, mutated and passed > around, making helper IR-building functions more flexible because they > can now simply take a builder object as argument and will inherit its > IR generation properties in exactly the same way that a discrete > instruction would from the same builder object. > > The emit_typed_write() function from my image-load-store branch is an > example that illustrates the usefulness of the latter point: Due to > hardware limitations the function may have to split the untyped > surface message in 8-wide chunks. That means that the several > functions called to help with the construction of the message payload > are themselves required to set the execution width and half control > correctly on the instructions they emit, and to allocate all registers > with half the default width. With the previous approach this would > require the used helper functions to be aware of the parameters that > might differ from the default state and explicitly set the instruction > bits accordingly. With the new approach they would get a modified > builder object as argument that would influence all instructions > emitted by the helper function as if it were the default state. > > Another example is the fs_visitor::VARYING_PULL_CONSTANT_LOAD() > method. It doesn't actually emit any instructions, they are simply > created and inserted into an exec_list which is returned for the > caller to emit at some location of the program. This sort of two-step > emission becomes unnecessary with the builder interface because the > insertion point is one more of the code generation parameters which > are part of the builder object. The caller can simply pass > VARYING_PULL_CONSTANT_LOAD() a modified builder object pointing at the > location of the program where the effect of the constant load is > desired. This two-step emission (which pervades the compiler back-end > and is in most cases redundant) goes away: E.g. ADD() now actually > adds two registers rather than just creating an ADD instruction in > memory, emit(ADD()) is no longer necessary. > > v2: Drop scalarizing VEC4 builder. > v3: Take a backend_shader as constructor argument. Improve handling > of debug annotations and execution control flags. > --- > src/mesa/drivers/dri/i965/Makefile.sources | 1 + > src/mesa/drivers/dri/i965/brw_fs_builder.h | 659 > +++++++++++++++++++++++++++++ > 2 files changed, 660 insertions(+) > create mode 100644 src/mesa/drivers/dri/i965/brw_fs_builder.h > > diff --git a/src/mesa/drivers/dri/i965/Makefile.sources > b/src/mesa/drivers/dri/i965/Makefile.sources > index 3f852cd..93f336e 100644 > --- a/src/mesa/drivers/dri/i965/Makefile.sources > +++ b/src/mesa/drivers/dri/i965/Makefile.sources > @@ -42,6 +42,7 @@ i965_FILES = \ > brw_ff_gs.c \ > brw_ff_gs_emit.c \ > brw_ff_gs.h \ > + brw_fs_builder.h \ > brw_fs_channel_expressions.cpp \ > brw_fs_cmod_propagation.cpp \ > brw_fs_combine_constants.cpp \ > diff --git a/src/mesa/drivers/dri/i965/brw_fs_builder.h > b/src/mesa/drivers/dri/i965/brw_fs_builder.h > new file mode 100644 > index 0000000..e528180 > --- /dev/null > +++ b/src/mesa/drivers/dri/i965/brw_fs_builder.h > @@ -0,0 +1,659 @@ > +/* -*- c++ -*- */ > +/* > + * Copyright © 2010-2015 Intel Corporation > + * > + * Permission is hereby granted, free of charge, to any person obtaining a > + * copy of this software and associated documentation files (the "Software"), > + * to deal in the Software without restriction, including without limitation > + * the rights to use, copy, modify, merge, publish, distribute, sublicense, > + * and/or sell copies of the Software, and to permit persons to whom the > + * Software is furnished to do so, subject to the following conditions: > + * > + * The above copyright notice and this permission notice (including the next > + * paragraph) shall be included in all copies or substantial portions of the > + * Software. > + * > + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR > + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, > + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL > + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER > + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING > + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER > DEALINGS > + * IN THE SOFTWARE. > + */ > + > +#ifndef BRW_FS_BUILDER_H > +#define BRW_FS_BUILDER_H > + > +#include "brw_ir_fs.h" > +#include "brw_shader.h" > +#include "brw_context.h" > + > +namespace brw { > + /** > + * Toolbox to assemble an FS IR program out of individual instructions. > + * > + * This object is meant to have an interface consistent with > + * brw::vec4_builder. They cannot be fully interchangeable because > + * brw::fs_builder generates scalar code while brw::vec4_builder generates > + * vector code. > + */ > + class fs_builder { > + public: > + /** Type used in this IR to represent a source of an instruction. */ > + typedef fs_reg src_reg; > + > + /** Type used in this IR to represent the destination of an > instruction. */ > + typedef fs_reg dst_reg; > + > + /** Type used in this IR to represent an instruction. */ > + typedef fs_inst instruction; > + > + /** > + * Construct an fs_builder appending instructions at the end of > + * the instruction list of \p shader. \p dispatch_width gives > + * the native execution width of the program. > + */ > + fs_builder(backend_shader *shader, > + unsigned dispatch_width) : > + shader(shader), block(NULL), > + cursor((exec_node *)&shader->instructions.tail), > + _dispatch_width(dispatch_width), > + _group(0), > + force_writemask_all(false), > + annotation() > + { > + } > + > + /** > + * Construct an fs_builder that inserts instructions before \p cursor > in > + * basic block \p block, inheriting other code generation parameters > + * from this. > + */ > + fs_builder > + at(bblock_t *block, exec_node *cursor) const > + { > + fs_builder bld = *this; > + bld.block = block; > + bld.cursor = cursor; > + return bld; > + } > + > + /** > + * Construct a builder specifying the default SIMD width and group of > + * channel enable signals, inheriting other code generation parameters > + * from this. > + * > + * \p n gives the default SIMD width, \p i gives the slot group used > for > + * predication and control flow masking in multiples of \p n channels. > + */ > + fs_builder > + group(unsigned n, unsigned i) const > + { > + assert(n <= dispatch_width() && > + i < dispatch_width() / n); > + fs_builder bld = *this; > + bld._dispatch_width = n; > + bld._group += i * n; > + return bld; > + } > + > + /** > + * Alias for group() with width equal to eight. > + */ > + fs_builder > + half(unsigned i) const > + { > + return group(8, i); > + } > + > + /** > + * Construct a builder with per-channel control flow execution masking > + * disabled if \p b is true. If control flow execution masking is > + * already disabled this has no effect. > + */ > + fs_builder > + exec_all(bool b = true) const > + { > + fs_builder bld = *this; > + if (b) > + bld.force_writemask_all = true; > + return bld; > + } > + > + /** > + * Construct a builder with the given debug annotation info. > + */ > + fs_builder > + annotate(const char *str, const void *ir = NULL) const > + { > + fs_builder bld = *this; > + bld.annotation.str = str; > + bld.annotation.ir = ir; > + return bld; > + } > + > + /** > + * Get the SIMD width in use. > + */ > + unsigned > + dispatch_width() const > + { > + return _dispatch_width; > + } > + > + /** > + * Allocate a virtual register of natural vector size (one for this IR) > + * and SIMD width. \p n gives the amount of space to allocate in > + * dispatch_width units (which is just enough space for one logical > + * component in this IR). > + */ > + dst_reg > + vgrf(enum brw_reg_type type, unsigned n = 1) const > + { > + return dst_reg(GRF, shader->alloc.allocate( > + DIV_ROUND_UP(n * type_sz(type) * dispatch_width(), > + REG_SIZE)), > + type, dispatch_width()); > + } > + > + /** > + * Create a null register of floating type. > + */ > + dst_reg > + null_reg_f() const > + { > + return dst_reg(retype(brw_null_vec(dispatch_width()), > + BRW_REGISTER_TYPE_F)); > + } > + > + /** > + * Create a null register of signed integer type. > + */ > + dst_reg > + null_reg_d() const > + { > + return dst_reg(retype(brw_null_vec(dispatch_width()), > + BRW_REGISTER_TYPE_D)); > + } > + > + /** > + * Create a null register of unsigned integer type. > + */ > + dst_reg > + null_reg_ud() const > + { > + return dst_reg(retype(brw_null_vec(dispatch_width()), > + BRW_REGISTER_TYPE_UD)); > + } > + > + /** > + * Get the mask of SIMD channels enabled by dispatch and not yet > + * disabled by discard. > + */ > + src_reg > + sample_mask_reg() const > + { > + const bool uses_kill = > + (shader->stage == MESA_SHADER_FRAGMENT && > + ((brw_wm_prog_data *)shader->stage_prog_data)->uses_kill); > + return (shader->stage != MESA_SHADER_FRAGMENT ? src_reg(0xffff) : > + uses_kill ? brw_flag_reg(0, 1) : > + retype(brw_vec1_grf(1, 7), BRW_REGISTER_TYPE_UD)); > + } > + > + /** > + * Insert an instruction into the program. > + */ > + instruction * > + emit(const instruction &inst) const > + { > + return emit(new(shader->mem_ctx) instruction(inst)); > + } > + > + /** > + * Create and insert a nullary control instruction into the program. > + */ > + instruction * > + emit(enum opcode opcode) const > + { > + return emit(instruction(opcode, dispatch_width())); > + } > + > + /** > + * Create and insert a nullary instruction into the program. > + */ > + instruction * > + emit(enum opcode opcode, const dst_reg &dst) const > + { > + return emit(instruction(opcode, dst)); > + } > + > + /** > + * Create and insert a unary instruction into the program. > + */ > + instruction * > + emit(enum opcode opcode, const dst_reg &dst, const src_reg &src0) const > + { > + switch (opcode) { > + case SHADER_OPCODE_RCP: > + case SHADER_OPCODE_RSQ: > + case SHADER_OPCODE_SQRT: > + case SHADER_OPCODE_EXP2: > + case SHADER_OPCODE_LOG2: > + case SHADER_OPCODE_SIN: > + case SHADER_OPCODE_COS: > + return fix_math_instruction( > + emit(instruction(opcode, dst.width, dst, > + fix_math_operand(src0)))); > + > + default: > + return emit(instruction(opcode, dst.width, dst, src0)); > + } > + } > + > + /** > + * Create and insert a binary instruction into the program. > + */ > + instruction * > + emit(enum opcode opcode, const dst_reg &dst, const src_reg &src0, > + const src_reg &src1) const > + { > + switch (opcode) { > + case SHADER_OPCODE_POW: > + case SHADER_OPCODE_INT_QUOTIENT: > + case SHADER_OPCODE_INT_REMAINDER: > + return fix_math_instruction( > + emit(instruction(opcode, dst.width, dst, > + fix_math_operand(src0), > + fix_math_operand(src1)))); > + > + default: > + return emit(instruction(opcode, dst.width, dst, src0, src1)); > + > + } > + } > + > + /** > + * Create and insert a ternary instruction into the program. > + */ > + instruction * > + emit(enum opcode opcode, const dst_reg &dst, const src_reg &src0, > + const src_reg &src1, const src_reg &src2) const > + { > + switch (opcode) { > + case BRW_OPCODE_BFE: > + case BRW_OPCODE_BFI2: > + case BRW_OPCODE_MAD: > + case BRW_OPCODE_LRP: > + return emit(instruction(opcode, dst.width, dst, > + fix_3src_operand(src0), > + fix_3src_operand(src1), > + fix_3src_operand(src2))); > + > + default: > + return emit(instruction(opcode, dst.width, dst, src0, src1, > src2)); > + } > + } > + > + /** > + * Insert a preallocated instruction into the program. > + */ > + instruction * > + emit(instruction *inst) const > + { > + assert(inst->exec_size == dispatch_width() || > + force_writemask_all); > + assert(_group == 0 || _group == 8); > + > + inst->force_sechalf = (_group == 8); > + inst->force_writemask_all = force_writemask_all; > + inst->annotation = annotation.str; > + inst->ir = annotation.ir; > + > + if (block) > + static_cast<instruction *>(cursor)->insert_before(block, inst); > + else > + cursor->insert_before(inst); > + > + return inst; > + } > + > + /** > + * Select \p src0 if the comparison of both sources with the given > + * conditional mod evaluates to true, otherwise select \p src1. > + * > + * Generally useful to get the minimum or maximum of two values. > + */ > + void > + emit_minmax(const dst_reg &dst, const src_reg &src0, > + const src_reg &src1, brw_conditional_mod mod) const
This is unused, after Jason's commit: commit 78644ffc4d341deb431145108f0b2d377e59b61e Author: Jason Ekstrand <jason.ekstr...@intel.com> Date: Wed May 20 10:35:34 2015 -0700 i965/fs: Remove the ir_visitor code Now that everything is running through NIR, this is all dead. Reviewed-by: Matt Turner <matts...@gmail.com> Reviewed-by: Kenneth Graunke <kenn...@whitecape.org> > + { > + if (shader->devinfo->gen >= 6) { > + set_condmod(mod, SEL(dst, fix_unsigned_negate(src0), > + fix_unsigned_negate(src1))); > + } else { > + CMP(null_reg_d(), src0, src1, mod); > + set_predicate(BRW_PREDICATE_NORMAL, > + SEL(dst, src0, src1)); > + } > + } > + > + /** > + * Copy any live channel from \p src to the first channel of \p dst. > + */ > + void > + emit_uniformize(const dst_reg &dst, const src_reg &src) const > + { > + const fs_builder ubld = exec_all(); > + const dst_reg chan_index = vgrf(BRW_REGISTER_TYPE_UD); > + > + ubld.emit(SHADER_OPCODE_FIND_LIVE_CHANNEL, component(chan_index, > 0)); > + ubld.emit(SHADER_OPCODE_BROADCAST, component(dst, 0), > + src, component(chan_index, 0)); > + } > + > + /** > + * Assorted arithmetic ops. > + * @{ > + */ > +#define ALU1(op) \ > + instruction * \ > + op(const dst_reg &dst, const src_reg &src0) const \ > + { \ > + return emit(BRW_OPCODE_##op, dst, src0); \ > + } > + > +#define ALU2(op) \ > + instruction * \ > + op(const dst_reg &dst, const src_reg &src0, const src_reg &src1) const > \ > + { \ > + return emit(BRW_OPCODE_##op, dst, src0, src1); \ > + } > + > +#define ALU2_ACC(op) \ > + instruction * \ > + op(const dst_reg &dst, const src_reg &src0, const src_reg &src1) const > \ > + { \ > + instruction *inst = emit(BRW_OPCODE_##op, dst, src0, src1); \ > + inst->writes_accumulator = true; \ > + return inst; \ > + } > + > +#define ALU3(op) \ > + instruction * \ > + op(const dst_reg &dst, const src_reg &src0, const src_reg &src1, \ > + const src_reg &src2) const \ > + { \ > + return emit(BRW_OPCODE_##op, dst, src0, src1, src2); \ > + } > + > + ALU2(ADD) > + ALU2_ACC(ADDC) > + ALU2(AND) > + ALU2(ASR) > + ALU2(AVG) > + ALU3(BFE) > + ALU2(BFI1) > + ALU3(BFI2) > + ALU1(BFREV) > + ALU1(CBIT) > + ALU2(CMPN) > + ALU3(CSEL) > + ALU2(DP2) > + ALU2(DP3) > + ALU2(DP4) > + ALU2(DPH) > + ALU1(F16TO32) > + ALU1(F32TO16) > + ALU1(FBH) > + ALU1(FBL) > + ALU1(FRC) > + ALU2(LINE) > + ALU1(LZD) > + ALU2(MAC) > + ALU2_ACC(MACH) > + ALU3(MAD) > + ALU1(MOV) > + ALU2(MUL) > + ALU1(NOT) > + ALU2(OR) > + ALU2(PLN) > + ALU1(RNDD) > + ALU1(RNDE) > + ALU1(RNDU) > + ALU1(RNDZ) > + ALU2(SAD2) > + ALU2_ACC(SADA2) > + ALU2(SEL) > + ALU2(SHL) > + ALU2(SHR) > + ALU2_ACC(SUBB) > + ALU2(XOR) > + > +#undef ALU3 > +#undef ALU2_ACC > +#undef ALU2 > +#undef ALU1 > + /** @} */ > + > + /** > + * CMP: Sets the low bit of the destination channels with the result > + * of the comparison, while the upper bits are undefined, and updates > + * the flag register with the packed 16 bits of the result. > + */ > + instruction * > + CMP(const dst_reg &dst, const src_reg &src0, const src_reg &src1, > + brw_conditional_mod condition) const > + { > + /* Take the instruction: > + * > + * CMP null<d> src0<f> src1<f> > + * > + * Original gen4 does type conversion to the destination type > + * before comparison, producing garbage results for floating > + * point comparisons. > + * > + * The destination type doesn't matter on newer generations, > + * so we set the type to match src0 so we can compact the > + * instruction. > + */ > + return set_condmod(condition, > + emit(BRW_OPCODE_CMP, retype(dst, src0.type), > + fix_unsigned_negate(src0), > + fix_unsigned_negate(src1))); > + } > + > + /** > + * Gen4 predicated IF. > + */ > + instruction * > + IF(brw_predicate predicate) const > + { > + instruction *inst = emit(BRW_OPCODE_IF); > + return set_predicate(predicate, inst); > + } > + > + /** > + * Gen6 IF with embedded comparison. > + */ > + instruction * > + IF(const src_reg &src0, const src_reg &src1, > + brw_conditional_mod condition) const > + { > + assert(shader->devinfo->gen == 6); > + return set_condmod(condition, > + emit(BRW_OPCODE_IF, > + null_reg_d(), > + fix_unsigned_negate(src0), > + fix_unsigned_negate(src1))); I don't see that we were calling resolve_ud_negate() on fs_visitor::emit_if_gen6. In fact, we've removed the function entirely (a known regression in the switch to NIR). I'd drop this for now. It's unused as well. > + } > + > + /** > + * Emit a linear interpolation instruction. > + */ > + instruction * > + LRP(const dst_reg &dst, const src_reg &x, const src_reg &y, > + const src_reg &a) const > + { > + if (shader->devinfo->gen >= 6) { > + /* The LRP instruction actually does op1 * op0 + op2 * (1 - > op0), so > + * we need to reorder the operands. > + */ > + return emit(BRW_OPCODE_LRP, dst, a, y, x); > + > + } else { > + /* We can't use the LRP instruction. Emit x*(1-a) + y*a. */ > + const dst_reg y_times_a = vgrf(dst.type); > + const dst_reg one_minus_a = vgrf(dst.type); > + const dst_reg x_times_one_minus_a = vgrf(dst.type); > + > + MUL(y_times_a, y, a); > + ADD(one_minus_a, negate(a), src_reg(1.0f)); > + MUL(x_times_one_minus_a, x, src_reg(one_minus_a)); > + return ADD(dst, src_reg(x_times_one_minus_a), > src_reg(y_times_a)); > + } > + } > + > + /** > + * Collect a number of registers in a contiguous range of registers. > + */ > + instruction * > + LOAD_PAYLOAD(const dst_reg &dst, const src_reg *src, > + unsigned sources, unsigned header_size) const > + { > + assert(dst.width % 8 == 0); > + instruction *inst = emit(instruction(SHADER_OPCODE_LOAD_PAYLOAD, > + dst.width, dst, src, sources)); > + inst->header_size = header_size; > + > + for (unsigned i = 0; i < header_size; i++) > + assert(src[i].file != GRF || > + src[i].width * type_sz(src[i].type) == 32); > + inst->regs_written = header_size; > + > + for (unsigned i = header_size; i < sources; ++i) > + assert(src[i].file != GRF || > + src[i].width == dst.width); > + inst->regs_written += (sources - header_size) * (dst.width / 8); > + > + return inst; > + } > + > + backend_shader *shader; > + > + private: > + /** > + * Workaround for negation of UD registers. See comment in > + * fs_generator::generate_code() for more details. > + */ > + src_reg > + fix_unsigned_negate(const src_reg &src) const > + { > + if (src.type == BRW_REGISTER_TYPE_UD && > + src.negate) { > + dst_reg temp = vgrf(BRW_REGISTER_TYPE_UD); > + MOV(temp, src); > + return src_reg(temp); > + } else { > + return src; > + } > + } > + > + /** > + * Workaround for source register modes not supported by the ternary > + * instruction encoding. > + */ > + src_reg > + fix_3src_operand(const src_reg &src) const > + { > + if (src.file == GRF || src.file == UNIFORM || src.stride > 1) { > + return src; > + } else { > + dst_reg expanded = vgrf(src.type); > + MOV(expanded, src); > + return expanded; > + } > + } > + > + /** > + * Workaround for source register modes not supported by the math > + * instruction. > + */ > + src_reg > + fix_math_operand(const src_reg &src) const > + { > + /* Can't do hstride == 0 args on gen6 math, so expand it out. We > + * might be able to do better by doing execsize = 1 math and then > + * expanding that result out, but we would need to be careful with > + * masking. > + * > + * Gen6 hardware ignores source modifiers (negate and abs) on math > + * instructions, so we also move to a temp to set those up. > + * > + * Gen7 relaxes most of the above restrictions, but still can't use > IMM > + * operands to math > + */ > + if ((shader->devinfo->gen == 6 && > + (src.file == IMM || src.file == UNIFORM || > + src.abs || src.negate)) || > + (shader->devinfo->gen == 7 && src.file == IMM)) { > + const dst_reg tmp = vgrf(src.type); > + MOV(tmp, src); > + return tmp; > + } else { > + return src; > + } > + } > + > + /** > + * Workaround other weirdness of the math instruction. > + */ > + instruction * > + fix_math_instruction(instruction *instr) const Is there a reason this is 'instr' instead of 'inst'? If not, let's rename it 'inst'. > + { > + if (shader->devinfo->gen < 6) { > + instr->base_mrf = 2; > + instr->mlen = instr->sources * dispatch_width() / 8; > + > + if (instr->sources > 1) { > + /* From the Ironlake PRM, Volume 4, Part 1, Section 6.1.13 > + * "Message Payload": > + * > + * "Operand0[7]. For the INT DIV functions, this operand is > the > + * denominator." > + * ... > + * "Operand1[7]. For the INT DIV functions, this operand is > the > + * numerator." > + */ > + const bool is_int_div = instr->opcode != SHADER_OPCODE_POW; > + const fs_reg src0 = is_int_div ? instr->src[1] : > instr->src[0]; > + const fs_reg src1 = is_int_div ? instr->src[0] : > instr->src[1]; > + > + instr->resize_sources(1); > + instr->src[0] = src0; > + > + at(block, instr).MOV(fs_reg(MRF, instr->base_mrf + 1, > src1.type, > + dispatch_width()), src1); > + } > + } > + > + return instr; > + } _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev