On Fri, Jun 5, 2015 at 1:42 PM, Francisco Jerez <curroje...@riseup.net> wrote: > Matt Turner <matts...@gmail.com> writes: > >> On Thu, Jun 4, 2015 at 9:04 AM, Francisco Jerez <curroje...@riseup.net> >> wrote: >>> The purpose of this change is threefold: First, it improves the >>> modularity of the compiler back-end by separating the functionality >>> required to construct an i965 IR program from the rest of the visitor >>> god-object, what in turn will reduce the coupling between other >>> components and the visitor allowing a more modular design. This patch >>> doesn't yet remove the equivalent functionality from the visitor >>> classes, that task will be undertaken by a separate series, as it >>> involves major back-end surgery. >>> >>> Second, it improves consistency between the scalar and vector >>> back-ends. The FS and VEC4 builders can both be used to generate >>> scalar code with a compatible interface or they can be used to >>> generate natural vector width code -- 1 or 4 components respectively. >>> >>> Third, the approach to IR construction is somewhat different to what >>> the visitor classes currently do. All parameters affecting code >>> generation (execution size, half control, point in the program where >>> new instructions are inserted, etc.) are encapsulated in a stand-alone >>> object rather than being quasi-global state (yes, anything defined in >>> one of the visitor classes is effectively global due to the tight >>> coupling with virtually everything else in the compiler back-end). >>> This object is lightweight and can be copied, mutated and passed >>> around, making helper IR-building functions more flexible because they >>> can now simply take a builder object as argument and will inherit its >>> IR generation properties in exactly the same way that a discrete >>> instruction would from the same builder object. >>> >>> The emit_typed_write() function from my image-load-store branch is an >>> example that illustrates the usefulness of the latter point: Due to >>> hardware limitations the function may have to split the untyped >>> surface message in 8-wide chunks. That means that the several >>> functions called to help with the construction of the message payload >>> are themselves required to set the execution width and half control >>> correctly on the instructions they emit, and to allocate all registers >>> with half the default width. With the previous approach this would >>> require the used helper functions to be aware of the parameters that >>> might differ from the default state and explicitly set the instruction >>> bits accordingly. With the new approach they would get a modified >>> builder object as argument that would influence all instructions >>> emitted by the helper function as if it were the default state. >>> >>> Another example is the fs_visitor::VARYING_PULL_CONSTANT_LOAD() >>> method. It doesn't actually emit any instructions, they are simply >>> created and inserted into an exec_list which is returned for the >>> caller to emit at some location of the program. This sort of two-step >>> emission becomes unnecessary with the builder interface because the >>> insertion point is one more of the code generation parameters which >>> are part of the builder object. The caller can simply pass >>> VARYING_PULL_CONSTANT_LOAD() a modified builder object pointing at the >>> location of the program where the effect of the constant load is >>> desired. This two-step emission (which pervades the compiler back-end >>> and is in most cases redundant) goes away: E.g. ADD() now actually >>> adds two registers rather than just creating an ADD instruction in >>> memory, emit(ADD()) is no longer necessary. >>> >>> v2: Drop scalarizing VEC4 builder. >>> v3: Take a backend_shader as constructor argument. Improve handling >>> of debug annotations and execution control flags. >>> --- >>> src/mesa/drivers/dri/i965/Makefile.sources | 1 + >>> src/mesa/drivers/dri/i965/brw_fs_builder.h | 659 >>> +++++++++++++++++++++++++++++ >>> 2 files changed, 660 insertions(+) >>> create mode 100644 src/mesa/drivers/dri/i965/brw_fs_builder.h >>> >>> diff --git a/src/mesa/drivers/dri/i965/Makefile.sources >>> b/src/mesa/drivers/dri/i965/Makefile.sources >>> index 3f852cd..93f336e 100644 >>> --- a/src/mesa/drivers/dri/i965/Makefile.sources >>> +++ b/src/mesa/drivers/dri/i965/Makefile.sources >>> @@ -42,6 +42,7 @@ i965_FILES = \ >>> brw_ff_gs.c \ >>> brw_ff_gs_emit.c \ >>> brw_ff_gs.h \ >>> + brw_fs_builder.h \ >>> brw_fs_channel_expressions.cpp \ >>> brw_fs_cmod_propagation.cpp \ >>> brw_fs_combine_constants.cpp \ >>> diff --git a/src/mesa/drivers/dri/i965/brw_fs_builder.h >>> b/src/mesa/drivers/dri/i965/brw_fs_builder.h >>> new file mode 100644 >>> index 0000000..e528180 >>> --- /dev/null >>> +++ b/src/mesa/drivers/dri/i965/brw_fs_builder.h >>> @@ -0,0 +1,659 @@ >>> +/* -*- c++ -*- */ >>> +/* >>> + * Copyright © 2010-2015 Intel Corporation >>> + * >>> + * Permission is hereby granted, free of charge, to any person obtaining a >>> + * copy of this software and associated documentation files (the >>> "Software"), >>> + * to deal in the Software without restriction, including without >>> limitation >>> + * the rights to use, copy, modify, merge, publish, distribute, sublicense, >>> + * and/or sell copies of the Software, and to permit persons to whom the >>> + * Software is furnished to do so, subject to the following conditions: >>> + * >>> + * The above copyright notice and this permission notice (including the >>> next >>> + * paragraph) shall be included in all copies or substantial portions of >>> the >>> + * Software. >>> + * >>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS >>> OR >>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, >>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL >>> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR >>> OTHER >>> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING >>> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER >>> DEALINGS >>> + * IN THE SOFTWARE. >>> + */ >>> + >>> +#ifndef BRW_FS_BUILDER_H >>> +#define BRW_FS_BUILDER_H >>> + >>> +#include "brw_ir_fs.h" >>> +#include "brw_shader.h" >>> +#include "brw_context.h" >>> + >>> +namespace brw { >>> + /** >>> + * Toolbox to assemble an FS IR program out of individual instructions. >>> + * >>> + * This object is meant to have an interface consistent with >>> + * brw::vec4_builder. They cannot be fully interchangeable because >>> + * brw::fs_builder generates scalar code while brw::vec4_builder >>> generates >>> + * vector code. >>> + */ >>> + class fs_builder { >>> + public: >>> + /** Type used in this IR to represent a source of an instruction. */ >>> + typedef fs_reg src_reg; >>> + >>> + /** Type used in this IR to represent the destination of an >>> instruction. */ >>> + typedef fs_reg dst_reg; >>> + >>> + /** Type used in this IR to represent an instruction. */ >>> + typedef fs_inst instruction; >>> + >>> + /** >>> + * Construct an fs_builder appending instructions at the end of >>> + * the instruction list of \p shader. \p dispatch_width gives >>> + * the native execution width of the program. >>> + */ >>> + fs_builder(backend_shader *shader, >>> + unsigned dispatch_width) : >>> + shader(shader), block(NULL), >>> + cursor((exec_node *)&shader->instructions.tail), >>> + _dispatch_width(dispatch_width), >>> + _group(0), >>> + force_writemask_all(false), >>> + annotation() >>> + { >>> + } >>> + >>> + /** >>> + * Construct an fs_builder that inserts instructions before \p >>> cursor in >>> + * basic block \p block, inheriting other code generation parameters >>> + * from this. >>> + */ >>> + fs_builder >>> + at(bblock_t *block, exec_node *cursor) const >>> + { >>> + fs_builder bld = *this; >>> + bld.block = block; >>> + bld.cursor = cursor; >>> + return bld; >>> + } >>> + >>> + /** >>> + * Construct a builder specifying the default SIMD width and group of >>> + * channel enable signals, inheriting other code generation >>> parameters >>> + * from this. >>> + * >>> + * \p n gives the default SIMD width, \p i gives the slot group used >>> for >>> + * predication and control flow masking in multiples of \p n >>> channels. >>> + */ >>> + fs_builder >>> + group(unsigned n, unsigned i) const >>> + { >>> + assert(n <= dispatch_width() && >>> + i < dispatch_width() / n); >>> + fs_builder bld = *this; >>> + bld._dispatch_width = n; >>> + bld._group += i * n; >>> + return bld; >>> + } >>> + >>> + /** >>> + * Alias for group() with width equal to eight. >>> + */ >>> + fs_builder >>> + half(unsigned i) const >>> + { >>> + return group(8, i); >>> + } >>> + >>> + /** >>> + * Construct a builder with per-channel control flow execution >>> masking >>> + * disabled if \p b is true. If control flow execution masking is >>> + * already disabled this has no effect. >>> + */ >>> + fs_builder >>> + exec_all(bool b = true) const >>> + { >>> + fs_builder bld = *this; >>> + if (b) >>> + bld.force_writemask_all = true; >>> + return bld; >>> + } >>> + >>> + /** >>> + * Construct a builder with the given debug annotation info. >>> + */ >>> + fs_builder >>> + annotate(const char *str, const void *ir = NULL) const >>> + { >>> + fs_builder bld = *this; >>> + bld.annotation.str = str; >>> + bld.annotation.ir = ir; >>> + return bld; >>> + } >>> + >>> + /** >>> + * Get the SIMD width in use. >>> + */ >>> + unsigned >>> + dispatch_width() const >>> + { >>> + return _dispatch_width; >>> + } >>> + >>> + /** >>> + * Allocate a virtual register of natural vector size (one for this >>> IR) >>> + * and SIMD width. \p n gives the amount of space to allocate in >>> + * dispatch_width units (which is just enough space for one logical >>> + * component in this IR). >>> + */ >>> + dst_reg >>> + vgrf(enum brw_reg_type type, unsigned n = 1) const >>> + { >>> + return dst_reg(GRF, shader->alloc.allocate( >>> + DIV_ROUND_UP(n * type_sz(type) * >>> dispatch_width(), >>> + REG_SIZE)), >>> + type, dispatch_width()); >>> + } >>> + >>> + /** >>> + * Create a null register of floating type. >>> + */ >>> + dst_reg >>> + null_reg_f() const >>> + { >>> + return dst_reg(retype(brw_null_vec(dispatch_width()), >>> + BRW_REGISTER_TYPE_F)); >>> + } >>> + >>> + /** >>> + * Create a null register of signed integer type. >>> + */ >>> + dst_reg >>> + null_reg_d() const >>> + { >>> + return dst_reg(retype(brw_null_vec(dispatch_width()), >>> + BRW_REGISTER_TYPE_D)); >>> + } >>> + >>> + /** >>> + * Create a null register of unsigned integer type. >>> + */ >>> + dst_reg >>> + null_reg_ud() const >>> + { >>> + return dst_reg(retype(brw_null_vec(dispatch_width()), >>> + BRW_REGISTER_TYPE_UD)); >>> + } >>> + >>> + /** >>> + * Get the mask of SIMD channels enabled by dispatch and not yet >>> + * disabled by discard. >>> + */ >>> + src_reg >>> + sample_mask_reg() const >>> + { >>> + const bool uses_kill = >>> + (shader->stage == MESA_SHADER_FRAGMENT && >>> + ((brw_wm_prog_data *)shader->stage_prog_data)->uses_kill); >>> + return (shader->stage != MESA_SHADER_FRAGMENT ? src_reg(0xffff) : >>> + uses_kill ? brw_flag_reg(0, 1) : >>> + retype(brw_vec1_grf(1, 7), BRW_REGISTER_TYPE_UD)); >>> + } >>> + >>> + /** >>> + * Insert an instruction into the program. >>> + */ >>> + instruction * >>> + emit(const instruction &inst) const >>> + { >>> + return emit(new(shader->mem_ctx) instruction(inst)); >>> + } >>> + >>> + /** >>> + * Create and insert a nullary control instruction into the program. >>> + */ >>> + instruction * >>> + emit(enum opcode opcode) const >>> + { >>> + return emit(instruction(opcode, dispatch_width())); >>> + } >>> + >>> + /** >>> + * Create and insert a nullary instruction into the program. >>> + */ >>> + instruction * >>> + emit(enum opcode opcode, const dst_reg &dst) const >>> + { >>> + return emit(instruction(opcode, dst)); >>> + } >>> + >>> + /** >>> + * Create and insert a unary instruction into the program. >>> + */ >>> + instruction * >>> + emit(enum opcode opcode, const dst_reg &dst, const src_reg &src0) >>> const >>> + { >>> + switch (opcode) { >>> + case SHADER_OPCODE_RCP: >>> + case SHADER_OPCODE_RSQ: >>> + case SHADER_OPCODE_SQRT: >>> + case SHADER_OPCODE_EXP2: >>> + case SHADER_OPCODE_LOG2: >>> + case SHADER_OPCODE_SIN: >>> + case SHADER_OPCODE_COS: >>> + return fix_math_instruction( >>> + emit(instruction(opcode, dst.width, dst, >>> + fix_math_operand(src0)))); >>> + >>> + default: >>> + return emit(instruction(opcode, dst.width, dst, src0)); >>> + } >>> + } >>> + >>> + /** >>> + * Create and insert a binary instruction into the program. >>> + */ >>> + instruction * >>> + emit(enum opcode opcode, const dst_reg &dst, const src_reg &src0, >>> + const src_reg &src1) const >>> + { >>> + switch (opcode) { >>> + case SHADER_OPCODE_POW: >>> + case SHADER_OPCODE_INT_QUOTIENT: >>> + case SHADER_OPCODE_INT_REMAINDER: >>> + return fix_math_instruction( >>> + emit(instruction(opcode, dst.width, dst, >>> + fix_math_operand(src0), >>> + fix_math_operand(src1)))); >>> + >>> + default: >>> + return emit(instruction(opcode, dst.width, dst, src0, src1)); >>> + >>> + } >>> + } >>> + >>> + /** >>> + * Create and insert a ternary instruction into the program. >>> + */ >>> + instruction * >>> + emit(enum opcode opcode, const dst_reg &dst, const src_reg &src0, >>> + const src_reg &src1, const src_reg &src2) const >>> + { >>> + switch (opcode) { >>> + case BRW_OPCODE_BFE: >>> + case BRW_OPCODE_BFI2: >>> + case BRW_OPCODE_MAD: >>> + case BRW_OPCODE_LRP: >>> + return emit(instruction(opcode, dst.width, dst, >>> + fix_3src_operand(src0), >>> + fix_3src_operand(src1), >>> + fix_3src_operand(src2))); >>> + >>> + default: >>> + return emit(instruction(opcode, dst.width, dst, src0, src1, >>> src2)); >>> + } >>> + } >>> + >>> + /** >>> + * Insert a preallocated instruction into the program. >>> + */ >>> + instruction * >>> + emit(instruction *inst) const >>> + { >>> + assert(inst->exec_size == dispatch_width() || >>> + force_writemask_all); >>> + assert(_group == 0 || _group == 8); >>> + >>> + inst->force_sechalf = (_group == 8); >>> + inst->force_writemask_all = force_writemask_all; >>> + inst->annotation = annotation.str; >>> + inst->ir = annotation.ir; >>> + >>> + if (block) >>> + static_cast<instruction *>(cursor)->insert_before(block, inst); >>> + else >>> + cursor->insert_before(inst); >>> + >>> + return inst; >>> + } >>> + >>> + /** >>> + * Select \p src0 if the comparison of both sources with the given >>> + * conditional mod evaluates to true, otherwise select \p src1. >>> + * >>> + * Generally useful to get the minimum or maximum of two values. >>> + */ >>> + void >>> + emit_minmax(const dst_reg &dst, const src_reg &src0, >>> + const src_reg &src1, brw_conditional_mod mod) const >> >> This is unused, after Jason's commit: >> >> commit 78644ffc4d341deb431145108f0b2d377e59b61e >> Author: Jason Ekstrand <jason.ekstr...@intel.com> >> Date: Wed May 20 10:35:34 2015 -0700 >> >> i965/fs: Remove the ir_visitor code >> >> Now that everything is running through NIR, this is all dead. >> >> Reviewed-by: Matt Turner <matts...@gmail.com> >> Reviewed-by: Kenneth Graunke <kenn...@whitecape.org> >> > > I was planning to use it in my ARB_shader_image_load_store series. It > may also make sense to fix brw_fs_nir.cpp to use it, as it would > probably simplify the code slightly. I can make the changes as part of > PATCH 32 if you like.
Oh, okay. I didn't realize that brw_fs_nir.cpp had it open-coded. Yeah, it's fine to leave it in here. > >>> + { >>> + if (shader->devinfo->gen >= 6) { >>> + set_condmod(mod, SEL(dst, fix_unsigned_negate(src0), >>> + fix_unsigned_negate(src1))); >>> + } else { >>> + CMP(null_reg_d(), src0, src1, mod); >>> + set_predicate(BRW_PREDICATE_NORMAL, >>> + SEL(dst, src0, src1)); >>> + } >>> + } >>> + >>> + /** >>> + * Copy any live channel from \p src to the first channel of \p dst. >>> + */ >>> + void >>> + emit_uniformize(const dst_reg &dst, const src_reg &src) const >>> + { >>> + const fs_builder ubld = exec_all(); >>> + const dst_reg chan_index = vgrf(BRW_REGISTER_TYPE_UD); >>> + >>> + ubld.emit(SHADER_OPCODE_FIND_LIVE_CHANNEL, component(chan_index, >>> 0)); >>> + ubld.emit(SHADER_OPCODE_BROADCAST, component(dst, 0), >>> + src, component(chan_index, 0)); >>> + } >>> + >>> + /** >>> + * Assorted arithmetic ops. >>> + * @{ >>> + */ >>> +#define ALU1(op) \ >>> + instruction * \ >>> + op(const dst_reg &dst, const src_reg &src0) const \ >>> + { \ >>> + return emit(BRW_OPCODE_##op, dst, src0); \ >>> + } >>> + >>> +#define ALU2(op) \ >>> + instruction * \ >>> + op(const dst_reg &dst, const src_reg &src0, const src_reg &src1) >>> const \ >>> + { \ >>> + return emit(BRW_OPCODE_##op, dst, src0, src1); \ >>> + } >>> + >>> +#define ALU2_ACC(op) \ >>> + instruction * \ >>> + op(const dst_reg &dst, const src_reg &src0, const src_reg &src1) >>> const \ >>> + { \ >>> + instruction *inst = emit(BRW_OPCODE_##op, dst, src0, src1); \ >>> + inst->writes_accumulator = true; \ >>> + return inst; \ >>> + } >>> + >>> +#define ALU3(op) \ >>> + instruction * \ >>> + op(const dst_reg &dst, const src_reg &src0, const src_reg &src1, \ >>> + const src_reg &src2) const \ >>> + { \ >>> + return emit(BRW_OPCODE_##op, dst, src0, src1, src2); \ >>> + } >>> + >>> + ALU2(ADD) >>> + ALU2_ACC(ADDC) >>> + ALU2(AND) >>> + ALU2(ASR) >>> + ALU2(AVG) >>> + ALU3(BFE) >>> + ALU2(BFI1) >>> + ALU3(BFI2) >>> + ALU1(BFREV) >>> + ALU1(CBIT) >>> + ALU2(CMPN) >>> + ALU3(CSEL) >>> + ALU2(DP2) >>> + ALU2(DP3) >>> + ALU2(DP4) >>> + ALU2(DPH) >>> + ALU1(F16TO32) >>> + ALU1(F32TO16) >>> + ALU1(FBH) >>> + ALU1(FBL) >>> + ALU1(FRC) >>> + ALU2(LINE) >>> + ALU1(LZD) >>> + ALU2(MAC) >>> + ALU2_ACC(MACH) >>> + ALU3(MAD) >>> + ALU1(MOV) >>> + ALU2(MUL) >>> + ALU1(NOT) >>> + ALU2(OR) >>> + ALU2(PLN) >>> + ALU1(RNDD) >>> + ALU1(RNDE) >>> + ALU1(RNDU) >>> + ALU1(RNDZ) >>> + ALU2(SAD2) >>> + ALU2_ACC(SADA2) >>> + ALU2(SEL) >>> + ALU2(SHL) >>> + ALU2(SHR) >>> + ALU2_ACC(SUBB) >>> + ALU2(XOR) >>> + >>> +#undef ALU3 >>> +#undef ALU2_ACC >>> +#undef ALU2 >>> +#undef ALU1 >>> + /** @} */ >>> + >>> + /** >>> + * CMP: Sets the low bit of the destination channels with the result >>> + * of the comparison, while the upper bits are undefined, and updates >>> + * the flag register with the packed 16 bits of the result. >>> + */ >>> + instruction * >>> + CMP(const dst_reg &dst, const src_reg &src0, const src_reg &src1, >>> + brw_conditional_mod condition) const >>> + { >>> + /* Take the instruction: >>> + * >>> + * CMP null<d> src0<f> src1<f> >>> + * >>> + * Original gen4 does type conversion to the destination type >>> + * before comparison, producing garbage results for floating >>> + * point comparisons. >>> + * >>> + * The destination type doesn't matter on newer generations, >>> + * so we set the type to match src0 so we can compact the >>> + * instruction. >>> + */ >>> + return set_condmod(condition, >>> + emit(BRW_OPCODE_CMP, retype(dst, src0.type), >>> + fix_unsigned_negate(src0), >>> + fix_unsigned_negate(src1))); >>> + } >>> + >>> + /** >>> + * Gen4 predicated IF. >>> + */ >>> + instruction * >>> + IF(brw_predicate predicate) const >>> + { >>> + instruction *inst = emit(BRW_OPCODE_IF); >>> + return set_predicate(predicate, inst); >>> + } >>> + >>> + /** >>> + * Gen6 IF with embedded comparison. >>> + */ >>> + instruction * >>> + IF(const src_reg &src0, const src_reg &src1, >>> + brw_conditional_mod condition) const >>> + { >>> + assert(shader->devinfo->gen == 6); >>> + return set_condmod(condition, >>> + emit(BRW_OPCODE_IF, >>> + null_reg_d(), >>> + fix_unsigned_negate(src0), >>> + fix_unsigned_negate(src1))); >> >> I don't see that we were calling resolve_ud_negate() on >> fs_visitor::emit_if_gen6. >> > Right, but shouldn't any instruction comparing unsigned values which are > potentially negated call resolve_ud_negate()? From looking at the Sandybridge PRM, the IF with embedded conditional cannot do source modifiers, so I think the answer is no. >> In fact, we've removed the function entirely (a known regression in >> the switch to NIR). I'd drop this for now. It's unused as well. >> > If there are still plans to restore this functionality, I would rather > leave it alone rather than remove it now for somebody else to have to > rewrite it later on, because it's less work and doesn't seem to hurt. But moreover, again I don't like hidden changes like this in what is sold as a refactor. Even if it had been a bug in the original code, this patch would *not* have been the appropriate place to fix it. Lets remove the function and restore it when its actually used. _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev