On Fri, Jun 5, 2015 at 4:13 PM, Francisco Jerez <curroje...@riseup.net> wrote: > Matt Turner <matts...@gmail.com> writes: > >> On Fri, Jun 5, 2015 at 1:42 PM, Francisco Jerez <curroje...@riseup.net> >> wrote: >>> Matt Turner <matts...@gmail.com> writes: >>> >>>> On Thu, Jun 4, 2015 at 9:04 AM, Francisco Jerez <curroje...@riseup.net> >>>> wrote: >>>>> The purpose of this change is threefold: First, it improves the >>>>> modularity of the compiler back-end by separating the functionality >>>>> required to construct an i965 IR program from the rest of the visitor >>>>> god-object, what in turn will reduce the coupling between other >>>>> components and the visitor allowing a more modular design. This patch >>>>> doesn't yet remove the equivalent functionality from the visitor >>>>> classes, that task will be undertaken by a separate series, as it >>>>> involves major back-end surgery. >>>>> >>>>> Second, it improves consistency between the scalar and vector >>>>> back-ends. The FS and VEC4 builders can both be used to generate >>>>> scalar code with a compatible interface or they can be used to >>>>> generate natural vector width code -- 1 or 4 components respectively. >>>>> >>>>> Third, the approach to IR construction is somewhat different to what >>>>> the visitor classes currently do. All parameters affecting code >>>>> generation (execution size, half control, point in the program where >>>>> new instructions are inserted, etc.) are encapsulated in a stand-alone >>>>> object rather than being quasi-global state (yes, anything defined in >>>>> one of the visitor classes is effectively global due to the tight >>>>> coupling with virtually everything else in the compiler back-end). >>>>> This object is lightweight and can be copied, mutated and passed >>>>> around, making helper IR-building functions more flexible because they >>>>> can now simply take a builder object as argument and will inherit its >>>>> IR generation properties in exactly the same way that a discrete >>>>> instruction would from the same builder object. >>>>> >>>>> The emit_typed_write() function from my image-load-store branch is an >>>>> example that illustrates the usefulness of the latter point: Due to >>>>> hardware limitations the function may have to split the untyped >>>>> surface message in 8-wide chunks. That means that the several >>>>> functions called to help with the construction of the message payload >>>>> are themselves required to set the execution width and half control >>>>> correctly on the instructions they emit, and to allocate all registers >>>>> with half the default width. With the previous approach this would >>>>> require the used helper functions to be aware of the parameters that >>>>> might differ from the default state and explicitly set the instruction >>>>> bits accordingly. With the new approach they would get a modified >>>>> builder object as argument that would influence all instructions >>>>> emitted by the helper function as if it were the default state. >>>>> >>>>> Another example is the fs_visitor::VARYING_PULL_CONSTANT_LOAD() >>>>> method. It doesn't actually emit any instructions, they are simply >>>>> created and inserted into an exec_list which is returned for the >>>>> caller to emit at some location of the program. This sort of two-step >>>>> emission becomes unnecessary with the builder interface because the >>>>> insertion point is one more of the code generation parameters which >>>>> are part of the builder object. The caller can simply pass >>>>> VARYING_PULL_CONSTANT_LOAD() a modified builder object pointing at the >>>>> location of the program where the effect of the constant load is >>>>> desired. This two-step emission (which pervades the compiler back-end >>>>> and is in most cases redundant) goes away: E.g. ADD() now actually >>>>> adds two registers rather than just creating an ADD instruction in >>>>> memory, emit(ADD()) is no longer necessary. >>>>> >>>>> v2: Drop scalarizing VEC4 builder. >>>>> v3: Take a backend_shader as constructor argument. Improve handling >>>>> of debug annotations and execution control flags. >>>>> --- >>>>> src/mesa/drivers/dri/i965/Makefile.sources | 1 + >>>>> src/mesa/drivers/dri/i965/brw_fs_builder.h | 659 >>>>> +++++++++++++++++++++++++++++ >>>>> 2 files changed, 660 insertions(+) >>>>> create mode 100644 src/mesa/drivers/dri/i965/brw_fs_builder.h >>>>> >>>>> diff --git a/src/mesa/drivers/dri/i965/Makefile.sources >>>>> b/src/mesa/drivers/dri/i965/Makefile.sources >>>>> index 3f852cd..93f336e 100644 >>>>> --- a/src/mesa/drivers/dri/i965/Makefile.sources >>>>> +++ b/src/mesa/drivers/dri/i965/Makefile.sources >>>>> @@ -42,6 +42,7 @@ i965_FILES = \ >>>>> brw_ff_gs.c \ >>>>> brw_ff_gs_emit.c \ >>>>> brw_ff_gs.h \ >>>>> + brw_fs_builder.h \ >>>>> brw_fs_channel_expressions.cpp \ >>>>> brw_fs_cmod_propagation.cpp \ >>>>> brw_fs_combine_constants.cpp \ >>>>> diff --git a/src/mesa/drivers/dri/i965/brw_fs_builder.h >>>>> b/src/mesa/drivers/dri/i965/brw_fs_builder.h >>>>> new file mode 100644 >>>>> index 0000000..e528180 >>>>> --- /dev/null >>>>> +++ b/src/mesa/drivers/dri/i965/brw_fs_builder.h >>>>> @@ -0,0 +1,659 @@ >>>>> +/* -*- c++ -*- */ >>>>> +/* >>>>> + * Copyright © 2010-2015 Intel Corporation >>>>> + * >>>>> + * Permission is hereby granted, free of charge, to any person obtaining >>>>> a >>>>> + * copy of this software and associated documentation files (the >>>>> "Software"), >>>>> + * to deal in the Software without restriction, including without >>>>> limitation >>>>> + * the rights to use, copy, modify, merge, publish, distribute, >>>>> sublicense, >>>>> + * and/or sell copies of the Software, and to permit persons to whom the >>>>> + * Software is furnished to do so, subject to the following conditions: >>>>> + * >>>>> + * The above copyright notice and this permission notice (including the >>>>> next >>>>> + * paragraph) shall be included in all copies or substantial portions of >>>>> the >>>>> + * Software. >>>>> + * >>>>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, >>>>> EXPRESS OR >>>>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF >>>>> MERCHANTABILITY, >>>>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT >>>>> SHALL >>>>> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR >>>>> OTHER >>>>> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, >>>>> ARISING >>>>> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER >>>>> DEALINGS >>>>> + * IN THE SOFTWARE. >>>>> + */ >>>>> + >>>>> +#ifndef BRW_FS_BUILDER_H >>>>> +#define BRW_FS_BUILDER_H >>>>> + >>>>> +#include "brw_ir_fs.h" >>>>> +#include "brw_shader.h" >>>>> +#include "brw_context.h" >>>>> + >>>>> +namespace brw { >>>>> + /** >>>>> + * Toolbox to assemble an FS IR program out of individual >>>>> instructions. >>>>> + * >>>>> + * This object is meant to have an interface consistent with >>>>> + * brw::vec4_builder. They cannot be fully interchangeable because >>>>> + * brw::fs_builder generates scalar code while brw::vec4_builder >>>>> generates >>>>> + * vector code. >>>>> + */ >>>>> + class fs_builder { >>>>> + public: >>>>> + /** Type used in this IR to represent a source of an instruction. >>>>> */ >>>>> + typedef fs_reg src_reg; >>>>> + >>>>> + /** Type used in this IR to represent the destination of an >>>>> instruction. */ >>>>> + typedef fs_reg dst_reg; >>>>> + >>>>> + /** Type used in this IR to represent an instruction. */ >>>>> + typedef fs_inst instruction; >>>>> + >>>>> + /** >>>>> + * Construct an fs_builder appending instructions at the end of >>>>> + * the instruction list of \p shader. \p dispatch_width gives >>>>> + * the native execution width of the program. >>>>> + */ >>>>> + fs_builder(backend_shader *shader, >>>>> + unsigned dispatch_width) : >>>>> + shader(shader), block(NULL), >>>>> + cursor((exec_node *)&shader->instructions.tail), >>>>> + _dispatch_width(dispatch_width), >>>>> + _group(0), >>>>> + force_writemask_all(false), >>>>> + annotation() >>>>> + { >>>>> + } >>>>> + >>>>> + /** >>>>> + * Construct an fs_builder that inserts instructions before \p >>>>> cursor in >>>>> + * basic block \p block, inheriting other code generation >>>>> parameters >>>>> + * from this. >>>>> + */ >>>>> + fs_builder >>>>> + at(bblock_t *block, exec_node *cursor) const >>>>> + { >>>>> + fs_builder bld = *this; >>>>> + bld.block = block; >>>>> + bld.cursor = cursor; >>>>> + return bld; >>>>> + } >>>>> + >>>>> + /** >>>>> + * Construct a builder specifying the default SIMD width and group >>>>> of >>>>> + * channel enable signals, inheriting other code generation >>>>> parameters >>>>> + * from this. >>>>> + * >>>>> + * \p n gives the default SIMD width, \p i gives the slot group >>>>> used for >>>>> + * predication and control flow masking in multiples of \p n >>>>> channels. >>>>> + */ >>>>> + fs_builder >>>>> + group(unsigned n, unsigned i) const >>>>> + { >>>>> + assert(n <= dispatch_width() && >>>>> + i < dispatch_width() / n); >>>>> + fs_builder bld = *this; >>>>> + bld._dispatch_width = n; >>>>> + bld._group += i * n; >>>>> + return bld; >>>>> + } >>>>> + >>>>> + /** >>>>> + * Alias for group() with width equal to eight. >>>>> + */ >>>>> + fs_builder >>>>> + half(unsigned i) const >>>>> + { >>>>> + return group(8, i); >>>>> + } >>>>> + >>>>> + /** >>>>> + * Construct a builder with per-channel control flow execution >>>>> masking >>>>> + * disabled if \p b is true. If control flow execution masking is >>>>> + * already disabled this has no effect. >>>>> + */ >>>>> + fs_builder >>>>> + exec_all(bool b = true) const >>>>> + { >>>>> + fs_builder bld = *this; >>>>> + if (b) >>>>> + bld.force_writemask_all = true; >>>>> + return bld; >>>>> + } >>>>> + >>>>> + /** >>>>> + * Construct a builder with the given debug annotation info. >>>>> + */ >>>>> + fs_builder >>>>> + annotate(const char *str, const void *ir = NULL) const >>>>> + { >>>>> + fs_builder bld = *this; >>>>> + bld.annotation.str = str; >>>>> + bld.annotation.ir = ir; >>>>> + return bld; >>>>> + } >>>>> + >>>>> + /** >>>>> + * Get the SIMD width in use. >>>>> + */ >>>>> + unsigned >>>>> + dispatch_width() const >>>>> + { >>>>> + return _dispatch_width; >>>>> + } >>>>> + >>>>> + /** >>>>> + * Allocate a virtual register of natural vector size (one for >>>>> this IR) >>>>> + * and SIMD width. \p n gives the amount of space to allocate in >>>>> + * dispatch_width units (which is just enough space for one logical >>>>> + * component in this IR). >>>>> + */ >>>>> + dst_reg >>>>> + vgrf(enum brw_reg_type type, unsigned n = 1) const >>>>> + { >>>>> + return dst_reg(GRF, shader->alloc.allocate( >>>>> + DIV_ROUND_UP(n * type_sz(type) * >>>>> dispatch_width(), >>>>> + REG_SIZE)), >>>>> + type, dispatch_width()); >>>>> + } >>>>> + >>>>> + /** >>>>> + * Create a null register of floating type. >>>>> + */ >>>>> + dst_reg >>>>> + null_reg_f() const >>>>> + { >>>>> + return dst_reg(retype(brw_null_vec(dispatch_width()), >>>>> + BRW_REGISTER_TYPE_F)); >>>>> + } >>>>> + >>>>> + /** >>>>> + * Create a null register of signed integer type. >>>>> + */ >>>>> + dst_reg >>>>> + null_reg_d() const >>>>> + { >>>>> + return dst_reg(retype(brw_null_vec(dispatch_width()), >>>>> + BRW_REGISTER_TYPE_D)); >>>>> + } >>>>> + >>>>> + /** >>>>> + * Create a null register of unsigned integer type. >>>>> + */ >>>>> + dst_reg >>>>> + null_reg_ud() const >>>>> + { >>>>> + return dst_reg(retype(brw_null_vec(dispatch_width()), >>>>> + BRW_REGISTER_TYPE_UD)); >>>>> + } >>>>> + >>>>> + /** >>>>> + * Get the mask of SIMD channels enabled by dispatch and not yet >>>>> + * disabled by discard. >>>>> + */ >>>>> + src_reg >>>>> + sample_mask_reg() const >>>>> + { >>>>> + const bool uses_kill = >>>>> + (shader->stage == MESA_SHADER_FRAGMENT && >>>>> + ((brw_wm_prog_data *)shader->stage_prog_data)->uses_kill); >>>>> + return (shader->stage != MESA_SHADER_FRAGMENT ? src_reg(0xffff) >>>>> : >>>>> + uses_kill ? brw_flag_reg(0, 1) : >>>>> + retype(brw_vec1_grf(1, 7), BRW_REGISTER_TYPE_UD)); >>>>> + } >>>>> + >>>>> + /** >>>>> + * Insert an instruction into the program. >>>>> + */ >>>>> + instruction * >>>>> + emit(const instruction &inst) const >>>>> + { >>>>> + return emit(new(shader->mem_ctx) instruction(inst)); >>>>> + } >>>>> + >>>>> + /** >>>>> + * Create and insert a nullary control instruction into the >>>>> program. >>>>> + */ >>>>> + instruction * >>>>> + emit(enum opcode opcode) const >>>>> + { >>>>> + return emit(instruction(opcode, dispatch_width())); >>>>> + } >>>>> + >>>>> + /** >>>>> + * Create and insert a nullary instruction into the program. >>>>> + */ >>>>> + instruction * >>>>> + emit(enum opcode opcode, const dst_reg &dst) const >>>>> + { >>>>> + return emit(instruction(opcode, dst)); >>>>> + } >>>>> + >>>>> + /** >>>>> + * Create and insert a unary instruction into the program. >>>>> + */ >>>>> + instruction * >>>>> + emit(enum opcode opcode, const dst_reg &dst, const src_reg &src0) >>>>> const >>>>> + { >>>>> + switch (opcode) { >>>>> + case SHADER_OPCODE_RCP: >>>>> + case SHADER_OPCODE_RSQ: >>>>> + case SHADER_OPCODE_SQRT: >>>>> + case SHADER_OPCODE_EXP2: >>>>> + case SHADER_OPCODE_LOG2: >>>>> + case SHADER_OPCODE_SIN: >>>>> + case SHADER_OPCODE_COS: >>>>> + return fix_math_instruction( >>>>> + emit(instruction(opcode, dst.width, dst, >>>>> + fix_math_operand(src0)))); >>>>> + >>>>> + default: >>>>> + return emit(instruction(opcode, dst.width, dst, src0)); >>>>> + } >>>>> + } >>>>> + >>>>> + /** >>>>> + * Create and insert a binary instruction into the program. >>>>> + */ >>>>> + instruction * >>>>> + emit(enum opcode opcode, const dst_reg &dst, const src_reg &src0, >>>>> + const src_reg &src1) const >>>>> + { >>>>> + switch (opcode) { >>>>> + case SHADER_OPCODE_POW: >>>>> + case SHADER_OPCODE_INT_QUOTIENT: >>>>> + case SHADER_OPCODE_INT_REMAINDER: >>>>> + return fix_math_instruction( >>>>> + emit(instruction(opcode, dst.width, dst, >>>>> + fix_math_operand(src0), >>>>> + fix_math_operand(src1)))); >>>>> + >>>>> + default: >>>>> + return emit(instruction(opcode, dst.width, dst, src0, src1)); >>>>> + >>>>> + } >>>>> + } >>>>> + >>>>> + /** >>>>> + * Create and insert a ternary instruction into the program. >>>>> + */ >>>>> + instruction * >>>>> + emit(enum opcode opcode, const dst_reg &dst, const src_reg &src0, >>>>> + const src_reg &src1, const src_reg &src2) const >>>>> + { >>>>> + switch (opcode) { >>>>> + case BRW_OPCODE_BFE: >>>>> + case BRW_OPCODE_BFI2: >>>>> + case BRW_OPCODE_MAD: >>>>> + case BRW_OPCODE_LRP: >>>>> + return emit(instruction(opcode, dst.width, dst, >>>>> + fix_3src_operand(src0), >>>>> + fix_3src_operand(src1), >>>>> + fix_3src_operand(src2))); >>>>> + >>>>> + default: >>>>> + return emit(instruction(opcode, dst.width, dst, src0, src1, >>>>> src2)); >>>>> + } >>>>> + } >>>>> + >>>>> + /** >>>>> + * Insert a preallocated instruction into the program. >>>>> + */ >>>>> + instruction * >>>>> + emit(instruction *inst) const >>>>> + { >>>>> + assert(inst->exec_size == dispatch_width() || >>>>> + force_writemask_all); >>>>> + assert(_group == 0 || _group == 8); >>>>> + >>>>> + inst->force_sechalf = (_group == 8); >>>>> + inst->force_writemask_all = force_writemask_all; >>>>> + inst->annotation = annotation.str; >>>>> + inst->ir = annotation.ir; >>>>> + >>>>> + if (block) >>>>> + static_cast<instruction *>(cursor)->insert_before(block, >>>>> inst); >>>>> + else >>>>> + cursor->insert_before(inst); >>>>> + >>>>> + return inst; >>>>> + } >>>>> + >>>>> + /** >>>>> + * Select \p src0 if the comparison of both sources with the given >>>>> + * conditional mod evaluates to true, otherwise select \p src1. >>>>> + * >>>>> + * Generally useful to get the minimum or maximum of two values. >>>>> + */ >>>>> + void >>>>> + emit_minmax(const dst_reg &dst, const src_reg &src0, >>>>> + const src_reg &src1, brw_conditional_mod mod) const >>>> >>>> This is unused, after Jason's commit: >>>> >>>> commit 78644ffc4d341deb431145108f0b2d377e59b61e >>>> Author: Jason Ekstrand <jason.ekstr...@intel.com> >>>> Date: Wed May 20 10:35:34 2015 -0700 >>>> >>>> i965/fs: Remove the ir_visitor code >>>> >>>> Now that everything is running through NIR, this is all dead. >>>> >>>> Reviewed-by: Matt Turner <matts...@gmail.com> >>>> Reviewed-by: Kenneth Graunke <kenn...@whitecape.org> >>>> >>> >>> I was planning to use it in my ARB_shader_image_load_store series. It >>> may also make sense to fix brw_fs_nir.cpp to use it, as it would >>> probably simplify the code slightly. I can make the changes as part of >>> PATCH 32 if you like. >> >> Oh, okay. I didn't realize that brw_fs_nir.cpp had it open-coded. >> Yeah, it's fine to leave it in here. >> >>> >>>>> + { >>>>> + if (shader->devinfo->gen >= 6) { >>>>> + set_condmod(mod, SEL(dst, fix_unsigned_negate(src0), >>>>> + fix_unsigned_negate(src1))); >>>>> + } else { >>>>> + CMP(null_reg_d(), src0, src1, mod); >>>>> + set_predicate(BRW_PREDICATE_NORMAL, >>>>> + SEL(dst, src0, src1)); >>>>> + } >>>>> + } >>>>> + >>>>> + /** >>>>> + * Copy any live channel from \p src to the first channel of \p >>>>> dst. >>>>> + */ >>>>> + void >>>>> + emit_uniformize(const dst_reg &dst, const src_reg &src) const >>>>> + { >>>>> + const fs_builder ubld = exec_all(); >>>>> + const dst_reg chan_index = vgrf(BRW_REGISTER_TYPE_UD); >>>>> + >>>>> + ubld.emit(SHADER_OPCODE_FIND_LIVE_CHANNEL, >>>>> component(chan_index, 0)); >>>>> + ubld.emit(SHADER_OPCODE_BROADCAST, component(dst, 0), >>>>> + src, component(chan_index, 0)); >>>>> + } >>>>> + >>>>> + /** >>>>> + * Assorted arithmetic ops. >>>>> + * @{ >>>>> + */ >>>>> +#define ALU1(op) \ >>>>> + instruction * \ >>>>> + op(const dst_reg &dst, const src_reg &src0) const \ >>>>> + { \ >>>>> + return emit(BRW_OPCODE_##op, dst, src0); \ >>>>> + } >>>>> + >>>>> +#define ALU2(op) \ >>>>> + instruction * \ >>>>> + op(const dst_reg &dst, const src_reg &src0, const src_reg &src1) >>>>> const \ >>>>> + { \ >>>>> + return emit(BRW_OPCODE_##op, dst, src0, src1); \ >>>>> + } >>>>> + >>>>> +#define ALU2_ACC(op) \ >>>>> + instruction * \ >>>>> + op(const dst_reg &dst, const src_reg &src0, const src_reg &src1) >>>>> const \ >>>>> + { \ >>>>> + instruction *inst = emit(BRW_OPCODE_##op, dst, src0, src1); \ >>>>> + inst->writes_accumulator = true; \ >>>>> + return inst; \ >>>>> + } >>>>> + >>>>> +#define ALU3(op) \ >>>>> + instruction * \ >>>>> + op(const dst_reg &dst, const src_reg &src0, const src_reg &src1, \ >>>>> + const src_reg &src2) const \ >>>>> + { \ >>>>> + return emit(BRW_OPCODE_##op, dst, src0, src1, src2); \ >>>>> + } >>>>> + >>>>> + ALU2(ADD) >>>>> + ALU2_ACC(ADDC) >>>>> + ALU2(AND) >>>>> + ALU2(ASR) >>>>> + ALU2(AVG) >>>>> + ALU3(BFE) >>>>> + ALU2(BFI1) >>>>> + ALU3(BFI2) >>>>> + ALU1(BFREV) >>>>> + ALU1(CBIT) >>>>> + ALU2(CMPN) >>>>> + ALU3(CSEL) >>>>> + ALU2(DP2) >>>>> + ALU2(DP3) >>>>> + ALU2(DP4) >>>>> + ALU2(DPH) >>>>> + ALU1(F16TO32) >>>>> + ALU1(F32TO16) >>>>> + ALU1(FBH) >>>>> + ALU1(FBL) >>>>> + ALU1(FRC) >>>>> + ALU2(LINE) >>>>> + ALU1(LZD) >>>>> + ALU2(MAC) >>>>> + ALU2_ACC(MACH) >>>>> + ALU3(MAD) >>>>> + ALU1(MOV) >>>>> + ALU2(MUL) >>>>> + ALU1(NOT) >>>>> + ALU2(OR) >>>>> + ALU2(PLN) >>>>> + ALU1(RNDD) >>>>> + ALU1(RNDE) >>>>> + ALU1(RNDU) >>>>> + ALU1(RNDZ) >>>>> + ALU2(SAD2) >>>>> + ALU2_ACC(SADA2) >>>>> + ALU2(SEL) >>>>> + ALU2(SHL) >>>>> + ALU2(SHR) >>>>> + ALU2_ACC(SUBB) >>>>> + ALU2(XOR) >>>>> + >>>>> +#undef ALU3 >>>>> +#undef ALU2_ACC >>>>> +#undef ALU2 >>>>> +#undef ALU1 >>>>> + /** @} */ >>>>> + >>>>> + /** >>>>> + * CMP: Sets the low bit of the destination channels with the >>>>> result >>>>> + * of the comparison, while the upper bits are undefined, and >>>>> updates >>>>> + * the flag register with the packed 16 bits of the result. >>>>> + */ >>>>> + instruction * >>>>> + CMP(const dst_reg &dst, const src_reg &src0, const src_reg &src1, >>>>> + brw_conditional_mod condition) const >>>>> + { >>>>> + /* Take the instruction: >>>>> + * >>>>> + * CMP null<d> src0<f> src1<f> >>>>> + * >>>>> + * Original gen4 does type conversion to the destination type >>>>> + * before comparison, producing garbage results for floating >>>>> + * point comparisons. >>>>> + * >>>>> + * The destination type doesn't matter on newer generations, >>>>> + * so we set the type to match src0 so we can compact the >>>>> + * instruction. >>>>> + */ >>>>> + return set_condmod(condition, >>>>> + emit(BRW_OPCODE_CMP, retype(dst, src0.type), >>>>> + fix_unsigned_negate(src0), >>>>> + fix_unsigned_negate(src1))); >>>>> + } >>>>> + >>>>> + /** >>>>> + * Gen4 predicated IF. >>>>> + */ >>>>> + instruction * >>>>> + IF(brw_predicate predicate) const >>>>> + { >>>>> + instruction *inst = emit(BRW_OPCODE_IF); >>>>> + return set_predicate(predicate, inst); >>>>> + } >>>>> + >>>>> + /** >>>>> + * Gen6 IF with embedded comparison. >>>>> + */ >>>>> + instruction * >>>>> + IF(const src_reg &src0, const src_reg &src1, >>>>> + brw_conditional_mod condition) const >>>>> + { >>>>> + assert(shader->devinfo->gen == 6); >>>>> + return set_condmod(condition, >>>>> + emit(BRW_OPCODE_IF, >>>>> + null_reg_d(), >>>>> + fix_unsigned_negate(src0), >>>>> + fix_unsigned_negate(src1))); >>>> >>>> I don't see that we were calling resolve_ud_negate() on >>>> fs_visitor::emit_if_gen6. >>>> >>> Right, but shouldn't any instruction comparing unsigned values which are >>> potentially negated call resolve_ud_negate()? >> >> From looking at the Sandybridge PRM, the IF with embedded conditional >> cannot do source modifiers, so I think the answer is no. >> > I suspect that the PRM is wrong, IIRC I checked this on real hardware > and the simulator, and the embedded conditional on SNB behaves largely > like the normal CMP instruction, source modifiers work as usual (though > we didn't actually make use of them AFAIK).
All the more reason that this should be done as a separate patch with an explanation. >>>> In fact, we've removed the function entirely (a known regression in >>>> the switch to NIR). I'd drop this for now. It's unused as well. >>>> >>> If there are still plans to restore this functionality, I would rather >>> leave it alone rather than remove it now for somebody else to have to >>> rewrite it later on, because it's less work and doesn't seem to hurt. >> >> But moreover, again I don't like hidden changes like this in what is >> sold as a refactor. >> > It was never my intention to sell this as a refactor. Are you just arguing semantics? Call it what you will -- the series is refactoring a bunch of code. >> Even if it had been a bug in the original code, this patch would *not* >> have been the appropriate place to fix it. >> > Heh, OK, next time I'm rewriting code I'll try to pay more attention to > preserve the bugs I notice in the original code. Drop the snark. I shouldn't have to explain to you that huge copy-and-paste blocks shouldn't have hidden functional changes. _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev