Matt Turner <matts...@gmail.com> writes: > On Fri, Jun 5, 2015 at 4:13 PM, Francisco Jerez <curroje...@riseup.net> wrote: >> Matt Turner <matts...@gmail.com> writes: >> >>> On Fri, Jun 5, 2015 at 1:42 PM, Francisco Jerez <curroje...@riseup.net> >>> wrote: >>>> Matt Turner <matts...@gmail.com> writes: >>>> >>>>> On Thu, Jun 4, 2015 at 9:04 AM, Francisco Jerez <curroje...@riseup.net> >>>>> wrote: >>>>>> The purpose of this change is threefold: First, it improves the >>>>>> modularity of the compiler back-end by separating the functionality >>>>>> required to construct an i965 IR program from the rest of the visitor >>>>>> god-object, what in turn will reduce the coupling between other >>>>>> components and the visitor allowing a more modular design. This patch >>>>>> doesn't yet remove the equivalent functionality from the visitor >>>>>> classes, that task will be undertaken by a separate series, as it >>>>>> involves major back-end surgery. >>>>>> >>>>>> Second, it improves consistency between the scalar and vector >>>>>> back-ends. The FS and VEC4 builders can both be used to generate >>>>>> scalar code with a compatible interface or they can be used to >>>>>> generate natural vector width code -- 1 or 4 components respectively. >>>>>> >>>>>> Third, the approach to IR construction is somewhat different to what >>>>>> the visitor classes currently do. All parameters affecting code >>>>>> generation (execution size, half control, point in the program where >>>>>> new instructions are inserted, etc.) are encapsulated in a stand-alone >>>>>> object rather than being quasi-global state (yes, anything defined in >>>>>> one of the visitor classes is effectively global due to the tight >>>>>> coupling with virtually everything else in the compiler back-end). >>>>>> This object is lightweight and can be copied, mutated and passed >>>>>> around, making helper IR-building functions more flexible because they >>>>>> can now simply take a builder object as argument and will inherit its >>>>>> IR generation properties in exactly the same way that a discrete >>>>>> instruction would from the same builder object. >>>>>> >>>>>> The emit_typed_write() function from my image-load-store branch is an >>>>>> example that illustrates the usefulness of the latter point: Due to >>>>>> hardware limitations the function may have to split the untyped >>>>>> surface message in 8-wide chunks. That means that the several >>>>>> functions called to help with the construction of the message payload >>>>>> are themselves required to set the execution width and half control >>>>>> correctly on the instructions they emit, and to allocate all registers >>>>>> with half the default width. With the previous approach this would >>>>>> require the used helper functions to be aware of the parameters that >>>>>> might differ from the default state and explicitly set the instruction >>>>>> bits accordingly. With the new approach they would get a modified >>>>>> builder object as argument that would influence all instructions >>>>>> emitted by the helper function as if it were the default state. >>>>>> >>>>>> Another example is the fs_visitor::VARYING_PULL_CONSTANT_LOAD() >>>>>> method. It doesn't actually emit any instructions, they are simply >>>>>> created and inserted into an exec_list which is returned for the >>>>>> caller to emit at some location of the program. This sort of two-step >>>>>> emission becomes unnecessary with the builder interface because the >>>>>> insertion point is one more of the code generation parameters which >>>>>> are part of the builder object. The caller can simply pass >>>>>> VARYING_PULL_CONSTANT_LOAD() a modified builder object pointing at the >>>>>> location of the program where the effect of the constant load is >>>>>> desired. This two-step emission (which pervades the compiler back-end >>>>>> and is in most cases redundant) goes away: E.g. ADD() now actually >>>>>> adds two registers rather than just creating an ADD instruction in >>>>>> memory, emit(ADD()) is no longer necessary. >>>>>> >>>>>> v2: Drop scalarizing VEC4 builder. >>>>>> v3: Take a backend_shader as constructor argument. Improve handling >>>>>> of debug annotations and execution control flags. >>>>>> --- >>>>>> src/mesa/drivers/dri/i965/Makefile.sources | 1 + >>>>>> src/mesa/drivers/dri/i965/brw_fs_builder.h | 659 >>>>>> +++++++++++++++++++++++++++++ >>>>>> 2 files changed, 660 insertions(+) >>>>>> create mode 100644 src/mesa/drivers/dri/i965/brw_fs_builder.h >>>>>> >>>>>> diff --git a/src/mesa/drivers/dri/i965/Makefile.sources >>>>>> b/src/mesa/drivers/dri/i965/Makefile.sources >>>>>> index 3f852cd..93f336e 100644 >>>>>> --- a/src/mesa/drivers/dri/i965/Makefile.sources >>>>>> +++ b/src/mesa/drivers/dri/i965/Makefile.sources >>>>>> @@ -42,6 +42,7 @@ i965_FILES = \ >>>>>> brw_ff_gs.c \ >>>>>> brw_ff_gs_emit.c \ >>>>>> brw_ff_gs.h \ >>>>>> + brw_fs_builder.h \ >>>>>> brw_fs_channel_expressions.cpp \ >>>>>> brw_fs_cmod_propagation.cpp \ >>>>>> brw_fs_combine_constants.cpp \ >>>>>> diff --git a/src/mesa/drivers/dri/i965/brw_fs_builder.h >>>>>> b/src/mesa/drivers/dri/i965/brw_fs_builder.h >>>>>> new file mode 100644 >>>>>> index 0000000..e528180 >>>>>> --- /dev/null >>>>>> +++ b/src/mesa/drivers/dri/i965/brw_fs_builder.h >>>>>> @@ -0,0 +1,659 @@ >>>>>> +/* -*- c++ -*- */ >>>>>> +/* >>>>>> + * Copyright © 2010-2015 Intel Corporation >>>>>> + * >>>>>> + * Permission is hereby granted, free of charge, to any person >>>>>> obtaining a >>>>>> + * copy of this software and associated documentation files (the >>>>>> "Software"), >>>>>> + * to deal in the Software without restriction, including without >>>>>> limitation >>>>>> + * the rights to use, copy, modify, merge, publish, distribute, >>>>>> sublicense, >>>>>> + * and/or sell copies of the Software, and to permit persons to whom the >>>>>> + * Software is furnished to do so, subject to the following conditions: >>>>>> + * >>>>>> + * The above copyright notice and this permission notice (including the >>>>>> next >>>>>> + * paragraph) shall be included in all copies or substantial portions >>>>>> of the >>>>>> + * Software. >>>>>> + * >>>>>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, >>>>>> EXPRESS OR >>>>>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF >>>>>> MERCHANTABILITY, >>>>>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT >>>>>> SHALL >>>>>> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR >>>>>> OTHER >>>>>> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, >>>>>> ARISING >>>>>> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER >>>>>> DEALINGS >>>>>> + * IN THE SOFTWARE. >>>>>> + */ >>>>>> + >>>>>> +#ifndef BRW_FS_BUILDER_H >>>>>> +#define BRW_FS_BUILDER_H >>>>>> + >>>>>> +#include "brw_ir_fs.h" >>>>>> +#include "brw_shader.h" >>>>>> +#include "brw_context.h" >>>>>> + >>>>>> +namespace brw { >>>>>> + /** >>>>>> + * Toolbox to assemble an FS IR program out of individual >>>>>> instructions. >>>>>> + * >>>>>> + * This object is meant to have an interface consistent with >>>>>> + * brw::vec4_builder. They cannot be fully interchangeable because >>>>>> + * brw::fs_builder generates scalar code while brw::vec4_builder >>>>>> generates >>>>>> + * vector code. >>>>>> + */ >>>>>> + class fs_builder { >>>>>> + public: >>>>>> + /** Type used in this IR to represent a source of an instruction. >>>>>> */ >>>>>> + typedef fs_reg src_reg; >>>>>> + >>>>>> + /** Type used in this IR to represent the destination of an >>>>>> instruction. */ >>>>>> + typedef fs_reg dst_reg; >>>>>> + >>>>>> + /** Type used in this IR to represent an instruction. */ >>>>>> + typedef fs_inst instruction; >>>>>> + >>>>>> + /** >>>>>> + * Construct an fs_builder appending instructions at the end of >>>>>> + * the instruction list of \p shader. \p dispatch_width gives >>>>>> + * the native execution width of the program. >>>>>> + */ >>>>>> + fs_builder(backend_shader *shader, >>>>>> + unsigned dispatch_width) : >>>>>> + shader(shader), block(NULL), >>>>>> + cursor((exec_node *)&shader->instructions.tail), >>>>>> + _dispatch_width(dispatch_width), >>>>>> + _group(0), >>>>>> + force_writemask_all(false), >>>>>> + annotation() >>>>>> + { >>>>>> + } >>>>>> + >>>>>> + /** >>>>>> + * Construct an fs_builder that inserts instructions before \p >>>>>> cursor in >>>>>> + * basic block \p block, inheriting other code generation >>>>>> parameters >>>>>> + * from this. >>>>>> + */ >>>>>> + fs_builder >>>>>> + at(bblock_t *block, exec_node *cursor) const >>>>>> + { >>>>>> + fs_builder bld = *this; >>>>>> + bld.block = block; >>>>>> + bld.cursor = cursor; >>>>>> + return bld; >>>>>> + } >>>>>> + >>>>>> + /** >>>>>> + * Construct a builder specifying the default SIMD width and >>>>>> group of >>>>>> + * channel enable signals, inheriting other code generation >>>>>> parameters >>>>>> + * from this. >>>>>> + * >>>>>> + * \p n gives the default SIMD width, \p i gives the slot group >>>>>> used for >>>>>> + * predication and control flow masking in multiples of \p n >>>>>> channels. >>>>>> + */ >>>>>> + fs_builder >>>>>> + group(unsigned n, unsigned i) const >>>>>> + { >>>>>> + assert(n <= dispatch_width() && >>>>>> + i < dispatch_width() / n); >>>>>> + fs_builder bld = *this; >>>>>> + bld._dispatch_width = n; >>>>>> + bld._group += i * n; >>>>>> + return bld; >>>>>> + } >>>>>> + >>>>>> + /** >>>>>> + * Alias for group() with width equal to eight. >>>>>> + */ >>>>>> + fs_builder >>>>>> + half(unsigned i) const >>>>>> + { >>>>>> + return group(8, i); >>>>>> + } >>>>>> + >>>>>> + /** >>>>>> + * Construct a builder with per-channel control flow execution >>>>>> masking >>>>>> + * disabled if \p b is true. If control flow execution masking is >>>>>> + * already disabled this has no effect. >>>>>> + */ >>>>>> + fs_builder >>>>>> + exec_all(bool b = true) const >>>>>> + { >>>>>> + fs_builder bld = *this; >>>>>> + if (b) >>>>>> + bld.force_writemask_all = true; >>>>>> + return bld; >>>>>> + } >>>>>> + >>>>>> + /** >>>>>> + * Construct a builder with the given debug annotation info. >>>>>> + */ >>>>>> + fs_builder >>>>>> + annotate(const char *str, const void *ir = NULL) const >>>>>> + { >>>>>> + fs_builder bld = *this; >>>>>> + bld.annotation.str = str; >>>>>> + bld.annotation.ir = ir; >>>>>> + return bld; >>>>>> + } >>>>>> + >>>>>> + /** >>>>>> + * Get the SIMD width in use. >>>>>> + */ >>>>>> + unsigned >>>>>> + dispatch_width() const >>>>>> + { >>>>>> + return _dispatch_width; >>>>>> + } >>>>>> + >>>>>> + /** >>>>>> + * Allocate a virtual register of natural vector size (one for >>>>>> this IR) >>>>>> + * and SIMD width. \p n gives the amount of space to allocate in >>>>>> + * dispatch_width units (which is just enough space for one >>>>>> logical >>>>>> + * component in this IR). >>>>>> + */ >>>>>> + dst_reg >>>>>> + vgrf(enum brw_reg_type type, unsigned n = 1) const >>>>>> + { >>>>>> + return dst_reg(GRF, shader->alloc.allocate( >>>>>> + DIV_ROUND_UP(n * type_sz(type) * >>>>>> dispatch_width(), >>>>>> + REG_SIZE)), >>>>>> + type, dispatch_width()); >>>>>> + } >>>>>> + >>>>>> + /** >>>>>> + * Create a null register of floating type. >>>>>> + */ >>>>>> + dst_reg >>>>>> + null_reg_f() const >>>>>> + { >>>>>> + return dst_reg(retype(brw_null_vec(dispatch_width()), >>>>>> + BRW_REGISTER_TYPE_F)); >>>>>> + } >>>>>> + >>>>>> + /** >>>>>> + * Create a null register of signed integer type. >>>>>> + */ >>>>>> + dst_reg >>>>>> + null_reg_d() const >>>>>> + { >>>>>> + return dst_reg(retype(brw_null_vec(dispatch_width()), >>>>>> + BRW_REGISTER_TYPE_D)); >>>>>> + } >>>>>> + >>>>>> + /** >>>>>> + * Create a null register of unsigned integer type. >>>>>> + */ >>>>>> + dst_reg >>>>>> + null_reg_ud() const >>>>>> + { >>>>>> + return dst_reg(retype(brw_null_vec(dispatch_width()), >>>>>> + BRW_REGISTER_TYPE_UD)); >>>>>> + } >>>>>> + >>>>>> + /** >>>>>> + * Get the mask of SIMD channels enabled by dispatch and not yet >>>>>> + * disabled by discard. >>>>>> + */ >>>>>> + src_reg >>>>>> + sample_mask_reg() const >>>>>> + { >>>>>> + const bool uses_kill = >>>>>> + (shader->stage == MESA_SHADER_FRAGMENT && >>>>>> + ((brw_wm_prog_data *)shader->stage_prog_data)->uses_kill); >>>>>> + return (shader->stage != MESA_SHADER_FRAGMENT ? >>>>>> src_reg(0xffff) : >>>>>> + uses_kill ? brw_flag_reg(0, 1) : >>>>>> + retype(brw_vec1_grf(1, 7), BRW_REGISTER_TYPE_UD)); >>>>>> + } >>>>>> + >>>>>> + /** >>>>>> + * Insert an instruction into the program. >>>>>> + */ >>>>>> + instruction * >>>>>> + emit(const instruction &inst) const >>>>>> + { >>>>>> + return emit(new(shader->mem_ctx) instruction(inst)); >>>>>> + } >>>>>> + >>>>>> + /** >>>>>> + * Create and insert a nullary control instruction into the >>>>>> program. >>>>>> + */ >>>>>> + instruction * >>>>>> + emit(enum opcode opcode) const >>>>>> + { >>>>>> + return emit(instruction(opcode, dispatch_width())); >>>>>> + } >>>>>> + >>>>>> + /** >>>>>> + * Create and insert a nullary instruction into the program. >>>>>> + */ >>>>>> + instruction * >>>>>> + emit(enum opcode opcode, const dst_reg &dst) const >>>>>> + { >>>>>> + return emit(instruction(opcode, dst)); >>>>>> + } >>>>>> + >>>>>> + /** >>>>>> + * Create and insert a unary instruction into the program. >>>>>> + */ >>>>>> + instruction * >>>>>> + emit(enum opcode opcode, const dst_reg &dst, const src_reg &src0) >>>>>> const >>>>>> + { >>>>>> + switch (opcode) { >>>>>> + case SHADER_OPCODE_RCP: >>>>>> + case SHADER_OPCODE_RSQ: >>>>>> + case SHADER_OPCODE_SQRT: >>>>>> + case SHADER_OPCODE_EXP2: >>>>>> + case SHADER_OPCODE_LOG2: >>>>>> + case SHADER_OPCODE_SIN: >>>>>> + case SHADER_OPCODE_COS: >>>>>> + return fix_math_instruction( >>>>>> + emit(instruction(opcode, dst.width, dst, >>>>>> + fix_math_operand(src0)))); >>>>>> + >>>>>> + default: >>>>>> + return emit(instruction(opcode, dst.width, dst, src0)); >>>>>> + } >>>>>> + } >>>>>> + >>>>>> + /** >>>>>> + * Create and insert a binary instruction into the program. >>>>>> + */ >>>>>> + instruction * >>>>>> + emit(enum opcode opcode, const dst_reg &dst, const src_reg &src0, >>>>>> + const src_reg &src1) const >>>>>> + { >>>>>> + switch (opcode) { >>>>>> + case SHADER_OPCODE_POW: >>>>>> + case SHADER_OPCODE_INT_QUOTIENT: >>>>>> + case SHADER_OPCODE_INT_REMAINDER: >>>>>> + return fix_math_instruction( >>>>>> + emit(instruction(opcode, dst.width, dst, >>>>>> + fix_math_operand(src0), >>>>>> + fix_math_operand(src1)))); >>>>>> + >>>>>> + default: >>>>>> + return emit(instruction(opcode, dst.width, dst, src0, >>>>>> src1)); >>>>>> + >>>>>> + } >>>>>> + } >>>>>> + >>>>>> + /** >>>>>> + * Create and insert a ternary instruction into the program. >>>>>> + */ >>>>>> + instruction * >>>>>> + emit(enum opcode opcode, const dst_reg &dst, const src_reg &src0, >>>>>> + const src_reg &src1, const src_reg &src2) const >>>>>> + { >>>>>> + switch (opcode) { >>>>>> + case BRW_OPCODE_BFE: >>>>>> + case BRW_OPCODE_BFI2: >>>>>> + case BRW_OPCODE_MAD: >>>>>> + case BRW_OPCODE_LRP: >>>>>> + return emit(instruction(opcode, dst.width, dst, >>>>>> + fix_3src_operand(src0), >>>>>> + fix_3src_operand(src1), >>>>>> + fix_3src_operand(src2))); >>>>>> + >>>>>> + default: >>>>>> + return emit(instruction(opcode, dst.width, dst, src0, src1, >>>>>> src2)); >>>>>> + } >>>>>> + } >>>>>> + >>>>>> + /** >>>>>> + * Insert a preallocated instruction into the program. >>>>>> + */ >>>>>> + instruction * >>>>>> + emit(instruction *inst) const >>>>>> + { >>>>>> + assert(inst->exec_size == dispatch_width() || >>>>>> + force_writemask_all); >>>>>> + assert(_group == 0 || _group == 8); >>>>>> + >>>>>> + inst->force_sechalf = (_group == 8); >>>>>> + inst->force_writemask_all = force_writemask_all; >>>>>> + inst->annotation = annotation.str; >>>>>> + inst->ir = annotation.ir; >>>>>> + >>>>>> + if (block) >>>>>> + static_cast<instruction *>(cursor)->insert_before(block, >>>>>> inst); >>>>>> + else >>>>>> + cursor->insert_before(inst); >>>>>> + >>>>>> + return inst; >>>>>> + } >>>>>> + >>>>>> + /** >>>>>> + * Select \p src0 if the comparison of both sources with the given >>>>>> + * conditional mod evaluates to true, otherwise select \p src1. >>>>>> + * >>>>>> + * Generally useful to get the minimum or maximum of two values. >>>>>> + */ >>>>>> + void >>>>>> + emit_minmax(const dst_reg &dst, const src_reg &src0, >>>>>> + const src_reg &src1, brw_conditional_mod mod) const >>>>> >>>>> This is unused, after Jason's commit: >>>>> >>>>> commit 78644ffc4d341deb431145108f0b2d377e59b61e >>>>> Author: Jason Ekstrand <jason.ekstr...@intel.com> >>>>> Date: Wed May 20 10:35:34 2015 -0700 >>>>> >>>>> i965/fs: Remove the ir_visitor code >>>>> >>>>> Now that everything is running through NIR, this is all dead. >>>>> >>>>> Reviewed-by: Matt Turner <matts...@gmail.com> >>>>> Reviewed-by: Kenneth Graunke <kenn...@whitecape.org> >>>>> >>>> >>>> I was planning to use it in my ARB_shader_image_load_store series. It >>>> may also make sense to fix brw_fs_nir.cpp to use it, as it would >>>> probably simplify the code slightly. I can make the changes as part of >>>> PATCH 32 if you like. >>> >>> Oh, okay. I didn't realize that brw_fs_nir.cpp had it open-coded. >>> Yeah, it's fine to leave it in here. >>> >>>> >>>>>> + { >>>>>> + if (shader->devinfo->gen >= 6) { >>>>>> + set_condmod(mod, SEL(dst, fix_unsigned_negate(src0), >>>>>> + fix_unsigned_negate(src1))); >>>>>> + } else { >>>>>> + CMP(null_reg_d(), src0, src1, mod); >>>>>> + set_predicate(BRW_PREDICATE_NORMAL, >>>>>> + SEL(dst, src0, src1)); >>>>>> + } >>>>>> + } >>>>>> + >>>>>> + /** >>>>>> + * Copy any live channel from \p src to the first channel of \p >>>>>> dst. >>>>>> + */ >>>>>> + void >>>>>> + emit_uniformize(const dst_reg &dst, const src_reg &src) const >>>>>> + { >>>>>> + const fs_builder ubld = exec_all(); >>>>>> + const dst_reg chan_index = vgrf(BRW_REGISTER_TYPE_UD); >>>>>> + >>>>>> + ubld.emit(SHADER_OPCODE_FIND_LIVE_CHANNEL, >>>>>> component(chan_index, 0)); >>>>>> + ubld.emit(SHADER_OPCODE_BROADCAST, component(dst, 0), >>>>>> + src, component(chan_index, 0)); >>>>>> + } >>>>>> + >>>>>> + /** >>>>>> + * Assorted arithmetic ops. >>>>>> + * @{ >>>>>> + */ >>>>>> +#define ALU1(op) \ >>>>>> + instruction * \ >>>>>> + op(const dst_reg &dst, const src_reg &src0) const \ >>>>>> + { \ >>>>>> + return emit(BRW_OPCODE_##op, dst, src0); \ >>>>>> + } >>>>>> + >>>>>> +#define ALU2(op) >>>>>> \ >>>>>> + instruction * >>>>>> \ >>>>>> + op(const dst_reg &dst, const src_reg &src0, const src_reg &src1) >>>>>> const \ >>>>>> + { >>>>>> \ >>>>>> + return emit(BRW_OPCODE_##op, dst, src0, src1); >>>>>> \ >>>>>> + } >>>>>> + >>>>>> +#define ALU2_ACC(op) >>>>>> \ >>>>>> + instruction * >>>>>> \ >>>>>> + op(const dst_reg &dst, const src_reg &src0, const src_reg &src1) >>>>>> const \ >>>>>> + { >>>>>> \ >>>>>> + instruction *inst = emit(BRW_OPCODE_##op, dst, src0, src1); >>>>>> \ >>>>>> + inst->writes_accumulator = true; >>>>>> \ >>>>>> + return inst; >>>>>> \ >>>>>> + } >>>>>> + >>>>>> +#define ALU3(op) >>>>>> \ >>>>>> + instruction * >>>>>> \ >>>>>> + op(const dst_reg &dst, const src_reg &src0, const src_reg &src1, >>>>>> \ >>>>>> + const src_reg &src2) const >>>>>> \ >>>>>> + { >>>>>> \ >>>>>> + return emit(BRW_OPCODE_##op, dst, src0, src1, src2); >>>>>> \ >>>>>> + } >>>>>> + >>>>>> + ALU2(ADD) >>>>>> + ALU2_ACC(ADDC) >>>>>> + ALU2(AND) >>>>>> + ALU2(ASR) >>>>>> + ALU2(AVG) >>>>>> + ALU3(BFE) >>>>>> + ALU2(BFI1) >>>>>> + ALU3(BFI2) >>>>>> + ALU1(BFREV) >>>>>> + ALU1(CBIT) >>>>>> + ALU2(CMPN) >>>>>> + ALU3(CSEL) >>>>>> + ALU2(DP2) >>>>>> + ALU2(DP3) >>>>>> + ALU2(DP4) >>>>>> + ALU2(DPH) >>>>>> + ALU1(F16TO32) >>>>>> + ALU1(F32TO16) >>>>>> + ALU1(FBH) >>>>>> + ALU1(FBL) >>>>>> + ALU1(FRC) >>>>>> + ALU2(LINE) >>>>>> + ALU1(LZD) >>>>>> + ALU2(MAC) >>>>>> + ALU2_ACC(MACH) >>>>>> + ALU3(MAD) >>>>>> + ALU1(MOV) >>>>>> + ALU2(MUL) >>>>>> + ALU1(NOT) >>>>>> + ALU2(OR) >>>>>> + ALU2(PLN) >>>>>> + ALU1(RNDD) >>>>>> + ALU1(RNDE) >>>>>> + ALU1(RNDU) >>>>>> + ALU1(RNDZ) >>>>>> + ALU2(SAD2) >>>>>> + ALU2_ACC(SADA2) >>>>>> + ALU2(SEL) >>>>>> + ALU2(SHL) >>>>>> + ALU2(SHR) >>>>>> + ALU2_ACC(SUBB) >>>>>> + ALU2(XOR) >>>>>> + >>>>>> +#undef ALU3 >>>>>> +#undef ALU2_ACC >>>>>> +#undef ALU2 >>>>>> +#undef ALU1 >>>>>> + /** @} */ >>>>>> + >>>>>> + /** >>>>>> + * CMP: Sets the low bit of the destination channels with the >>>>>> result >>>>>> + * of the comparison, while the upper bits are undefined, and >>>>>> updates >>>>>> + * the flag register with the packed 16 bits of the result. >>>>>> + */ >>>>>> + instruction * >>>>>> + CMP(const dst_reg &dst, const src_reg &src0, const src_reg &src1, >>>>>> + brw_conditional_mod condition) const >>>>>> + { >>>>>> + /* Take the instruction: >>>>>> + * >>>>>> + * CMP null<d> src0<f> src1<f> >>>>>> + * >>>>>> + * Original gen4 does type conversion to the destination type >>>>>> + * before comparison, producing garbage results for floating >>>>>> + * point comparisons. >>>>>> + * >>>>>> + * The destination type doesn't matter on newer generations, >>>>>> + * so we set the type to match src0 so we can compact the >>>>>> + * instruction. >>>>>> + */ >>>>>> + return set_condmod(condition, >>>>>> + emit(BRW_OPCODE_CMP, retype(dst, src0.type), >>>>>> + fix_unsigned_negate(src0), >>>>>> + fix_unsigned_negate(src1))); >>>>>> + } >>>>>> + >>>>>> + /** >>>>>> + * Gen4 predicated IF. >>>>>> + */ >>>>>> + instruction * >>>>>> + IF(brw_predicate predicate) const >>>>>> + { >>>>>> + instruction *inst = emit(BRW_OPCODE_IF); >>>>>> + return set_predicate(predicate, inst); >>>>>> + } >>>>>> + >>>>>> + /** >>>>>> + * Gen6 IF with embedded comparison. >>>>>> + */ >>>>>> + instruction * >>>>>> + IF(const src_reg &src0, const src_reg &src1, >>>>>> + brw_conditional_mod condition) const >>>>>> + { >>>>>> + assert(shader->devinfo->gen == 6); >>>>>> + return set_condmod(condition, >>>>>> + emit(BRW_OPCODE_IF, >>>>>> + null_reg_d(), >>>>>> + fix_unsigned_negate(src0), >>>>>> + fix_unsigned_negate(src1))); >>>>> >>>>> I don't see that we were calling resolve_ud_negate() on >>>>> fs_visitor::emit_if_gen6. >>>>> >>>> Right, but shouldn't any instruction comparing unsigned values which are >>>> potentially negated call resolve_ud_negate()? >>> >>> From looking at the Sandybridge PRM, the IF with embedded conditional >>> cannot do source modifiers, so I think the answer is no. >>> >> I suspect that the PRM is wrong, IIRC I checked this on real hardware >> and the simulator, and the embedded conditional on SNB behaves largely >> like the normal CMP instruction, source modifiers work as usual (though >> we didn't actually make use of them AFAIK). > > All the more reason that this should be done as a separate patch with > an explanation. > >>>>> In fact, we've removed the function entirely (a known regression in >>>>> the switch to NIR). I'd drop this for now. It's unused as well. >>>>> >>>> If there are still plans to restore this functionality, I would rather >>>> leave it alone rather than remove it now for somebody else to have to >>>> rewrite it later on, because it's less work and doesn't seem to hurt. >>> >>> But moreover, again I don't like hidden changes like this in what is >>> sold as a refactor. >>> >> It was never my intention to sell this as a refactor. > > Are you just arguing semantics? > > Call it what you will -- the series is refactoring a bunch of code. > >>> Even if it had been a bug in the original code, this patch would *not* >>> have been the appropriate place to fix it. >>> >> Heh, OK, next time I'm rewriting code I'll try to pay more attention to >> preserve the bugs I notice in the original code. > > Drop the snark. I shouldn't have to explain to you that huge > copy-and-paste blocks shouldn't have hidden functional changes.
Sorry for the snark, I couldn't help to answer humorously to your apparent anger at my rewrite being more careful than the original. No, it wasn't just a copy-and-paste block, the one function we've been arguing about had little resemblance (except maybe in the Doxygen comment above) to the original, you could have guessed I had rewritten it fully. And no, it wasn't a functional change, it matched the behavior of the original function exactly, for the whole set of inputs for which the original wouldn't generate broken code. If we didn't have a bug that is. Anyway, I've dropped the definition of this unnecessary function altogether, as you requested.
signature.asc
Description: PGP signature
_______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev