It is now mostly working, and I've incorporated a bunch of the feedback I got on the last posting. I promised to have it out on the list by american office hours today, so here it goes.
There is a massive issue with the compare_entries function. I'm hacking on that atm, trying to come up with a solution I'm comfortable with, that does not cause bugs. There is also an issue with the way I insert new consts. There's probably something there that I have not understood. Below follows a list of changes from last version: Add pessimation of branching on undef (based on llvm's aproach) Incorporate some of Connors feedback Incorporate Matt's feedback Rework set_float_range API Reworked to not do a separate initialization of everything Some cleanups of unneeded variables Handle loops differently Add a visit-counter to prevent endless looping (This has proven to not do anything, could be dropped) Squash a bug where we where adding ourself as user instead of the user (Same accidental mistake I had with uses in the LCSSA pass) Calculate a lot of the things we can trivially compute. Special case calculation of abs. We are now detecting bcsel(x, 0, 0) (four cases) abs(0) (two cases) Remove support for phi's: It was buggy and hard to get right, as type-information was kinda hard to get hold of. A quick statistics gathering showed phi-nodes to amount to only 0.24% of the ssa-defs, and so one can assume that getting them right would not give us any substantial benefit. Phi-nodes might have some saying in if we are able to remove complete nodes/blocks? If so, then we should probably revisit this decision, but for now it seems the return-on-investment and risk-reward ratio does not justify dealing with them. Signed-off-by: Thomas Helland <thomashellan...@gmail.com> --- src/glsl/Makefile.sources | 1 + src/glsl/nir/nir.h | 2 + src/glsl/nir/nir_opt_value_range.c | 1570 ++++++++++++++++++++++++++++++++++++ 3 files changed, 1573 insertions(+) create mode 100644 src/glsl/nir/nir_opt_value_range.c diff --git a/src/glsl/Makefile.sources b/src/glsl/Makefile.sources index 4f79fee..0b971c9 100644 --- a/src/glsl/Makefile.sources +++ b/src/glsl/Makefile.sources @@ -56,6 +56,7 @@ NIR_FILES = \ nir/nir_opt_peephole_ffma.c \ nir/nir_opt_peephole_select.c \ nir/nir_opt_remove_phis.c \ + nir/nir_opt_value_range.c \ nir/nir_print.c \ nir/nir_remove_dead_variables.c \ nir/nir_search.c \ diff --git a/src/glsl/nir/nir.h b/src/glsl/nir/nir.h index aba653e..19202a7 100644 --- a/src/glsl/nir/nir.h +++ b/src/glsl/nir/nir.h @@ -1707,6 +1707,8 @@ bool nir_opt_peephole_ffma(nir_shader *shader); bool nir_opt_remove_phis(nir_shader *shader); +bool nir_opt_value_range(nir_shader *shader); + void nir_sweep(nir_shader *shader); #ifdef __cplusplus diff --git a/src/glsl/nir/nir_opt_value_range.c b/src/glsl/nir/nir_opt_value_range.c new file mode 100644 index 0000000..616ec1c --- /dev/null +++ b/src/glsl/nir/nir_opt_value_range.c @@ -0,0 +1,1570 @@ +/* + * Copyright © 2015 Thomas Helland + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the "Software"), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice (including the next + * paragraph) shall be included in all copies or substantial portions of the + * Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS + * IN THE SOFTWARE. + */ + +#include "nir.h" +#include "nir_ssa_def_worklist.h" +#include "nir_block_worklist.h" +#include "nir_constant_expressions.h" + +/* This pass implements an extension of + * "Constant Propagation with Conditional Branches" by Wegman and Zadeck + * that also handles value ranges. This is useful as a lot of shaders have + * min/max expressions that can be eliminated, or conditionals that we can + * prove to be false or true due to previously applied restrictions on range. + * Value range propagation is a superset of tracking constant values, so this + * pass eliminates the need for a separate constant propagation pass. The + * pass is optimistic, meaning we assume all variables are constant (or have + * restricted range) and disprove it. A pessimistic algorithm would assume + * all values where undeterminable, and then propagate expressions we know + * to be constant through the program. An optimistic algorithm gets better + * results than a pessimistic, with the downside that it can not be aborted + * halfway through as the information gathered may be wrong. + * + * The lattice types are: + * undefined: Variable may be constant or range-restricted + * (We have not yet gotten to them in the pass) + * constant: Value is determined to be constant + * range: Value is determined to be range-restricted + * overdefined: We cannot guarantee the value is constant or range-restricted + * + * We extend the lattice from the SCCP pass so that constant entries are + * changed to inclusive ranges for each vector component. The join rules are: + * + * undefined join undefined = undefined + * undefined join overdefined = overdefined + * overdefined join overdefined = overdefined + * [low, high] join overdefined = overdefined + * [low, high] join undefined = [low, high] + * [low1, high1] join [low2, high2] = [min(low1, low2), max(high1, high2)] + * + * These rules are general pessimistic rules. There may be situations where + * we can still determine parts of the range of the variable, even though it + * has an overdefined input (i.e. max, min, sat, abs). This is also true for + * boolean operations like AND and OR. These can be determined even if we + * know only one of the operators. + * + * We don't preserve the range perfectly for a variable, as we combine two + * ranges for a variable into a range of: + * [min(low1, low2), max(high1, high2)] + * Preserving the non-continuous range information would greatly complicate + * the pass, and is therefore not implemented. + * + * There is one interesting situation that is hard to deal with: + * When we find that something is dead code, but it does not become a + * constant value. Examples are things like max(sin(x), y) where y > 1. + * We know sin(x) is dead code, but the result is not a constant but instead + * an ssa-def with a range. We mark this in the range-state so that we can + * eliminate it after the pass is done. This means that the pass should be + * rerun if we resolve one of these, as we will then have simplified the + * program, and new ranges may be resolved. + * + * When the pass is done all "undefined" values should be determined as + * either const, range, or overdefined. (Except for in blocks that are + * marked as unreachable, as they may have not been processed) + */ + +/* XXX: Review feedback from Matt + * Food for thought: a range isn't really what you want for bools. Maybe we + * can think about extending this analysis to know about disjount ranges, + * or maybe even just a flag to say "it's either high or low". + */ + +/* XXX: + * We need a mechanism for overflow protection for integer and + * unsigned integer calculations as we may end up adding two values, + * overflowing the datatype, and wreck havoc as low may now be high, + * and vice versa. + */ + +#define IS_FLOAT_CONSTANT(const_value, operator, operand, num_components) \ + ((num_components >= 1) ? \ + const_value.f[0] operator operand && \ + ((num_components >= 2) ? \ + const_value.f[1] operator operand && \ + ((num_components >= 3) ? \ + const_value.f[2] operator operand && \ + ((num_components == 4) ? \ + const_value.f[3] operator operand : \ + true) : \ + true) : \ + true) : \ + false) + +#define IS_INT_CONSTANT(const_value, operator, operand, num_components) \ + ((num_components >= 1) ? \ + const_value.i[0] operator operand && \ + ((num_components >= 2) ? \ + const_value.i[1] operator operand && \ + ((num_components >= 3) ? \ + const_value.i[2] operator operand && \ + ((num_components == 4) ? \ + const_value.i[3] operator operand : \ + true) : \ + true) : \ + true) : \ + false) + +#define IS_UNSIGNED_CONSTANT(const_value, operator, operand, num_components) \ + ((num_components >= 1) ? \ + const_value.u[0] operator operand && \ + ((num_components >= 2) ? \ + const_value.u[1] operator operand && \ + ((num_components >= 3) ? \ + const_value.u[2] operator operand && \ + ((num_components == 4) ? \ + const_value.u[3] operator operand : \ + true) : \ + true) : \ + true) : \ + false) + +typedef enum { + undefined, + range, + constant, + overdefined +} lattice_type; + +typedef enum { + low, + high, + both +} range_to_set; + +typedef struct { + /* Is this entry float, unsigned or something else? */ + nir_alu_type range_type; + + nir_const_value lo; + nir_const_value hi; + + /* What type of lattice is this */ + lattice_type type; + + nir_ssa_def *ssa_def; + bool in_loop; + + /* Visit-counter to prevent infinite looping */ + unsigned visits; +} lattice_entry; + +typedef struct { + /* Corresponds to SSA Work in the original paper */ + nir_ssa_def_worklist ssa_worklist; + + /* Work list of blocks, corresponding to the papers Flow work list */ + nir_block_worklist block_worklist; + + /* Keep track of which blocks are reachable */ + BITSET_WORD *reachable_blocks; + + /* Keep track of which blocks are in loops */ + BITSET_WORD *blocks_in_loop; + + /* An array of lattice_antries for all the ssa_defs */ + lattice_entry *entries; +} value_range_state; + +static lattice_entry * +get_lattice_entry(nir_ssa_def *def, value_range_state *state) +{ + lattice_entry *entry = &state->entries[def->index]; + + /* On the first run of this function the def may not have been added to + * the lattice_entry, so do that here. + */ + if (!entry->ssa_def) + entry->ssa_def = def; + + return entry; +} + +static bool +is_instr_supported(nir_instr_type type) +{ + switch (type) { + case nir_instr_type_alu: + case nir_instr_type_phi: + case nir_instr_type_load_const: + return true; + default: + return false; + } +} + +/* Returns true if this is a change in status of the entry. This simplifies + * checking if users of this entry should be added to the worklist. + */ +static bool +set_as_overdefined(lattice_entry *entry, nir_alu_type type) +{ + if (entry->type == overdefined && entry->range_type == type) + return false; + + entry->type = overdefined; + entry->range_type = type; + + for (unsigned i = 0; i < entry->ssa_def->num_components; i++) { + switch (type) { + case nir_type_float: + entry->hi.f[i] = INFINITY; + entry->lo.f[i] = -INFINITY; + break; + + case nir_type_int: + entry->hi.i[i] = INT32_MAX; + entry->lo.i[i] = INT32_MIN; + break; + + case nir_type_bool: + case nir_type_unsigned: + entry->hi.u[i] = UINT32_MAX; + entry->lo.u[i] = 0; + break; + case nir_type_invalid: + break; + } + } + + return true; +} + +static inline void +set_range_float(lattice_entry *entry, range_to_set to_set, float_t value) +{ + for (unsigned i = 0; i < entry->ssa_def->num_components; i++) { + + if (to_set == low || to_set == both) + entry->lo.f[i] = value; + + if (to_set == high || to_set == both) + entry->hi.f[i] = value; + + if (to_set == both) { + entry->type = constant; + } else { + entry->type = range; + } + } +} + +static inline void +set_range_int(lattice_entry *entry, range_to_set to_set, int32_t value) +{ + for (unsigned i = 0; i < entry->ssa_def->num_components; i++) { + + if (to_set == low || to_set == both) + entry->lo.i[i] = value; + + if (to_set == high || to_set == both) + entry->hi.i[i] = value; + + if (to_set == both) { + entry->type = constant; + } else { + entry->type = range; + } + } +} + +static inline void +set_range_uint(lattice_entry *entry, range_to_set to_set, uint32_t value) +{ + for (unsigned i = 0; i < entry->ssa_def->num_components; i++) { + + if (to_set == low || to_set == both) + entry->lo.u[i] = value; + + if (to_set == high || to_set == both) + entry->hi.u[i] = value; + + if (to_set == both) { + entry->type = constant; + } else { + entry->type = range; + } + } +} + +static bool +is_type_max(nir_const_value *value, nir_alu_type type, + unsigned num_components) +{ + for (unsigned i = 0; i < num_components; i++) { + switch (type) { + case nir_type_float: + if (value->f[i] != INFINITY) + return false; + break; + + case nir_type_int: + if (value->i[i] != INT32_MAX) + return false; + break; + + case nir_type_bool: + case nir_type_unsigned: + if (value->u[i] != UINT32_MAX) + return false; + break; + + case nir_type_invalid: + unreachable("not reached"); + } + } + + return true; +} + +static bool +is_type_min(nir_const_value *value, nir_alu_type type, + unsigned num_components) +{ + for (unsigned i = 0; i < num_components; i++) { + switch (type) { + case nir_type_float: + if (value->f[i] != -INFINITY) + return false; + break; + + case nir_type_int: + if (value->i[i] != INT32_MIN) + return false; + break; + + case nir_type_bool: + case nir_type_unsigned: + if (value->u[i] != 0) + return false; + break; + + case nir_type_invalid: + unreachable("not reached"); + } + } + + return true; +} + +static nir_const_value +per_component_max(nir_const_value src0, nir_const_value src1, + nir_alu_type type, unsigned num_components) +{ + nir_const_value value; + for (unsigned i = 0; i < num_components; i++) { + switch (type) { + case nir_type_float: + value.f[i] = MAX2(src0.f[i], src1.f[i]); + break; + case nir_type_int: + value.i[i] = MAX2(src0.i[i], src1.i[i]); + break; + case nir_type_bool: + case nir_type_unsigned: + value.u[i] = MAX2(src0.u[i], src1.u[i]); + break; + case nir_type_invalid: + range_print("Nir type invalid in max \n"); + unreachable("not reached"); + } + } + + return value; +} + +static nir_const_value +per_component_min(nir_const_value src0, nir_const_value src1, + nir_alu_type type, unsigned num_components) +{ + nir_const_value value; + for (unsigned i = 0; i < num_components; i++) { + switch (type) { + case nir_type_float: + value.f[i] = MIN2(src0.f[i], src1.f[i]); + break; + case nir_type_int: + value.i[i] = MIN2(src0.i[i], src1.i[i]); + break; + case nir_type_bool: + case nir_type_unsigned: + value.u[i] = MIN2(src0.u[i], src1.u[i]); + break; + case nir_type_invalid: + unreachable("not reached"); + } + } + + return value; +} + +typedef enum { + LESS, + LESS_EQUAL, + EQUAL, + GREATER_EQUAL, + GREATER, + MIXED, + COMPARE_FAIL +} compare_range; + +static compare_range +compare_entries(nir_alu_src *a, nir_alu_src *b, nir_alu_type type, + unsigned num_components, value_range_state *state) +{ + lattice_entry *entry_a = get_lattice_entry(a->src.ssa, state); + lattice_entry *entry_b = get_lattice_entry(b->src.ssa, state); + + bool foundless = false; + bool found_low_equal = false; + bool foundequal = false; + bool found_high_equal = false; + bool foundgreater = false; + + for (unsigned i = 0; i < num_components; i++) { + switch (type) { + case nir_type_float: + if (entry_a->hi.f[a->swizzle[i]] < entry_b->lo.f[b->swizzle[i]]) + foundless = true; + + if (entry_a->hi.f[a->swizzle[i]] == entry_b->lo.f[b->swizzle[i]]) + found_low_equal = true; + + if (entry_a->lo.f[a->swizzle[i]] > entry_b->hi.f[b->swizzle[i]]) + foundgreater = true; + + if (entry_a->lo.f[a->swizzle[i]] == entry_b->hi.f[b->swizzle[i]]) + found_high_equal = true; + + if (entry_a->hi.f[a->swizzle[i]] == entry_b->lo.f[b->swizzle[i]] && + entry_a->lo.f[a->swizzle[i]] == entry_b->hi.f[b->swizzle[i]]) + foundequal = true; + break; + case nir_type_int: + if (entry_a->hi.i[a->swizzle[i]] < entry_b->lo.i[b->swizzle[i]]) + foundless = true; + if (entry_a->lo.i[a->swizzle[i]] > entry_b->hi.i[b->swizzle[i]]) + foundgreater = true; + if (entry_a->lo.i[a->swizzle[i]] == entry_b->hi.i[b->swizzle[i]] && + entry_a->hi.i[a->swizzle[i]] == entry_b->lo.i[b->swizzle[i]]) + foundequal = true; + break; + case nir_type_unsigned: + case nir_type_bool: + if (entry_a->hi.u[a->swizzle[i]] < entry_b->lo.u[b->swizzle[i]]) + foundless = true; + if (entry_a->lo.u[a->swizzle[i]] > entry_b->hi.u[b->swizzle[i]]) + foundgreater = true; + if (entry_a->lo.u[a->swizzle[i]] == entry_b->hi.u[b->swizzle[i]] && + entry_a->hi.u[a->swizzle[i]] == entry_b->lo.u[b->swizzle[i]]) + foundequal = true; + break; + } + } + + if (foundequal) { + if (foundless) + return LESS_EQUAL; + if (foundgreater) + return GREATER_EQUAL; + return EQUAL; + } + + if (foundless) + return LESS; + if (foundgreater) + return GREATER; + + return COMPARE_FAIL; +} + +static bool +is_entry_overdefined(lattice_entry *entry) +{ + if (entry->type == overdefined) + return true; + + if (entry->type == constant) + return false; + + if (entry->type == undefined) + return false; + + /* This checks high and low to find out if the range is indeeed maximum + * and mininum of the type, and therefore in fact is overdefined. This + * can happen in a very trivial case like phi(a, b) where a = abs(x) and + * b = neg(abs(y)) and we don't know the range of either x or y. However, + * if we don't have a valid range type then we bail early. + */ + if (entry->range_type == nir_type_invalid) + return true; + + if (is_type_max(&entry->hi, entry->range_type, + entry->ssa_def->num_components) && + is_type_min(&entry->lo, entry->range_type, + entry->ssa_def->num_components)) + return true; + + return false; +} + +static bool +is_component_false(lattice_entry *entry, unsigned component) +{ + /* This can be simplified to check for 0x00000000 */ + return entry->lo.i[component] == 0x00000000 && + entry->hi.i[component] == 0x00000000; //XXX: This is not correct. Floats also have negative zero. Is this also false? +/* + switch (entry->range_type) { + case nir_type_int: + return entry->lo.i[component] == 0 && + entry->hi.i[component] == 0; + case nir_type_unsigned: + case nir_type_bool: + return entry->lo.u[component] == 0 && + entry->hi.u[component] == 0; + case nir_type_float: + return entry->lo.f[component] == 0.0f && + entry->hi.f[component] == 0.0f; + case nir_type_invalid: + unreachable("not reached"); + } +*/ + return false; +} + +static bool +is_entry_false(lattice_entry *entry) +{ + bool is_false = true; + + if (is_entry_overdefined(entry)) + return false; + + for (uint8_t i = 0; i < entry->ssa_def->num_components; i++) + is_false = is_false && is_component_false(entry, i); + + return is_false; +} + +static bool +is_component_true(lattice_entry *entry, unsigned component) +{ + + switch (entry->range_type) { + case nir_type_int: + return entry->lo.i[component] > 0 || + entry->hi.i[component] < 0; + case nir_type_unsigned: + case nir_type_bool: + return entry->lo.u[component] > 0; + case nir_type_float: + return entry->lo.f[component] > 0.0f || + entry->hi.f[component] < 0.0f; + case nir_type_invalid: + return false; // XXX: This is an issue. Should probably check the user for it's type, and use that here. + } + + return false; +} + +static bool +is_entry_true(lattice_entry *entry) +{ + bool is_true = true; + + if (is_entry_overdefined(entry)) + return false; + + for (uint8_t i = 0; i < entry->ssa_def->num_components; i++) + is_true = is_true && is_component_true(entry, i); + + return is_true; +} + +static bool +is_entry_constant(lattice_entry *entry, unsigned num_components) +{ + if (entry->type == constant) + return true; + + if (entry->type == overdefined) + return false; + + for (uint8_t i = 0; i < num_components; i++) { + if (entry->lo.u[i] != entry->hi.u[i]) + return false; + } + + entry->type = constant; + return true; +} + +static void +mark_block_reachable(nir_block *block, value_range_state *state) +{ + BITSET_SET(state->reachable_blocks, block->index); +} + +static bool +is_block_reachable(nir_block *block, value_range_state *state) +{ + return BITSET_TEST(state->reachable_blocks, block->index); +} + +static bool +special_case_alu(nir_ssa_def *def, value_range_state *state) +{ + lattice_entry *entry = get_lattice_entry(def, state); + + switch(nir_instr_as_alu(def->parent_instr)->op) { + case nir_op_fsat: + set_range_float(entry, low, 0.0f); + set_range_float(entry, high, 1.0f); + return true; + case nir_op_fsin: + case nir_op_fcos: + case nir_op_fsign: + set_range_float(entry, low, -1.0f); + set_range_float(entry, high, 1.0f); + return true; + case nir_op_fabs: + case nir_op_fexp2: + case nir_op_fpow: + case nir_op_frsq: + case nir_op_fsqrt: + set_range_float(entry, low, 0.0f); + set_range_float(entry, high, INFINITY); + return true; + case nir_op_iabs: + set_range_int(entry, low, 0); + set_range_int(entry, high, INT32_MAX); + return true; + case nir_op_isign: + set_range_int(entry, low, -1); + set_range_int(entry, high, 1); + return true; + + /* We have limited type-information for these, and they "falsely" + * advertise that they are integer oerations (?) causing us to set + * wrong values for the higher and lower bound. Set them as invalid + * so that we can infer the type-information from their uses later + * on if necessary. + */ + case nir_op_vec2: + case nir_op_vec3: + case nir_op_vec4: + set_as_overdefined(entry, nir_type_invalid); + return true; + default: + return false; + } +} + +static bool +is_trivially_computable(nir_op op) +{ + switch (op) { + + /* These just extract values, so are safe to compute */ + case nir_op_fmax: case nir_op_imax: case nir_op_umax: + case nir_op_fmin: case nir_op_imin: case nir_op_umin: + + /* We can have the following situation: + * + * vec4 ssa_59 = txl ssa_58 (coord), ssa_0 (lod), sampler0 (sampler) + * vec1 ssa_60 = imov ssa_59 + * vec1 ssa_61 = imov ssa_59.y + * vec1 ssa_62 = imov ssa_59.z + * vec1 ssa_63 = fmul ssa_59.y, ssa_7 + * + * We don't know the type of the txl operation, but we know that imow + * is an integer operation, and so we set ssa_59 and ssa_60 to + * [i_min, i_max]. That is not correct however, as we find out in ssa_63. + * The values are instead floats. So we convert the range of ssa_62 to + * float-min and max, as we know it is overdefined, and so it works out. + * This is a bit fragile, but should work. + */ + case nir_op_fmov: case nir_op_imov: + + /* Conditional selects also just extract values */ + case nir_op_bcsel: case nir_op_fcsel: + + /* These also just extract values, should be safe. However, we have + * troubles with type information, and so we compute wrong values */ +// case nir_op_vec2: case nir_op_vec3: case nir_op_vec4: + + /* These just limit the range, or flip it, and so are safe to compute */ + case nir_op_fsat: + case nir_op_fsign: case nir_op_isign: + case nir_op_fneg: case nir_op_ineg: + + /* These are not safe, as you can have a case where the input is + * [-inf, inf] which becomes [inf, inf] which is clearly wrong. This + * is dealth with in a special-casing in the compute-range section. + */ + case nir_op_fabs: case nir_op_iabs: + + /* These are also not safe. If we have X = [-inf, inf] and Y = [0, 0] + * it will never match, and so an is-equal comparison will always return + * false, although that is obviously wrong since it might very well match. + * This should also be handled separately. This can be checked by utilizing + * the compare_entries function. XXX: Fix this! + */ +// case nir_op_feq: case nir_op_ieq: case nir_op_seq: +// case nir_op_fge: case nir_op_ige: case nir_op_sge: case nir_op_uge: +// case nir_op_flt: case nir_op_ilt: case nir_op_slt: case nir_op_ult: +// case nir_op_fne: case nir_op_ine: case nir_op_sne: + + /* Logical not is safe. Most likely we can add "and" and "or" too */ + case nir_op_fnot: case nir_op_inot: + + return true; + + default: + return false; + + } +} + +static void +evaluate_alu_instr(nir_alu_instr *alu, value_range_state *state) +{ + lattice_entry *entry = get_lattice_entry(&alu->dest.dest.ssa, state); + lattice_entry *src[4]; + bool all_constant = true; + + /* XXX: How many times should we maximally visit an alu? This seems to never occur */ + if (entry->visits > 3) { + set_as_overdefined(entry, nir_op_infos[alu->op].output_type); + return; + } + + for (unsigned i = 0; i < nir_op_infos[alu->op].num_inputs; i++) { + + src[i] = get_lattice_entry(alu->src[i].src.ssa, state); + + /* Check if any of the sources are overdefined. If one of them are then + * the result of this entry will also be overdefined. We may however + * have cases where we only select one of the operands(max,csel), or we + * are limiting the range of the operand (abs, sat). These can still be + * safely computed, as we will not get undefined results. + */ + if (is_entry_overdefined(src[i]) && !is_trivially_computable(alu->op)) { + + if (!special_case_alu(entry->ssa_def, state)) + set_as_overdefined(entry, nir_op_infos[alu->op].output_type); + + entry->visits++; + return; + } + + if (src[i]->type == undefined) + return; + + unsigned num_components = nir_op_infos[alu->op].input_sizes[i]; + + if (num_components == 0) + num_components = alu->dest.dest.ssa.num_components; + + all_constant = all_constant && is_entry_constant(src[i], num_components); + } + + /** + * XXX: Snippet from the comment in nir.h + * + * For each input component, says which component of the register it is + * chosen from. Note that which elements of the swizzle are used and which + * are ignored are based on the write mask for most opcodes - for example, + * a statement like "foo.xzw = bar.zyx" would have a writemask of 1101b and + * a swizzle of {2, x, 1, 0} where x means "don't care." + * + * XXX: We handle swizzles at least partly correct now. However, we do not + * deal with writemasks yet, and so we are probably doing wrong things here. + */ + + if (all_constant) { + nir_const_value const_src[4]; + + for (unsigned i = 0; i < nir_op_infos[alu->op].num_inputs; i++) { + for (unsigned j = 0; j < entry->ssa_def->num_components; j++) { + const_src[i].f[j] = src[i]->lo.f[alu->src[i].swizzle[j]]; + } + } + + nir_const_value dest = + nir_eval_const_opcode(alu->op, + alu->dest.dest.ssa.num_components, + const_src); + + entry->visits++; + entry->type = constant; + entry->lo = dest; + entry->hi = dest; + entry->range_type = nir_op_infos[alu->op].output_type; + return; + } + + /* Check if the ssa-def is in a loop */ + if (BITSET_TEST(state->blocks_in_loop, + entry->ssa_def->parent_instr->block->index)) { + + /* We have for the first time detected that the ssa-def is in a loop. + * Initialize to the best of our knowledge and mark it as in loop + */ + if (!special_case_alu(entry->ssa_def, state)) + set_as_overdefined(entry, nir_op_infos[alu->op].output_type); + + /* Mark it so that we don't visit it anymore */ + entry->in_loop = true; + return; + } + + /* If all inputs are of range or constant type we can calculate the range + * of the operation instead of hand-rolling this ourselves. We can only + * do this for a select amount of operations. (Basically those that only + * select/filter the operands, or that restricts the range). In cases + * like mul, add, etc we can not compute the range trivially, as we can + * get situations where we multiply -inf and inf and produce nan, or we + * overflow integer types. + * + * This will allow things like sat to possibly give us constants. It will + * calculate the constant corresponding to the upper and lower value. If + * these are found to be the same in the "is-def-constant" function then + * it will get marked as constant, so there is no need to "special-case" + * things like sat(a) where a < 0.0 + */ + if (is_trivially_computable(alu->op)) { + nir_const_value lo_consts[4]; + nir_const_value hi_consts[4]; + + for (unsigned i = 0; i < nir_op_infos[alu->op].num_inputs; i++) { + /* We may have set one of the operands to overdefined without knowing + * what type it was going to be. Now we know that the operand is used + * in an operation with a give alu-type, and so we can set the + * operand as overdef again, but with the right alu-type. + */ + if (is_entry_overdefined(src[i]) && + src[i]->range_type != nir_op_infos[alu->op].input_types[i]) + set_as_overdefined(src[i], nir_op_infos[alu->op].input_types[i]); + + for (unsigned j = 0; j < entry->ssa_def->num_components; j++) { + lo_consts[i].f[j] = src[i]->lo.f[alu->src[i].swizzle[j]]; + hi_consts[i].f[j] = src[i]->hi.f[alu->src[i].swizzle[j]]; + } + } + + bool first = true; + nir_const_value temp_const[4]; + nir_const_value computed_low; + nir_const_value computed_high; + + /* Possible combinations we can make with high and low for num_inputs */ + unsigned combinations = 1 << nir_op_infos[alu->op].num_inputs; + + for (unsigned i = 0; i < combinations; i++) { + + /* Do a copy of low or high values according to mask */ + for (unsigned j = 0; j < nir_op_infos[alu->op].num_inputs; j++) + temp_const[j] = i & (1 << j) ? hi_consts[j] : lo_consts[j]; + + nir_const_value temp = + nir_eval_const_opcode(alu->op, + alu->dest.dest.ssa.num_components, + temp_const); + + if (first) { + /* If this is the first run then set an initial value */ + computed_low = computed_high = temp; + first = false; + } else { + /* Do a per_component min/max to get the worst case scenario */ + computed_low = per_component_min(computed_low, temp, + nir_op_infos[alu->op].output_type, + entry->ssa_def->num_components); + computed_high = per_component_max(computed_high, temp, + nir_op_infos[alu->op].output_type, + entry->ssa_def->num_components); + } + } + + /* If the operations was an abs-operation we will have wrongfully + * computed the lower value for the abs. Take a component-wise min + * of the low-ranges of both operands. If that is higher than zero + * then set that as the range, if not, set zero as lowest value. + */ + if (alu->op == nir_op_fabs) { + nir_const_value temp = + per_component_min(lo_consts[0], lo_consts[1], + nir_type_float, + entry->ssa_def->num_components); + + if (IS_FLOAT_CONSTANT(temp, >, 0.0f, + entry->ssa_def->num_components)) { + computed_low = temp; + } else { + computed_low.f[0] = 0.0f; computed_low.f[1] = 0.0f; + computed_low.f[2] = 0.0f; computed_low.f[3] = 0.0f; + } + } + + if (alu->op == nir_op_iabs) { + nir_const_value temp = + per_component_min(lo_consts[0], lo_consts[1], + nir_type_int, + entry->ssa_def->num_components); + + if (IS_INT_CONSTANT(temp, >, 0, + entry->ssa_def->num_components)) { + computed_low = temp; + } else { + computed_low.i[0] = 0; computed_low.i[1] = 0; + computed_low.i[2] = 0; computed_low.i[3] = 0; + } + } + + entry->visits++; + entry->type = range; + entry->lo = computed_low; + entry->hi = computed_high; + entry->range_type = nir_op_infos[alu->op].output_type; + return; + } + + if (special_case_alu(&alu->dest.dest.ssa, state)) + return; + + entry->visits++; + set_as_overdefined(entry, nir_op_infos[alu->op].output_type); + return; +} + +static bool +evaluate_entry(lattice_entry *entry, value_range_state *state) +{ + lattice_type old_type = entry->type; + nir_const_value old_hi; + nir_const_value old_lo; + + /* If it is already overdefined then that can not change */ + if (entry->type == overdefined) + return false; + + /* We have already marked load_consts as constant, no use evaluating + * + * XXX: This is most likely not correct according to the papers. We can + * still go from constant and down to overdefined in some cases, so we + * should allow "analyzing" of constants + */ +// if (is_entry_constant(entry)) +// return false; + + for (unsigned i = 0; i < 4; i++) { + old_hi.f[i] = entry->hi.f[i]; + old_lo.f[i] = entry->lo.f[i]; + } + + switch (entry->ssa_def->parent_instr->type) { + case nir_instr_type_load_const: { + /* First time we encounter a load_const, mark as constant */ + nir_load_const_instr *instr = + nir_instr_as_load_const(entry->ssa_def->parent_instr); + entry->type = constant; + entry->lo = instr->value; + entry->hi = instr->value; + entry->range_type = nir_type_invalid; + break; + } + + case nir_instr_type_alu: { + nir_alu_instr *alu = nir_instr_as_alu(entry->ssa_def->parent_instr); + evaluate_alu_instr(alu, state); + break; + } + + default: + return set_as_overdefined(entry, nir_type_invalid); + } + + /* If we find that the entry has become overdefined without marking it as + * such that means we have calculated the range to +/- inf. That means we + * have changed the status of the entry, + */ + if (is_entry_overdefined(entry)) { + set_as_overdefined(entry, entry->range_type); + return true; + } + + /* Now we check if the information for the instruction has changed. + * If it has then we return true, so that we can evaluate the users. + */ + if (entry->type != old_type) + return true; + + for (unsigned i = 0; i < 4; i++) { + if (old_hi.f[i] != entry->hi.f[i] || + old_lo.f[i] != entry->lo.f[i]) + return true; + } + + return false; +} + +/* Coordinates finding the uses of the ssa_def corresponding to the entry + * and sticking them in the ssa_worklist. + * Should be run on every lattice_entry that we change the information of. + */ +static void +coordinate_uses(lattice_entry *entry, value_range_state *state) +{ + nir_foreach_use_safe(entry->ssa_def, src) { + nir_instr *user = src->parent_instr; + + /* No point in checking the use if it is not yet found reachable */ + if (!is_block_reachable(user->block, state)) + continue; + + /* We don't want to deal with unsupported instructions. This should + * already be marked overdefined on the first visit of the block it + * is in, so no point in doing anything special + */ + if (!is_instr_supported(user->type)) + continue; + + nir_ssa_def *def = nir_instr_get_dest_ssa_def(user); + + /* If it is overdefined we want to push it to head of the list. + * That way we will propagate those faster, avoiding visiting + * ssa-defs with overdefined sources multiple times. */ + if (is_entry_overdefined(entry)) { + nir_ssa_def_worklist_push_head(&state->ssa_worklist, def); + } else { + nir_ssa_def_worklist_push_tail(&state->ssa_worklist, def); + } + } + + nir_foreach_if_use_safe(entry->ssa_def, src) { + /* If the condition was used for an if then we should do something + * about the if to "push our range" into the then and else branch by + * inserting a copy in each of the blocks where we "apply" the range + * implied by the if-statement. + */ + nir_if *if_statement = src->parent_if; + + nir_cf_node *then_node = nir_if_first_then_node(if_statement); + nir_cf_node *else_node = nir_if_first_else_node(if_statement); + + nir_block *then_block = nir_cf_node_as_block(then_node); + nir_block *else_block = nir_cf_node_as_block(else_node); + + if (is_entry_true(entry)) { + nir_block_worklist_push_tail(&state->block_worklist, then_block); + continue; + } + + if (is_entry_false(entry)) { + nir_block_worklist_push_tail(&state->block_worklist, else_block); + continue; + } + + if (entry->type == undefined) + continue; + + nir_block_worklist_push_tail(&state->block_worklist, then_block); + nir_block_worklist_push_tail(&state->block_worklist, else_block); + } +} + +static bool +handle_unsupported_def(nir_ssa_def *def, void *void_state) +{ + value_range_state *state = void_state; + lattice_entry *entry = get_lattice_entry(def, state); + set_as_overdefined(entry, nir_type_invalid); + coordinate_uses(entry, state); + return true; +} + +/* On the first run of a block we want to always check all the uses + * of the instructions that we process. + */ +static void +evaluate_block(nir_block *block, value_range_state *state) +{ + nir_foreach_instr_safe(block, instr) { + + /* Skip doing anything to unsupported instructions */ + if (!is_instr_supported(instr->type)) { + nir_foreach_ssa_def(instr, handle_unsupported_def, state); + continue; + } + + nir_ssa_def *def = nir_instr_get_dest_ssa_def(instr); + lattice_entry *entry = get_lattice_entry(def, state); + + evaluate_entry(entry, state); + coordinate_uses(entry, state); + } +} + +static void +mark_cf_node_loop(nir_cf_node *node, value_range_state *state, bool in_loop) +{ + switch (node->type) { + case nir_cf_node_block: + + if (in_loop) + BITSET_SET(state->blocks_in_loop, nir_cf_node_as_block(node)->index); + + return; + case nir_cf_node_if: { + nir_if *if_stmt = nir_cf_node_as_if(node); + foreach_list_typed(nir_cf_node, nested_node, node, &if_stmt->then_list) + mark_cf_node_loop(nested_node, state, in_loop); + foreach_list_typed(nir_cf_node, nested_node, node, &if_stmt->else_list) + mark_cf_node_loop(nested_node, state, in_loop); + return; + } + case nir_cf_node_loop: { + nir_loop *loop = nir_cf_node_as_loop(node); + foreach_list_typed(nir_cf_node, nested_node, node, &loop->body) + mark_cf_node_loop(nested_node, state, true); + break; + } + default: + unreachable("unknown cf node type"); + } +} + +static bool +nir_opt_value_range_impl(nir_function_impl *impl) +{ + /* We might want to run a pass to insert "pi-nodes" into the + * ssa-tree before running the pass. This is essentially just + * a mov x2 = x1 that we use to have something to "store" the + * range implied by things like if's. + * + * This will also lead to a need of inserting more phi-nodes, + * as one gets variables that diverge and then converge. + * + * x1 = ....; [-unlimited, unlimited] + * if (x1 < 7) + * x2 = x1; [-unlimited, 7] + * | + * | + * else + * x3 = x1; [7, unlimited] + * | + * | + * x4 = phi(x2, x3); + */ + + /* We want to recalculate the ssa-indexes first, to reduce our memory + * consumption and amount of "empty" value-range-variables + */ + nir_index_ssa_defs(impl); + nir_index_blocks(impl); + + /* No use in running the pass if there are no ssa-defs */ + if (impl->ssa_alloc == 0) + return false; + + bool progress = false; + + void *mem_ctx = ralloc_context(NULL); + value_range_state state; + + state.entries = rzalloc_array(mem_ctx, lattice_entry, + impl->ssa_alloc); + + nir_block_worklist_init(&state.block_worklist, impl->num_blocks, + mem_ctx); + + nir_ssa_def_worklist_init(&state.ssa_worklist, impl->ssa_alloc, + mem_ctx); + + state.reachable_blocks = rzalloc_array(mem_ctx, BITSET_WORD, + BITSET_WORDS(impl->num_blocks)); + + state.blocks_in_loop = rzalloc_array(mem_ctx, BITSET_WORD, + BITSET_WORDS(impl->num_blocks)); + + /* Mark the blocks that are inside a loop so we can avoid them */ + foreach_list_typed(nir_cf_node, node, node, &impl->body) { + if (node->type == nir_cf_node_loop) + mark_cf_node_loop(node, &state, true); + else + mark_cf_node_loop(node, &state, false); + } + + nir_block_worklist_push_head(&state.block_worklist, impl->start_block); + + /* Process the work lists until they are empty */ + while (state.block_worklist.count > 0 || + state.ssa_worklist.count > 0) { + + /* Process the basic block work list */ + while (state.block_worklist.count > 0) { + nir_block *block = nir_block_worklist_pop_head(&state.block_worklist); + + /* Since we have our "coordinate_uses" function that also handles + * phi nodes we can skip the block if it is already set reachable, + * as the phi's will get automagically added to the ssa-worklist + * as they are users of the defs. + */ + if (is_block_reachable(block, &state)) + continue; + + /* Block has been determined to be reachable, mark it */ + mark_block_reachable(block, &state); + + /* Evaluate all phi's and expressions of the block. */ + evaluate_block(block, &state); + + /* If the block has only one successor then add it to the worklist */ + if ((block->successors[0] != NULL) && + (block->successors[1] == NULL)) { + nir_block_worklist_push_tail(&state.block_worklist, + block->successors[0]); + continue; + } + } + + /* Process the instruction work list + * This doesn't do things exactly like in the paper. + * Instead of storing the instructions that have changed and processing + * each user we are instead adding to the list all the users of + * instructions that have changed. In practice there is no difference, + * apart from dealing with uses is moved out to a separate function. + */ + while (state.ssa_worklist.count > 0) { + /* All instructions in the list are here because + * we got new information about the range of an operand. + */ + nir_ssa_def *def = nir_ssa_def_worklist_pop_head(&state.ssa_worklist); + lattice_entry *entry = get_lattice_entry(def, &state); + + /* If the def is in a loop we don't want to do anything. + * (The instruction is initialized as best we can when we mark it.) + * When the block it's in is added to the worklist the entry will + * get processed, and so we will evaluate its users. + */ + if (entry->in_loop) + continue; + + /* Evaluate the ssa_def. If it has changed then add users to list */ + if (evaluate_entry(entry, &state)) + coordinate_uses(entry, &state); + } + + /* If both lists are empty that means we are done. However, we assume + * branches on undef values can not reach any of their successors. + * This is not a safe assumption, and can lead to eliminating things + * we shouldn't. Do a trip over all the ssa-defs to find unresolved + * branches. If one is found then add the "then"-branch to the block + * worklist. That way we will find the rest of the CFG, and only + * slightly pessimizes the analysis results. + * + * XXX: It might be of interest to also check the what operations + * use undefined values as sources, and resolve those. Some operations + * should return undef when there's a undef operand, while others + * should return i.e. 0 to be conservative. + * + * This part is implemented in a "resolvedUndefsIn" function in the + * LLVM SCCP pass, and should give a good idea of what should be done. + */ + if (state.block_worklist.count == 0 && + state.ssa_worklist.count == 0) { + + for (unsigned i = 0; i < impl->ssa_alloc; i++) { + lattice_entry *entry = &state.entries[i]; + + if (entry->type != undefined) + continue; + + if (!entry->ssa_def) + continue; + + if (!list_empty(&entry->ssa_def->if_uses)) { + nir_foreach_if_use_safe(entry->ssa_def, src) { + nir_cf_node *node = nir_if_first_then_node(src->parent_if); + /* First node in the branch should be a block */ + assert(node->type == nir_cf_node_block); + nir_block *block = nir_cf_node_as_block(node); + nir_block_worklist_push_tail(&state.block_worklist, block); + } + } + } + } + } + + /* We can now traverse over blocks and delete those that + * are still marked as unreachable. If we delete a basic block + * we need to first rewrite the phi's that use the results from + * the BB. + * + * For now we want to leave the blocks in place, as when the + * conditional for the block that is unreachable is set as a constant + * Connor's pass for removing dead control flow will come along + * and clean up the blocks that can not be reached. + */ + + /* Check for all values that are proved to be constant, + * and replace them with their constant instruction counterpart. */ + for (unsigned i = 0; i < (impl->ssa_alloc - 1); i++) { + lattice_entry *entry = &(state.entries[i]); + + /* If it's a constant thats not a load_const then make a load_const + * instruction and replace the uses of the old instr with that. + */ + if (is_entry_constant(entry, entry->ssa_def->num_components) && + (entry->ssa_def->parent_instr->type != nir_instr_type_load_const)) { + + /* XXX: Doing the below replacement causes crashes because there is + * something I have not yet understood about how to rewrite things. + */ +/* + nir_load_const_instr *new_instr = + nir_load_const_instr_create(impl->overload->function->shader, + entry->ssa_def->num_components); + + new_instr->value = entry->hi; + + nir_ssa_def_init(&new_instr->instr, &new_instr->def, + entry->ssa_def->num_components, + "Constant propagated const"); + + nir_instr_insert_before(entry->ssa_def->parent_instr, &new_instr->instr); + + nir_ssa_def_rewrite_uses(entry->ssa_def, nir_src_for_ssa(&new_instr->def), + impl); + + nir_instr_remove(entry->ssa_def->parent_instr); + ralloc_free(entry->ssa_def->parent_instr); +*/ + + continue; + } + + if (entry->ssa_def->parent_instr->type != nir_instr_type_alu) + continue; + + nir_alu_instr *alu = nir_instr_as_alu(entry->ssa_def->parent_instr); + + if (nir_op_infos[alu->op].num_inputs == 2 && + !(is_entry_overdefined(get_lattice_entry(alu->src[0].src.ssa, &state)) || + is_entry_overdefined(get_lattice_entry(alu->src[1].src.ssa, &state)))) { + + compare_range result = MIXED; + + switch (alu->op) { + case nir_op_fmax: case nir_op_fmin: + case nir_op_imax: case nir_op_imin: + case nir_op_umax: case nir_op_umin: + + result = compare_entries(&alu->src[0], &alu->src[1], + nir_op_infos[alu->op].output_type, + entry->ssa_def->num_components, &state); + break; + default: + break; + } + + if (result == LESS || result == LESS_EQUAL) { + if (alu->op == nir_op_fmax || + alu->op == nir_op_imax || + alu->op == nir_op_umax) + nir_ssa_def_rewrite_uses(entry->ssa_def, alu->src[1].src, mem_ctx); + else + nir_ssa_def_rewrite_uses(entry->ssa_def, alu->src[0].src, mem_ctx); + + progress = true; + continue; + } + + if (result == GREATER || result == GREATER_EQUAL) { + if (alu->op == nir_op_fmax || + alu->op == nir_op_imax || + alu->op == nir_op_umax) + nir_ssa_def_rewrite_uses(entry->ssa_def, alu->src[0].src, mem_ctx); + else + nir_ssa_def_rewrite_uses(entry->ssa_def, alu->src[1].src, mem_ctx); + + progress = true; + continue; + } + } + + /* XXX: Here we also want to add checks for undefined behaviour + * + * We also want to detect lo = 0, hi = 1 and replace with sat. + * We also want to use the compare_entries function to check these + */ + + + /* Check if the result of a comparison can be determined */ + if (nir_op_infos[alu->op].num_inputs == 2 && + !(is_entry_overdefined(get_lattice_entry(alu->src[0].src.ssa, &state)) || + is_entry_overdefined(get_lattice_entry(alu->src[1].src.ssa, &state)))) { + + compare_range result = MIXED; + + switch (alu->op) { + case nir_op_feq: case nir_op_ieq: case nir_op_seq: + case nir_op_fge: case nir_op_ige: case nir_op_sge: case nir_op_uge: + case nir_op_flt: case nir_op_ilt: case nir_op_slt: case nir_op_ult: + case nir_op_fne: case nir_op_ine: case nir_op_sne: + + result = compare_entries(&alu->src[0], &alu->src[1], + nir_op_infos[alu->op].input_types[0], + entry->ssa_def->num_components, &state); + break; + default: + break; + } + + if (alu->op == nir_op_feq || + alu->op == nir_op_ieq || + alu->op == nir_op_seq) { + if (result == EQUAL) { + /* We should insert a true */ + } + + if (result == LESS || result == GREATER) { + /* We should insert a false */ + } + } + + if (alu->op == nir_op_fne || + alu->op == nir_op_ine || + alu->op == nir_op_sne) { + if (result == EQUAL) { + /* We should insert a false */ + } + + if (result == LESS || result == GREATER) { + /* We should insert a true */ + } + } + + if (alu->op == nir_op_flt || alu->op == nir_op_ilt || + alu->op == nir_op_slt || alu->op == nir_op_ult) { + if (result == LESS) { + /* We should insert a true */ + } + + if (result == GREATER) { + /* We should insert a false */ + } + } + + if (alu->op == nir_op_fge || alu->op == nir_op_ige || + alu->op == nir_op_sge || alu->op == nir_op_uge) { + if (result == GREATER) { + /* We should insert a true */ + } + + if (result == LESS) { + /* We should insert a false */ + } + } +// progress = true; +// continue; + } + + + bool set_false = false; + bool set_true = false; + /* Detect that "overdefined && false" == false, as we would have bailed + * on this earlier in the pass since we have an overdefined source + */ + if (alu->op == nir_op_fand || alu->op == nir_op_iand) { + for (int i = 0; i < 2; i++) { + lattice_entry *e = get_lattice_entry(alu->src[i].src.ssa, &state); + if (is_entry_false(e)) { + set_false = true; + } + } + } + + /* Detect that "overdefined || true" == true, as we would have bailed + * on this earlier in the pass since we have an overdefined source + */ + if (alu->op == nir_op_for || alu->op == nir_op_ior) { + for (int i = 0; i < 2; i++) { + lattice_entry *e = get_lattice_entry(alu->src[i].src.ssa, &state); + if (is_entry_true(e)) { + set_true = true; + } + } + } + + if (set_true || set_false) { + nir_const_value dest; + + if (set_false) { + /* We can replace entry with "false". 0.0f == 0x00000000 which + * should be tha same as signed integer zero, so this is safe + */ + dest.f[0] = dest.f[1] = dest.f[2] = dest.f[3] = 0.0f; + } + + if (set_true) { + /* We can replace entry with 1.0f which is also "true" for int */ + dest.f[0] = dest.f[1] = dest.f[2] = dest.f[3] = 1.0f; + } +/* + nir_load_const_instr *new_instr = + nir_load_const_instr_create(impl->overload->function->shader, + alu->dest.dest.ssa.num_components); + + new_instr->value = dest; + + nir_instr_insert_before(&alu->instr, &new_instr->instr); + + nir_ssa_def_rewrite_uses(&alu->dest.dest.ssa, nir_src_for_ssa(&new_instr->def), + impl); + + nir_instr_remove(&alu->instr); + ralloc_free(alu); +*/ + } + + } + + ralloc_free(mem_ctx); + + return progress; +} + +bool +nir_opt_value_range(nir_shader *shader) +{ + bool progress = false; + + nir_foreach_overload(shader, overload) { + if (overload->impl && nir_opt_value_range_impl(overload->impl)) + progress = true; + } + + return progress; +} -- 2.4.4 _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev