2015-08-06 19:22 GMT+02:00 Thomas Helland <thomashellan...@gmail.com>: > It is now mostly working, and I've incorporated a bunch > of the feedback I got on the last posting. I promised to > have it out on the list by american office hours today, > so here it goes. > > There is a massive issue with the compare_entries function. > I'm hacking on that atm, trying to come up with a solution > I'm comfortable with, that does not cause bugs. There is > also an issue with the way I insert new consts. There's > probably something there that I have not understood. > > Below follows a list of changes from last version: > Add pessimation of branching on undef (based on llvm's aproach) > Incorporate some of Connors feedback > Incorporate Matt's feedback > Rework set_float_range API > Reworked to not do a separate initialization of everything > Some cleanups of unneeded variables > Handle loops differently > Add a visit-counter to prevent endless looping > (This has proven to not do anything, could be dropped) > Squash a bug where we where adding ourself as user instead of the user > (Same accidental mistake I had with uses in the LCSSA pass) > Calculate a lot of the things we can trivially compute. > Special case calculation of abs. > > We are now detecting bcsel(x, 0, 0) (four cases) > abs(0) (two cases) > > Remove support for phi's: > > It was buggy and hard to get right, as type-information > was kinda hard to get hold of. A quick statistics gathering > showed phi-nodes to amount to only 0.24% of the ssa-defs, > and so one can assume that getting them right would not give > us any substantial benefit. > > Phi-nodes might have some saying in if we are able to remove > complete nodes/blocks? If so, then we should probably revisit > this decision, but for now it seems the return-on-investment > and risk-reward ratio does not justify dealing with them. > > Signed-off-by: Thomas Helland <thomashellan...@gmail.com> > --- > src/glsl/Makefile.sources | 1 + > src/glsl/nir/nir.h | 2 + > src/glsl/nir/nir_opt_value_range.c | 1570 > ++++++++++++++++++++++++++++++++++++ > 3 files changed, 1573 insertions(+) > create mode 100644 src/glsl/nir/nir_opt_value_range.c > > diff --git a/src/glsl/Makefile.sources b/src/glsl/Makefile.sources > index 4f79fee..0b971c9 100644 > --- a/src/glsl/Makefile.sources > +++ b/src/glsl/Makefile.sources > @@ -56,6 +56,7 @@ NIR_FILES = \ > nir/nir_opt_peephole_ffma.c \ > nir/nir_opt_peephole_select.c \ > nir/nir_opt_remove_phis.c \ > + nir/nir_opt_value_range.c \ > nir/nir_print.c \ > nir/nir_remove_dead_variables.c \ > nir/nir_search.c \ > diff --git a/src/glsl/nir/nir.h b/src/glsl/nir/nir.h > index aba653e..19202a7 100644 > --- a/src/glsl/nir/nir.h > +++ b/src/glsl/nir/nir.h > @@ -1707,6 +1707,8 @@ bool nir_opt_peephole_ffma(nir_shader *shader); > > bool nir_opt_remove_phis(nir_shader *shader); > > +bool nir_opt_value_range(nir_shader *shader); > + > void nir_sweep(nir_shader *shader); > > #ifdef __cplusplus > diff --git a/src/glsl/nir/nir_opt_value_range.c > b/src/glsl/nir/nir_opt_value_range.c > new file mode 100644 > index 0000000..616ec1c > --- /dev/null > +++ b/src/glsl/nir/nir_opt_value_range.c > @@ -0,0 +1,1570 @@ > +/* > + * Copyright © 2015 Thomas Helland > + * > + * Permission is hereby granted, free of charge, to any person obtaining a > + * copy of this software and associated documentation files (the "Software"), > + * to deal in the Software without restriction, including without limitation > + * the rights to use, copy, modify, merge, publish, distribute, sublicense, > + * and/or sell copies of the Software, and to permit persons to whom the > + * Software is furnished to do so, subject to the following conditions: > + * > + * The above copyright notice and this permission notice (including the next > + * paragraph) shall be included in all copies or substantial portions of the > + * Software. > + * > + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR > + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, > + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL > + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER > + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING > + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER > DEALINGS > + * IN THE SOFTWARE. > + */ > + > +#include "nir.h" > +#include "nir_ssa_def_worklist.h" > +#include "nir_block_worklist.h" > +#include "nir_constant_expressions.h" > + > +/* This pass implements an extension of > + * "Constant Propagation with Conditional Branches" by Wegman and Zadeck > + * that also handles value ranges. This is useful as a lot of shaders have > + * min/max expressions that can be eliminated, or conditionals that we can > + * prove to be false or true due to previously applied restrictions on range. > + * Value range propagation is a superset of tracking constant values, so this > + * pass eliminates the need for a separate constant propagation pass. The > + * pass is optimistic, meaning we assume all variables are constant (or have > + * restricted range) and disprove it. A pessimistic algorithm would assume > + * all values where undeterminable, and then propagate expressions we know > + * to be constant through the program. An optimistic algorithm gets better > + * results than a pessimistic, with the downside that it can not be aborted > + * halfway through as the information gathered may be wrong. > + * > + * The lattice types are: > + * undefined: Variable may be constant or range-restricted > + * (We have not yet gotten to them in the pass) > + * constant: Value is determined to be constant > + * range: Value is determined to be range-restricted > + * overdefined: We cannot guarantee the value is constant or range-restricted > + * > + * We extend the lattice from the SCCP pass so that constant entries are > + * changed to inclusive ranges for each vector component. The join rules are: > + * > + * undefined join undefined = undefined > + * undefined join overdefined = overdefined > + * overdefined join overdefined = overdefined > + * [low, high] join overdefined = overdefined > + * [low, high] join undefined = [low, high] > + * [low1, high1] join [low2, high2] = [min(low1, low2), max(high1, high2)] > + * > + * These rules are general pessimistic rules. There may be situations where > + * we can still determine parts of the range of the variable, even though it > + * has an overdefined input (i.e. max, min, sat, abs). This is also true for > + * boolean operations like AND and OR. These can be determined even if we > + * know only one of the operators. > + * > + * We don't preserve the range perfectly for a variable, as we combine two > + * ranges for a variable into a range of: > + * [min(low1, low2), max(high1, high2)] > + * Preserving the non-continuous range information would greatly complicate > + * the pass, and is therefore not implemented. > + * > + * There is one interesting situation that is hard to deal with: > + * When we find that something is dead code, but it does not become a > + * constant value. Examples are things like max(sin(x), y) where y > 1. > + * We know sin(x) is dead code, but the result is not a constant but instead > + * an ssa-def with a range. We mark this in the range-state so that we can > + * eliminate it after the pass is done. This means that the pass should be > + * rerun if we resolve one of these, as we will then have simplified the > + * program, and new ranges may be resolved. > + * > + * When the pass is done all "undefined" values should be determined as > + * either const, range, or overdefined. (Except for in blocks that are > + * marked as unreachable, as they may have not been processed) > + */ > + > +/* XXX: Review feedback from Matt > + * Food for thought: a range isn't really what you want for bools. Maybe we > + * can think about extending this analysis to know about disjount ranges, > + * or maybe even just a flag to say "it's either high or low". > + */ > + > +/* XXX: > + * We need a mechanism for overflow protection for integer and > + * unsigned integer calculations as we may end up adding two values, > + * overflowing the datatype, and wreck havoc as low may now be high, > + * and vice versa. > + */ > + > +#define IS_FLOAT_CONSTANT(const_value, operator, operand, num_components) > \ > + ((num_components >= 1) ? > \ > + const_value.f[0] operator operand && > \ > + ((num_components >= 2) ? > \ > + const_value.f[1] operator operand && > \ > + ((num_components >= 3) ? > \ > + const_value.f[2] operator operand && > \ > + ((num_components == 4) ? > \ > + const_value.f[3] operator operand : > \ > + true) : > \ > + true) : > \ > + true) : > \ > + false) > + > +#define IS_INT_CONSTANT(const_value, operator, operand, num_components) > \ > + ((num_components >= 1) ? > \ > + const_value.i[0] operator operand && > \ > + ((num_components >= 2) ? > \ > + const_value.i[1] operator operand && > \ > + ((num_components >= 3) ? > \ > + const_value.i[2] operator operand && > \ > + ((num_components == 4) ? > \ > + const_value.i[3] operator operand : > \ > + true) : > \ > + true) : > \ > + true) : > \ > + false) > + > +#define IS_UNSIGNED_CONSTANT(const_value, operator, operand, num_components) > \ > + ((num_components >= 1) ? > \ > + const_value.u[0] operator operand && > \ > + ((num_components >= 2) ? > \ > + const_value.u[1] operator operand && > \ > + ((num_components >= 3) ? > \ > + const_value.u[2] operator operand && > \ > + ((num_components == 4) ? > \ > + const_value.u[3] operator operand : > \ > + true) : > \ > + true) : > \ > + true) : > \ > + false) > + > +typedef enum { > + undefined, > + range, > + constant, > + overdefined > +} lattice_type; > + > +typedef enum { > + low, > + high, > + both > +} range_to_set; > + > +typedef struct { > + /* Is this entry float, unsigned or something else? */ > + nir_alu_type range_type; > + > + nir_const_value lo; > + nir_const_value hi; > + > + /* What type of lattice is this */ > + lattice_type type; > + > + nir_ssa_def *ssa_def; > + bool in_loop; > + > + /* Visit-counter to prevent infinite looping */ > + unsigned visits; > +} lattice_entry; > + > +typedef struct { > + /* Corresponds to SSA Work in the original paper */ > + nir_ssa_def_worklist ssa_worklist; > + > + /* Work list of blocks, corresponding to the papers Flow work list */ > + nir_block_worklist block_worklist; > + > + /* Keep track of which blocks are reachable */ > + BITSET_WORD *reachable_blocks; > + > + /* Keep track of which blocks are in loops */ > + BITSET_WORD *blocks_in_loop; > + > + /* An array of lattice_antries for all the ssa_defs */ > + lattice_entry *entries; > +} value_range_state; > + > +static lattice_entry * > +get_lattice_entry(nir_ssa_def *def, value_range_state *state) > +{ > + lattice_entry *entry = &state->entries[def->index]; > + > + /* On the first run of this function the def may not have been added to > + * the lattice_entry, so do that here. > + */ > + if (!entry->ssa_def) > + entry->ssa_def = def; > + > + return entry; > +} > + > +static bool > +is_instr_supported(nir_instr_type type) > +{ > + switch (type) { > + case nir_instr_type_alu: > + case nir_instr_type_phi: > + case nir_instr_type_load_const: > + return true; > + default: > + return false; > + } > +} > + > +/* Returns true if this is a change in status of the entry. This simplifies > + * checking if users of this entry should be added to the worklist. > + */ > +static bool > +set_as_overdefined(lattice_entry *entry, nir_alu_type type) > +{ > + if (entry->type == overdefined && entry->range_type == type) > + return false; > + > + entry->type = overdefined; > + entry->range_type = type; > + > + for (unsigned i = 0; i < entry->ssa_def->num_components; i++) { > + switch (type) { > + case nir_type_float: > + entry->hi.f[i] = INFINITY; > + entry->lo.f[i] = -INFINITY; > + break; > + > + case nir_type_int: > + entry->hi.i[i] = INT32_MAX; > + entry->lo.i[i] = INT32_MIN; > + break; > + > + case nir_type_bool: > + case nir_type_unsigned: > + entry->hi.u[i] = UINT32_MAX; > + entry->lo.u[i] = 0; > + break; > + case nir_type_invalid: > + break; > + } > + } > + > + return true; > +} > + > +static inline void > +set_range_float(lattice_entry *entry, range_to_set to_set, float_t value) > +{ > + for (unsigned i = 0; i < entry->ssa_def->num_components; i++) { > + > + if (to_set == low || to_set == both) > + entry->lo.f[i] = value; > + > + if (to_set == high || to_set == both) > + entry->hi.f[i] = value; > + > + if (to_set == both) { > + entry->type = constant; > + } else { > + entry->type = range; > + } > + } > +} > + > +static inline void > +set_range_int(lattice_entry *entry, range_to_set to_set, int32_t value) > +{ > + for (unsigned i = 0; i < entry->ssa_def->num_components; i++) { > + > + if (to_set == low || to_set == both) > + entry->lo.i[i] = value; > + > + if (to_set == high || to_set == both) > + entry->hi.i[i] = value; > + > + if (to_set == both) { > + entry->type = constant; > + } else { > + entry->type = range; > + } > + } > +} > + > +static inline void > +set_range_uint(lattice_entry *entry, range_to_set to_set, uint32_t value) > +{ > + for (unsigned i = 0; i < entry->ssa_def->num_components; i++) { > + > + if (to_set == low || to_set == both) > + entry->lo.u[i] = value; > + > + if (to_set == high || to_set == both) > + entry->hi.u[i] = value; > + > + if (to_set == both) { > + entry->type = constant; > + } else { > + entry->type = range; > + } > + } > +} > + > +static bool > +is_type_max(nir_const_value *value, nir_alu_type type, > + unsigned num_components) > +{ > + for (unsigned i = 0; i < num_components; i++) { > + switch (type) { > + case nir_type_float: > + if (value->f[i] != INFINITY) > + return false; > + break; > + > + case nir_type_int: > + if (value->i[i] != INT32_MAX) > + return false; > + break; > + > + case nir_type_bool: > + case nir_type_unsigned: > + if (value->u[i] != UINT32_MAX) > + return false; > + break; > + > + case nir_type_invalid: > + unreachable("not reached"); > + } > + } > + > + return true; > +} > + > +static bool > +is_type_min(nir_const_value *value, nir_alu_type type, > + unsigned num_components) > +{ > + for (unsigned i = 0; i < num_components; i++) { > + switch (type) { > + case nir_type_float: > + if (value->f[i] != -INFINITY) > + return false; > + break; > + > + case nir_type_int: > + if (value->i[i] != INT32_MIN) > + return false; > + break; > + > + case nir_type_bool: > + case nir_type_unsigned: > + if (value->u[i] != 0) > + return false; > + break; > + > + case nir_type_invalid: > + unreachable("not reached"); > + } > + } > + > + return true; > +} > + > +static nir_const_value > +per_component_max(nir_const_value src0, nir_const_value src1, > + nir_alu_type type, unsigned num_components) > +{ > + nir_const_value value; > + for (unsigned i = 0; i < num_components; i++) { > + switch (type) { > + case nir_type_float: > + value.f[i] = MAX2(src0.f[i], src1.f[i]); > + break; > + case nir_type_int: > + value.i[i] = MAX2(src0.i[i], src1.i[i]); > + break; > + case nir_type_bool: > + case nir_type_unsigned: > + value.u[i] = MAX2(src0.u[i], src1.u[i]); > + break; > + case nir_type_invalid: > + range_print("Nir type invalid in max \n"); > + unreachable("not reached"); > + } > + } > + > + return value; > +} > + > +static nir_const_value > +per_component_min(nir_const_value src0, nir_const_value src1, > + nir_alu_type type, unsigned num_components) > +{ > + nir_const_value value; > + for (unsigned i = 0; i < num_components; i++) { > + switch (type) { > + case nir_type_float: > + value.f[i] = MIN2(src0.f[i], src1.f[i]); > + break; > + case nir_type_int: > + value.i[i] = MIN2(src0.i[i], src1.i[i]); > + break; > + case nir_type_bool: > + case nir_type_unsigned: > + value.u[i] = MIN2(src0.u[i], src1.u[i]); > + break; > + case nir_type_invalid: > + unreachable("not reached"); > + } > + } > + > + return value; > +} > + > +typedef enum { > + LESS, > + LESS_EQUAL, > + EQUAL, > + GREATER_EQUAL, > + GREATER, > + MIXED, > + COMPARE_FAIL > +} compare_range; > +
I decided upon a design for the compare_entries function that gave no issues and artifacts, so I'll stick it here as an update. static compare_range compare_entries(nir_alu_src *a, nir_alu_src *b, nir_alu_type type, unsigned num_components, value_range_state *state) { lattice_entry *entry_a = get_lattice_entry(a->src.ssa, state); lattice_entry *entry_b = get_lattice_entry(b->src.ssa, state); bool foundless = true; bool found_low_equal = true; bool found_high_equal = true; bool foundgreater = true; for (unsigned i = 0; i < num_components; i++) { switch (type) { case nir_type_float: if (entry_a->hi.f[a->swizzle[i]] >= entry_b->lo.f[b->swizzle[i]]) foundless = false; if (entry_a->hi.f[a->swizzle[i]] > entry_b->lo.f[b->swizzle[i]]) found_low_equal = false; if (entry_a->lo.f[a->swizzle[i]] <= entry_b->hi.f[b->swizzle[i]]) foundgreater = false; if (entry_a->lo.f[a->swizzle[i]] < entry_b->hi.f[b->swizzle[i]]) found_high_equal = false; break; case nir_type_int: if (entry_a->hi.i[a->swizzle[i]] >= entry_b->lo.i[b->swizzle[i]]) foundless = false; if (entry_a->hi.i[a->swizzle[i]] > entry_b->lo.i[b->swizzle[i]]) found_low_equal = false; if (entry_a->lo.i[a->swizzle[i]] <= entry_b->hi.i[b->swizzle[i]]) foundgreater = false; if (entry_a->lo.i[a->swizzle[i]] < entry_b->hi.i[b->swizzle[i]]) found_high_equal = false; break; case nir_type_unsigned: case nir_type_bool: if (entry_a->hi.u[a->swizzle[i]] >= entry_b->lo.u[b->swizzle[i]]) foundless = false; if (entry_a->hi.u[a->swizzle[i]] > entry_b->lo.u[b->swizzle[i]]) found_low_equal = false; if (entry_a->lo.u[a->swizzle[i]] <= entry_b->hi.u[b->swizzle[i]]) foundgreater = false; if (entry_a->lo.u[a->swizzle[i]] < entry_b->hi.u[b->swizzle[i]]) found_high_equal = false; break; default: return COMPARE_FAIL; } } if (foundless) return LESS; if (found_high_equal && found_low_equal) return EQUAL; if (found_low_equal) return LESS_EQUAL; if (found_high_equal) return GREATER_EQUAL; if (foundgreater) return GREATER; return COMPARE_FAIL; } > + > +static bool > +is_entry_overdefined(lattice_entry *entry) > +{ > + if (entry->type == overdefined) > + return true; > + > + if (entry->type == constant) > + return false; > + > + if (entry->type == undefined) > + return false; > + > + /* This checks high and low to find out if the range is indeeed maximum > + * and mininum of the type, and therefore in fact is overdefined. This > + * can happen in a very trivial case like phi(a, b) where a = abs(x) and > + * b = neg(abs(y)) and we don't know the range of either x or y. However, > + * if we don't have a valid range type then we bail early. > + */ > + if (entry->range_type == nir_type_invalid) > + return true; > + > + if (is_type_max(&entry->hi, entry->range_type, > + entry->ssa_def->num_components) && > + is_type_min(&entry->lo, entry->range_type, > + entry->ssa_def->num_components)) > + return true; > + > + return false; > +} > + > +static bool > +is_component_false(lattice_entry *entry, unsigned component) > +{ > + /* This can be simplified to check for 0x00000000 */ > + return entry->lo.i[component] == 0x00000000 && > + entry->hi.i[component] == 0x00000000; //XXX: This is not correct. > Floats also have negative zero. Is this also false? > +/* > + switch (entry->range_type) { > + case nir_type_int: > + return entry->lo.i[component] == 0 && > + entry->hi.i[component] == 0; > + case nir_type_unsigned: > + case nir_type_bool: > + return entry->lo.u[component] == 0 && > + entry->hi.u[component] == 0; > + case nir_type_float: > + return entry->lo.f[component] == 0.0f && > + entry->hi.f[component] == 0.0f; > + case nir_type_invalid: > + unreachable("not reached"); > + } > +*/ > + return false; > +} > + > +static bool > +is_entry_false(lattice_entry *entry) > +{ > + bool is_false = true; > + > + if (is_entry_overdefined(entry)) > + return false; > + > + for (uint8_t i = 0; i < entry->ssa_def->num_components; i++) > + is_false = is_false && is_component_false(entry, i); > + > + return is_false; > +} > + > +static bool > +is_component_true(lattice_entry *entry, unsigned component) > +{ > + > + switch (entry->range_type) { > + case nir_type_int: > + return entry->lo.i[component] > 0 || > + entry->hi.i[component] < 0; > + case nir_type_unsigned: > + case nir_type_bool: > + return entry->lo.u[component] > 0; > + case nir_type_float: > + return entry->lo.f[component] > 0.0f || > + entry->hi.f[component] < 0.0f; > + case nir_type_invalid: > + return false; // XXX: This is an issue. Should probably check the user > for it's type, and use that here. > + } > + > + return false; > +} > + > +static bool > +is_entry_true(lattice_entry *entry) > +{ > + bool is_true = true; > + > + if (is_entry_overdefined(entry)) > + return false; > + > + for (uint8_t i = 0; i < entry->ssa_def->num_components; i++) > + is_true = is_true && is_component_true(entry, i); > + > + return is_true; > +} > + > +static bool > +is_entry_constant(lattice_entry *entry, unsigned num_components) > +{ > + if (entry->type == constant) > + return true; > + > + if (entry->type == overdefined) > + return false; > + > + for (uint8_t i = 0; i < num_components; i++) { > + if (entry->lo.u[i] != entry->hi.u[i]) > + return false; > + } > + > + entry->type = constant; > + return true; > +} > + > +static void > +mark_block_reachable(nir_block *block, value_range_state *state) > +{ > + BITSET_SET(state->reachable_blocks, block->index); > +} > + > +static bool > +is_block_reachable(nir_block *block, value_range_state *state) > +{ > + return BITSET_TEST(state->reachable_blocks, block->index); > +} > + > +static bool > +special_case_alu(nir_ssa_def *def, value_range_state *state) > +{ > + lattice_entry *entry = get_lattice_entry(def, state); > + > + switch(nir_instr_as_alu(def->parent_instr)->op) { > + case nir_op_fsat: > + set_range_float(entry, low, 0.0f); > + set_range_float(entry, high, 1.0f); > + return true; > + case nir_op_fsin: > + case nir_op_fcos: > + case nir_op_fsign: > + set_range_float(entry, low, -1.0f); > + set_range_float(entry, high, 1.0f); > + return true; > + case nir_op_fabs: > + case nir_op_fexp2: > + case nir_op_fpow: > + case nir_op_frsq: > + case nir_op_fsqrt: > + set_range_float(entry, low, 0.0f); > + set_range_float(entry, high, INFINITY); > + return true; > + case nir_op_iabs: > + set_range_int(entry, low, 0); > + set_range_int(entry, high, INT32_MAX); > + return true; > + case nir_op_isign: > + set_range_int(entry, low, -1); > + set_range_int(entry, high, 1); > + return true; > + > + /* We have limited type-information for these, and they "falsely" > + * advertise that they are integer oerations (?) causing us to set > + * wrong values for the higher and lower bound. Set them as invalid > + * so that we can infer the type-information from their uses later > + * on if necessary. > + */ > + case nir_op_vec2: > + case nir_op_vec3: > + case nir_op_vec4: > + set_as_overdefined(entry, nir_type_invalid); > + return true; > + default: > + return false; > + } > +} > + > +static bool > +is_trivially_computable(nir_op op) > +{ > + switch (op) { > + > + /* These just extract values, so are safe to compute */ > + case nir_op_fmax: case nir_op_imax: case nir_op_umax: > + case nir_op_fmin: case nir_op_imin: case nir_op_umin: > + > + /* We can have the following situation: > + * > + * vec4 ssa_59 = txl ssa_58 (coord), ssa_0 (lod), sampler0 (sampler) > + * vec1 ssa_60 = imov ssa_59 > + * vec1 ssa_61 = imov ssa_59.y > + * vec1 ssa_62 = imov ssa_59.z > + * vec1 ssa_63 = fmul ssa_59.y, ssa_7 > + * > + * We don't know the type of the txl operation, but we know that imow > + * is an integer operation, and so we set ssa_59 and ssa_60 to > + * [i_min, i_max]. That is not correct however, as we find out in ssa_63. > + * The values are instead floats. So we convert the range of ssa_62 to > + * float-min and max, as we know it is overdefined, and so it works out. > + * This is a bit fragile, but should work. > + */ > + case nir_op_fmov: case nir_op_imov: > + > + /* Conditional selects also just extract values */ > + case nir_op_bcsel: case nir_op_fcsel: > + > + /* These also just extract values, should be safe. However, we have > + * troubles with type information, and so we compute wrong values */ > +// case nir_op_vec2: case nir_op_vec3: case nir_op_vec4: > + > + /* These just limit the range, or flip it, and so are safe to compute */ > + case nir_op_fsat: > + case nir_op_fsign: case nir_op_isign: > + case nir_op_fneg: case nir_op_ineg: > + > + /* These are not safe, as you can have a case where the input is > + * [-inf, inf] which becomes [inf, inf] which is clearly wrong. This > + * is dealth with in a special-casing in the compute-range section. > + */ > + case nir_op_fabs: case nir_op_iabs: > + > + /* These are also not safe. If we have X = [-inf, inf] and Y = [0, 0] > + * it will never match, and so an is-equal comparison will always return > + * false, although that is obviously wrong since it might very well match. > + * This should also be handled separately. This can be checked by > utilizing > + * the compare_entries function. XXX: Fix this! > + */ > +// case nir_op_feq: case nir_op_ieq: case nir_op_seq: > +// case nir_op_fge: case nir_op_ige: case nir_op_sge: case > nir_op_uge: > +// case nir_op_flt: case nir_op_ilt: case nir_op_slt: case > nir_op_ult: > +// case nir_op_fne: case nir_op_ine: case nir_op_sne: > + > + /* Logical not is safe. Most likely we can add "and" and "or" too */ > + case nir_op_fnot: case nir_op_inot: > + > + return true; > + > + default: > + return false; > + > + } > +} > + > +static void > +evaluate_alu_instr(nir_alu_instr *alu, value_range_state *state) > +{ > + lattice_entry *entry = get_lattice_entry(&alu->dest.dest.ssa, state); > + lattice_entry *src[4]; > + bool all_constant = true; > + > + /* XXX: How many times should we maximally visit an alu? This seems to > never occur */ > + if (entry->visits > 3) { > + set_as_overdefined(entry, nir_op_infos[alu->op].output_type); > + return; > + } > + > + for (unsigned i = 0; i < nir_op_infos[alu->op].num_inputs; i++) { > + > + src[i] = get_lattice_entry(alu->src[i].src.ssa, state); > + > + /* Check if any of the sources are overdefined. If one of them are then > + * the result of this entry will also be overdefined. We may however > + * have cases where we only select one of the operands(max,csel), or we > + * are limiting the range of the operand (abs, sat). These can still be > + * safely computed, as we will not get undefined results. > + */ > + if (is_entry_overdefined(src[i]) && !is_trivially_computable(alu->op)) > { > + > + if (!special_case_alu(entry->ssa_def, state)) > + set_as_overdefined(entry, nir_op_infos[alu->op].output_type); > + > + entry->visits++; > + return; > + } > + > + if (src[i]->type == undefined) > + return; > + > + unsigned num_components = nir_op_infos[alu->op].input_sizes[i]; > + > + if (num_components == 0) > + num_components = alu->dest.dest.ssa.num_components; > + > + all_constant = all_constant && is_entry_constant(src[i], > num_components); > + } > + > + /** > + * XXX: Snippet from the comment in nir.h > + * > + * For each input component, says which component of the register it is > + * chosen from. Note that which elements of the swizzle are used and which > + * are ignored are based on the write mask for most opcodes - for example, > + * a statement like "foo.xzw = bar.zyx" would have a writemask of 1101b > and > + * a swizzle of {2, x, 1, 0} where x means "don't care." > + * > + * XXX: We handle swizzles at least partly correct now. However, we do not > + * deal with writemasks yet, and so we are probably doing wrong things > here. > + */ > + > + if (all_constant) { > + nir_const_value const_src[4]; > + > + for (unsigned i = 0; i < nir_op_infos[alu->op].num_inputs; i++) { > + for (unsigned j = 0; j < entry->ssa_def->num_components; j++) { > + const_src[i].f[j] = src[i]->lo.f[alu->src[i].swizzle[j]]; > + } > + } > + > + nir_const_value dest = > + nir_eval_const_opcode(alu->op, > + alu->dest.dest.ssa.num_components, > + const_src); > + > + entry->visits++; > + entry->type = constant; > + entry->lo = dest; > + entry->hi = dest; > + entry->range_type = nir_op_infos[alu->op].output_type; > + return; > + } > + > + /* Check if the ssa-def is in a loop */ > + if (BITSET_TEST(state->blocks_in_loop, > + entry->ssa_def->parent_instr->block->index)) { > + > + /* We have for the first time detected that the ssa-def is in a loop. > + * Initialize to the best of our knowledge and mark it as in loop > + */ > + if (!special_case_alu(entry->ssa_def, state)) > + set_as_overdefined(entry, nir_op_infos[alu->op].output_type); > + > + /* Mark it so that we don't visit it anymore */ > + entry->in_loop = true; > + return; > + } > + > + /* If all inputs are of range or constant type we can calculate the range > + * of the operation instead of hand-rolling this ourselves. We can only > + * do this for a select amount of operations. (Basically those that only > + * select/filter the operands, or that restricts the range). In cases > + * like mul, add, etc we can not compute the range trivially, as we can > + * get situations where we multiply -inf and inf and produce nan, or we > + * overflow integer types. > + * > + * This will allow things like sat to possibly give us constants. It will > + * calculate the constant corresponding to the upper and lower value. If > + * these are found to be the same in the "is-def-constant" function then > + * it will get marked as constant, so there is no need to "special-case" > + * things like sat(a) where a < 0.0 > + */ > + if (is_trivially_computable(alu->op)) { > + nir_const_value lo_consts[4]; > + nir_const_value hi_consts[4]; > + > + for (unsigned i = 0; i < nir_op_infos[alu->op].num_inputs; i++) { > + /* We may have set one of the operands to overdefined without > knowing > + * what type it was going to be. Now we know that the operand is > used > + * in an operation with a give alu-type, and so we can set the > + * operand as overdef again, but with the right alu-type. > + */ > + if (is_entry_overdefined(src[i]) && > + src[i]->range_type != nir_op_infos[alu->op].input_types[i]) > + set_as_overdefined(src[i], nir_op_infos[alu->op].input_types[i]); > + > + for (unsigned j = 0; j < entry->ssa_def->num_components; j++) { > + lo_consts[i].f[j] = src[i]->lo.f[alu->src[i].swizzle[j]]; > + hi_consts[i].f[j] = src[i]->hi.f[alu->src[i].swizzle[j]]; > + } > + } > + > + bool first = true; > + nir_const_value temp_const[4]; > + nir_const_value computed_low; > + nir_const_value computed_high; > + > + /* Possible combinations we can make with high and low for num_inputs > */ > + unsigned combinations = 1 << nir_op_infos[alu->op].num_inputs; > + > + for (unsigned i = 0; i < combinations; i++) { > + > + /* Do a copy of low or high values according to mask */ > + for (unsigned j = 0; j < nir_op_infos[alu->op].num_inputs; j++) > + temp_const[j] = i & (1 << j) ? hi_consts[j] : lo_consts[j]; > + > + nir_const_value temp = > + nir_eval_const_opcode(alu->op, > + alu->dest.dest.ssa.num_components, > + temp_const); > + > + if (first) { > + /* If this is the first run then set an initial value */ > + computed_low = computed_high = temp; > + first = false; > + } else { > + /* Do a per_component min/max to get the worst case scenario */ > + computed_low = per_component_min(computed_low, temp, > + > nir_op_infos[alu->op].output_type, > + entry->ssa_def->num_components); > + computed_high = per_component_max(computed_high, temp, > + > nir_op_infos[alu->op].output_type, > + > entry->ssa_def->num_components); > + } > + } > + > + /* If the operations was an abs-operation we will have wrongfully > + * computed the lower value for the abs. Take a component-wise min > + * of the low-ranges of both operands. If that is higher than zero > + * then set that as the range, if not, set zero as lowest value. > + */ > + if (alu->op == nir_op_fabs) { > + nir_const_value temp = > + per_component_min(lo_consts[0], lo_consts[1], > + nir_type_float, > + entry->ssa_def->num_components); > + > + if (IS_FLOAT_CONSTANT(temp, >, 0.0f, > + entry->ssa_def->num_components)) { > + computed_low = temp; > + } else { > + computed_low.f[0] = 0.0f; computed_low.f[1] = 0.0f; > + computed_low.f[2] = 0.0f; computed_low.f[3] = 0.0f; > + } > + } > + > + if (alu->op == nir_op_iabs) { > + nir_const_value temp = > + per_component_min(lo_consts[0], lo_consts[1], > + nir_type_int, > + entry->ssa_def->num_components); > + > + if (IS_INT_CONSTANT(temp, >, 0, > + entry->ssa_def->num_components)) { > + computed_low = temp; > + } else { > + computed_low.i[0] = 0; computed_low.i[1] = 0; > + computed_low.i[2] = 0; computed_low.i[3] = 0; > + } > + } > + > + entry->visits++; > + entry->type = range; > + entry->lo = computed_low; > + entry->hi = computed_high; > + entry->range_type = nir_op_infos[alu->op].output_type; > + return; > + } > + > + if (special_case_alu(&alu->dest.dest.ssa, state)) > + return; > + > + entry->visits++; > + set_as_overdefined(entry, nir_op_infos[alu->op].output_type); > + return; > +} > + > +static bool > +evaluate_entry(lattice_entry *entry, value_range_state *state) > +{ > + lattice_type old_type = entry->type; > + nir_const_value old_hi; > + nir_const_value old_lo; > + > + /* If it is already overdefined then that can not change */ > + if (entry->type == overdefined) > + return false; > + > + /* We have already marked load_consts as constant, no use evaluating > + * > + * XXX: This is most likely not correct according to the papers. We can > + * still go from constant and down to overdefined in some cases, so we > + * should allow "analyzing" of constants > + */ > +// if (is_entry_constant(entry)) > +// return false; > + > + for (unsigned i = 0; i < 4; i++) { > + old_hi.f[i] = entry->hi.f[i]; > + old_lo.f[i] = entry->lo.f[i]; > + } > + > + switch (entry->ssa_def->parent_instr->type) { > + case nir_instr_type_load_const: { > + /* First time we encounter a load_const, mark as constant */ > + nir_load_const_instr *instr = > + nir_instr_as_load_const(entry->ssa_def->parent_instr); > + entry->type = constant; > + entry->lo = instr->value; > + entry->hi = instr->value; > + entry->range_type = nir_type_invalid; > + break; > + } > + > + case nir_instr_type_alu: { > + nir_alu_instr *alu = nir_instr_as_alu(entry->ssa_def->parent_instr); > + evaluate_alu_instr(alu, state); > + break; > + } > + > + default: > + return set_as_overdefined(entry, nir_type_invalid); > + } > + > + /* If we find that the entry has become overdefined without marking it as > + * such that means we have calculated the range to +/- inf. That means we > + * have changed the status of the entry, > + */ > + if (is_entry_overdefined(entry)) { > + set_as_overdefined(entry, entry->range_type); > + return true; > + } > + > + /* Now we check if the information for the instruction has changed. > + * If it has then we return true, so that we can evaluate the users. > + */ > + if (entry->type != old_type) > + return true; > + > + for (unsigned i = 0; i < 4; i++) { > + if (old_hi.f[i] != entry->hi.f[i] || > + old_lo.f[i] != entry->lo.f[i]) > + return true; > + } > + > + return false; > +} > + > +/* Coordinates finding the uses of the ssa_def corresponding to the entry > + * and sticking them in the ssa_worklist. > + * Should be run on every lattice_entry that we change the information of. > + */ > +static void > +coordinate_uses(lattice_entry *entry, value_range_state *state) > +{ > + nir_foreach_use_safe(entry->ssa_def, src) { > + nir_instr *user = src->parent_instr; > + > + /* No point in checking the use if it is not yet found reachable */ > + if (!is_block_reachable(user->block, state)) > + continue; > + > + /* We don't want to deal with unsupported instructions. This should > + * already be marked overdefined on the first visit of the block it > + * is in, so no point in doing anything special > + */ > + if (!is_instr_supported(user->type)) > + continue; > + > + nir_ssa_def *def = nir_instr_get_dest_ssa_def(user); > + > + /* If it is overdefined we want to push it to head of the list. > + * That way we will propagate those faster, avoiding visiting > + * ssa-defs with overdefined sources multiple times. */ > + if (is_entry_overdefined(entry)) { > + nir_ssa_def_worklist_push_head(&state->ssa_worklist, def); > + } else { > + nir_ssa_def_worklist_push_tail(&state->ssa_worklist, def); > + } > + } > + > + nir_foreach_if_use_safe(entry->ssa_def, src) { > + /* If the condition was used for an if then we should do something > + * about the if to "push our range" into the then and else branch by > + * inserting a copy in each of the blocks where we "apply" the range > + * implied by the if-statement. > + */ > + nir_if *if_statement = src->parent_if; > + > + nir_cf_node *then_node = nir_if_first_then_node(if_statement); > + nir_cf_node *else_node = nir_if_first_else_node(if_statement); > + > + nir_block *then_block = nir_cf_node_as_block(then_node); > + nir_block *else_block = nir_cf_node_as_block(else_node); > + > + if (is_entry_true(entry)) { > + nir_block_worklist_push_tail(&state->block_worklist, then_block); > + continue; > + } > + > + if (is_entry_false(entry)) { > + nir_block_worklist_push_tail(&state->block_worklist, else_block); > + continue; > + } > + > + if (entry->type == undefined) > + continue; > + > + nir_block_worklist_push_tail(&state->block_worklist, then_block); > + nir_block_worklist_push_tail(&state->block_worklist, else_block); > + } > +} > + > +static bool > +handle_unsupported_def(nir_ssa_def *def, void *void_state) > +{ > + value_range_state *state = void_state; > + lattice_entry *entry = get_lattice_entry(def, state); > + set_as_overdefined(entry, nir_type_invalid); > + coordinate_uses(entry, state); > + return true; > +} > + > +/* On the first run of a block we want to always check all the uses > + * of the instructions that we process. > + */ > +static void > +evaluate_block(nir_block *block, value_range_state *state) > +{ > + nir_foreach_instr_safe(block, instr) { > + > + /* Skip doing anything to unsupported instructions */ > + if (!is_instr_supported(instr->type)) { > + nir_foreach_ssa_def(instr, handle_unsupported_def, state); > + continue; > + } > + > + nir_ssa_def *def = nir_instr_get_dest_ssa_def(instr); > + lattice_entry *entry = get_lattice_entry(def, state); > + > + evaluate_entry(entry, state); > + coordinate_uses(entry, state); > + } > +} > + > +static void > +mark_cf_node_loop(nir_cf_node *node, value_range_state *state, bool in_loop) > +{ > + switch (node->type) { > + case nir_cf_node_block: > + > + if (in_loop) > + BITSET_SET(state->blocks_in_loop, > nir_cf_node_as_block(node)->index); > + > + return; > + case nir_cf_node_if: { > + nir_if *if_stmt = nir_cf_node_as_if(node); > + foreach_list_typed(nir_cf_node, nested_node, node, &if_stmt->then_list) > + mark_cf_node_loop(nested_node, state, in_loop); > + foreach_list_typed(nir_cf_node, nested_node, node, &if_stmt->else_list) > + mark_cf_node_loop(nested_node, state, in_loop); > + return; > + } > + case nir_cf_node_loop: { > + nir_loop *loop = nir_cf_node_as_loop(node); > + foreach_list_typed(nir_cf_node, nested_node, node, &loop->body) > + mark_cf_node_loop(nested_node, state, true); > + break; > + } > + default: > + unreachable("unknown cf node type"); > + } > +} > + > +static bool > +nir_opt_value_range_impl(nir_function_impl *impl) > +{ > + /* We might want to run a pass to insert "pi-nodes" into the > + * ssa-tree before running the pass. This is essentially just > + * a mov x2 = x1 that we use to have something to "store" the > + * range implied by things like if's. > + * > + * This will also lead to a need of inserting more phi-nodes, > + * as one gets variables that diverge and then converge. > + * > + * x1 = ....; [-unlimited, unlimited] > + * if (x1 < 7) > + * x2 = x1; [-unlimited, 7] > + * | > + * | > + * else > + * x3 = x1; [7, unlimited] > + * | > + * | > + * x4 = phi(x2, x3); > + */ > + > + /* We want to recalculate the ssa-indexes first, to reduce our memory > + * consumption and amount of "empty" value-range-variables > + */ > + nir_index_ssa_defs(impl); > + nir_index_blocks(impl); > + > + /* No use in running the pass if there are no ssa-defs */ > + if (impl->ssa_alloc == 0) > + return false; > + > + bool progress = false; > + > + void *mem_ctx = ralloc_context(NULL); > + value_range_state state; > + > + state.entries = rzalloc_array(mem_ctx, lattice_entry, > + impl->ssa_alloc); > + > + nir_block_worklist_init(&state.block_worklist, impl->num_blocks, > + mem_ctx); > + > + nir_ssa_def_worklist_init(&state.ssa_worklist, impl->ssa_alloc, > + mem_ctx); > + > + state.reachable_blocks = rzalloc_array(mem_ctx, BITSET_WORD, > + BITSET_WORDS(impl->num_blocks)); > + > + state.blocks_in_loop = rzalloc_array(mem_ctx, BITSET_WORD, > + BITSET_WORDS(impl->num_blocks)); > + > + /* Mark the blocks that are inside a loop so we can avoid them */ > + foreach_list_typed(nir_cf_node, node, node, &impl->body) { > + if (node->type == nir_cf_node_loop) > + mark_cf_node_loop(node, &state, true); > + else > + mark_cf_node_loop(node, &state, false); > + } > + > + nir_block_worklist_push_head(&state.block_worklist, impl->start_block); > + > + /* Process the work lists until they are empty */ > + while (state.block_worklist.count > 0 || > + state.ssa_worklist.count > 0) { > + > + /* Process the basic block work list */ > + while (state.block_worklist.count > 0) { > + nir_block *block = > nir_block_worklist_pop_head(&state.block_worklist); > + > + /* Since we have our "coordinate_uses" function that also handles > + * phi nodes we can skip the block if it is already set reachable, > + * as the phi's will get automagically added to the ssa-worklist > + * as they are users of the defs. > + */ > + if (is_block_reachable(block, &state)) > + continue; > + > + /* Block has been determined to be reachable, mark it */ > + mark_block_reachable(block, &state); > + > + /* Evaluate all phi's and expressions of the block. */ > + evaluate_block(block, &state); > + > + /* If the block has only one successor then add it to the worklist > */ > + if ((block->successors[0] != NULL) && > + (block->successors[1] == NULL)) { > + nir_block_worklist_push_tail(&state.block_worklist, > + block->successors[0]); > + continue; > + } > + } > + > + /* Process the instruction work list > + * This doesn't do things exactly like in the paper. > + * Instead of storing the instructions that have changed and processing > + * each user we are instead adding to the list all the users of > + * instructions that have changed. In practice there is no difference, > + * apart from dealing with uses is moved out to a separate function. > + */ > + while (state.ssa_worklist.count > 0) { > + /* All instructions in the list are here because > + * we got new information about the range of an operand. > + */ > + nir_ssa_def *def = > nir_ssa_def_worklist_pop_head(&state.ssa_worklist); > + lattice_entry *entry = get_lattice_entry(def, &state); > + > + /* If the def is in a loop we don't want to do anything. > + * (The instruction is initialized as best we can when we mark it.) > + * When the block it's in is added to the worklist the entry will > + * get processed, and so we will evaluate its users. > + */ > + if (entry->in_loop) > + continue; > + > + /* Evaluate the ssa_def. If it has changed then add users to list */ > + if (evaluate_entry(entry, &state)) > + coordinate_uses(entry, &state); > + } > + > + /* If both lists are empty that means we are done. However, we assume > + * branches on undef values can not reach any of their successors. > + * This is not a safe assumption, and can lead to eliminating things > + * we shouldn't. Do a trip over all the ssa-defs to find unresolved > + * branches. If one is found then add the "then"-branch to the block > + * worklist. That way we will find the rest of the CFG, and only > + * slightly pessimizes the analysis results. > + * > + * XXX: It might be of interest to also check the what operations > + * use undefined values as sources, and resolve those. Some operations > + * should return undef when there's a undef operand, while others > + * should return i.e. 0 to be conservative. > + * > + * This part is implemented in a "resolvedUndefsIn" function in the > + * LLVM SCCP pass, and should give a good idea of what should be done. > + */ > + if (state.block_worklist.count == 0 && > + state.ssa_worklist.count == 0) { > + > + for (unsigned i = 0; i < impl->ssa_alloc; i++) { > + lattice_entry *entry = &state.entries[i]; > + > + if (entry->type != undefined) > + continue; > + > + if (!entry->ssa_def) > + continue; > + > + if (!list_empty(&entry->ssa_def->if_uses)) { > + nir_foreach_if_use_safe(entry->ssa_def, src) { > + nir_cf_node *node = nir_if_first_then_node(src->parent_if); > + /* First node in the branch should be a block */ > + assert(node->type == nir_cf_node_block); > + nir_block *block = nir_cf_node_as_block(node); > + nir_block_worklist_push_tail(&state.block_worklist, block); > + } > + } > + } > + } > + } > + > + /* We can now traverse over blocks and delete those that > + * are still marked as unreachable. If we delete a basic block > + * we need to first rewrite the phi's that use the results from > + * the BB. > + * > + * For now we want to leave the blocks in place, as when the > + * conditional for the block that is unreachable is set as a constant > + * Connor's pass for removing dead control flow will come along > + * and clean up the blocks that can not be reached. > + */ > + > + /* Check for all values that are proved to be constant, > + * and replace them with their constant instruction counterpart. */ > + for (unsigned i = 0; i < (impl->ssa_alloc - 1); i++) { > + lattice_entry *entry = &(state.entries[i]); > + > + /* If it's a constant thats not a load_const then make a load_const > + * instruction and replace the uses of the old instr with that. > + */ > + if (is_entry_constant(entry, entry->ssa_def->num_components) && > + (entry->ssa_def->parent_instr->type != nir_instr_type_load_const)) > { > + > + /* XXX: Doing the below replacement causes crashes because there is > + * something I have not yet understood about how to rewrite things. > + */ > +/* > + nir_load_const_instr *new_instr = > + nir_load_const_instr_create(impl->overload->function->shader, > + entry->ssa_def->num_components); > + > + new_instr->value = entry->hi; > + > + nir_ssa_def_init(&new_instr->instr, &new_instr->def, > + entry->ssa_def->num_components, > + "Constant propagated const"); > + > + nir_instr_insert_before(entry->ssa_def->parent_instr, > &new_instr->instr); > + > + nir_ssa_def_rewrite_uses(entry->ssa_def, > nir_src_for_ssa(&new_instr->def), > + impl); > + > + nir_instr_remove(entry->ssa_def->parent_instr); > + ralloc_free(entry->ssa_def->parent_instr); > +*/ > + > + continue; > + } > + > + if (entry->ssa_def->parent_instr->type != nir_instr_type_alu) > + continue; > + > + nir_alu_instr *alu = nir_instr_as_alu(entry->ssa_def->parent_instr); > + > + if (nir_op_infos[alu->op].num_inputs == 2 && > + !(is_entry_overdefined(get_lattice_entry(alu->src[0].src.ssa, > &state)) || > + is_entry_overdefined(get_lattice_entry(alu->src[1].src.ssa, > &state)))) { > + > + compare_range result = MIXED; > + > + switch (alu->op) { > + case nir_op_fmax: case nir_op_fmin: > + case nir_op_imax: case nir_op_imin: > + case nir_op_umax: case nir_op_umin: > + > + result = compare_entries(&alu->src[0], &alu->src[1], > + nir_op_infos[alu->op].output_type, > + entry->ssa_def->num_components, &state); > + break; > + default: > + break; > + } > + > + if (result == LESS || result == LESS_EQUAL) { > + if (alu->op == nir_op_fmax || > + alu->op == nir_op_imax || > + alu->op == nir_op_umax) > + nir_ssa_def_rewrite_uses(entry->ssa_def, alu->src[1].src, > mem_ctx); > + else > + nir_ssa_def_rewrite_uses(entry->ssa_def, alu->src[0].src, > mem_ctx); > + > + progress = true; > + continue; > + } > + > + if (result == GREATER || result == GREATER_EQUAL) { > + if (alu->op == nir_op_fmax || > + alu->op == nir_op_imax || > + alu->op == nir_op_umax) > + nir_ssa_def_rewrite_uses(entry->ssa_def, alu->src[0].src, > mem_ctx); > + else > + nir_ssa_def_rewrite_uses(entry->ssa_def, alu->src[1].src, > mem_ctx); > + > + progress = true; > + continue; > + } > + } > + > + /* XXX: Here we also want to add checks for undefined behaviour > + * > + * We also want to detect lo = 0, hi = 1 and replace with sat. > + * We also want to use the compare_entries function to check these > + */ > + > + > + /* Check if the result of a comparison can be determined */ > + if (nir_op_infos[alu->op].num_inputs == 2 && > + !(is_entry_overdefined(get_lattice_entry(alu->src[0].src.ssa, > &state)) || > + is_entry_overdefined(get_lattice_entry(alu->src[1].src.ssa, > &state)))) { > + > + compare_range result = MIXED; > + > + switch (alu->op) { > + case nir_op_feq: case nir_op_ieq: case nir_op_seq: > + case nir_op_fge: case nir_op_ige: case nir_op_sge: case > nir_op_uge: > + case nir_op_flt: case nir_op_ilt: case nir_op_slt: case > nir_op_ult: > + case nir_op_fne: case nir_op_ine: case nir_op_sne: > + > + result = compare_entries(&alu->src[0], &alu->src[1], > + nir_op_infos[alu->op].input_types[0], > + entry->ssa_def->num_components, &state); > + break; > + default: > + break; > + } > + > + if (alu->op == nir_op_feq || > + alu->op == nir_op_ieq || > + alu->op == nir_op_seq) { > + if (result == EQUAL) { > + /* We should insert a true */ > + } > + > + if (result == LESS || result == GREATER) { > + /* We should insert a false */ > + } > + } > + > + if (alu->op == nir_op_fne || > + alu->op == nir_op_ine || > + alu->op == nir_op_sne) { > + if (result == EQUAL) { > + /* We should insert a false */ > + } > + > + if (result == LESS || result == GREATER) { > + /* We should insert a true */ > + } > + } > + > + if (alu->op == nir_op_flt || alu->op == nir_op_ilt || > + alu->op == nir_op_slt || alu->op == nir_op_ult) { > + if (result == LESS) { > + /* We should insert a true */ > + } > + > + if (result == GREATER) { > + /* We should insert a false */ > + } > + } > + > + if (alu->op == nir_op_fge || alu->op == nir_op_ige || > + alu->op == nir_op_sge || alu->op == nir_op_uge) { > + if (result == GREATER) { > + /* We should insert a true */ > + } > + > + if (result == LESS) { > + /* We should insert a false */ > + } > + } > +// progress = true; > +// continue; > + } > + > + > + bool set_false = false; > + bool set_true = false; > + /* Detect that "overdefined && false" == false, as we would have bailed > + * on this earlier in the pass since we have an overdefined source > + */ > + if (alu->op == nir_op_fand || alu->op == nir_op_iand) { > + for (int i = 0; i < 2; i++) { > + lattice_entry *e = get_lattice_entry(alu->src[i].src.ssa, > &state); > + if (is_entry_false(e)) { > + set_false = true; > + } > + } > + } > + > + /* Detect that "overdefined || true" == true, as we would have bailed > + * on this earlier in the pass since we have an overdefined source > + */ > + if (alu->op == nir_op_for || alu->op == nir_op_ior) { > + for (int i = 0; i < 2; i++) { > + lattice_entry *e = get_lattice_entry(alu->src[i].src.ssa, > &state); > + if (is_entry_true(e)) { > + set_true = true; > + } > + } > + } > + > + if (set_true || set_false) { > + nir_const_value dest; > + > + if (set_false) { > + /* We can replace entry with "false". 0.0f == 0x00000000 which > + * should be tha same as signed integer zero, so this is safe > + */ > + dest.f[0] = dest.f[1] = dest.f[2] = dest.f[3] = 0.0f; > + } > + > + if (set_true) { > + /* We can replace entry with 1.0f which is also "true" for int */ > + dest.f[0] = dest.f[1] = dest.f[2] = dest.f[3] = 1.0f; > + } > +/* > + nir_load_const_instr *new_instr = > + nir_load_const_instr_create(impl->overload->function->shader, > + alu->dest.dest.ssa.num_components); > + > + new_instr->value = dest; > + > + nir_instr_insert_before(&alu->instr, &new_instr->instr); > + > + nir_ssa_def_rewrite_uses(&alu->dest.dest.ssa, > nir_src_for_ssa(&new_instr->def), > + impl); > + > + nir_instr_remove(&alu->instr); > + ralloc_free(alu); > +*/ > + } > + > + } > + > + ralloc_free(mem_ctx); > + > + return progress; > +} > + > +bool > +nir_opt_value_range(nir_shader *shader) > +{ > + bool progress = false; > + > + nir_foreach_overload(shader, overload) { > + if (overload->impl && nir_opt_value_range_impl(overload->impl)) > + progress = true; > + } > + > + return progress; > +} > -- > 2.4.4 > _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev