Re: [Mesa-dev] [RFC][PATCH] nir: Add a value range propagation pass

Thomas Helland Thu, 06 Aug 2015 15:24:33 -0700

2015-08-06 19:22 GMT+02:00 Thomas Helland <thomashellan...@gmail.com>:
> It is now mostly working, and I've incorporated a bunch
> of the feedback I got on the last posting. I promised to
> have it out on the list by american office hours today,
> so here it goes.
>
> There is a massive issue with the compare_entries function.
> I'm hacking on that atm, trying to come up with a solution
> I'm comfortable with, that does not cause bugs. There is
> also an issue with the way I insert new consts. There's
> probably something there that I have not understood.
>
> Below follows a list of changes from last version:
> Add pessimation of branching on undef (based on llvm's aproach)
> Incorporate some of Connors feedback
> Incorporate Matt's feedback
> Rework set_float_range API
> Reworked to not do a separate initialization of everything
> Some cleanups of unneeded variables
> Handle loops differently
> Add a visit-counter to prevent endless looping
> (This has proven to not do anything, could be dropped)
> Squash a bug where we where adding ourself as user instead of the user
> (Same accidental mistake I had with uses in the LCSSA pass)
> Calculate a lot of the things we can trivially compute.
> Special case calculation of abs.
>
> We are now detecting bcsel(x, 0, 0) (four cases)
>                      abs(0) (two cases)
>
> Remove support for phi's:
>
> It was buggy and hard to get right, as type-information
> was kinda hard to get hold of. A quick statistics gathering
> showed phi-nodes to amount to only 0.24% of the ssa-defs,
> and so one can assume that getting them right would not give
> us any substantial benefit.
>
> Phi-nodes might have some saying in if we are able to remove
> complete nodes/blocks? If so, then we should probably revisit
> this decision, but for now it seems the return-on-investment
> and risk-reward ratio does not justify dealing with them.
>
> Signed-off-by: Thomas Helland <thomashellan...@gmail.com>
> ---
>  src/glsl/Makefile.sources          |    1 +
>  src/glsl/nir/nir.h                 |    2 +
>  src/glsl/nir/nir_opt_value_range.c | 1570 
> ++++++++++++++++++++++++++++++++++++
>  3 files changed, 1573 insertions(+)
>  create mode 100644 src/glsl/nir/nir_opt_value_range.c
>
> diff --git a/src/glsl/Makefile.sources b/src/glsl/Makefile.sources
> index 4f79fee..0b971c9 100644
> --- a/src/glsl/Makefile.sources
> +++ b/src/glsl/Makefile.sources
> @@ -56,6 +56,7 @@ NIR_FILES = \
>         nir/nir_opt_peephole_ffma.c \
>         nir/nir_opt_peephole_select.c \
>         nir/nir_opt_remove_phis.c \
> +       nir/nir_opt_value_range.c \
>         nir/nir_print.c \
>         nir/nir_remove_dead_variables.c \
>         nir/nir_search.c \
> diff --git a/src/glsl/nir/nir.h b/src/glsl/nir/nir.h
> index aba653e..19202a7 100644
> --- a/src/glsl/nir/nir.h
> +++ b/src/glsl/nir/nir.h
> @@ -1707,6 +1707,8 @@ bool nir_opt_peephole_ffma(nir_shader *shader);
>
>  bool nir_opt_remove_phis(nir_shader *shader);
>
> +bool nir_opt_value_range(nir_shader *shader);
> +
>  void nir_sweep(nir_shader *shader);
>
>  #ifdef __cplusplus
> diff --git a/src/glsl/nir/nir_opt_value_range.c 
> b/src/glsl/nir/nir_opt_value_range.c
> new file mode 100644
> index 0000000..616ec1c
> --- /dev/null
> +++ b/src/glsl/nir/nir_opt_value_range.c
> @@ -0,0 +1,1570 @@
> +/*
> + * Copyright © 2015 Thomas Helland
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice (including the next
> + * paragraph) shall be included in all copies or substantial portions of the
> + * Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER 
> DEALINGS
> + * IN THE SOFTWARE.
> + */
> +
> +#include "nir.h"
> +#include "nir_ssa_def_worklist.h"
> +#include "nir_block_worklist.h"
> +#include "nir_constant_expressions.h"
> +
> +/* This pass implements an extension of
> + * "Constant Propagation with Conditional Branches" by Wegman and Zadeck
> + * that also handles value ranges. This is useful as a lot of shaders have
> + * min/max expressions that can be eliminated, or conditionals that we can
> + * prove to be false or true due to previously applied restrictions on range.
> + * Value range propagation is a superset of tracking constant values, so this
> + * pass eliminates the need for a separate constant propagation pass. The
> + * pass is optimistic, meaning we assume all variables are constant (or have
> + * restricted range) and disprove it. A pessimistic algorithm would assume
> + * all values where undeterminable, and then propagate expressions we know
> + * to be constant through the program. An optimistic algorithm gets better
> + * results than a pessimistic, with the downside that it can not be aborted
> + * halfway through as the information gathered may be wrong.
> + *
> + * The lattice types are:
> + * undefined: Variable may be constant or range-restricted
> + *             (We have not yet gotten to them in the pass)
> + * constant: Value is determined to be constant
> + * range: Value is determined to be range-restricted
> + * overdefined: We cannot guarantee the value is constant or range-restricted
> + *
> + * We extend the lattice from the SCCP pass so that constant entries are
> + * changed to inclusive ranges for each vector component. The join rules are:
> + *
> + * undefined join undefined = undefined
> + * undefined join overdefined = overdefined
> + * overdefined join overdefined = overdefined
> + * [low, high] join overdefined = overdefined
> + * [low, high] join undefined = [low, high]
> + * [low1, high1] join [low2, high2] = [min(low1, low2), max(high1, high2)]
> + *
> + * These rules are general pessimistic rules. There may be situations where
> + * we can still determine parts of the range of the variable, even though it
> + * has an overdefined input (i.e. max, min, sat, abs). This is also true for
> + * boolean operations like AND and OR. These can be determined even if we
> + * know only one of the operators.
> + *
> + * We don't preserve the range perfectly for a variable, as we combine two
> + * ranges for a variable into a range of:
> + *    [min(low1, low2), max(high1, high2)]
> + * Preserving the non-continuous range information would greatly complicate
> + * the pass, and is therefore not implemented.
> + *
> + * There is one interesting situation that is hard to deal with:
> + * When we find that something is dead code, but it does not become a
> + * constant value. Examples are things like max(sin(x), y) where y > 1.
> + * We know sin(x) is dead code, but the result is not a constant but instead
> + * an ssa-def with a range. We mark this in the range-state so that we can
> + * eliminate it after the pass is done. This means that the pass should be
> + * rerun if we resolve one of these, as we will then have simplified the
> + * program, and new ranges may be resolved.
> + *
> + * When the pass is done all "undefined" values should be determined as
> + * either const, range, or overdefined. (Except for in blocks that are
> + * marked as unreachable, as they may have not been processed)
> + */
> +
> +/* XXX: Review feedback from Matt
> + * Food for thought: a range isn't really what you want for bools. Maybe we
> + * can think about extending this analysis to know about disjount ranges,
> + * or maybe even just a flag to say "it's either high or low".
> + */
> +
> +/* XXX:
> + * We need a mechanism for overflow protection for integer and
> + * unsigned integer calculations as we may end up adding two values,
> + * overflowing the datatype, and wreck havoc as low may now be high,
> + * and vice versa.
> + */
> +
> +#define IS_FLOAT_CONSTANT(const_value, operator, operand, num_components)    
> \
> +   ((num_components >= 1) ?                                                  
> \
> +      const_value.f[0] operator operand &&                                   
> \
> +      ((num_components >= 2) ?                                               
> \
> +         const_value.f[1] operator operand &&                                
> \
> +         ((num_components >= 3) ?                                            
> \
> +            const_value.f[2] operator operand &&                             
> \
> +            ((num_components == 4) ?                                         
> \
> +               const_value.f[3] operator operand :                           
> \
> +               true) :                                                       
> \
> +            true) :                                                          
> \
> +         true) :                                                             
> \
> +      false)
> +
> +#define IS_INT_CONSTANT(const_value, operator, operand, num_components)      
> \
> +   ((num_components >= 1) ?                                                  
> \
> +      const_value.i[0] operator operand &&                                   
> \
> +      ((num_components >= 2) ?                                               
> \
> +         const_value.i[1] operator operand &&                                
> \
> +         ((num_components >= 3) ?                                            
> \
> +            const_value.i[2] operator operand &&                             
> \
> +            ((num_components == 4) ?                                         
> \
> +               const_value.i[3] operator operand :                           
> \
> +               true) :                                                       
> \
> +            true) :                                                          
> \
> +         true) :                                                             
> \
> +      false)
> +
> +#define IS_UNSIGNED_CONSTANT(const_value, operator, operand, num_components) 
> \
> +   ((num_components >= 1) ?                                                  
> \
> +      const_value.u[0] operator operand &&                                   
> \
> +      ((num_components >= 2) ?                                               
> \
> +         const_value.u[1] operator operand &&                                
> \
> +         ((num_components >= 3) ?                                            
> \
> +            const_value.u[2] operator operand &&                             
> \
> +            ((num_components == 4) ?                                         
> \
> +               const_value.u[3] operator operand :                           
> \
> +               true) :                                                       
> \
> +            true) :                                                          
> \
> +         true) :                                                             
> \
> +      false)
> +
> +typedef enum {
> +   undefined,
> +   range,
> +   constant,
> +   overdefined
> +} lattice_type;
> +
> +typedef enum {
> +   low,
> +   high,
> +   both
> +} range_to_set;
> +
> +typedef struct {
> +   /* Is this entry float, unsigned or something else? */
> +   nir_alu_type range_type;
> +
> +   nir_const_value lo;
> +   nir_const_value hi;
> +
> +   /* What type of lattice is this */
> +   lattice_type type;
> +
> +   nir_ssa_def *ssa_def;
> +   bool in_loop;
> +
> +   /* Visit-counter to prevent infinite looping */
> +   unsigned visits;
> +} lattice_entry;
> +
> +typedef struct {
> +   /* Corresponds to SSA Work in the original paper */
> +   nir_ssa_def_worklist ssa_worklist;
> +
> +   /* Work list of blocks, corresponding to the papers Flow work list */
> +   nir_block_worklist block_worklist;
> +
> +   /* Keep track of which blocks are reachable */
> +   BITSET_WORD *reachable_blocks;
> +
> +   /* Keep track of which blocks are in loops */
> +   BITSET_WORD *blocks_in_loop;
> +
> +   /* An array of lattice_antries for all the ssa_defs */
> +   lattice_entry *entries;
> +} value_range_state;
> +
> +static lattice_entry *
> +get_lattice_entry(nir_ssa_def *def, value_range_state *state)
> +{
> +   lattice_entry *entry = &state->entries[def->index];
> +
> +   /* On the first run of this function the def may not have been added to
> +    * the lattice_entry, so do that here.
> +    */
> +   if (!entry->ssa_def)
> +      entry->ssa_def = def;
> +
> +   return entry;
> +}
> +
> +static bool
> +is_instr_supported(nir_instr_type type)
> +{
> +   switch (type) {
> +   case nir_instr_type_alu:
> +   case nir_instr_type_phi:
> +   case nir_instr_type_load_const:
> +      return true;
> +   default:
> +      return false;
> +   }
> +}
> +
> +/* Returns true if this is a change in status of the entry. This simplifies
> + * checking if users of this entry should be added to the worklist.
> + */
> +static bool
> +set_as_overdefined(lattice_entry *entry, nir_alu_type type)
> +{
> +   if (entry->type == overdefined && entry->range_type == type)
> +      return false;
> +
> +   entry->type = overdefined;
> +   entry->range_type = type;
> +
> +   for (unsigned i = 0; i < entry->ssa_def->num_components; i++) {
> +      switch (type) {
> +      case nir_type_float:
> +         entry->hi.f[i] = INFINITY;
> +         entry->lo.f[i] = -INFINITY;
> +         break;
> +
> +      case nir_type_int:
> +         entry->hi.i[i] = INT32_MAX;
> +         entry->lo.i[i] = INT32_MIN;
> +         break;
> +
> +      case nir_type_bool:
> +      case nir_type_unsigned:
> +         entry->hi.u[i] = UINT32_MAX;
> +         entry->lo.u[i] = 0;
> +         break;
> +      case nir_type_invalid:
> +         break;
> +      }
> +   }
> +
> +   return true;
> +}
> +
> +static inline void
> +set_range_float(lattice_entry *entry, range_to_set to_set, float_t value)
> +{
> +   for (unsigned i = 0; i < entry->ssa_def->num_components; i++) {
> +
> +      if (to_set == low || to_set == both)
> +         entry->lo.f[i] = value;
> +
> +      if (to_set == high || to_set == both)
> +         entry->hi.f[i] = value;
> +
> +      if (to_set == both) {
> +         entry->type = constant;
> +      } else {
> +         entry->type = range;
> +      }
> +   }
> +}
> +
> +static inline void
> +set_range_int(lattice_entry *entry, range_to_set to_set, int32_t value)
> +{
> +   for (unsigned i = 0; i < entry->ssa_def->num_components; i++) {
> +
> +      if (to_set == low || to_set == both)
> +         entry->lo.i[i] = value;
> +
> +      if (to_set == high || to_set == both)
> +         entry->hi.i[i] = value;
> +
> +      if (to_set == both) {
> +         entry->type = constant;
> +      } else {
> +         entry->type = range;
> +      }
> +   }
> +}
> +
> +static inline void
> +set_range_uint(lattice_entry *entry, range_to_set to_set, uint32_t value)
> +{
> +   for (unsigned i = 0; i < entry->ssa_def->num_components; i++) {
> +
> +      if (to_set == low || to_set == both)
> +         entry->lo.u[i] = value;
> +
> +      if (to_set == high || to_set == both)
> +         entry->hi.u[i] = value;
> +
> +      if (to_set == both) {
> +         entry->type = constant;
> +      } else {
> +         entry->type = range;
> +      }
> +   }
> +}
> +
> +static bool
> +is_type_max(nir_const_value *value, nir_alu_type type,
> +            unsigned num_components)
> +{
> +   for (unsigned i = 0; i < num_components; i++) {
> +      switch (type) {
> +      case nir_type_float:
> +         if (value->f[i] != INFINITY)
> +            return false;
> +         break;
> +
> +      case nir_type_int:
> +         if (value->i[i] != INT32_MAX)
> +            return false;
> +         break;
> +
> +      case nir_type_bool:
> +      case nir_type_unsigned:
> +         if (value->u[i] != UINT32_MAX)
> +            return false;
> +         break;
> +
> +      case nir_type_invalid:
> +         unreachable("not reached");
> +      }
> +   }
> +
> +   return true;
> +}
> +
> +static bool
> +is_type_min(nir_const_value *value, nir_alu_type type,
> +            unsigned num_components)
> +{
> +   for (unsigned i = 0; i < num_components; i++) {
> +      switch (type) {
> +      case nir_type_float:
> +         if (value->f[i] != -INFINITY)
> +            return false;
> +         break;
> +
> +      case nir_type_int:
> +         if (value->i[i] != INT32_MIN)
> +            return false;
> +         break;
> +
> +      case nir_type_bool:
> +      case nir_type_unsigned:
> +         if (value->u[i] != 0)
> +            return false;
> +         break;
> +
> +      case nir_type_invalid:
> +         unreachable("not reached");
> +      }
> +   }
> +
> +   return true;
> +}
> +
> +static nir_const_value
> +per_component_max(nir_const_value src0, nir_const_value src1,
> +                  nir_alu_type type, unsigned num_components)
> +{
> +   nir_const_value value;
> +   for (unsigned i = 0; i < num_components; i++) {
> +      switch (type) {
> +      case nir_type_float:
> +         value.f[i] = MAX2(src0.f[i], src1.f[i]);
> +         break;
> +      case nir_type_int:
> +         value.i[i] = MAX2(src0.i[i], src1.i[i]);
> +         break;
> +      case nir_type_bool:
> +      case nir_type_unsigned:
> +         value.u[i] = MAX2(src0.u[i], src1.u[i]);
> +         break;
> +      case nir_type_invalid:
> +         range_print("Nir type invalid in max \n");
> +         unreachable("not reached");
> +      }
> +   }
> +
> +   return value;
> +}
> +
> +static nir_const_value
> +per_component_min(nir_const_value src0, nir_const_value src1,
> +                  nir_alu_type type, unsigned num_components)
> +{
> +   nir_const_value value;
> +   for (unsigned i = 0; i < num_components; i++) {
> +      switch (type) {
> +      case nir_type_float:
> +         value.f[i] = MIN2(src0.f[i], src1.f[i]);
> +         break;
> +      case nir_type_int:
> +         value.i[i] = MIN2(src0.i[i], src1.i[i]);
> +         break;
> +      case nir_type_bool:
> +      case nir_type_unsigned:
> +         value.u[i] = MIN2(src0.u[i], src1.u[i]);
> +         break;
> +      case nir_type_invalid:
> +         unreachable("not reached");
> +      }
> +   }
> +
> +   return value;
> +}
> +
> +typedef enum {
> +   LESS,
> +   LESS_EQUAL,
> +   EQUAL,
> +   GREATER_EQUAL,
> +   GREATER,
> +   MIXED,
> +   COMPARE_FAIL
> +} compare_range;
> +


I decided upon a design for the compare_entries function that
gave no issues and artifacts, so I'll stick it here as an update.


static compare_range
compare_entries(nir_alu_src *a, nir_alu_src *b, nir_alu_type type,
                unsigned num_components, value_range_state *state)
{
   lattice_entry *entry_a = get_lattice_entry(a->src.ssa, state);
   lattice_entry *entry_b = get_lattice_entry(b->src.ssa, state);

   bool foundless = true;
   bool found_low_equal = true;
   bool found_high_equal = true;
   bool foundgreater = true;

   for (unsigned i = 0; i < num_components; i++) {
      switch (type) {
      case nir_type_float:
         if (entry_a->hi.f[a->swizzle[i]] >= entry_b->lo.f[b->swizzle[i]])
            foundless = false;

         if (entry_a->hi.f[a->swizzle[i]] > entry_b->lo.f[b->swizzle[i]])
            found_low_equal = false;

         if (entry_a->lo.f[a->swizzle[i]] <= entry_b->hi.f[b->swizzle[i]])
            foundgreater = false;

         if (entry_a->lo.f[a->swizzle[i]] < entry_b->hi.f[b->swizzle[i]])
            found_high_equal = false;
         break;

      case nir_type_int:
         if (entry_a->hi.i[a->swizzle[i]] >= entry_b->lo.i[b->swizzle[i]])
            foundless = false;

         if (entry_a->hi.i[a->swizzle[i]] > entry_b->lo.i[b->swizzle[i]])
            found_low_equal = false;

         if (entry_a->lo.i[a->swizzle[i]] <= entry_b->hi.i[b->swizzle[i]])
            foundgreater = false;

         if (entry_a->lo.i[a->swizzle[i]] < entry_b->hi.i[b->swizzle[i]])
            found_high_equal = false;
         break;

      case nir_type_unsigned:
      case nir_type_bool:
         if (entry_a->hi.u[a->swizzle[i]] >= entry_b->lo.u[b->swizzle[i]])
            foundless = false;

         if (entry_a->hi.u[a->swizzle[i]] > entry_b->lo.u[b->swizzle[i]])
            found_low_equal = false;

         if (entry_a->lo.u[a->swizzle[i]] <= entry_b->hi.u[b->swizzle[i]])
            foundgreater = false;

         if (entry_a->lo.u[a->swizzle[i]] < entry_b->hi.u[b->swizzle[i]])
            found_high_equal = false;
         break;
      default:
         return COMPARE_FAIL;
      }
   }

   if (foundless)
      return LESS;
   if (found_high_equal && found_low_equal)
      return EQUAL;
   if (found_low_equal)
      return LESS_EQUAL;
   if (found_high_equal)
      return GREATER_EQUAL;
   if (foundgreater)
      return GREATER;

   return COMPARE_FAIL;
}

> +
> +static bool
> +is_entry_overdefined(lattice_entry *entry)
> +{
> +   if (entry->type == overdefined)
> +      return true;
> +
> +   if (entry->type == constant)
> +      return false;
> +
> +   if (entry->type == undefined)
> +      return false;
> +
> +   /* This checks high and low to find out if the range is indeeed maximum
> +    * and mininum of the type, and therefore in fact is overdefined. This
> +    * can happen in a very trivial case like phi(a, b) where a = abs(x) and
> +    * b = neg(abs(y)) and we don't know the range of either x or y. However,
> +    * if we don't have a valid range type then we bail early.
> +    */
> +   if (entry->range_type == nir_type_invalid)
> +      return true;
> +
> +   if (is_type_max(&entry->hi, entry->range_type,
> +                   entry->ssa_def->num_components) &&
> +       is_type_min(&entry->lo, entry->range_type,
> +                   entry->ssa_def->num_components))
> +      return true;
> +
> +   return false;
> +}
> +
> +static bool
> +is_component_false(lattice_entry *entry, unsigned component)
> +{
> +   /* This can be simplified to check for 0x00000000 */
> +   return entry->lo.i[component] == 0x00000000 &&
> +          entry->hi.i[component] == 0x00000000; //XXX: This is not correct. 
> Floats also have negative zero. Is this also false?
> +/*
> +   switch (entry->range_type) {
> +   case nir_type_int:
> +      return entry->lo.i[component] == 0 &&
> +             entry->hi.i[component] == 0;
> +   case nir_type_unsigned:
> +   case nir_type_bool:
> +      return entry->lo.u[component] == 0 &&
> +             entry->hi.u[component] == 0;
> +   case nir_type_float:
> +      return entry->lo.f[component] == 0.0f &&
> +             entry->hi.f[component] == 0.0f;
> +   case nir_type_invalid:
> +      unreachable("not reached");
> +   }
> +*/
> +   return false;
> +}
> +
> +static bool
> +is_entry_false(lattice_entry *entry)
> +{
> +   bool is_false = true;
> +
> +   if (is_entry_overdefined(entry))
> +         return false;
> +
> +   for (uint8_t i = 0; i < entry->ssa_def->num_components; i++)
> +      is_false = is_false && is_component_false(entry, i);
> +
> +   return is_false;
> +}
> +
> +static bool
> +is_component_true(lattice_entry *entry, unsigned component)
> +{
> +
> +   switch (entry->range_type) {
> +   case nir_type_int:
> +      return entry->lo.i[component] > 0 ||
> +             entry->hi.i[component] < 0;
> +   case nir_type_unsigned:
> +   case nir_type_bool:
> +      return entry->lo.u[component] > 0;
> +   case nir_type_float:
> +      return entry->lo.f[component] > 0.0f ||
> +             entry->hi.f[component] < 0.0f;
> +   case nir_type_invalid:
> +      return false; // XXX: This is an issue. Should probably check the user 
> for it's type, and use that here.
> +   }
> +
> +   return false;
> +}
> +
> +static bool
> +is_entry_true(lattice_entry *entry)
> +{
> +   bool is_true = true;
> +
> +   if (is_entry_overdefined(entry))
> +      return false;
> +
> +   for (uint8_t i = 0; i < entry->ssa_def->num_components; i++)
> +      is_true = is_true && is_component_true(entry, i);
> +
> +   return is_true;
> +}
> +
> +static bool
> +is_entry_constant(lattice_entry *entry, unsigned num_components)
> +{
> +   if (entry->type == constant)
> +      return true;
> +
> +   if (entry->type == overdefined)
> +      return false;
> +
> +   for (uint8_t i = 0; i < num_components; i++) {
> +      if (entry->lo.u[i] != entry->hi.u[i])
> +         return false;
> +   }
> +
> +   entry->type = constant;
> +   return true;
> +}
> +
> +static void
> +mark_block_reachable(nir_block *block, value_range_state *state)
> +{
> +   BITSET_SET(state->reachable_blocks, block->index);
> +}
> +
> +static bool
> +is_block_reachable(nir_block *block, value_range_state *state)
> +{
> +   return BITSET_TEST(state->reachable_blocks, block->index);
> +}
> +
> +static bool
> +special_case_alu(nir_ssa_def *def, value_range_state *state)
> +{
> +   lattice_entry *entry = get_lattice_entry(def, state);
> +
> +   switch(nir_instr_as_alu(def->parent_instr)->op) {
> +   case nir_op_fsat:
> +      set_range_float(entry, low, 0.0f);
> +      set_range_float(entry, high, 1.0f);
> +      return true;
> +   case nir_op_fsin:
> +   case nir_op_fcos:
> +   case nir_op_fsign:
> +      set_range_float(entry, low, -1.0f);
> +      set_range_float(entry, high, 1.0f);
> +      return true;
> +   case nir_op_fabs:
> +   case nir_op_fexp2:
> +   case nir_op_fpow:
> +   case nir_op_frsq:
> +   case nir_op_fsqrt:
> +      set_range_float(entry, low, 0.0f);
> +      set_range_float(entry, high, INFINITY);
> +      return true;
> +   case nir_op_iabs:
> +      set_range_int(entry, low, 0);
> +      set_range_int(entry, high, INT32_MAX);
> +      return true;
> +   case nir_op_isign:
> +      set_range_int(entry, low, -1);
> +      set_range_int(entry, high, 1);
> +      return true;
> +
> +   /* We have limited type-information for these, and they "falsely"
> +    * advertise that they are integer oerations (?) causing us to set
> +    * wrong values for the higher and lower bound. Set them as invalid
> +    * so that we can infer the type-information from their uses later
> +    * on if necessary.
> +    */
> +   case nir_op_vec2:
> +   case nir_op_vec3:
> +   case nir_op_vec4:
> +      set_as_overdefined(entry, nir_type_invalid);
> +      return true;
> +   default:
> +      return false;
> +   }
> +}
> +
> +static bool
> +is_trivially_computable(nir_op op)
> +{
> +   switch (op) {
> +
> +   /* These just extract values, so are safe to compute */
> +   case nir_op_fmax:   case nir_op_imax:   case nir_op_umax:
> +   case nir_op_fmin:   case nir_op_imin:   case nir_op_umin:
> +
> +   /* We can have the following situation:
> +    *
> +    *  vec4 ssa_59 = txl ssa_58 (coord), ssa_0 (lod), sampler0 (sampler)
> +    *  vec1 ssa_60 = imov ssa_59
> +    *  vec1 ssa_61 = imov ssa_59.y
> +    *  vec1 ssa_62 = imov ssa_59.z
> +    *  vec1 ssa_63 = fmul ssa_59.y, ssa_7
> +    *
> +    *  We don't know the type of the txl operation, but we know that imow
> +    *  is an integer operation, and so we set ssa_59 and ssa_60 to
> +    *  [i_min, i_max]. That is not correct however, as we find out in ssa_63.
> +    *  The values are instead floats. So we convert the range of ssa_62 to
> +    *  float-min and max, as we know it is overdefined, and so it works out.
> +    *  This is a bit fragile, but should work.
> +    */
> +   case nir_op_fmov:   case nir_op_imov:
> +
> +   /* Conditional selects also just extract values */
> +   case nir_op_bcsel:  case nir_op_fcsel:
> +
> +   /* These also just extract values, should be safe. However, we have
> +    * troubles with type information, and so we compute wrong values */
> +//   case nir_op_vec2:   case nir_op_vec3:   case nir_op_vec4:
> +
> +   /* These just limit the range, or flip it, and so are safe to compute */
> +   case nir_op_fsat:
> +   case nir_op_fsign:  case nir_op_isign:
> +   case nir_op_fneg:   case nir_op_ineg:
> +
> +   /* These are not safe, as you can have a case where the input is
> +    * [-inf, inf] which becomes [inf, inf] which is clearly wrong. This
> +    * is dealth with in a special-casing in the compute-range section.
> +    */
> +   case nir_op_fabs:   case nir_op_iabs:
> +
> +   /* These are also not safe. If we have X = [-inf, inf] and Y = [0, 0]
> +    * it will never match, and so an is-equal comparison will always return
> +    * false, although that is obviously wrong since it might very well match.
> +    * This should also be handled separately. This can be checked by 
> utilizing
> +    * the compare_entries function. XXX: Fix this!
> +    */
> +//   case nir_op_feq:    case nir_op_ieq:   case nir_op_seq:
> +//   case nir_op_fge:    case nir_op_ige:   case nir_op_sge:   case 
> nir_op_uge:
> +//   case nir_op_flt:    case nir_op_ilt:   case nir_op_slt:   case 
> nir_op_ult:
> +//   case nir_op_fne:    case nir_op_ine:   case nir_op_sne:
> +
> +   /* Logical not is safe. Most likely we can add "and" and "or" too */
> +   case nir_op_fnot:   case nir_op_inot:
> +
> +      return true;
> +
> +   default:
> +      return false;
> +
> +   }
> +}
> +
> +static void
> +evaluate_alu_instr(nir_alu_instr *alu, value_range_state *state)
> +{
> +   lattice_entry *entry = get_lattice_entry(&alu->dest.dest.ssa, state);
> +   lattice_entry *src[4];
> +   bool all_constant = true;
> +
> +   /* XXX: How many times should we maximally visit an alu? This seems to 
> never occur */
> +   if (entry->visits > 3) {
> +      set_as_overdefined(entry, nir_op_infos[alu->op].output_type);
> +      return;
> +   }
> +
> +   for (unsigned i = 0; i < nir_op_infos[alu->op].num_inputs; i++) {
> +
> +      src[i] = get_lattice_entry(alu->src[i].src.ssa, state);
> +
> +      /* Check if any of the sources are overdefined. If one of them are then
> +       * the result of this entry will also be overdefined. We may however
> +       * have cases where we only select one of the operands(max,csel), or we
> +       * are limiting the range of the operand (abs, sat). These can still be
> +       * safely computed, as we will not get undefined results.
> +       */
> +      if (is_entry_overdefined(src[i]) && !is_trivially_computable(alu->op)) 
> {
> +
> +         if (!special_case_alu(entry->ssa_def, state))
> +            set_as_overdefined(entry, nir_op_infos[alu->op].output_type);
> +
> +         entry->visits++;
> +         return;
> +      }
> +
> +      if (src[i]->type == undefined)
> +         return;
> +
> +      unsigned num_components = nir_op_infos[alu->op].input_sizes[i];
> +
> +      if (num_components == 0)
> +         num_components = alu->dest.dest.ssa.num_components;
> +
> +      all_constant = all_constant && is_entry_constant(src[i], 
> num_components);
> +   }
> +
> +   /**
> +    * XXX: Snippet from the comment in nir.h
> +    *
> +    * For each input component, says which component of the register it is
> +    * chosen from. Note that which elements of the swizzle are used and which
> +    * are ignored are based on the write mask for most opcodes - for example,
> +    * a statement like "foo.xzw = bar.zyx" would have a writemask of 1101b 
> and
> +    * a swizzle of {2, x, 1, 0} where x means "don't care."
> +    *
> +    * XXX: We handle swizzles at least partly correct now. However, we do not
> +    * deal with writemasks yet, and so we are probably doing wrong things 
> here.
> +    */
> +
> +   if (all_constant) {
> +      nir_const_value const_src[4];
> +
> +      for (unsigned i = 0; i < nir_op_infos[alu->op].num_inputs; i++) {
> +         for (unsigned j = 0; j < entry->ssa_def->num_components; j++) {
> +            const_src[i].f[j] = src[i]->lo.f[alu->src[i].swizzle[j]];
> +         }
> +      }
> +
> +      nir_const_value dest =
> +               nir_eval_const_opcode(alu->op,
> +                                     alu->dest.dest.ssa.num_components,
> +                                     const_src);
> +
> +      entry->visits++;
> +      entry->type = constant;
> +      entry->lo = dest;
> +      entry->hi = dest;
> +      entry->range_type = nir_op_infos[alu->op].output_type;
> +      return;
> +   }
> +
> +   /* Check if the ssa-def is in a loop */
> +   if (BITSET_TEST(state->blocks_in_loop,
> +                   entry->ssa_def->parent_instr->block->index)) {
> +
> +      /* We have for the first time detected that the ssa-def is in a loop.
> +       * Initialize to the best of our knowledge and mark it as in loop
> +       */
> +      if (!special_case_alu(entry->ssa_def, state))
> +         set_as_overdefined(entry, nir_op_infos[alu->op].output_type);
> +
> +      /* Mark it so that we don't visit it anymore */
> +      entry->in_loop = true;
> +      return;
> +   }
> +
> +   /* If all inputs are of range or constant type we can calculate the range
> +    * of the operation instead of hand-rolling this ourselves. We can only
> +    * do this for a select amount of operations. (Basically those that only
> +    * select/filter the operands, or that restricts the range). In cases
> +    * like mul, add, etc we can not compute the range trivially, as we can
> +    * get situations where we multiply -inf and inf and produce nan, or we
> +    * overflow integer types.
> +    *
> +    * This will allow things like sat to possibly give us constants. It will
> +    * calculate the constant corresponding to the upper and lower value. If
> +    * these are found to be the same in the "is-def-constant" function then
> +    * it will get marked as constant, so there is no need to "special-case"
> +    * things like sat(a) where a < 0.0
> +    */
> +   if (is_trivially_computable(alu->op)) {
> +      nir_const_value lo_consts[4];
> +      nir_const_value hi_consts[4];
> +
> +      for (unsigned i = 0; i < nir_op_infos[alu->op].num_inputs; i++) {
> +         /* We may have set one of the operands to overdefined without 
> knowing
> +          * what type it was going to be. Now we know that the operand is 
> used
> +          * in an operation with a give alu-type, and so we can set the
> +          * operand as overdef again, but with the right alu-type.
> +          */
> +         if (is_entry_overdefined(src[i]) &&
> +             src[i]->range_type != nir_op_infos[alu->op].input_types[i])
> +            set_as_overdefined(src[i], nir_op_infos[alu->op].input_types[i]);
> +
> +         for (unsigned j = 0; j < entry->ssa_def->num_components; j++) {
> +            lo_consts[i].f[j] = src[i]->lo.f[alu->src[i].swizzle[j]];
> +            hi_consts[i].f[j] = src[i]->hi.f[alu->src[i].swizzle[j]];
> +         }
> +      }
> +
> +      bool first = true;
> +      nir_const_value temp_const[4];
> +      nir_const_value computed_low;
> +      nir_const_value computed_high;
> +
> +      /* Possible combinations we can make with high and low for num_inputs 
> */
> +      unsigned combinations = 1 << nir_op_infos[alu->op].num_inputs;
> +
> +      for (unsigned i = 0; i < combinations; i++) {
> +
> +         /* Do a copy of low or high values according to mask */
> +         for (unsigned j = 0; j < nir_op_infos[alu->op].num_inputs; j++)
> +            temp_const[j] = i & (1 << j) ? hi_consts[j] : lo_consts[j];
> +
> +         nir_const_value temp =
> +               nir_eval_const_opcode(alu->op,
> +                                     alu->dest.dest.ssa.num_components,
> +                                     temp_const);
> +
> +         if (first) {
> +            /* If this is the first run then set an initial value */
> +            computed_low = computed_high = temp;
> +            first = false;
> +         } else {
> +            /* Do a per_component min/max to get the worst case scenario */
> +            computed_low = per_component_min(computed_low, temp,
> +                                             
> nir_op_infos[alu->op].output_type,
> +                                             entry->ssa_def->num_components);
> +            computed_high = per_component_max(computed_high, temp,
> +                                              
> nir_op_infos[alu->op].output_type,
> +                                              
> entry->ssa_def->num_components);
> +         }
> +      }
> +
> +      /* If the operations was an abs-operation we will have wrongfully
> +       * computed the lower value for the abs. Take a component-wise min
> +       * of the low-ranges of both operands. If that is higher than zero
> +       * then set that as the range, if not, set zero as lowest value.
> +       */
> +      if (alu->op == nir_op_fabs) {
> +         nir_const_value temp =
> +               per_component_min(lo_consts[0], lo_consts[1],
> +                                 nir_type_float,
> +                                 entry->ssa_def->num_components);
> +
> +         if (IS_FLOAT_CONSTANT(temp, >, 0.0f,
> +                               entry->ssa_def->num_components)) {
> +            computed_low = temp;
> +         } else {
> +            computed_low.f[0] = 0.0f;    computed_low.f[1] = 0.0f;
> +            computed_low.f[2] = 0.0f;    computed_low.f[3] = 0.0f;
> +         }
> +      }
> +
> +      if (alu->op == nir_op_iabs) {
> +         nir_const_value temp =
> +               per_component_min(lo_consts[0], lo_consts[1],
> +                                 nir_type_int,
> +                                 entry->ssa_def->num_components);
> +
> +         if (IS_INT_CONSTANT(temp, >, 0,
> +                             entry->ssa_def->num_components)) {
> +            computed_low = temp;
> +         } else {
> +            computed_low.i[0] = 0;    computed_low.i[1] = 0;
> +            computed_low.i[2] = 0;    computed_low.i[3] = 0;
> +         }
> +      }
> +
> +      entry->visits++;
> +      entry->type = range;
> +      entry->lo = computed_low;
> +      entry->hi = computed_high;
> +      entry->range_type = nir_op_infos[alu->op].output_type;
> +      return;
> +   }
> +
> +   if (special_case_alu(&alu->dest.dest.ssa, state))
> +      return;
> +
> +   entry->visits++;
> +   set_as_overdefined(entry, nir_op_infos[alu->op].output_type);
> +   return;
> +}
> +
> +static bool
> +evaluate_entry(lattice_entry *entry, value_range_state *state)
> +{
> +   lattice_type old_type = entry->type;
> +   nir_const_value old_hi;
> +   nir_const_value old_lo;
> +
> +   /* If it is already overdefined then that can not change */
> +   if (entry->type == overdefined)
> +      return false;
> +
> +   /* We have already marked load_consts as constant, no use evaluating
> +    *
> +    * XXX: This is most likely not correct according to the papers. We can
> +    * still go from constant and down to overdefined in some cases, so we
> +    * should allow "analyzing" of constants
> +    */
> +//   if (is_entry_constant(entry))
> +//      return false;
> +
> +   for (unsigned i = 0; i < 4; i++) {
> +      old_hi.f[i] = entry->hi.f[i];
> +      old_lo.f[i] = entry->lo.f[i];
> +   }
> +
> +   switch (entry->ssa_def->parent_instr->type) {
> +   case nir_instr_type_load_const: {
> +      /* First time we encounter a load_const, mark as constant */
> +      nir_load_const_instr *instr =
> +         nir_instr_as_load_const(entry->ssa_def->parent_instr);
> +      entry->type = constant;
> +      entry->lo = instr->value;
> +      entry->hi = instr->value;
> +      entry->range_type = nir_type_invalid;
> +      break;
> +   }
> +
> +   case nir_instr_type_alu: {
> +      nir_alu_instr *alu = nir_instr_as_alu(entry->ssa_def->parent_instr);
> +      evaluate_alu_instr(alu, state);
> +      break;
> +   }
> +
> +   default:
> +      return set_as_overdefined(entry, nir_type_invalid);
> +   }
> +
> +   /* If we find that the entry has become overdefined without marking it as
> +    * such that means we have calculated the range to +/- inf. That means we
> +    * have changed the status of the entry,
> +    */
> +   if (is_entry_overdefined(entry)) {
> +      set_as_overdefined(entry, entry->range_type);
> +      return true;
> +   }
> +
> +   /* Now we check if the information for the instruction has changed.
> +    * If it has then we return true, so that we can evaluate the users.
> +    */
> +   if (entry->type != old_type)
> +      return true;
> +
> +   for (unsigned i = 0; i < 4; i++) {
> +      if (old_hi.f[i] != entry->hi.f[i] ||
> +          old_lo.f[i] != entry->lo.f[i])
> +      return true;
> +   }
> +
> +   return false;
> +}
> +
> +/* Coordinates finding the uses of the ssa_def corresponding to the entry
> + * and sticking them in the ssa_worklist.
> + * Should be run on every lattice_entry that we change the information of.
> + */
> +static void
> +coordinate_uses(lattice_entry *entry, value_range_state *state)
> +{
> +   nir_foreach_use_safe(entry->ssa_def, src) {
> +      nir_instr *user = src->parent_instr;
> +
> +      /* No point in checking the use if it is not yet found reachable */
> +      if (!is_block_reachable(user->block, state))
> +         continue;
> +
> +      /* We don't want to deal with unsupported instructions. This should
> +       * already be marked overdefined on the first visit of the block it
> +       * is in, so no point in doing anything special
> +       */
> +      if (!is_instr_supported(user->type))
> +         continue;
> +
> +      nir_ssa_def *def = nir_instr_get_dest_ssa_def(user);
> +
> +      /* If it is overdefined we want to push it to head of the list.
> +       * That way we will propagate those faster, avoiding visiting
> +       * ssa-defs with overdefined sources multiple times. */
> +      if (is_entry_overdefined(entry)) {
> +         nir_ssa_def_worklist_push_head(&state->ssa_worklist, def);
> +      } else {
> +         nir_ssa_def_worklist_push_tail(&state->ssa_worklist, def);
> +      }
> +   }
> +
> +   nir_foreach_if_use_safe(entry->ssa_def, src) {
> +      /* If the condition was used for an if then we should do something
> +       * about the if to "push our range" into the then and else branch by
> +       * inserting a copy in each of the blocks where we "apply" the range
> +       * implied by the if-statement.
> +       */
> +      nir_if *if_statement = src->parent_if;
> +
> +      nir_cf_node *then_node = nir_if_first_then_node(if_statement);
> +      nir_cf_node *else_node = nir_if_first_else_node(if_statement);
> +
> +      nir_block *then_block = nir_cf_node_as_block(then_node);
> +      nir_block *else_block = nir_cf_node_as_block(else_node);
> +
> +      if (is_entry_true(entry)) {
> +         nir_block_worklist_push_tail(&state->block_worklist, then_block);
> +         continue;
> +      }
> +
> +      if (is_entry_false(entry)) {
> +         nir_block_worklist_push_tail(&state->block_worklist, else_block);
> +         continue;
> +      }
> +
> +      if (entry->type == undefined)
> +         continue;
> +
> +      nir_block_worklist_push_tail(&state->block_worklist, then_block);
> +      nir_block_worklist_push_tail(&state->block_worklist, else_block);
> +   }
> +}
> +
> +static bool
> +handle_unsupported_def(nir_ssa_def *def, void *void_state)
> +{
> +   value_range_state *state = void_state;
> +   lattice_entry *entry = get_lattice_entry(def, state);
> +   set_as_overdefined(entry, nir_type_invalid);
> +   coordinate_uses(entry, state);
> +   return true;
> +}
> +
> +/* On the first run of a block we want to always check all the uses
> + * of the instructions that we process.
> + */
> +static void
> +evaluate_block(nir_block *block, value_range_state *state)
> +{
> +   nir_foreach_instr_safe(block, instr) {
> +
> +      /* Skip doing anything to unsupported instructions */
> +      if (!is_instr_supported(instr->type)) {
> +         nir_foreach_ssa_def(instr, handle_unsupported_def, state);
> +         continue;
> +      }
> +
> +      nir_ssa_def *def = nir_instr_get_dest_ssa_def(instr);
> +      lattice_entry *entry = get_lattice_entry(def, state);
> +
> +      evaluate_entry(entry, state);
> +      coordinate_uses(entry, state);
> +   }
> +}
> +
> +static void
> +mark_cf_node_loop(nir_cf_node *node, value_range_state *state, bool in_loop)
> +{
> +   switch (node->type) {
> +   case nir_cf_node_block:
> +
> +      if (in_loop)
> +         BITSET_SET(state->blocks_in_loop, 
> nir_cf_node_as_block(node)->index);
> +
> +      return;
> +   case nir_cf_node_if: {
> +      nir_if *if_stmt = nir_cf_node_as_if(node);
> +      foreach_list_typed(nir_cf_node, nested_node, node, &if_stmt->then_list)
> +         mark_cf_node_loop(nested_node, state, in_loop);
> +      foreach_list_typed(nir_cf_node, nested_node, node, &if_stmt->else_list)
> +         mark_cf_node_loop(nested_node, state, in_loop);
> +      return;
> +   }
> +   case nir_cf_node_loop: {
> +      nir_loop *loop = nir_cf_node_as_loop(node);
> +      foreach_list_typed(nir_cf_node, nested_node, node, &loop->body)
> +         mark_cf_node_loop(nested_node, state, true);
> +      break;
> +   }
> +   default:
> +      unreachable("unknown cf node type");
> +   }
> +}
> +
> +static bool
> +nir_opt_value_range_impl(nir_function_impl *impl)
> +{
> +   /* We might want to run a pass to insert "pi-nodes" into the
> +    * ssa-tree before running the pass. This is essentially just
> +    * a mov x2 = x1 that we use to have something to "store" the
> +    * range implied by things like if's.
> +    *
> +    * This will also lead to a need of inserting more phi-nodes,
> +    * as one gets variables that diverge and then converge.
> +    *
> +    * x1 = ....; [-unlimited, unlimited]
> +    * if (x1 < 7)
> +    *    x2 = x1; [-unlimited, 7]
> +    *    |
> +    *    |
> +    * else
> +    *    x3 = x1; [7, unlimited]
> +    *    |
> +    *    |
> +    * x4 = phi(x2, x3);
> +    */
> +
> +   /* We want to recalculate the ssa-indexes first, to reduce our memory
> +    * consumption and amount of "empty" value-range-variables
> +    */
> +   nir_index_ssa_defs(impl);
> +   nir_index_blocks(impl);
> +
> +   /* No use in running the pass if there are no ssa-defs */
> +   if (impl->ssa_alloc == 0)
> +      return false;
> +
> +   bool progress = false;
> +
> +   void *mem_ctx = ralloc_context(NULL);
> +   value_range_state state;
> +
> +   state.entries = rzalloc_array(mem_ctx, lattice_entry,
> +                                         impl->ssa_alloc);
> +
> +   nir_block_worklist_init(&state.block_worklist, impl->num_blocks,
> +                           mem_ctx);
> +
> +   nir_ssa_def_worklist_init(&state.ssa_worklist, impl->ssa_alloc,
> +                             mem_ctx);
> +
> +   state.reachable_blocks = rzalloc_array(mem_ctx, BITSET_WORD,
> +                                          BITSET_WORDS(impl->num_blocks));
> +
> +   state.blocks_in_loop = rzalloc_array(mem_ctx, BITSET_WORD,
> +                                        BITSET_WORDS(impl->num_blocks));
> +
> +   /* Mark the blocks that are inside a loop so we can avoid them */
> +   foreach_list_typed(nir_cf_node, node, node, &impl->body) {
> +      if (node->type == nir_cf_node_loop)
> +         mark_cf_node_loop(node, &state, true);
> +      else
> +         mark_cf_node_loop(node, &state, false);
> +   }
> +
> +   nir_block_worklist_push_head(&state.block_worklist, impl->start_block);
> +
> +   /* Process the work lists until they are empty */
> +   while (state.block_worklist.count > 0 ||
> +          state.ssa_worklist.count > 0) {
> +
> +      /* Process the basic block work list */
> +      while (state.block_worklist.count > 0) {
> +         nir_block *block = 
> nir_block_worklist_pop_head(&state.block_worklist);
> +
> +         /* Since we have our "coordinate_uses" function that also handles
> +          * phi nodes we can skip the block if it is already set reachable,
> +          * as the phi's will get automagically added to the ssa-worklist
> +          * as they are users of the defs.
> +          */
> +         if (is_block_reachable(block, &state))
> +            continue;
> +
> +         /* Block has been determined to be reachable, mark it */
> +         mark_block_reachable(block, &state);
> +
> +         /* Evaluate all phi's and expressions of the block. */
> +         evaluate_block(block, &state);
> +
> +         /* If the block has only one successor then add it to the worklist 
> */
> +         if ((block->successors[0] != NULL) &&
> +             (block->successors[1] == NULL)) {
> +            nir_block_worklist_push_tail(&state.block_worklist,
> +                                         block->successors[0]);
> +            continue;
> +         }
> +      }
> +
> +      /* Process the instruction work list
> +       * This doesn't do things exactly like in the paper.
> +       * Instead of storing the instructions that have changed and processing
> +       * each user we are instead adding to the list all the users of
> +       * instructions that have changed. In practice there is no difference,
> +       * apart from dealing with uses is moved out to a separate function.
> +       */
> +      while (state.ssa_worklist.count > 0) {
> +         /* All instructions in the list are here because
> +          * we got new information about the range of an operand.
> +          */
> +         nir_ssa_def *def = 
> nir_ssa_def_worklist_pop_head(&state.ssa_worklist);
> +         lattice_entry *entry = get_lattice_entry(def, &state);
> +
> +         /* If the def is in a loop we don't want to do anything.
> +          * (The instruction is initialized as best we can when we mark it.)
> +          * When the block it's in is added to the worklist the entry will
> +          * get processed, and so we will evaluate its users.
> +          */
> +         if (entry->in_loop)
> +            continue;
> +
> +         /* Evaluate the ssa_def. If it has changed then add users to list */
> +         if (evaluate_entry(entry, &state))
> +            coordinate_uses(entry, &state);
> +      }
> +
> +      /* If both lists are empty that means we are done. However, we assume
> +       * branches on undef values can not reach any of their successors.
> +       * This is not a safe assumption, and can lead to eliminating things
> +       * we shouldn't. Do a trip over all the ssa-defs to find unresolved
> +       * branches. If one is found then add the "then"-branch to the block
> +       * worklist. That way we will find the rest of the CFG, and only
> +       * slightly pessimizes the analysis results.
> +       *
> +       * XXX: It might be of interest to also check the what operations
> +       * use undefined values as sources, and resolve those. Some operations
> +       * should return undef when there's a undef operand, while others
> +       * should return i.e. 0 to be conservative.
> +       *
> +       * This part is implemented in a "resolvedUndefsIn" function in the
> +       * LLVM SCCP pass, and should give a good idea of what should be done.
> +       */
> +      if (state.block_worklist.count == 0 &&
> +          state.ssa_worklist.count == 0) {
> +
> +         for (unsigned i = 0; i < impl->ssa_alloc; i++) {
> +            lattice_entry *entry = &state.entries[i];
> +
> +            if (entry->type != undefined)
> +               continue;
> +
> +            if (!entry->ssa_def)
> +               continue;
> +
> +            if (!list_empty(&entry->ssa_def->if_uses)) {
> +               nir_foreach_if_use_safe(entry->ssa_def, src) {
> +                  nir_cf_node *node = nir_if_first_then_node(src->parent_if);
> +                  /* First node in the branch should be a block */
> +                  assert(node->type == nir_cf_node_block);
> +                  nir_block *block = nir_cf_node_as_block(node);
> +                  nir_block_worklist_push_tail(&state.block_worklist, block);
> +               }
> +            }
> +         }
> +      }
> +   }
> +
> +   /* We can now traverse over blocks and delete those that
> +    * are still marked as unreachable. If we delete a basic block
> +    * we need to first rewrite the phi's that use the results from
> +    * the BB.
> +    *
> +    * For now we want to leave the blocks in place, as when the
> +    * conditional for the block that is unreachable is set as a constant
> +    * Connor's pass for removing dead control flow will come along
> +    * and clean up the blocks that can not be reached.
> +    */
> +
> +   /* Check for all values that are proved to be constant,
> +    * and replace them with their constant instruction counterpart. */
> +   for (unsigned i = 0; i < (impl->ssa_alloc - 1); i++) {
> +      lattice_entry *entry = &(state.entries[i]);
> +
> +      /* If it's a constant thats not a load_const then make a load_const
> +       * instruction and replace the uses of the old instr with that.
> +       */
> +      if (is_entry_constant(entry, entry->ssa_def->num_components) &&
> +          (entry->ssa_def->parent_instr->type != nir_instr_type_load_const)) 
> {
> +
> +         /* XXX: Doing the below replacement causes crashes because there is
> +          * something I have not yet understood about how to rewrite things.
> +          */
> +/*
> +         nir_load_const_instr *new_instr =
> +            nir_load_const_instr_create(impl->overload->function->shader,
> +                                        entry->ssa_def->num_components);
> +
> +         new_instr->value = entry->hi;
> +
> +         nir_ssa_def_init(&new_instr->instr, &new_instr->def,
> +                          entry->ssa_def->num_components,
> +                          "Constant propagated const");
> +
> +         nir_instr_insert_before(entry->ssa_def->parent_instr, 
> &new_instr->instr);
> +
> +         nir_ssa_def_rewrite_uses(entry->ssa_def, 
> nir_src_for_ssa(&new_instr->def),
> +                                  impl);
> +
> +         nir_instr_remove(entry->ssa_def->parent_instr);
> +         ralloc_free(entry->ssa_def->parent_instr);
> +*/
> +
> +         continue;
> +      }
> +
> +      if (entry->ssa_def->parent_instr->type != nir_instr_type_alu)
> +         continue;
> +
> +      nir_alu_instr *alu = nir_instr_as_alu(entry->ssa_def->parent_instr);
> +
> +      if (nir_op_infos[alu->op].num_inputs == 2 &&
> +          !(is_entry_overdefined(get_lattice_entry(alu->src[0].src.ssa, 
> &state)) ||
> +            is_entry_overdefined(get_lattice_entry(alu->src[1].src.ssa, 
> &state)))) {
> +
> +         compare_range result = MIXED;
> +
> +         switch (alu->op) {
> +         case nir_op_fmax:   case nir_op_fmin:
> +         case nir_op_imax:   case nir_op_imin:
> +         case nir_op_umax:   case nir_op_umin:
> +
> +            result = compare_entries(&alu->src[0], &alu->src[1],
> +                                     nir_op_infos[alu->op].output_type,
> +                                     entry->ssa_def->num_components, &state);
> +            break;
> +         default:
> +            break;
> +         }
> +
> +         if (result == LESS || result == LESS_EQUAL) {
> +            if (alu->op == nir_op_fmax ||
> +                alu->op == nir_op_imax ||
> +                alu->op == nir_op_umax)
> +               nir_ssa_def_rewrite_uses(entry->ssa_def, alu->src[1].src, 
> mem_ctx);
> +            else
> +               nir_ssa_def_rewrite_uses(entry->ssa_def, alu->src[0].src, 
> mem_ctx);
> +
> +            progress = true;
> +            continue;
> +         }
> +
> +         if (result == GREATER || result == GREATER_EQUAL) {
> +            if (alu->op == nir_op_fmax ||
> +                alu->op == nir_op_imax ||
> +                alu->op == nir_op_umax)
> +               nir_ssa_def_rewrite_uses(entry->ssa_def, alu->src[0].src, 
> mem_ctx);
> +            else
> +               nir_ssa_def_rewrite_uses(entry->ssa_def, alu->src[1].src, 
> mem_ctx);
> +
> +            progress = true;
> +            continue;
> +         }
> +      }
> +
> +      /* XXX: Here we also want to add checks for undefined behaviour
> +       *
> +       * We also want to detect lo = 0, hi = 1 and replace with sat.
> +       * We also want to use the compare_entries function to check these
> +       */
> +
> +
> +      /* Check if the result of a comparison can be determined */
> +      if (nir_op_infos[alu->op].num_inputs == 2 &&
> +          !(is_entry_overdefined(get_lattice_entry(alu->src[0].src.ssa, 
> &state)) ||
> +            is_entry_overdefined(get_lattice_entry(alu->src[1].src.ssa, 
> &state)))) {
> +
> +         compare_range result = MIXED;
> +
> +         switch (alu->op) {
> +         case nir_op_feq:  case nir_op_ieq:  case nir_op_seq:
> +         case nir_op_fge:  case nir_op_ige:  case nir_op_sge:  case 
> nir_op_uge:
> +         case nir_op_flt:  case nir_op_ilt:  case nir_op_slt:  case 
> nir_op_ult:
> +         case nir_op_fne:  case nir_op_ine:  case nir_op_sne:
> +
> +            result = compare_entries(&alu->src[0], &alu->src[1],
> +                                     nir_op_infos[alu->op].input_types[0],
> +                                     entry->ssa_def->num_components, &state);
> +            break;
> +         default:
> +            break;
> +         }
> +
> +         if (alu->op == nir_op_feq ||
> +             alu->op == nir_op_ieq ||
> +             alu->op == nir_op_seq) {
> +            if (result == EQUAL) {
> +               /* We should insert a true */
> +            }
> +
> +            if (result == LESS || result == GREATER) {
> +               /* We should insert a false */
> +            }
> +         }
> +
> +         if (alu->op == nir_op_fne ||
> +             alu->op == nir_op_ine ||
> +             alu->op == nir_op_sne) {
> +            if (result == EQUAL) {
> +               /* We should insert a false */
> +            }
> +
> +            if (result == LESS || result == GREATER) {
> +               /* We should insert a true */
> +            }
> +         }
> +
> +         if (alu->op == nir_op_flt || alu->op == nir_op_ilt ||
> +             alu->op == nir_op_slt || alu->op == nir_op_ult) {
> +            if (result == LESS) {
> +               /* We should insert a true */
> +            }
> +
> +            if (result == GREATER) {
> +               /* We should insert a false */
> +            }
> +         }
> +
> +         if (alu->op == nir_op_fge || alu->op == nir_op_ige ||
> +             alu->op == nir_op_sge || alu->op == nir_op_uge) {
> +            if (result == GREATER) {
> +               /* We should insert a true */
> +            }
> +
> +            if (result == LESS) {
> +               /* We should insert a false */
> +            }
> +         }
> +//         progress = true;
> +//         continue;
> +      }
> +
> +
> +      bool set_false = false;
> +      bool set_true = false;
> +      /* Detect that "overdefined && false" == false, as we would have bailed
> +       * on this earlier in the pass since we have an overdefined source
> +       */
> +      if (alu->op == nir_op_fand || alu->op == nir_op_iand) {
> +         for (int i = 0; i < 2; i++) {
> +            lattice_entry *e = get_lattice_entry(alu->src[i].src.ssa, 
> &state);
> +            if (is_entry_false(e)) {
> +               set_false = true;
> +            }
> +         }
> +      }
> +
> +      /* Detect that "overdefined || true" == true, as we would have bailed
> +       * on this earlier in the pass since we have an overdefined source
> +       */
> +      if (alu->op == nir_op_for || alu->op == nir_op_ior) {
> +         for (int i = 0; i < 2; i++) {
> +            lattice_entry *e = get_lattice_entry(alu->src[i].src.ssa, 
> &state);
> +            if (is_entry_true(e)) {
> +               set_true = true;
> +            }
> +         }
> +      }
> +
> +      if (set_true || set_false) {
> +         nir_const_value dest;
> +
> +         if (set_false) {
> +            /* We can replace entry with "false". 0.0f == 0x00000000 which
> +             * should be tha same as signed integer zero, so this is safe
> +             */
> +            dest.f[0] = dest.f[1] = dest.f[2] = dest.f[3] = 0.0f;
> +         }
> +
> +         if (set_true) {
> +            /* We can replace entry with 1.0f which is also "true" for int */
> +            dest.f[0] = dest.f[1] = dest.f[2] = dest.f[3] = 1.0f;
> +         }
> +/*
> +         nir_load_const_instr *new_instr =
> +            nir_load_const_instr_create(impl->overload->function->shader,
> +                                        alu->dest.dest.ssa.num_components);
> +
> +         new_instr->value = dest;
> +
> +         nir_instr_insert_before(&alu->instr, &new_instr->instr);
> +
> +         nir_ssa_def_rewrite_uses(&alu->dest.dest.ssa, 
> nir_src_for_ssa(&new_instr->def),
> +                                  impl);
> +
> +         nir_instr_remove(&alu->instr);
> +         ralloc_free(alu);
> +*/
> +      }
> +
> +   }
> +
> +   ralloc_free(mem_ctx);
> +
> +   return progress;
> +}
> +
> +bool
> +nir_opt_value_range(nir_shader *shader)
> +{
> +   bool progress = false;
> +
> +   nir_foreach_overload(shader, overload) {
> +      if (overload->impl && nir_opt_value_range_impl(overload->impl))
> +         progress = true;
> +   }
> +
> +   return progress;
> +}
> --
> 2.4.4
>
_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [RFC][PATCH] nir: Add a value range propagation pass

Reply via email to