ssbo: Add lower_buffer_access class

Iago Toral Mon, 16 Nov 2015 23:02:00 -0800

On Mon, 2015-11-16 at 17:52 -0800, Jordan Justen wrote:
> On 2015-11-16 04:27:55, Iago Toral wrote:
> > On Sat, 2015-11-14 at 13:43 -0800, Jordan Justen wrote:
> > > This class has code that will be shared by lower_ubo_reference and
> > > lower_shared_reference. (lower_shared_reference will be used to
> > > support compute shader shared variables.)
> > > 
> > > Signed-off-by: Jordan Justen <jordan.l.jus...@intel.com>
> > > Cc: Samuel Iglesias Gonsalvez <sigles...@igalia.com>
> > > Cc: Iago Toral Quiroga <ito...@igalia.com>
> > > ---
> > >  src/glsl/Makefile.sources        |   1 +
> > >  src/glsl/lower_buffer_access.cpp | 307 
> > > +++++++++++++++++++++++++++++++++++++++
> > >  src/glsl/lower_buffer_access.h   |  56 +++++++
> > >  src/glsl/lower_ubo_reference.cpp | 180 +----------------------
> > >  4 files changed, 367 insertions(+), 177 deletions(-)
> > >  create mode 100644 src/glsl/lower_buffer_access.cpp
> > >  create mode 100644 src/glsl/lower_buffer_access.h
> > > 
> > > diff --git a/src/glsl/Makefile.sources b/src/glsl/Makefile.sources
> > > index d4b02c1..f2c95c0 100644
> > > --- a/src/glsl/Makefile.sources
> > > +++ b/src/glsl/Makefile.sources
> > > @@ -155,6 +155,7 @@ LIBGLSL_FILES = \
> > >       loop_analysis.h \
> > >       loop_controls.cpp \
> > >       loop_unroll.cpp \
> > > +     lower_buffer_access.cpp \
> > >       lower_clip_distance.cpp \
> > >       lower_const_arrays_to_uniforms.cpp \
> > >       lower_discard.cpp \
> > > diff --git a/src/glsl/lower_buffer_access.cpp 
> > > b/src/glsl/lower_buffer_access.cpp
> > > new file mode 100644
> > > index 0000000..e0b5a2f
> > > --- /dev/null
> > > +++ b/src/glsl/lower_buffer_access.cpp
> > > @@ -0,0 +1,307 @@
> > > +/*
> > > + * Copyright (c) 2015 Intel Corporation
> > > + *
> > > + * Permission is hereby granted, free of charge, to any person obtaining 
> > > a
> > > + * copy of this software and associated documentation files (the 
> > > "Software"),
> > > + * to deal in the Software without restriction, including without 
> > > limitation
> > > + * the rights to use, copy, modify, merge, publish, distribute, 
> > > sublicense,
> > > + * and/or sell copies of the Software, and to permit persons to whom the
> > > + * Software is furnished to do so, subject to the following conditions:
> > > + *
> > > + * The above copyright notice and this permission notice (including the 
> > > next
> > > + * paragraph) shall be included in all copies or substantial portions of 
> > > the
> > > + * Software.
> > > + *
> > > + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, 
> > > EXPRESS OR
> > > + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF 
> > > MERCHANTABILITY,
> > > + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT 
> > > SHALL
> > > + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR 
> > > OTHER
> > > + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, 
> > > ARISING
> > > + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
> > > + * DEALINGS IN THE SOFTWARE.
> > > + */
> > > +
> > > +/**
> > > + * \file lower_buffer_access.cpp
> > > + *
> > > + * Helper for IR lowering pass to replace dereferences of buffer object 
> > > based
> > > + * shader variables with intrinsic function calls.
> > > + *
> > > + * This helper is used by lowering passes for UBOs, SSBOs and compute 
> > > shader
> > > + * shared variables.
> > > + */
> > > +
> > > +#include "ir.h"
> > > +#include "ir_builder.h"
> > > +#include "ir_rvalue_visitor.h"
> > > +#include "main/macros.h"
> > > +#include "util/list.h"
> > > +#include "glsl_parser_extras.h"
> > > +#include "lower_buffer_access.h"
> > > +
> > > +using namespace ir_builder;
> > > +
> > > +namespace lower_buffer_access {
> > > +
> > > +static inline int
> > > +writemask_for_size(unsigned n)
> > > +{
> > > +   return ((1 << n) - 1);
> > > +}
> > > +
> > > +/**
> > > + * Takes LHS and emits a series of assignments into its components
> > > + * from the shared variable storage.
> > 
> > I find this part of the comment a bit confusing. This function breaks a
> > dereference access into one or multiple accesses to the underlying
> > buffer storage. Such dereference could be in a RHS expression, and in
> > fact, that will always be the case for UBO and SSBO loads.
> 
> Hmm. I may have copied this comment from lower_ubo_reference some time
> back. Anyway, I intended to use the current comment from
> lower_ubo_reference:
> 
> /**
>  * Takes a deref and recursively calls itself to break the deref down to the
>  * point that the reads or writes generated are contiguous scalars or vectors.
>  */


Yeah, that looks better.

> > > + * Recursively calls itself to break the deref down to the point that
> > > + * the intrinsic calls are generated.
> > > + */
> > > +void
> > > +lower_buffer_access::emit_access(bool is_write,
> > > +                                 ir_dereference *deref,
> > > +                                 ir_variable *base_offset,
> > > +                                 unsigned int deref_offset,
> > > +                                 bool row_major,
> > > +                                 int matrix_columns,
> > > +                                 unsigned int packing,
> > > +                                 unsigned int write_mask)
> > > +{
> > 
> > Why not pass mem_ctx as parameter instead of having it be a class
> > member? I find it a bit odd that this class defines mem_ctx but never
> > really takes care of initializing it, expecting that subclasses do that
> > for it, so in that case why not just make them actually take care of
> > passing the mem_ctx to use instead?
> > 
> > If you rather keep mem_ctx defined here I'd at least suggest to add an
> > assert to the functions that use it to check that it has indeed been
> > initialized by the subclass.
> 
> I think your comment applies to the current code in
> lower_ubo_reference as well. It resets mem_ctx at various points. I
> will try to get rid of mem_ctx as a member variable in all the related
> classes and add it as a parameter instead.

Yes, the handling of the memory context in lower_ubo_reference is a bit
messy, if you can fix that too it'd be great.

> Thanks,
> 
> -Jordan
> 
> > > +   if (deref->type->is_record()) {
> > > +      unsigned int field_offset = 0;
> > > +
> > > +      for (unsigned i = 0; i < deref->type->length; i++) {
> > > +         const struct glsl_struct_field *field =
> > > +            &deref->type->fields.structure[i];
> > > +         ir_dereference *field_deref =
> > > +            new(mem_ctx) ir_dereference_record(deref->clone(mem_ctx, 
> > > NULL),
> > > +                                               field->name);
> > > +
> > > +         field_offset =
> > > +            glsl_align(field_offset,
> > > +                       field->type->std140_base_alignment(row_major));
> > > +
> > > +         emit_access(is_write, field_deref, base_offset,
> > > +                     deref_offset + field_offset,
> > > +                     row_major, 1, packing,
> > > +                     
> > > writemask_for_size(field_deref->type->vector_elements));
> > > +
> > > +         field_offset += field->type->std140_size(row_major);
> > > +      }
> > > +      return;
> > > +   }
> > > +
> > > +   if (deref->type->is_array()) {
> > > +      unsigned array_stride = packing == GLSL_INTERFACE_PACKING_STD430 ?
> > > +         deref->type->fields.array->std430_array_stride(row_major) :
> > > +         glsl_align(deref->type->fields.array->std140_size(row_major), 
> > > 16);
> > > +
> > > +      for (unsigned i = 0; i < deref->type->length; i++) {
> > > +         ir_constant *element = new(mem_ctx) ir_constant(i);
> > > +         ir_dereference *element_deref =
> > > +            new(mem_ctx) ir_dereference_array(deref->clone(mem_ctx, 
> > > NULL),
> > > +                                              element);
> > > +         emit_access(is_write, element_deref, base_offset,
> > > +                     deref_offset + i * array_stride,
> > > +                     row_major, 1, packing,
> > > +                     
> > > writemask_for_size(element_deref->type->vector_elements));
> > > +      }
> > > +      return;
> > > +   }
> > > +
> > > +   if (deref->type->is_matrix()) {
> > > +      for (unsigned i = 0; i < deref->type->matrix_columns; i++) {
> > > +         ir_constant *col = new(mem_ctx) ir_constant(i);
> > > +         ir_dereference *col_deref =
> > > +            new(mem_ctx) ir_dereference_array(deref->clone(mem_ctx, 
> > > NULL), col);
> > > +
> > > +         if (row_major) {
> > > +            /* For a row-major matrix, the next column starts at the next
> > > +             * element.
> > > +             */
> > > +            int size_mul = deref->type->is_double() ? 8 : 4;
> > > +            emit_access(is_write, col_deref, base_offset,
> > > +                        deref_offset + i * size_mul,
> > > +                        row_major, deref->type->matrix_columns, packing,
> > > +                        
> > > writemask_for_size(col_deref->type->vector_elements));
> > > +         } else {
> > > +            int size_mul;
> > > +
> > > +            /* std430 doesn't round up vec2 size to a vec4 size */
> > > +            if (packing == GLSL_INTERFACE_PACKING_STD430 &&
> > > +                deref->type->vector_elements == 2 &&
> > > +                !deref->type->is_double()) {
> > > +               size_mul = 8;
> > > +            } else {
> > > +               /* std140 always rounds the stride of arrays (and 
> > > matrices) to a
> > > +                * vec4, so matrices are always 16 between columns/rows. 
> > > With
> > > +                * doubles, they will be 32 apart when there are more 
> > > than 2 rows.
> > > +                *
> > > +                * For both std140 and std430, if the member is a
> > > +                * three-'component vector with components consuming N 
> > > basic
> > > +                * machine units, the base alignment is 4N. For vec4, base
> > > +                * alignment is 4N.
> > > +                */
> > > +               size_mul = (deref->type->is_double() &&
> > > +                           deref->type->vector_elements > 2) ? 32 : 16;
> > > +            }
> > > +
> > > +            emit_access(is_write, col_deref, base_offset,
> > > +                        deref_offset + i * size_mul,
> > > +                        row_major, deref->type->matrix_columns, packing,
> > > +                        
> > > writemask_for_size(col_deref->type->vector_elements));
> > > +         }
> > > +      }
> > > +      return;
> > > +   }
> > > +
> > > +   assert(deref->type->is_scalar() || deref->type->is_vector());
> > > +
> > > +   if (!row_major) {
> > > +      ir_rvalue *offset =
> > > +         add(base_offset, new(mem_ctx) ir_constant(deref_offset));
> > > +      unsigned mask =
> > > +         is_write ? write_mask : (1 << deref->type->vector_elements) - 1;
> > > +      insert_buffer_access(deref, deref->type, offset, mask, -1);
> > > +   } else {
> > > +      unsigned N = deref->type->is_double() ? 8 : 4;
> > > +
> > > +      /* We're dereffing a column out of a row-major matrix, so we
> > > +       * gather the vector from each stored row.
> > > +      */
> > > +      assert(deref->type->base_type == GLSL_TYPE_FLOAT ||
> > > +             deref->type->base_type == GLSL_TYPE_DOUBLE);
> > > +      /* Matrices, row_major or not, are stored as if they were
> > > +       * arrays of vectors of the appropriate size in std140.
> > > +       * Arrays have their strides rounded up to a vec4, so the
> > > +       * matrix stride is always 16. However a double matrix may either 
> > > be 16
> > > +       * or 32 depending on the number of columns.
> > > +       */
> > > +      assert(matrix_columns <= 4);
> > > +      unsigned matrix_stride = 0;
> > > +      /* Matrix stride for std430 mat2xY matrices are not rounded up to
> > > +       * vec4 size. From OpenGL 4.3 spec, section 7.6.2.2 "Standard 
> > > Uniform
> > > +       * Block Layout":
> > > +       *
> > > +       * "2. If the member is a two- or four-component vector with 
> > > components
> > > +       * consuming N basic machine units, the base alignment is 2N or 4N,
> > > +       * respectively." [...]
> > > +       * "4. If the member is an array of scalars or vectors, the base 
> > > alignment
> > > +       * and array stride are set to match the base alignment of a 
> > > single array
> > > +       * element, according to rules (1), (2), and (3), and rounded up 
> > > to the
> > > +       * base alignment of a vec4." [...]
> > > +       * "7. If the member is a row-major matrix with C columns and R 
> > > rows, the
> > > +       * matrix is stored identically to an array of R row vectors with C
> > > +       * components each, according to rule (4)." [...]
> > > +       * "When using the std430 storage layout, shader storage blocks 
> > > will be
> > > +       * laid out in buffer storage identically to uniform and shader 
> > > storage
> > > +       * blocks using the std140 layout, except that the base alignment 
> > > and
> > > +       * stride of arrays of scalars and vectors in rule 4 and of 
> > > structures in
> > > +       * rule 9 are not rounded up a multiple of the base alignment of a 
> > > vec4."
> > > +       */
> > > +      if (packing == GLSL_INTERFACE_PACKING_STD430 && matrix_columns == 
> > > 2)
> > > +         matrix_stride = 2 * N;
> > > +      else
> > > +         matrix_stride = glsl_align(matrix_columns * N, 16);
> > > +
> > > +      const glsl_type *deref_type = deref->type->base_type == 
> > > GLSL_TYPE_FLOAT ?
> > > +         glsl_type::float_type : glsl_type::double_type;
> > > +
> > > +      for (unsigned i = 0; i < deref->type->vector_elements; i++) {
> > > +         ir_rvalue *chan_offset =
> > > +            add(base_offset,
> > > +                new(mem_ctx) ir_constant(deref_offset + i * 
> > > matrix_stride));
> > > +         if (!is_write || ((1U << i) & write_mask))
> > > +            insert_buffer_access(deref, deref_type, chan_offset, (1U << 
> > > i), i);
> > > +      }
> > > +   }
> > > +}
> > > +
> > > +/**
> > > + * Determine if a thing being dereferenced is row-major
> > > + *
> > > + * There is some trickery here.
> > > + *
> > > + * If the thing being dereferenced is a member of uniform block \b 
> > > without an
> > > + * instance name, then the name of the \c ir_variable is the field name 
> > > of an
> > > + * interface type.  If this field is row-major, then the thing 
> > > referenced is
> > > + * row-major.
> > > + *
> > > + * If the thing being dereferenced is a member of uniform block \b with 
> > > an
> > > + * instance name, then the last dereference in the tree will be an
> > > + * \c ir_dereference_record.  If that record field is row-major, then the
> > > + * thing referenced is row-major.
> > > + */
> > > +static bool
> > > +is_dereferenced_thing_row_major(const ir_dereference *deref)
> > > +{
> > > +   bool matrix = false;
> > > +   const ir_rvalue *ir = deref;
> > > +
> > > +   while (true) {
> > > +      matrix = matrix || ir->type->without_array()->is_matrix();
> > > +
> > > +      switch (ir->ir_type) {
> > > +      case ir_type_dereference_array: {
> > > +         const ir_dereference_array *const array_deref =
> > > +            (const ir_dereference_array *) ir;
> > > +
> > > +         ir = array_deref->array;
> > > +         break;
> > > +      }
> > > +
> > > +      case ir_type_dereference_record: {
> > > +         const ir_dereference_record *const record_deref =
> > > +            (const ir_dereference_record *) ir;
> > > +
> > > +         ir = record_deref->record;
> > > +
> > > +         const int idx = ir->type->field_index(record_deref->field);
> > > +         assert(idx >= 0);
> > > +
> > > +         const enum glsl_matrix_layout matrix_layout =
> > > +            
> > > glsl_matrix_layout(ir->type->fields.structure[idx].matrix_layout);
> > > +
> > > +         switch (matrix_layout) {
> > > +         case GLSL_MATRIX_LAYOUT_INHERITED:
> > > +            break;
> > > +         case GLSL_MATRIX_LAYOUT_COLUMN_MAJOR:
> > > +            return false;
> > > +         case GLSL_MATRIX_LAYOUT_ROW_MAJOR:
> > > +            return matrix || deref->type->without_array()->is_record();
> > > +         }
> > > +
> > > +         break;
> > > +      }
> > > +
> > > +      case ir_type_dereference_variable: {
> > > +         const ir_dereference_variable *const var_deref =
> > > +            (const ir_dereference_variable *) ir;
> > > +
> > > +         const enum glsl_matrix_layout matrix_layout =
> > > +            glsl_matrix_layout(var_deref->var->data.matrix_layout);
> > > +
> > > +         switch (matrix_layout) {
> > > +         case GLSL_MATRIX_LAYOUT_INHERITED:
> > > +         case GLSL_MATRIX_LAYOUT_COLUMN_MAJOR:
> > > +            return false;
> > > +         case GLSL_MATRIX_LAYOUT_ROW_MAJOR:
> > > +            return matrix || deref->type->without_array()->is_record();
> > > +         }
> > > +
> > > +         unreachable("invalid matrix layout");
> > > +         break;
> > > +      }
> > > +
> > > +      default:
> > > +         return false;
> > > +      }
> > > +   }
> > > +
> > > +   /* The tree must have ended with a dereference that wasn't an
> > > +    * ir_dereference_variable.  That is invalid, and it should be 
> > > impossible.
> > > +    */
> > > +   unreachable("invalid dereference tree");
> > > +   return false;
> > > +}
> > > +
> > > +} /* namespace lower_buffer_access */
> > > diff --git a/src/glsl/lower_buffer_access.h 
> > > b/src/glsl/lower_buffer_access.h
> > > new file mode 100644
> > > index 0000000..3138963
> > > --- /dev/null
> > > +++ b/src/glsl/lower_buffer_access.h
> > > @@ -0,0 +1,56 @@
> > > +/*
> > > + * Copyright (c) 2015 Intel Corporation
> > > + *
> > > + * Permission is hereby granted, free of charge, to any person obtaining 
> > > a
> > > + * copy of this software and associated documentation files (the 
> > > "Software"),
> > > + * to deal in the Software without restriction, including without 
> > > limitation
> > > + * the rights to use, copy, modify, merge, publish, distribute, 
> > > sublicense,
> > > + * and/or sell copies of the Software, and to permit persons to whom the
> > > + * Software is furnished to do so, subject to the following conditions:
> > > + *
> > > + * The above copyright notice and this permission notice (including the 
> > > next
> > > + * paragraph) shall be included in all copies or substantial portions of 
> > > the
> > > + * Software.
> > > + *
> > > + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, 
> > > EXPRESS OR
> > > + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF 
> > > MERCHANTABILITY,
> > > + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT 
> > > SHALL
> > > + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR 
> > > OTHER
> > > + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, 
> > > ARISING
> > > + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
> > > + * DEALINGS IN THE SOFTWARE.
> > > + */
> > > +
> > > +/**
> > > + * \file lower_buffer_access.h
> > > + *
> > > + * Helper for IR lowering pass to replace dereferences of buffer object 
> > > based
> > > + * shader variables with intrinsic function calls.
> > > + *
> > > + * This helper is used by lowering passes for UBOs, SSBOs and compute 
> > > shader
> > > + * shared variables.
> > > + */
> > > +
> > > +#pragma once
> > > +#ifndef LOWER_BUFFER_ACCESS_H
> > > +#define LOWER_BUFFER_ACCESS_H
> > > +
> > > +namespace lower_buffer_access {
> > > +
> > > +class lower_buffer_access : public ir_rvalue_enter_visitor {
> > > +public:
> > > +   virtual void
> > > +   insert_buffer_access(ir_dereference *deref, const glsl_type *type,
> > > +                        ir_rvalue *offset, unsigned mask, int channel) = 
> > > 0;
> > > +
> > > +   void emit_access(bool is_write, ir_dereference *deref,
> > > +                    ir_variable *base_offset, unsigned int deref_offset,
> > > +                    bool row_major, int matrix_columns,
> > > +                    unsigned int packing, unsigned int write_mask);
> > > +
> > > +   void *mem_ctx;
> > > +};
> > > +
> > > +} /* namespace lower_buffer_access */
> > > +
> > > +#endif /* LOWER_BUFFER_ACCESS_H */
> > > diff --git a/src/glsl/lower_ubo_reference.cpp 
> > > b/src/glsl/lower_ubo_reference.cpp
> > > index b8fcc8e..8de4f5e 100644
> > > --- a/src/glsl/lower_ubo_reference.cpp
> > > +++ b/src/glsl/lower_ubo_reference.cpp
> > > @@ -38,6 +38,7 @@
> > >  #include "ir_rvalue_visitor.h"
> > >  #include "main/macros.h"
> > >  #include "glsl_parser_extras.h"
> > > +#include "lower_buffer_access.h"
> > >  
> > >  using namespace ir_builder;
> > >  
> > > @@ -132,7 +133,8 @@ is_dereferenced_thing_row_major(const ir_rvalue 
> > > *deref)
> > >  }
> > >  
> > >  namespace {
> > > -class lower_ubo_reference_visitor : public ir_rvalue_enter_visitor {
> > > +class lower_ubo_reference_visitor :
> > > +      public lower_buffer_access::lower_buffer_access {
> > >  public:
> > >     lower_ubo_reference_visitor(struct gl_shader *shader)
> > >     : shader(shader)
> > > @@ -173,11 +175,6 @@ public:
> > >     void insert_buffer_access(ir_dereference *deref, const glsl_type 
> > > *type,
> > >                               ir_rvalue *offset, unsigned mask, int 
> > > channel);
> > >  
> > > -   void emit_access(bool is_write, ir_dereference *deref,
> > > -                    ir_variable *base_offset, unsigned int deref_offset,
> > > -                    bool row_major, int matrix_columns,
> > > -                    unsigned packing, unsigned write_mask);
> > > -
> > >     ir_visitor_status visit_enter(class ir_expression *);
> > >     ir_expression *calculate_ssbo_unsized_array_length(ir_expression 
> > > *expr);
> > >     void check_ssbo_unsized_array_length_expression(class ir_expression 
> > > *);
> > > @@ -195,7 +192,6 @@ public:
> > >     ir_call *check_for_ssbo_atomic_intrinsic(ir_call *ir);
> > >     ir_visitor_status visit_enter(ir_call *ir);
> > >  
> > > -   void *mem_ctx;
> > >     struct gl_shader *shader;
> > >     struct gl_uniform_buffer_variable *ubo_var;
> > >     ir_rvalue *uniform_block;
> > > @@ -727,176 +723,6 @@ 
> > > lower_ubo_reference_visitor::insert_buffer_access(ir_dereference *deref,
> > >     }
> > >  }
> > >  
> > > -static inline int
> > > -writemask_for_size(unsigned n)
> > > -{
> > > -   return ((1 << n) - 1);
> > > -}
> > > -
> > > -/**
> > > - * Takes a deref and recursively calls itself to break the deref down to 
> > > the
> > > - * point that the reads or writes generated are contiguous scalars or 
> > > vectors.
> > > - */
> > > -void
> > > -lower_ubo_reference_visitor::emit_access(bool is_write,
> > > -                                         ir_dereference *deref,
> > > -                                         ir_variable *base_offset,
> > > -                                         unsigned int deref_offset,
> > > -                                         bool row_major,
> > > -                                         int matrix_columns,
> > > -                                         unsigned packing,
> > > -                                         unsigned write_mask)
> > > -{
> > > -   if (deref->type->is_record()) {
> > > -      unsigned int field_offset = 0;
> > > -
> > > -      for (unsigned i = 0; i < deref->type->length; i++) {
> > > -         const struct glsl_struct_field *field =
> > > -            &deref->type->fields.structure[i];
> > > -         ir_dereference *field_deref =
> > > -            new(mem_ctx) ir_dereference_record(deref->clone(mem_ctx, 
> > > NULL),
> > > -                                               field->name);
> > > -
> > > -         field_offset =
> > > -            glsl_align(field_offset,
> > > -                       field->type->std140_base_alignment(row_major));
> > > -
> > > -         emit_access(is_write, field_deref, base_offset,
> > > -                     deref_offset + field_offset,
> > > -                     row_major, 1, packing,
> > > -                     
> > > writemask_for_size(field_deref->type->vector_elements));
> > > -
> > > -         field_offset += field->type->std140_size(row_major);
> > > -      }
> > > -      return;
> > > -   }
> > > -
> > > -   if (deref->type->is_array()) {
> > > -      unsigned array_stride = packing == GLSL_INTERFACE_PACKING_STD430 ?
> > > -         deref->type->fields.array->std430_array_stride(row_major) :
> > > -         glsl_align(deref->type->fields.array->std140_size(row_major), 
> > > 16);
> > > -
> > > -      for (unsigned i = 0; i < deref->type->length; i++) {
> > > -         ir_constant *element = new(mem_ctx) ir_constant(i);
> > > -         ir_dereference *element_deref =
> > > -            new(mem_ctx) ir_dereference_array(deref->clone(mem_ctx, 
> > > NULL),
> > > -                                              element);
> > > -         emit_access(is_write, element_deref, base_offset,
> > > -                     deref_offset + i * array_stride,
> > > -                     row_major, 1, packing,
> > > -                     
> > > writemask_for_size(element_deref->type->vector_elements));
> > > -      }
> > > -      return;
> > > -   }
> > > -
> > > -   if (deref->type->is_matrix()) {
> > > -      for (unsigned i = 0; i < deref->type->matrix_columns; i++) {
> > > -         ir_constant *col = new(mem_ctx) ir_constant(i);
> > > -         ir_dereference *col_deref =
> > > -            new(mem_ctx) ir_dereference_array(deref->clone(mem_ctx, 
> > > NULL), col);
> > > -
> > > -         if (row_major) {
> > > -            /* For a row-major matrix, the next column starts at the next
> > > -             * element.
> > > -             */
> > > -            int size_mul = deref->type->is_double() ? 8 : 4;
> > > -            emit_access(is_write, col_deref, base_offset,
> > > -                        deref_offset + i * size_mul,
> > > -                        row_major, deref->type->matrix_columns, packing,
> > > -                        
> > > writemask_for_size(col_deref->type->vector_elements));
> > > -         } else {
> > > -            int size_mul;
> > > -
> > > -            /* std430 doesn't round up vec2 size to a vec4 size */
> > > -            if (packing == GLSL_INTERFACE_PACKING_STD430 &&
> > > -                deref->type->vector_elements == 2 &&
> > > -                !deref->type->is_double()) {
> > > -               size_mul = 8;
> > > -            } else {
> > > -               /* std140 always rounds the stride of arrays (and 
> > > matrices) to a
> > > -                * vec4, so matrices are always 16 between columns/rows. 
> > > With
> > > -                * doubles, they will be 32 apart when there are more 
> > > than 2 rows.
> > > -                *
> > > -                * For both std140 and std430, if the member is a
> > > -                * three-'component vector with components consuming N 
> > > basic
> > > -                * machine units, the base alignment is 4N. For vec4, base
> > > -                * alignment is 4N.
> > > -                */
> > > -               size_mul = (deref->type->is_double() &&
> > > -                           deref->type->vector_elements > 2) ? 32 : 16;
> > > -            }
> > > -
> > > -            emit_access(is_write, col_deref, base_offset,
> > > -                        deref_offset + i * size_mul,
> > > -                        row_major, deref->type->matrix_columns, packing,
> > > -                        
> > > writemask_for_size(col_deref->type->vector_elements));
> > > -         }
> > > -      }
> > > -      return;
> > > -   }
> > > -
> > > -   assert(deref->type->is_scalar() || deref->type->is_vector());
> > > -
> > > -   if (!row_major) {
> > > -      ir_rvalue *offset =
> > > -         add(base_offset, new(mem_ctx) ir_constant(deref_offset));
> > > -      unsigned mask =
> > > -         is_write ? write_mask : (1 << deref->type->vector_elements) - 1;
> > > -      insert_buffer_access(deref, deref->type, offset, mask, -1);
> > > -   } else {
> > > -      unsigned N = deref->type->is_double() ? 8 : 4;
> > > -
> > > -      /* We're dereffing a column out of a row-major matrix, so we
> > > -       * gather the vector from each stored row.
> > > -      */
> > > -      assert(deref->type->base_type == GLSL_TYPE_FLOAT ||
> > > -             deref->type->base_type == GLSL_TYPE_DOUBLE);
> > > -      /* Matrices, row_major or not, are stored as if they were
> > > -       * arrays of vectors of the appropriate size in std140.
> > > -       * Arrays have their strides rounded up to a vec4, so the
> > > -       * matrix stride is always 16. However a double matrix may either 
> > > be 16
> > > -       * or 32 depending on the number of columns.
> > > -       */
> > > -      assert(matrix_columns <= 4);
> > > -      unsigned matrix_stride = 0;
> > > -      /* Matrix stride for std430 mat2xY matrices are not rounded up to
> > > -       * vec4 size. From OpenGL 4.3 spec, section 7.6.2.2 "Standard 
> > > Uniform
> > > -       * Block Layout":
> > > -       *
> > > -       * "2. If the member is a two- or four-component vector with 
> > > components
> > > -       * consuming N basic machine units, the base alignment is 2N or 4N,
> > > -       * respectively." [...]
> > > -       * "4. If the member is an array of scalars or vectors, the base 
> > > alignment
> > > -       * and array stride are set to match the base alignment of a 
> > > single array
> > > -       * element, according to rules (1), (2), and (3), and rounded up 
> > > to the
> > > -       * base alignment of a vec4." [...]
> > > -       * "7. If the member is a row-major matrix with C columns and R 
> > > rows, the
> > > -       * matrix is stored identically to an array of R row vectors with C
> > > -       * components each, according to rule (4)." [...]
> > > -       * "When using the std430 storage layout, shader storage blocks 
> > > will be
> > > -       * laid out in buffer storage identically to uniform and shader 
> > > storage
> > > -       * blocks using the std140 layout, except that the base alignment 
> > > and
> > > -       * stride of arrays of scalars and vectors in rule 4 and of 
> > > structures in
> > > -       * rule 9 are not rounded up a multiple of the base alignment of a 
> > > vec4."
> > > -       */
> > > -      if (packing == GLSL_INTERFACE_PACKING_STD430 && matrix_columns == 
> > > 2)
> > > -         matrix_stride = 2 * N;
> > > -      else
> > > -         matrix_stride = glsl_align(matrix_columns * N, 16);
> > > -
> > > -      const glsl_type *deref_type = deref->type->base_type == 
> > > GLSL_TYPE_FLOAT ?
> > > -         glsl_type::float_type : glsl_type::double_type;
> > > -
> > > -      for (unsigned i = 0; i < deref->type->vector_elements; i++) {
> > > -         ir_rvalue *chan_offset =
> > > -            add(base_offset,
> > > -                new(mem_ctx) ir_constant(deref_offset + i * 
> > > matrix_stride));
> > > -         if (!is_write || ((1U << i) & write_mask))
> > > -            insert_buffer_access(deref, deref_type, chan_offset, (1U << 
> > > i), i);
> > > -      }
> > > -   }
> > > -}
> > > -
> > >  void
> > >  lower_ubo_reference_visitor::write_to_memory(ir_dereference *deref,
> > >                                               ir_variable *var,
> > 
> > 
> 


_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 16/36] glsl ubo/ssbo: Add lower_buffer_access class

Reply via email to