date:20151114

Re: [Mesa-dev] [PATCH v2] i965: Handle lum, intensity and missing components in the fast clear

2015-11-14 Thread Neil Roberts

Ben Widawsky  writes:

>> +   case GL_LUMINANCE:
>> +   case GL_LUMINANCE_ALPHA:
>> +  override_color.ui[1] = override_color.ui[0];
>> +  override_color.ui[2] = override_color.ui[0];
>> +  break;
>
> The definition for GL_LUMINANCE afaict: "Each element is a single
> luminance value. The GL converts it to floating point, then assembles
> it into an RGBA element by replicating the luminance value three times
> for red, green, and blue and attaching 1 for alpha."
>
> doesn't that mean you need
> override_color.f[3] = 1.0f;

That is handled separately by the bit at the bottom which checks for
_mesa_format_has_color_component(format, 3). It's the same bit of code
that overrides the alpha channel for GL_RGB.

>> +   default:
>> +  for (int i = 0; i < 3; i++) {
>> + if (!_mesa_format_has_color_component(format, i))
>> +override_color.ui[i] = 0;
>> +  }
>
> Is there an easy way to verify that all formats want 0 for GB
> channels? It looks right to me, but with my knowledge of GL, that
> doesn't mean much (I am looking here:
> https://www.opengl.org/sdk/docs/man/html/glTexImage2D.xhtml)

In the GL 4.5 spec, section 15.2.1 it says “When a texture lookup is
performed in a fragment shader, the GL computes the filtered texture
value τ in the manner described in sections 8.14 and 8.15, and converts
it to a texture base color C b as shown in table 15.1”. Table 15.1 looks
something like this:

Texture Base   Texture base color
Internal Format  Cb Ab
RED(Rt, 0, 0)   1
RG (Rt, Gt, 0)  1
RGB(Rt, Gt, Bt) 1
RGBA   (Rt, Gt, Bt) At

In the compatibility spec there is also the luminance, intensity and
alpha formats and they all have 0 for the missing RGB components. I also
tried running the Piglit test I wrote on the nvidia binary blob so if we
can assume that that tests all renderable formats then we can be
confident we at least match what nvidia do.

> I also think that component 0 must always have a color, right? (I'm
> not requesting a change as such, just making sure my understanding of
> what you're trying to do is correct).

As Ilia mentioned, GL_ALPHA doesn't have a red component.

>> +  break;
>> +   }
>> +
>> +   if (!_mesa_format_has_color_component(format, 3)) {
>> +  if (_mesa_is_format_integer_color(format))
>> + override_color.ui[3] = 1;
>
> We shouldn't ever be fast clearing integer formats. We can on GEN8+,
> but we're not doing it today. So I think it should be safe to remove
> this check.

You're right, but I thought I'd add it anyway because I didn't think
this is a particularly hot code path and it might make life easier for
whoever eventually adds support for integer fast clears. I'm happy to
take it out thought if you think that'd be better.

> Seems like a good patch to me. It would probably be nice to track down
> a good spec reference if you manage to find one. I know I've seen such
> reference in SKL docs (which aren't SKL specific) - but I am having
> trouble finding it in PRMs. My VPN is broken, so I can't look at SKL
> docs right now.

I haven't been able to find anything yet sadly.

> With the explanation of why the luminance alpha channel isn't 1 (I
> also claim incompotence on the GL_LUMINANCE_ALPHA format):
> Reviewed-by: Ben Widawsky 

Many thanks for the review.

Regards,
- Neil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH V2 1/8] glsl: move ast layout qualifier handling code into its own function

2015-11-14 Thread Timothy Arceri

From: Timothy Arceri 

We now also only apply these rules to variables rather than also
trying to apply them to function params.

V2: move code for handling stream layout qualifier
---
 src/glsl/ast_to_hir.cpp | 414 +---
 1 file changed, 212 insertions(+), 202 deletions(-)

diff --git a/src/glsl/ast_to_hir.cpp b/src/glsl/ast_to_hir.cpp
index 9d341e8..8bd78c4 100644
--- a/src/glsl/ast_to_hir.cpp
+++ b/src/glsl/ast_to_hir.cpp
@@ -2643,6 +2643,216 @@ validate_array_dimensions(const glsl_type *t,
 }
 
 static void
+apply_layout_qualifier_to_variable(const struct ast_type_qualifier *qual,
+   ir_variable *var,
+   struct _mesa_glsl_parse_state *state,
+   YYLTYPE *loc)
+{
+   if (var->name != NULL && strcmp(var->name, "gl_FragCoord") == 0) {
+
+  /* Section 4.3.8.1, page 39 of GLSL 1.50 spec says:
+   *
+   *"Within any shader, the first redeclarations of gl_FragCoord
+   * must appear before any use of gl_FragCoord."
+   *
+   * Generate a compiler error if above condition is not met by the
+   * fragment shader.
+   */
+  ir_variable *earlier = state->symbols->get_variable("gl_FragCoord");
+  if (earlier != NULL &&
+  earlier->data.used &&
+  !state->fs_redeclares_gl_fragcoord) {
+ _mesa_glsl_error(loc, state,
+  "gl_FragCoord used before its first redeclaration "
+  "in fragment shader");
+  }
+
+  /* Make sure all gl_FragCoord redeclarations specify the same layout
+   * qualifiers.
+   */
+  if (is_conflicting_fragcoord_redeclaration(state, qual)) {
+ const char *const qual_string =
+get_layout_qualifier_string(qual->flags.q.origin_upper_left,
+qual->flags.q.pixel_center_integer);
+
+ const char *const state_string =
+get_layout_qualifier_string(state->fs_origin_upper_left,
+state->fs_pixel_center_integer);
+
+ _mesa_glsl_error(loc, state,
+  "gl_FragCoord redeclared with different layout "
+  "qualifiers (%s) and (%s) ",
+  state_string,
+  qual_string);
+  }
+  state->fs_origin_upper_left = qual->flags.q.origin_upper_left;
+  state->fs_pixel_center_integer = qual->flags.q.pixel_center_integer;
+  state->fs_redeclares_gl_fragcoord_with_no_layout_qualifiers =
+ !qual->flags.q.origin_upper_left && 
!qual->flags.q.pixel_center_integer;
+  state->fs_redeclares_gl_fragcoord =
+ state->fs_origin_upper_left ||
+ state->fs_pixel_center_integer ||
+ state->fs_redeclares_gl_fragcoord_with_no_layout_qualifiers;
+   }
+
+   var->data.pixel_center_integer = qual->flags.q.pixel_center_integer;
+   var->data.origin_upper_left = qual->flags.q.origin_upper_left;
+   if ((qual->flags.q.origin_upper_left || qual->flags.q.pixel_center_integer)
+   && (strcmp(var->name, "gl_FragCoord") != 0)) {
+  const char *const qual_string = (qual->flags.q.origin_upper_left)
+ ? "origin_upper_left" : "pixel_center_integer";
+
+  _mesa_glsl_error(loc, state,
+  "layout qualifier `%s' can only be applied to "
+  "fragment shader input `gl_FragCoord'",
+  qual_string);
+   }
+
+   if (qual->flags.q.explicit_location) {
+  validate_explicit_location(qual, var, state, loc);
+   } else if (qual->flags.q.explicit_index) {
+  _mesa_glsl_error(loc, state, "explicit index requires explicit 
location");
+   }
+
+   if (qual->flags.q.explicit_binding &&
+   validate_binding_qualifier(state, loc, var->type, qual)) {
+  var->data.explicit_binding = true;
+  var->data.binding = qual->binding;
+   }
+
+   if (state->stage == MESA_SHADER_GEOMETRY &&
+   qual->flags.q.out && qual->flags.q.stream) {
+  var->data.stream = qual->stream;
+   }
+
+   if (var->type->contains_atomic()) {
+  if (var->data.mode == ir_var_uniform) {
+ if (var->data.explicit_binding) {
+unsigned *offset =
+   &state->atomic_counter_offsets[var->data.binding];
+
+if (*offset % ATOMIC_COUNTER_SIZE)
+   _mesa_glsl_error(loc, state,
+"misaligned atomic counter offset");
+
+var->data.atomic.offset = *offset;
+*offset += var->type->atomic_size();
+
+ } else {
+_mesa_glsl_error(loc, state,
+ "atomic counters require explicit binding point");
+ }
+  } else if (var->data.mode != ir_var_function_in) {
+ _mesa_glsl_error(loc, state, "atomic counters may only be declared as 
"
+  "function parameters or uniform-qualified "
+

[Mesa-dev] ARB_enhanced_layout compile-time-constants V3

2015-11-14 Thread Timothy Arceri

This series adds support for compile time constants and also adds
subroutine index qualifier support which was missing for 
ARB_explicit_uniform_location.

This series applies on top of a clean-up series[3]

V3: 
- Some refactoring and a bug fix based on Emil's feedback on V2.
- Series overhauled to reduce code churn while still keeping patches
as small as possible for easy review.
- Improvements to handling compute shader local_size layout qualifiers
- Rebased on top of clean-up series [3]

V2: Validation of minimum qualifier value moved to the helper functions, all
qualifiers will now always have the mimimum value checked. Split the patches
that move validation out of the parser and add the compile time constant
support into smaller patches where possible. 

Piglit tests have been reviewed and pushed to master, there is one outstanding
that tests querying of the subroutine index [1].

The extension is disabled by default until the remaining features are added.

MESA_EXTENSION_OVERRIDE=GL_ARB_enhanced_layouts can be used for testing.

You can get the series from my arb_enhanced_layouts6 branch [2]

[1] https://patchwork.freedesktop.org/patch/63795/
[2] https://github.com/tarceri/Mesa_arrays_of_arrays.git
[3] https://patchwork.freedesktop.org/series/728/

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v3 09/14] glsl: replace binding layout min boundary check

2015-11-14 Thread Timothy Arceri

From: Timothy Arceri 

Use new helper that will in a later patch allow for
compile time constants.
---
 src/glsl/ast_to_hir.cpp | 23 ---
 1 file changed, 12 insertions(+), 11 deletions(-)

diff --git a/src/glsl/ast_to_hir.cpp b/src/glsl/ast_to_hir.cpp
index 773b8ae..e7e2a85 100644
--- a/src/glsl/ast_to_hir.cpp
+++ b/src/glsl/ast_to_hir.cpp
@@ -2550,14 +2550,15 @@ apply_explicit_binding(struct _mesa_glsl_parse_state 
*state,
   return;
}
 
-   if (qual->binding < 0) {
-  _mesa_glsl_error(loc, state, "binding values must be >= 0");
+   unsigned qual_binding;
+   if (!process_qualifier_constant(state, loc, "binding", qual->binding,
+   &qual_binding)) {
   return;
}
 
const struct gl_context *const ctx = state->ctx;
unsigned elements = type->is_array() ? type->arrays_of_arrays_size() : 1;
-   unsigned max_index = qual->binding + elements - 1;
+   unsigned max_index = qual_binding + elements - 1;
const glsl_type *base_type = type->without_array();
 
if (base_type->is_interface()) {
@@ -2573,9 +2574,9 @@ apply_explicit_binding(struct _mesa_glsl_parse_state 
*state,
*/
   if (qual->flags.q.uniform &&
  max_index >= ctx->Const.MaxUniformBufferBindings) {
- _mesa_glsl_error(loc, state, "layout(binding = %d) for %d UBOs 
exceeds "
+ _mesa_glsl_error(loc, state, "layout(binding = %u) for %d UBOs 
exceeds "
   "the maximum number of UBO binding points (%d)",
-  qual->binding, elements,
+  qual_binding, elements,
   ctx->Const.MaxUniformBufferBindings);
  return;
   }
@@ -2591,9 +2592,9 @@ apply_explicit_binding(struct _mesa_glsl_parse_state 
*state,
*/
   if (qual->flags.q.buffer &&
  max_index >= ctx->Const.MaxShaderStorageBufferBindings) {
- _mesa_glsl_error(loc, state, "layout(binding = %d) for %d SSBOs 
exceeds "
+ _mesa_glsl_error(loc, state, "layout(binding = %u) for %d SSBOs 
exceeds "
   "the maximum number of SSBO binding points (%d)",
-  qual->binding, elements,
+  qual_binding, elements,
   ctx->Const.MaxShaderStorageBufferBindings);
  return;
   }
@@ -2610,16 +2611,16 @@ apply_explicit_binding(struct _mesa_glsl_parse_state 
*state,
   if (max_index >= limit) {
  _mesa_glsl_error(loc, state, "layout(binding = %d) for %d samplers "
   "exceeds the maximum number of texture image units "
-  "(%d)", qual->binding, elements, limit);
+  "(%u)", qual_binding, elements, limit);
 
  return;
   }
} else if (base_type->contains_atomic()) {
   assert(ctx->Const.MaxAtomicBufferBindings <= 
MAX_COMBINED_ATOMIC_BUFFERS);
-  if (unsigned(qual->binding) >= ctx->Const.MaxAtomicBufferBindings) {
+  if (qual_binding >= ctx->Const.MaxAtomicBufferBindings) {
  _mesa_glsl_error(loc, state, "layout(binding = %d) exceeds the "
   " maximum number of atomic counter buffer bindings"
-  "(%d)", qual->binding,
+  "(%u)", qual_binding,
   ctx->Const.MaxAtomicBufferBindings);
 
  return;
@@ -2641,7 +2642,7 @@ apply_explicit_binding(struct _mesa_glsl_parse_state 
*state,
}
 
var->data.explicit_binding = true;
-   var->data.binding = qual->binding;
+   var->data.binding = qual_binding;
 
return;
 }
-- 
2.4.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v3 08/14] glsl: encapsulate binding validation and setting

2015-11-14 Thread Timothy Arceri

From: Timothy Arceri 

This change moves the binding layout handing code into an apply
function to be consistent with other helper functions in the ast
code, and to encapsulate the code so that when we introduce
compile time constants the code will be much cleaner.

One small downside is for unnamed interface blocks we will now
be revalidating the binding for each member its applied to.
However this seems a small sacrifice in order to have code which
is readable.

We also remove the incorrect comment in the named interface code
about propagating bindings to members which seems to have been
copied from the unnamed interface code.
---
 src/glsl/ast_to_hir.cpp | 59 ++---
 1 file changed, 27 insertions(+), 32 deletions(-)

diff --git a/src/glsl/ast_to_hir.cpp b/src/glsl/ast_to_hir.cpp
index dedc39f..773b8ae 100644
--- a/src/glsl/ast_to_hir.cpp
+++ b/src/glsl/ast_to_hir.cpp
@@ -2537,22 +2537,22 @@ validate_stream_qualifier(YYLTYPE *loc, struct 
_mesa_glsl_parse_state *state,
 }
 
 static void
-validate_binding_qualifier(struct _mesa_glsl_parse_state *state,
-   YYLTYPE *loc,
-   ir_variable *var,
-   const glsl_type *type,
-   const ast_type_qualifier *qual)
+apply_explicit_binding(struct _mesa_glsl_parse_state *state,
+   YYLTYPE *loc,
+   ir_variable *var,
+   const glsl_type *type,
+   const ast_type_qualifier *qual)
 {
if (!qual->flags.q.uniform && !qual->flags.q.buffer) {
   _mesa_glsl_error(loc, state,
"the \"binding\" qualifier only applies to uniforms and 
"
"shader storage buffer objects");
-  return false;
+  return;
}
 
if (qual->binding < 0) {
   _mesa_glsl_error(loc, state, "binding values must be >= 0");
-  return false;
+  return;
}
 
const struct gl_context *const ctx = state->ctx;
@@ -2577,7 +2577,7 @@ validate_binding_qualifier(struct _mesa_glsl_parse_state 
*state,
   "the maximum number of UBO binding points (%d)",
   qual->binding, elements,
   ctx->Const.MaxUniformBufferBindings);
- return false;
+ return;
   }
 
   /* SSBOs. From page 67 of the GLSL 4.30 specification:
@@ -2595,7 +2595,7 @@ validate_binding_qualifier(struct _mesa_glsl_parse_state 
*state,
   "the maximum number of SSBO binding points (%d)",
   qual->binding, elements,
   ctx->Const.MaxShaderStorageBufferBindings);
- return false;
+ return;
   }
} else if (base_type->is_sampler()) {
   /* Samplers.  From page 63 of the GLSL 4.20 specification:
@@ -2612,7 +2612,7 @@ validate_binding_qualifier(struct _mesa_glsl_parse_state 
*state,
   "exceeds the maximum number of texture image units "
   "(%d)", qual->binding, elements, limit);
 
- return false;
+ return;
   }
} else if (base_type->contains_atomic()) {
   assert(ctx->Const.MaxAtomicBufferBindings <= 
MAX_COMBINED_ATOMIC_BUFFERS);
@@ -2622,7 +2622,7 @@ validate_binding_qualifier(struct _mesa_glsl_parse_state 
*state,
   "(%d)", qual->binding,
   ctx->Const.MaxAtomicBufferBindings);
 
- return false;
+ return;
   }
} else if (state->is_version(420, 310) && base_type->is_image()) {
   assert(ctx->Const.MaxImageUnits <= MAX_IMAGE_UNITS);
@@ -2630,17 +2630,20 @@ validate_binding_qualifier(struct 
_mesa_glsl_parse_state *state,
  _mesa_glsl_error(loc, state, "Image binding %d exceeds the "
   " maximum number of image units (%d)", max_index,
   ctx->Const.MaxImageUnits);
- return false;
+ return;
   }
 
} else {
   _mesa_glsl_error(loc, state,
"the \"binding\" qualifier only applies to uniform "
"blocks, opaque variables, or arrays thereof");
-  return false;
+  return;
}
 
-   return true;
+   var->data.explicit_binding = true;
+   var->data.binding = qual->binding;
+
+   return;
 }
 
 
@@ -3042,10 +3045,8 @@ apply_layout_qualifier_to_variable(const struct 
ast_type_qualifier *qual,
   _mesa_glsl_error(loc, state, "explicit index requires explicit 
location");
}
 
-   if (qual->flags.q.explicit_binding &&
-   validate_binding_qualifier(state, loc, var->type, qual)) {
-  var->data.explicit_binding = true;
-  var->data.binding = qual->binding;
+   if (qual->flags.q.explicit_binding) {
+  apply_explicit_binding(state, loc, var, var->type, qual);
}
 
if (state->stage == MESA_SHADER_GEOMETRY &&
@@ -6693,8 +6694,6 @@ ast_interface_block::hir(exec_list *instruction

[Mesa-dev] [PATCH v3 03/14] glsl: move location layout qualifier validation

2015-11-14 Thread Timothy Arceri

From: Timothy Arceri 

We are moving this out of the parser in preparation for compile
time constant support.
---
 src/glsl/ast_to_hir.cpp | 22 ++
 src/glsl/glsl_parser.yy |  8 +---
 2 files changed, 15 insertions(+), 15 deletions(-)

diff --git a/src/glsl/ast_to_hir.cpp b/src/glsl/ast_to_hir.cpp
index 2e55a49..de13589 100644
--- a/src/glsl/ast_to_hir.cpp
+++ b/src/glsl/ast_to_hir.cpp
@@ -2674,13 +2674,19 @@ apply_explicit_location(const struct ast_type_qualifier 
*qual,
 {
bool fail = false;
 
+   unsigned qual_location;
+   if (!process_qualifier_constant(state, loc, "location", qual->location,
+   &qual_location)) {
+  return;
+   }
+
/* Checks for GL_ARB_explicit_uniform_location. */
if (qual->flags.q.uniform) {
   if (!state->check_explicit_uniform_location_allowed(loc, var))
  return;
 
   const struct gl_context *const ctx = state->ctx;
-  unsigned max_loc = qual->location + var->type->uniform_locations() - 1;
+  unsigned max_loc = qual_location + var->type->uniform_locations() - 1;
 
   if (max_loc >= ctx->Const.MaxUserAssignableUniformLocations) {
  _mesa_glsl_error(loc, state, "location(s) consumed by uniform %s "
@@ -2690,7 +2696,7 @@ apply_explicit_location(const struct ast_type_qualifier 
*qual,
   }
 
   var->data.explicit_location = true;
-  var->data.location = qual->location;
+  var->data.location = qual_location;
   return;
}
 
@@ -2775,23 +2781,23 @@ apply_explicit_location(const struct ast_type_qualifier 
*qual,
   switch (state->stage) {
   case MESA_SHADER_VERTEX:
  var->data.location = (var->data.mode == ir_var_shader_in)
-? (qual->location + VERT_ATTRIB_GENERIC0)
-: (qual->location + VARYING_SLOT_VAR0);
+? (qual_location + VERT_ATTRIB_GENERIC0)
+: (qual_location + VARYING_SLOT_VAR0);
  break;
 
   case MESA_SHADER_TESS_CTRL:
   case MESA_SHADER_TESS_EVAL:
   case MESA_SHADER_GEOMETRY:
  if (var->data.patch)
-var->data.location = qual->location + VARYING_SLOT_PATCH0;
+var->data.location = qual_location + VARYING_SLOT_PATCH0;
  else
-var->data.location = qual->location + VARYING_SLOT_VAR0;
+var->data.location = qual_location + VARYING_SLOT_VAR0;
  break;
 
   case MESA_SHADER_FRAGMENT:
  var->data.location = (var->data.mode == ir_var_shader_out)
-? (qual->location + FRAG_RESULT_DATA0)
-: (qual->location + VARYING_SLOT_VAR0);
+? (qual_location + FRAG_RESULT_DATA0)
+: (qual_location + VARYING_SLOT_VAR0);
  break;
   case MESA_SHADER_COMPUTE:
  assert(!"Unexpected shader type");
diff --git a/src/glsl/glsl_parser.yy b/src/glsl/glsl_parser.yy
index adf6a05..5eb54c4 100644
--- a/src/glsl/glsl_parser.yy
+++ b/src/glsl/glsl_parser.yy
@@ -1466,13 +1466,7 @@ layout_qualifier_id:
"GL_ARB_explicit_attrib_location layout "
"identifier `%s' used", $1);
  }
-
- if ($3 >= 0) {
-$$.location = $3;
- } else {
- _mesa_glsl_error(& @3, state, "invalid location %d specified", 
$3);
- YYERROR;
- }
+ $$.location = $3;
   }
 
   if (match_layout_qualifier("index", $1, state) == 0) {
-- 
2.4.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v3 01/14] glsl: rename location layout helper

2015-11-14 Thread Timothy Arceri

From: Timothy Arceri 

Change name from validate -> apply to more accurately describe what
the function does.
---
 src/glsl/ast_to_hir.cpp | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/src/glsl/ast_to_hir.cpp b/src/glsl/ast_to_hir.cpp
index 7a05176..06ba97c 100644
--- a/src/glsl/ast_to_hir.cpp
+++ b/src/glsl/ast_to_hir.cpp
@@ -2650,10 +2650,10 @@ interpret_interpolation_qualifier(const struct 
ast_type_qualifier *qual,
 
 
 static void
-validate_explicit_location(const struct ast_type_qualifier *qual,
-   ir_variable *var,
-   struct _mesa_glsl_parse_state *state,
-   YYLTYPE *loc)
+apply_explicit_location(const struct ast_type_qualifier *qual,
+ir_variable *var,
+struct _mesa_glsl_parse_state *state,
+YYLTYPE *loc)
 {
bool fail = false;
 
@@ -2995,7 +2995,7 @@ apply_layout_qualifier_to_variable(const struct 
ast_type_qualifier *qual,
}
 
if (qual->flags.q.explicit_location) {
-  validate_explicit_location(qual, var, state, loc);
+  apply_explicit_location(qual, var, state, loc);
} else if (qual->flags.q.explicit_index) {
   _mesa_glsl_error(loc, state, "explicit index requires explicit 
location");
}
-- 
2.4.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v3 02/14] glsl: add process_qualifier_constant() helper

2015-11-14 Thread Timothy Arceri

From: Timothy Arceri 

For now this just validates that a qualifier is inside its
minimum boundary, in a later patch we will expand it to
evaluate compile time constants.
---
 src/glsl/ast_to_hir.cpp | 17 +
 1 file changed, 17 insertions(+)

diff --git a/src/glsl/ast_to_hir.cpp b/src/glsl/ast_to_hir.cpp
index 06ba97c..2e55a49 100644
--- a/src/glsl/ast_to_hir.cpp
+++ b/src/glsl/ast_to_hir.cpp
@@ -2505,6 +2505,23 @@ validate_matrix_layout_for_type(struct 
_mesa_glsl_parse_state *state,
 }
 
 static bool
+process_qualifier_constant(struct _mesa_glsl_parse_state *state,
+   YYLTYPE *loc,
+   const char *qual_indentifier,
+   int qual_value,
+   unsigned *value)
+{
+   if (qual_value < 0) {
+  _mesa_glsl_error(loc, state, "%s layout qualifier is invalid (%d < 0)",
+   qual_indentifier, qual_value);
+  return false;
+   }
+
+   *value = (unsigned) qual_value;
+   return true;
+}
+
+static bool
 validate_binding_qualifier(struct _mesa_glsl_parse_state *state,
YYLTYPE *loc,
const glsl_type *type,
-- 
2.4.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v3 05/14] glsl: replace index layout min boundary check

2015-11-14 Thread Timothy Arceri

From: Timothy Arceri 

Use new helper that will in a later patch allow for
compile time constants.
---
 src/glsl/ast_to_hir.cpp | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/src/glsl/ast_to_hir.cpp b/src/glsl/ast_to_hir.cpp
index de13589..8705f6e 100644
--- a/src/glsl/ast_to_hir.cpp
+++ b/src/glsl/ast_to_hir.cpp
@@ -2804,7 +2804,10 @@ apply_explicit_location(const struct ast_type_qualifier 
*qual,
  break;
   }
 
-  if (qual->flags.q.explicit_index) {
+  unsigned qual_index;
+  if (qual->flags.q.explicit_index &&
+  process_qualifier_constant(state, loc, "index", qual->index,
+ &qual_index)) {
  /* From the GLSL 4.30 specification, section 4.4.2 (Output
   * Layout Qualifiers):
   *
@@ -2814,12 +2817,12 @@ apply_explicit_location(const struct ast_type_qualifier 
*qual,
   * Older specifications don't mandate a behavior; we take
   * this as a clarification and always generate the error.
   */
- if (qual->index < 0 || qual->index > 1) {
+ if (qual_index > 1) {
 _mesa_glsl_error(loc, state,
  "explicit index may only be 0 or 1");
  } else {
 var->data.explicit_index = true;
-var->data.index = qual->index;
+var->data.index = qual_index;
  }
   }
}
-- 
2.4.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v3 04/14] glsl: remove duplicate validation for index layout qualifier

2015-11-14 Thread Timothy Arceri

From: Timothy Arceri 

The minimum value for index is validated in the ast code and
we want to remove validation from the parser so we can add
compile time constant support.
---
 src/glsl/glsl_parser.yy | 8 +---
 1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/src/glsl/glsl_parser.yy b/src/glsl/glsl_parser.yy
index 5eb54c4..b902411 100644
--- a/src/glsl/glsl_parser.yy
+++ b/src/glsl/glsl_parser.yy
@@ -1471,13 +1471,7 @@ layout_qualifier_id:
 
   if (match_layout_qualifier("index", $1, state) == 0) {
  $$.flags.q.explicit_index = 1;
-
- if ($3 >= 0) {
-$$.index = $3;
- } else {
-_mesa_glsl_error(& @3, state, "invalid index %d specified", $3);
-YYERROR;
- }
+ $$.index = $3;
   }
 
   if ((state->has_420pack() ||
-- 
2.4.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v3 12/14] glsl: add support for complie-time constant expressions

2015-11-14 Thread Timothy Arceri

From: Timothy Arceri 

This patch replaces the old interger constant qualifiers with either
the new ast_layout_expression type if the qualifier requires merging
or ast_expression if the qualifier can't have mulitple declarations
or if all but the newest qualifier is simply ignored.

We also update the process_qualifier_constant() helper to be
similar to the one in the ast_layout_expression class, but in
this case it will be used to process the ast_expression qualifiers.

Global shader layout qualifier validation is moved out of the parser
in this change as we now need to evaluate any constant expression
before doing the validation.
---
 src/glsl/ast.h  |  33 +--
 src/glsl/ast_to_hir.cpp | 126 
 src/glsl/ast_type.cpp   |  69 --
 src/glsl/glsl_parser.yy |  87 +--
 src/glsl/glsl_parser_extras.cpp |  44 --
 5 files changed, 195 insertions(+), 164 deletions(-)

diff --git a/src/glsl/ast.h b/src/glsl/ast.h
index f2b9dfa..bfeab6b 100644
--- a/src/glsl/ast.h
+++ b/src/glsl/ast.h
@@ -578,7 +578,7 @@ struct ast_type_qualifier {
unsigned precision:2;
 
/** Geometry shader invocations for GL_ARB_gpu_shader5. */
-   int invocations;
+   ast_layout_expression *invocations;
 
/**
 * Location specified via GL_ARB_explicit_attrib_location layout
@@ -586,20 +586,20 @@ struct ast_type_qualifier {
 * \note
 * This field is only valid if \c explicit_location is set.
 */
-   int location;
+   ast_expression *location;
/**
 * Index specified via GL_ARB_explicit_attrib_location layout
 *
 * \note
 * This field is only valid if \c explicit_index is set.
 */
-   int index;
+   ast_expression *index;
 
/** Maximum output vertices in GLSL 1.50 geometry shaders. */
-   int max_vertices;
+   ast_layout_expression *max_vertices;
 
/** Stream in GLSL 1.50 geometry shaders. */
-   unsigned stream;
+   ast_expression *stream;
 
/**
 * Input or output primitive type in GLSL 1.50 geometry shaders
@@ -613,7 +613,7 @@ struct ast_type_qualifier {
 * \note
 * This field is only valid if \c explicit_binding is set.
 */
-   int binding;
+   ast_expression *binding;
 
/**
 * Offset specified via GL_ARB_shader_atomic_counter's "offset"
@@ -622,14 +622,14 @@ struct ast_type_qualifier {
 * \note
 * This field is only valid if \c explicit_offset is set.
 */
-   int offset;
+   ast_expression *offset;
 
/**
 * Local size specified via GL_ARB_compute_shader's "local_size_{x,y,z}"
 * layout qualifier.  Element i of this array is only valid if
 * flags.q.local_size & (1 << i) is set.
 */
-   int local_size[3];
+   ast_layout_expression *local_size[3];
 
/** Tessellation evaluation shader: vertex spacing (equal, fractional 
even/odd) */
GLenum vertex_spacing;
@@ -641,7 +641,7 @@ struct ast_type_qualifier {
bool point_mode;
 
/** Tessellation control shader: number of output vertices */
-   int vertices;
+   ast_layout_expression *vertices;
 
/**
 * Image format specified with an ARB_shader_image_load_store
@@ -1113,17 +1113,13 @@ public:
 class ast_tcs_output_layout : public ast_node
 {
 public:
-   ast_tcs_output_layout(const struct YYLTYPE &locp, int vertices)
-  : vertices(vertices)
+   ast_tcs_output_layout(const struct YYLTYPE &locp)
{
   set_location(locp);
}
 
virtual ir_rvalue *hir(exec_list *instructions,
   struct _mesa_glsl_parse_state *state);
-
-private:
-   const int vertices;
 };
 
 
@@ -1155,9 +1151,12 @@ private:
 class ast_cs_input_layout : public ast_node
 {
 public:
-   ast_cs_input_layout(const struct YYLTYPE &locp, const unsigned *local_size)
+   ast_cs_input_layout(const struct YYLTYPE &locp,
+   ast_layout_expression **local_size)
{
-  memcpy(this->local_size, local_size, sizeof(this->local_size));
+  for (int i = 0; i < 3; i++) {
+ this->local_size[i] = local_size[i];
+  }
   set_location(locp);
}
 
@@ -1165,7 +1164,7 @@ public:
   struct _mesa_glsl_parse_state *state);
 
 private:
-   unsigned local_size[3];
+   ast_layout_expression *local_size[3];
 };
 
 /*@}*/
diff --git a/src/glsl/ast_to_hir.cpp b/src/glsl/ast_to_hir.cpp
index e7e2a85..6a3ec44 100644
--- a/src/glsl/ast_to_hir.cpp
+++ b/src/glsl/ast_to_hir.cpp
@@ -2508,16 +2508,40 @@ static bool
 process_qualifier_constant(struct _mesa_glsl_parse_state *state,
YYLTYPE *loc,
const char *qual_indentifier,
-   int qual_value,
+   ast_expression *const_expression,
unsigned *value)
 {
-   if (qual_value < 0) {
+   exec_list dummy_instructions;
+
+   if (const_expression == NULL) {
+  *value = 0;
+  return true;
+   }
+
+   ir_rvalue *const ir = c

[Mesa-dev] [PATCH v3 14/14] docs: mark compile-time constant expressions as done

2015-11-14 Thread Timothy Arceri

From: Timothy Arceri 

---
 docs/GL3.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/GL3.txt b/docs/GL3.txt
index b768eea..ad6b95e 100644
--- a/docs/GL3.txt
+++ b/docs/GL3.txt
@@ -179,7 +179,7 @@ GL 4.4, GLSL 4.40:
   GL_ARB_buffer_storageDONE (i965, nv50, nvc0, 
r600, radeonsi)
   GL_ARB_clear_texture DONE (i965, nv50, nvc0)
   GL_ARB_enhanced_layouts  in progress (Timothy)
-  - compile-time constant expressions  in progress
+  - compile-time constant expressions  DONE
   - explicit byte offsets for blocks   in progress
   - forced alignment within blocks in progress
   - specified vec4-slot component numbers  in progress
-- 
2.4.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v3 07/14] glsl: move stream layout max validation

2015-11-14 Thread Timothy Arceri

From: Timothy Arceri 

This validation is moved later so we can validate the
max value when compile time constant support is added in a
later patch.
---
 src/glsl/ast_to_hir.cpp | 22 --
 src/glsl/ast_type.cpp   | 14 --
 2 files changed, 20 insertions(+), 16 deletions(-)

diff --git a/src/glsl/ast_to_hir.cpp b/src/glsl/ast_to_hir.cpp
index 53faacf..dedc39f 100644
--- a/src/glsl/ast_to_hir.cpp
+++ b/src/glsl/ast_to_hir.cpp
@@ -2522,8 +2522,24 @@ process_qualifier_constant(struct _mesa_glsl_parse_state 
*state,
 }
 
 static bool
+validate_stream_qualifier(YYLTYPE *loc, struct _mesa_glsl_parse_state *state,
+  unsigned stream)
+{
+   if (stream >= state->ctx->Const.MaxVertexStreams) {
+  _mesa_glsl_error(loc, state,
+   "invalid stream specified %d is larger than "
+   "MAX_VERTEX_STREAMS - 1 (%d).",
+   stream, state->ctx->Const.MaxVertexStreams - 1);
+  return false;
+   }
+
+   return true;
+}
+
+static void
 validate_binding_qualifier(struct _mesa_glsl_parse_state *state,
YYLTYPE *loc,
+   ir_variable *var,
const glsl_type *type,
const ast_type_qualifier *qual)
 {
@@ -3036,7 +3052,8 @@ apply_layout_qualifier_to_variable(const struct 
ast_type_qualifier *qual,
qual->flags.q.out && qual->flags.q.stream) {
   unsigned qual_stream;
   if (process_qualifier_constant(state, loc, "stream", qual->stream,
- &qual_stream)) {
+ &qual_stream) &&
+  validate_stream_qualifier(loc, state, qual_stream)) {
  var->data.stream = qual_stream;
   }
}
@@ -6517,7 +6534,8 @@ ast_interface_block::hir(exec_list *instructions,
 
unsigned qual_stream;
if (!process_qualifier_constant(state, &loc, "stream", this->layout.stream,
-   &qual_stream)) {
+   &qual_stream) ||
+   !validate_stream_qualifier(&loc, state, qual_stream)) {
   /* If the stream qualifier is invalid it doesn't make sense to continue
* on and try to compare stream layouts on member variables agaist it so
* just return early.
diff --git a/src/glsl/ast_type.cpp b/src/glsl/ast_type.cpp
index 79134c1..b107051 100644
--- a/src/glsl/ast_type.cpp
+++ b/src/glsl/ast_type.cpp
@@ -190,20 +190,6 @@ ast_type_qualifier::merge_qualifier(YYLTYPE *loc,
 
if (state->stage == MESA_SHADER_GEOMETRY &&
state->has_explicit_attrib_stream()) {
-  if (q.flags.q.stream && q.stream >= state->ctx->Const.MaxVertexStreams) {
- _mesa_glsl_error(loc, state,
-  "`stream' value is larger than MAX_VERTEX_STREAMS - 
1 "
-  "(%d > %d)",
-  q.stream, state->ctx->Const.MaxVertexStreams - 1);
-  }
-  if (this->flags.q.explicit_stream &&
-  this->stream >= state->ctx->Const.MaxVertexStreams) {
- _mesa_glsl_error(loc, state,
-  "`stream' value is larger than MAX_VERTEX_STREAMS - 
1 "
-  "(%d > %d)",
-  this->stream, state->ctx->Const.MaxVertexStreams - 
1);
-  }
-
   if (!this->flags.q.explicit_stream) {
  if (q.flags.q.stream) {
 this->flags.q.stream = 1;
-- 
2.4.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v3 06/14] glsl: move stream layout qualifier validation

2015-11-14 Thread Timothy Arceri

From: Timothy Arceri 

We are moving this out of the parser in preparation for compile
time constant support.

The reason a validation function is used rather than an apply
function like what is used with bindings is because glsl allows
streams to be defined on members of blocks even though they must
match the stream thats associated with the current block, this
means we need access to the value after validation to do this
comparision.
---
 src/glsl/ast_to_hir.cpp | 44 +---
 src/glsl/glsl_parser.yy | 11 ++-
 2 files changed, 35 insertions(+), 20 deletions(-)

diff --git a/src/glsl/ast_to_hir.cpp b/src/glsl/ast_to_hir.cpp
index 8705f6e..53faacf 100644
--- a/src/glsl/ast_to_hir.cpp
+++ b/src/glsl/ast_to_hir.cpp
@@ -3034,7 +3034,11 @@ apply_layout_qualifier_to_variable(const struct 
ast_type_qualifier *qual,
 
if (state->stage == MESA_SHADER_GEOMETRY &&
qual->flags.q.out && qual->flags.q.stream) {
-  var->data.stream = qual->stream;
+  unsigned qual_stream;
+  if (process_qualifier_constant(state, loc, "stream", qual->stream,
+ &qual_stream)) {
+ var->data.stream = qual_stream;
+  }
}
 
if (var->type->contains_atomic()) {
@@ -6080,7 +6084,8 @@ ast_process_struct_or_iface_block_members(exec_list 
*instructions,
   enum glsl_matrix_layout 
matrix_layout,
   bool allow_reserved_names,
   ir_variable_mode var_mode,
-  ast_type_qualifier *layout)
+  ast_type_qualifier *layout,
+  unsigned block_stream)
 {
unsigned decl_count = 0;
 
@@ -6188,11 +6193,16 @@ ast_process_struct_or_iface_block_members(exec_list 
*instructions,
*   the specified stream must match the stream associated with the
*   containing block."
*/
-  if (qual->flags.q.explicit_stream &&
-  qual->stream != layout->stream) {
- _mesa_glsl_error(&loc, state, "stream layout qualifier on interface "
-  "block member does not match the interface block "
-  "(%d vs %d)", qual->stream, layout->stream);
+  if (qual->flags.q.explicit_stream) {
+ unsigned qual_stream;
+ if (process_qualifier_constant(state, &loc, "stream",
+qual->stream, &qual_stream) &&
+ qual_stream != block_stream) {
+_mesa_glsl_error(&loc, state, "stream layout qualifier on "
+ "interface block member does not match "
+ "the interface block (%d vs %d)", qual->stream,
+ block_stream);
+ }
   }
 
   if (qual->flags.q.uniform && qual->has_interpolation()) {
@@ -6350,7 +6360,8 @@ ast_struct_specifier::hir(exec_list *instructions,
 GLSL_MATRIX_LAYOUT_INHERITED,
 false /* allow_reserved_names 
*/,
 ir_var_auto,
-NULL);
+NULL,
+0 /* for interface only */);
 
validate_identifier(this->name, loc, state);
 
@@ -6504,6 +6515,16 @@ ast_interface_block::hir(exec_list *instructions,
"Interface block sets both readonly and writeonly");
}
 
+   unsigned qual_stream;
+   if (!process_qualifier_constant(state, &loc, "stream", this->layout.stream,
+   &qual_stream)) {
+  /* If the stream qualifier is invalid it doesn't make sense to continue
+   * on and try to compare stream layouts on member variables agaist it so
+   * just return early.
+   */
+  return NULL;
+   }
+
unsigned int num_variables =
   ast_process_struct_or_iface_block_members(&declared_variables,
 state,
@@ -6513,7 +6534,8 @@ ast_interface_block::hir(exec_list *instructions,
 matrix_layout,
 redeclaring_per_vertex,
 var_mode,
-&this->layout);
+&this->layout,
+qual_stream);
 
state->struct_specifier_depth--;
 
@@ -6857,7 +6879,7 @@ ast_interface_block::hir(exec_list *instructions,
  var->data.explicit_binding = this->layout.flags.q.explicit_binding;
  var->data.binding = this->layout.binding;
 
- var->data.stream = this->layout.stream;
+ var->data.stream = qual_stream;

[Mesa-dev] [PATCH v3 13/14] glsl: add subroutine index qualifier support

2015-11-14 Thread Timothy Arceri

From: Timothy Arceri 

ARB_explicit_uniform_location allows the index for subroutine functions
to be explicitly set in the shader.

This patch reduces the restriction on the index qualifier in
validate_layout_qualifiers() to allow it to be applied to subroutines
and adds the new subroutine qualifier validation to ast_function::hir().

ast_fully_specified_type::has_qualifiers() is updated to allow the
index qualifier on subroutine functions when explicit uniform locations
is available.

A new check is added to ast_type_qualifier::merge_qualifier() to stop
multiple function qualifiers from being defied, before this patch this
would cause a segfault.

Finally a new variable is added to ir_function_signature to store the
index. This value is validated and the non explicit values assigned in
link_assign_subroutine_types().
---
 src/glsl/ast.h |  2 +-
 src/glsl/ast_to_hir.cpp| 34 --
 src/glsl/ast_type.cpp  | 14 +-
 src/glsl/ir.cpp|  1 +
 src/glsl/ir.h  |  2 ++
 src/glsl/ir_clone.cpp  |  1 +
 src/glsl/linker.cpp| 33 +
 src/mesa/main/mtypes.h |  1 +
 src/mesa/main/shader_query.cpp |  7 +++
 9 files changed, 91 insertions(+), 4 deletions(-)

diff --git a/src/glsl/ast.h b/src/glsl/ast.h
index bfeab6b..1e4a998 100644
--- a/src/glsl/ast.h
+++ b/src/glsl/ast.h
@@ -772,7 +772,7 @@ public:
 class ast_fully_specified_type : public ast_node {
 public:
virtual void print(void) const;
-   bool has_qualifiers() const;
+   bool has_qualifiers(_mesa_glsl_parse_state *state) const;
 
ast_fully_specified_type() : qualifier(), specifier(NULL)
{
diff --git a/src/glsl/ast_to_hir.cpp b/src/glsl/ast_to_hir.cpp
index 6a3ec44..6c56829 100644
--- a/src/glsl/ast_to_hir.cpp
+++ b/src/glsl/ast_to_hir.cpp
@@ -2848,6 +2848,13 @@ apply_explicit_location(const struct ast_type_qualifier 
*qual,
  break;
   }
 
+  /* Check if index was set for the uniform instead of the function */
+  if (qual->flags.q.explicit_index && qual->flags.q.subroutine) {
+ _mesa_glsl_error(loc, state, "an index qualifier can only be "
+  "used with subroutine functions");
+ return;
+  }
+
   unsigned qual_index;
   if (qual->flags.q.explicit_index &&
   process_qualifier_constant(state, loc, "index", qual->index,
@@ -3067,7 +3074,9 @@ apply_layout_qualifier_to_variable(const struct 
ast_type_qualifier *qual,
if (qual->flags.q.explicit_location) {
   apply_explicit_location(qual, var, state, loc);
} else if (qual->flags.q.explicit_index) {
-  _mesa_glsl_error(loc, state, "explicit index requires explicit 
location");
+  if (!qual->flags.q.subroutine_def)
+ _mesa_glsl_error(loc, state,
+  "explicit index requires explicit location");
}
 
if (qual->flags.q.explicit_binding) {
@@ -5075,7 +5084,7 @@ ast_function::hir(exec_list *instructions,
/* From page 56 (page 62 of the PDF) of the GLSL 1.30 spec:
 * "No qualifier is allowed on the return type of a function."
 */
-   if (this->return_type->has_qualifiers()) {
+   if (this->return_type->has_qualifiers(state)) {
   YYLTYPE loc = this->get_location();
   _mesa_glsl_error(& loc, state,
"function `%s' return type has qualifiers", name);
@@ -5207,6 +5216,27 @@ ast_function::hir(exec_list *instructions,
if (this->return_type->qualifier.flags.q.subroutine_def) {
   int idx;
 
+  if (this->return_type->qualifier.flags.q.explicit_index) {
+ unsigned qual_index;
+ if (process_qualifier_constant(state, &loc, "index",
+this->return_type->qualifier.index,
+&qual_index)) {
+if (!state->has_explicit_uniform_location()) {
+   _mesa_glsl_error(&loc, state, "subroutine index requires "
+"GL_ARB_explicit_uniform_location or "
+"GLSL 4.30");
+} else if (qual_index >= MAX_SUBROUTINES) {
+   _mesa_glsl_error(&loc, state,
+"invalid subroutine index (%d) index must "
+"be a number between 0 and "
+"GL_MAX_SUBROUTINES - 1 (%d)", qual_index,
+MAX_SUBROUTINES - 1);
+} else {
+   f->subroutine_index = qual_index;
+}
+ }
+  }
+
   f->num_subroutine_types = 
this->return_type->qualifier.subroutine_list->declarations.length();
   f->subroutine_types = ralloc_array(state, const struct glsl_type *,
  f->num_subroutine_types);
diff --git a/src/glsl/ast_type.cpp b/src/glsl/ast_type.cpp
index 1e89a9e..03ed4dc 100644
--- a/src/glsl/ast_type.cpp
+++ b/src/glsl/ast

[Mesa-dev] [PATCH v3 10/14] glsl: call set_shader_inout_layout() earlier

2015-11-14 Thread Timothy Arceri

From: Timothy Arceri 

This will allow us to add error checking to this function
in a later patch, if we don't move it the error messages
will go missing.
---
 src/glsl/glsl_parser_extras.cpp | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/src/glsl/glsl_parser_extras.cpp b/src/glsl/glsl_parser_extras.cpp
index 3ed1168..1678d89 100644
--- a/src/glsl/glsl_parser_extras.cpp
+++ b/src/glsl/glsl_parser_extras.cpp
@@ -1795,6 +1795,9 @@ _mesa_glsl_compile_shader(struct gl_context *ctx, struct 
gl_shader *shader,
if (shader->InfoLog)
   ralloc_free(shader->InfoLog);
 
+   if (!state->error)
+  set_shader_inout_layout(shader, state);
+
shader->symbols = new(shader->ir) glsl_symbol_table;
shader->CompileStatus = !state->error;
shader->InfoLog = state->info_log;
@@ -1802,9 +1805,6 @@ _mesa_glsl_compile_shader(struct gl_context *ctx, struct 
gl_shader *shader,
shader->IsES = state->es_shader;
shader->uses_builtin_functions = state->uses_builtin_functions;
 
-   if (!state->error)
-  set_shader_inout_layout(shader, state);
-
/* Retain any live IR, but trash the rest. */
reparent_ir(shader->ir, shader->ir);
 
-- 
2.4.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v3 11/14] glsl: add new type for compile time constants

2015-11-14 Thread Timothy Arceri

From: Timothy Arceri 

In this patch we introduce a new ast type for holding the new
compile-time constant expressions. The main reason for this is that
we can no longer do merging of layout qualifiers before they have been
converted into GLSL IR so we need to store them to be proccessed later.

The new type has two helper functions:

- process_qualifier_constant()

 Used to merge and then evaluate qualifier expressions

- merge_qualifier()

 Simply appends a qualifier to a list to be merged later by
 process_qualifier_constant()

In order to avoid cascading error messages the process_qualifier_constant()
helpers return a bool
---
 src/glsl/ast.h| 20 +
 src/glsl/ast_type.cpp | 60 +++
 2 files changed, 80 insertions(+)

diff --git a/src/glsl/ast.h b/src/glsl/ast.h
index 1b75234..f2b9dfa 100644
--- a/src/glsl/ast.h
+++ b/src/glsl/ast.h
@@ -350,6 +350,26 @@ public:
exec_list array_dimensions;
 };
 
+class ast_layout_expression : public ast_node {
+public:
+   ast_layout_expression(const struct YYLTYPE &locp, ast_expression *expr)
+   {
+  set_location(locp);
+  layout_const_expressions.push_tail(&expr->link);
+   }
+
+   bool process_qualifier_constant(struct _mesa_glsl_parse_state *state,
+   const char *qual_indentifier,
+   unsigned *value, bool can_be_zero);
+
+   void merge_qualifier(ast_layout_expression *l_expr)
+   {
+  layout_const_expressions.append_list(&l_expr->layout_const_expressions);
+   }
+
+   exec_list layout_const_expressions;
+};
+
 /**
  * C-style aggregate initialization class
  *
diff --git a/src/glsl/ast_type.cpp b/src/glsl/ast_type.cpp
index b107051..89ab8ea 100644
--- a/src/glsl/ast_type.cpp
+++ b/src/glsl/ast_type.cpp
@@ -467,3 +467,63 @@ ast_type_qualifier::merge_in_qualifier(YYLTYPE *loc,
 
return true;
 }
+
+bool
+ast_layout_expression::process_qualifier_constant(struct 
_mesa_glsl_parse_state *state,
+  const char *qual_indentifier,
+  unsigned *value,
+  bool can_be_zero)
+{
+   int min_value = 0;
+   bool first_pass = true;
+   *value = 0;
+
+   if (!can_be_zero)
+  min_value = 1;
+
+   for (exec_node *node = layout_const_expressions.head;
+   !node->is_tail_sentinel(); node = node->next) {
+
+  exec_list dummy_instructions;
+  ast_node *const_expression = exec_node_data(ast_node, node, link);
+
+  ir_rvalue *const ir = const_expression->hir(&dummy_instructions, state);
+
+  ir_constant *const const_int = ir->constant_expression_value();
+  if (const_int == NULL || !const_int->type->is_integer()) {
+ YYLTYPE loc = const_expression->get_location();
+ _mesa_glsl_error(&loc, state, "%s must be an integral constant "
+  "expression", qual_indentifier);
+ return false;
+  }
+
+  if (const_int->value.i[0] < min_value) {
+ YYLTYPE loc = const_expression->get_location();
+ _mesa_glsl_error(&loc, state, "%s layout qualifier is invalid "
+  "(%d < %d)", qual_indentifier,
+  const_int->value.i[0], min_value);
+ return false;
+  }
+
+  if (!first_pass && *value != const_int->value.u[0]) {
+ YYLTYPE loc = const_expression->get_location();
+ _mesa_glsl_error(&loc, state, "%s layout qualifier does not "
+ "match previous declaration (%d vs %d)",
+  qual_indentifier, *value, const_int->value.i[0]);
+ return false;
+  } else {
+ first_pass = false;
+ *value = const_int->value.u[0];
+  }
+
+  /* If the location is const (and we've verified that
+   * it is) then no instructions should have been emitted
+   * when we converted it to HIR. If they were emitted,
+   * then either the location isn't const after all, or
+   * we are emitting unnecessary instructions.
+   */
+  assert(dummy_instructions.is_empty());
+   }
+
+   return true;
+}
-- 
2.4.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v3 07/14] glsl: move stream layout max validation

2015-11-14 Thread Timothy Arceri

On Sun, 2015-11-15 at 00:42 +1100, Timothy Arceri wrote:
> From: Timothy Arceri 
> 
> This validation is moved later so we can validate the
> max value when compile time constant support is added in a
> later patch.
> ---
>  src/glsl/ast_to_hir.cpp | 22 --
>  src/glsl/ast_type.cpp   | 14 --
>  2 files changed, 20 insertions(+), 16 deletions(-)
> 
> diff --git a/src/glsl/ast_to_hir.cpp b/src/glsl/ast_to_hir.cpp
> index 53faacf..dedc39f 100644
> --- a/src/glsl/ast_to_hir.cpp
> +++ b/src/glsl/ast_to_hir.cpp
> @@ -2522,8 +2522,24 @@ process_qualifier_constant(struct
> _mesa_glsl_parse_state *state,
>  }
>  
>  static bool
> +validate_stream_qualifier(YYLTYPE *loc, struct _mesa_glsl_parse_state
> *state,
> +  unsigned stream)
> +{
> +   if (stream >= state->ctx->Const.MaxVertexStreams) {
> +  _mesa_glsl_error(loc, state,
> +   "invalid stream specified %d is larger than "
> +   "MAX_VERTEX_STREAMS - 1 (%d).",
> +   stream, state->ctx->Const.MaxVertexStreams - 1);
> +  return false;
> +   }
> +
> +   return true;
> +}
> +
> +static void
>  validate_binding_qualifier(struct _mesa_glsl_parse_state *state,
> YYLTYPE *loc,
> +   ir_variable *var,

This and changeing the function to not return bool are meant to be in the next
patch. I've fixed this up locally.

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] android: fix LOCAL_C_INCLUDES to find glsl_types.h

2015-11-14 Thread Mauro Rossi

Hi Emil,

I checked with Chih-Wei and LOCAL_EXPORT_C_INCLUDE_DIRS is the preferred
way.

I took the chance to finally understand why the single line of change you
proposed will suffice,
it's because glsl_compiler module invokes libmesa_glsl as static library
and the same applies to i965_dri module.

My last problem was probably due to uncleaned mesa targets folders in $OUT
path,
now I have checked again and building errors are gone.

Thanks a lot Emil and Chih-Wei

Mauro

2015-11-08 14:12 GMT+01:00 Emil Velikov :

> Hi Mauro,
>
> On 8 November 2015 at 12:59, Mauro Rossi  wrote:
> > Update2: I'm getting the building error in both x86 target and x86_64
> > target.
> >
> > I'm relieved it's not arch dependend, I suspect that export will require
> a
> > dependency to be declared,
> >  because if i965_dri module is built before glsl ones we will have the
> > error.
> >
> > The LOCAL_C_INCLUDES even if not elegant, avoided the problem in the
> first
> > place,
> > but I'd like to learn the by the best practice and apply it in the
> future.
> >
> Fwiw I'm all for LOCAL_C_INCLUDES (I even mentioned a few times a way
> that we can share those and minimise these issues), but I believe
> Chih-Wei was not really a fan of them. If he's ok with it I'll push
> your original patch.
>
> Regards,
> Emil
>
> P.S. Typos - for each one we fix, another we introduce another one or
> more  :-P
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 00/11] i965/nir: Do texture rectangle lowering in NIR

2015-11-14 Thread Jason Ekstrand

On Thu, Nov 12, 2015 at 7:30 AM, Iago Toral  wrote:
> On Thu, 2015-11-12 at 16:23 +0100, Iago Toral wrote:
>> Patches 1-4 are,
>> Reviewed-by: Iago Toral Quiroga 
>>
>> Patch 5 seems to be missing.

If it helps to calm reviewer's minds, I ran patches 1-5 with this patch on top:

http://cgit.freedesktop.org/~jekstrand/mesa/commit/?h=wip/nir-clone

Zero regressions in piglit, dEQP, and the CTS.

> Oh never mind, I've just seen your reply to the thread pointing to the
> repository.
>
> Iago
>
>> Iago
>>
>> On Wed, 2015-11-11 at 17:23 -0800, Jason Ekstrand wrote:
>> > On older hardware (Iron Lake and below), we can't support texture rectangle
>> > natively.  Sandy Bridge through Haswell can support it but don't support
>> > the GL_CLAMP wrap mode natively.  It isn't until Broadwell that GL_CLAMP is
>> > supported together with GL_TEXTURE_RECTANGLE in hardware.  In the cases
>> > where it isn't supported, we have to fake it by dividing by the texture
>> > size.
>> >
>> > Previously, we had a rescale_texcoord function added a uniform to hold the
>> > texture coordinate and used that to rescale/clamp the texture coordinates.
>> > For a while now, nir_lower_tex has been able to lower texture rectangle to
>> > a textureSize and a regular texture2D operation.  This series makes i965
>> > use the nir_lower_tex path instead.  Incidentally, this fixes texture
>> > rectangle support in vertex and geometry shaders on Haswell and below.
>> > (The backend lowering was only ever done in the FS backend.)
>> >
>> > Since this is the first time we're doing any sort of shader variants in
>> > NIR, the first several passes add the infastructure to do so.  Two of these
>> > patches are from Ken, two are from Rob, and one (nir_clone itself) is my
>> > rendition but heavily based on what Rob did only with less hashing.
>> >
>> > Jason Ekstrand (7):
>> >   nir: support to clone shaders
>> >   i965/nir: Split shader optimization and lowering into three satages
>> >   i965: Move postprocess_nir to codegen time
>> >   nir/lower_tex: Report progress
>> >   nir/lower_tex: Set the dest_type for txs instructions
>> >   i965/fs: Don't allow SINT32 as a return type for resinfo
>> >   i965: Use nir_lower_tex for texture coordinate lowering
>> >
>> > Kenneth Graunke (2):
>> >   i965/nir: Add OPT() and OPT_V() macros for invoking NIR passes.
>> >   i965/nir: Validate that NIR passes call nir_metadata_preserve().
>> >
>> > Rob Clark (2):
>> >   nir: remove nir_variable::max_ifc_array_access
>> >   nir: add array length field
>> >
>> >  src/glsl/Makefile.sources |   1 +
>> >  src/glsl/nir/glsl_to_nir.cpp  |  14 +-
>> >  src/glsl/nir/nir.c|   8 +
>> >  src/glsl/nir/nir.h|  27 +-
>> >  src/glsl/nir/nir_clone.c  | 671 
>> > ++
>> >  src/glsl/nir/nir_lower_tex.c  |  20 +-
>> >  src/glsl/nir/nir_metadata.c   |  36 ++
>> >  src/mesa/drivers/dri/i965/brw_fs.cpp  |  13 +-
>> >  src/mesa/drivers/dri/i965/brw_fs.h|   3 -
>> >  src/mesa/drivers/dri/i965/brw_fs_generator.cpp|  10 +-
>> >  src/mesa/drivers/dri/i965/brw_fs_nir.cpp  |   4 +-
>> >  src/mesa/drivers/dri/i965/brw_fs_visitor.cpp  | 125 
>> >  src/mesa/drivers/dri/i965/brw_nir.c   | 268 +
>> >  src/mesa/drivers/dri/i965/brw_nir.h   |  15 +
>> >  src/mesa/drivers/dri/i965/brw_vec4.cpp|   7 +-
>> >  src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp |   8 +-
>> >  16 files changed, 966 insertions(+), 264 deletions(-)
>> >  create mode 100644 src/glsl/nir/nir_clone.c
>> >
>>
>
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 02/11] nir: add array length field

2015-11-14 Thread Connor Abbott

I think it would be better if we kept track of the type of the
constant instead. That would also allow us to simplify the constant
construction code in, err, something else...

On Wed, Nov 11, 2015 at 8:23 PM, Jason Ekstrand  wrote:
> From: Rob Clark 
>
> This will simplify things somewhat in clone.
>
> Signed-off-by: Rob Clark 
> Reviewed-by: Jason Ekstrand 
> ---
>  src/glsl/nir/glsl_to_nir.cpp | 5 +
>  src/glsl/nir/nir.h   | 5 +
>  2 files changed, 10 insertions(+)
>
> diff --git a/src/glsl/nir/glsl_to_nir.cpp b/src/glsl/nir/glsl_to_nir.cpp
> index 8e53e22..13fa987 100644
> --- a/src/glsl/nir/glsl_to_nir.cpp
> +++ b/src/glsl/nir/glsl_to_nir.cpp
> @@ -241,6 +241,8 @@ constant_copy(ir_constant *ir, void *mem_ctx)
>
> unsigned total_elems = ir->type->components();
> unsigned i;
> +
> +   ret->num_elements = 0;
> switch (ir->type->base_type) {
> case GLSL_TYPE_UINT:
>for (i = 0; i < total_elems; i++)
> @@ -265,6 +267,8 @@ constant_copy(ir_constant *ir, void *mem_ctx)
> case GLSL_TYPE_STRUCT:
>ret->elements = ralloc_array(mem_ctx, nir_constant *,
> ir->type->length);
> +  ret->num_elements = ir->type->length;
> +
>i = 0;
>foreach_in_list(ir_constant, field, &ir->components) {
>   ret->elements[i] = constant_copy(field, mem_ctx);
> @@ -275,6 +279,7 @@ constant_copy(ir_constant *ir, void *mem_ctx)
> case GLSL_TYPE_ARRAY:
>ret->elements = ralloc_array(mem_ctx, nir_constant *,
> ir->type->length);
> +  ret->num_elements = ir->type->length;
>
>for (i = 0; i < ir->type->length; i++)
>   ret->elements[i] = constant_copy(ir->array_elements[i], mem_ctx);
> diff --git a/src/glsl/nir/nir.h b/src/glsl/nir/nir.h
> index 6ffa60b..f99af4e 100644
> --- a/src/glsl/nir/nir.h
> +++ b/src/glsl/nir/nir.h
> @@ -111,6 +111,11 @@ typedef struct nir_constant {
>  */
> union nir_constant_data value;
>
> +   /* we could get this from the var->type but makes clone *much* easier to
> +* not have to care about the type.
> +*/
> +   unsigned num_elements;
> +
> /* Array elements / Structure Fields */
> struct nir_constant **elements;
>  } nir_constant;
> --
> 2.5.0.400.gff86faf
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 02/11] nir: add array length field

2015-11-14 Thread Jason Ekstrand

On Sat, Nov 14, 2015 at 8:19 AM, Connor Abbott  wrote:
> I think it would be better if we kept track of the type of the
> constant instead. That would also allow us to simplify the constant
> construction code in, err, something else...

We do keep track of the type.  It's in the variable.  Do you mean put
a type field in nir_constant?

> On Wed, Nov 11, 2015 at 8:23 PM, Jason Ekstrand  wrote:
>> From: Rob Clark 
>>
>> This will simplify things somewhat in clone.
>>
>> Signed-off-by: Rob Clark 
>> Reviewed-by: Jason Ekstrand 
>> ---
>>  src/glsl/nir/glsl_to_nir.cpp | 5 +
>>  src/glsl/nir/nir.h   | 5 +
>>  2 files changed, 10 insertions(+)
>>
>> diff --git a/src/glsl/nir/glsl_to_nir.cpp b/src/glsl/nir/glsl_to_nir.cpp
>> index 8e53e22..13fa987 100644
>> --- a/src/glsl/nir/glsl_to_nir.cpp
>> +++ b/src/glsl/nir/glsl_to_nir.cpp
>> @@ -241,6 +241,8 @@ constant_copy(ir_constant *ir, void *mem_ctx)
>>
>> unsigned total_elems = ir->type->components();
>> unsigned i;
>> +
>> +   ret->num_elements = 0;
>> switch (ir->type->base_type) {
>> case GLSL_TYPE_UINT:
>>for (i = 0; i < total_elems; i++)
>> @@ -265,6 +267,8 @@ constant_copy(ir_constant *ir, void *mem_ctx)
>> case GLSL_TYPE_STRUCT:
>>ret->elements = ralloc_array(mem_ctx, nir_constant *,
>> ir->type->length);
>> +  ret->num_elements = ir->type->length;
>> +
>>i = 0;
>>foreach_in_list(ir_constant, field, &ir->components) {
>>   ret->elements[i] = constant_copy(field, mem_ctx);
>> @@ -275,6 +279,7 @@ constant_copy(ir_constant *ir, void *mem_ctx)
>> case GLSL_TYPE_ARRAY:
>>ret->elements = ralloc_array(mem_ctx, nir_constant *,
>> ir->type->length);
>> +  ret->num_elements = ir->type->length;
>>
>>for (i = 0; i < ir->type->length; i++)
>>   ret->elements[i] = constant_copy(ir->array_elements[i], mem_ctx);
>> diff --git a/src/glsl/nir/nir.h b/src/glsl/nir/nir.h
>> index 6ffa60b..f99af4e 100644
>> --- a/src/glsl/nir/nir.h
>> +++ b/src/glsl/nir/nir.h
>> @@ -111,6 +111,11 @@ typedef struct nir_constant {
>>  */
>> union nir_constant_data value;
>>
>> +   /* we could get this from the var->type but makes clone *much* easier to
>> +* not have to care about the type.
>> +*/
>> +   unsigned num_elements;
>> +
>> /* Array elements / Structure Fields */
>> struct nir_constant **elements;
>>  } nir_constant;
>> --
>> 2.5.0.400.gff86faf
>>
>> ___
>> mesa-dev mailing list
>> mesa-dev@lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 02/11] nir: add array length field

2015-11-14 Thread Rob Clark

well, clone just needs to know the number of elements, so this is the
simplest possible solution.. not against tracking the type as well, if
that is needed elsewhere.. or if there was a helper to map type to #
of elements, I suppose, but for now this makes clone possible.

BR,
-R

On Sat, Nov 14, 2015 at 11:19 AM, Connor Abbott  wrote:
> I think it would be better if we kept track of the type of the
> constant instead. That would also allow us to simplify the constant
> construction code in, err, something else...
>
> On Wed, Nov 11, 2015 at 8:23 PM, Jason Ekstrand  wrote:
>> From: Rob Clark 
>>
>> This will simplify things somewhat in clone.
>>
>> Signed-off-by: Rob Clark 
>> Reviewed-by: Jason Ekstrand 
>> ---
>>  src/glsl/nir/glsl_to_nir.cpp | 5 +
>>  src/glsl/nir/nir.h   | 5 +
>>  2 files changed, 10 insertions(+)
>>
>> diff --git a/src/glsl/nir/glsl_to_nir.cpp b/src/glsl/nir/glsl_to_nir.cpp
>> index 8e53e22..13fa987 100644
>> --- a/src/glsl/nir/glsl_to_nir.cpp
>> +++ b/src/glsl/nir/glsl_to_nir.cpp
>> @@ -241,6 +241,8 @@ constant_copy(ir_constant *ir, void *mem_ctx)
>>
>> unsigned total_elems = ir->type->components();
>> unsigned i;
>> +
>> +   ret->num_elements = 0;
>> switch (ir->type->base_type) {
>> case GLSL_TYPE_UINT:
>>for (i = 0; i < total_elems; i++)
>> @@ -265,6 +267,8 @@ constant_copy(ir_constant *ir, void *mem_ctx)
>> case GLSL_TYPE_STRUCT:
>>ret->elements = ralloc_array(mem_ctx, nir_constant *,
>> ir->type->length);
>> +  ret->num_elements = ir->type->length;
>> +
>>i = 0;
>>foreach_in_list(ir_constant, field, &ir->components) {
>>   ret->elements[i] = constant_copy(field, mem_ctx);
>> @@ -275,6 +279,7 @@ constant_copy(ir_constant *ir, void *mem_ctx)
>> case GLSL_TYPE_ARRAY:
>>ret->elements = ralloc_array(mem_ctx, nir_constant *,
>> ir->type->length);
>> +  ret->num_elements = ir->type->length;
>>
>>for (i = 0; i < ir->type->length; i++)
>>   ret->elements[i] = constant_copy(ir->array_elements[i], mem_ctx);
>> diff --git a/src/glsl/nir/nir.h b/src/glsl/nir/nir.h
>> index 6ffa60b..f99af4e 100644
>> --- a/src/glsl/nir/nir.h
>> +++ b/src/glsl/nir/nir.h
>> @@ -111,6 +111,11 @@ typedef struct nir_constant {
>>  */
>> union nir_constant_data value;
>>
>> +   /* we could get this from the var->type but makes clone *much* easier to
>> +* not have to care about the type.
>> +*/
>> +   unsigned num_elements;
>> +
>> /* Array elements / Structure Fields */
>> struct nir_constant **elements;
>>  } nir_constant;
>> --
>> 2.5.0.400.gff86faf
>>
>> ___
>> mesa-dev mailing list
>> mesa-dev@lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 00/11] i965/nir: Do texture rectangle lowering in NIR

2015-11-14 Thread Rob Clark

On Sat, Nov 14, 2015 at 11:01 AM, Jason Ekstrand  wrote:
> On Thu, Nov 12, 2015 at 7:30 AM, Iago Toral  wrote:
>> On Thu, 2015-11-12 at 16:23 +0100, Iago Toral wrote:
>>> Patches 1-4 are,
>>> Reviewed-by: Iago Toral Quiroga 
>>>
>>> Patch 5 seems to be missing.
>
> If it helps to calm reviewer's minds, I ran patches 1-5 with this patch on 
> top:
>
> http://cgit.freedesktop.org/~jekstrand/mesa/commit/?h=wip/nir-clone
>
> Zero regressions in piglit, dEQP, and the CTS.

imho, please push something like this to master, w/ perhaps an env-var
switch (ofc just for debug builds)..  this way we can work nir_clone
testing into normal CI test cycle, and protect against future
difficult-to-track-down breakage

(and you can even pre-emptively slap my r-b on that, since I'm happy
however that is accomplished..)

BR,
-R

>> Oh never mind, I've just seen your reply to the thread pointing to the
>> repository.
>>
>> Iago
>>
>>> Iago
>>>
>>> On Wed, 2015-11-11 at 17:23 -0800, Jason Ekstrand wrote:
>>> > On older hardware (Iron Lake and below), we can't support texture 
>>> > rectangle
>>> > natively.  Sandy Bridge through Haswell can support it but don't support
>>> > the GL_CLAMP wrap mode natively.  It isn't until Broadwell that GL_CLAMP 
>>> > is
>>> > supported together with GL_TEXTURE_RECTANGLE in hardware.  In the cases
>>> > where it isn't supported, we have to fake it by dividing by the texture
>>> > size.
>>> >
>>> > Previously, we had a rescale_texcoord function added a uniform to hold the
>>> > texture coordinate and used that to rescale/clamp the texture coordinates.
>>> > For a while now, nir_lower_tex has been able to lower texture rectangle to
>>> > a textureSize and a regular texture2D operation.  This series makes i965
>>> > use the nir_lower_tex path instead.  Incidentally, this fixes texture
>>> > rectangle support in vertex and geometry shaders on Haswell and below.
>>> > (The backend lowering was only ever done in the FS backend.)
>>> >
>>> > Since this is the first time we're doing any sort of shader variants in
>>> > NIR, the first several passes add the infastructure to do so.  Two of 
>>> > these
>>> > patches are from Ken, two are from Rob, and one (nir_clone itself) is my
>>> > rendition but heavily based on what Rob did only with less hashing.
>>> >
>>> > Jason Ekstrand (7):
>>> >   nir: support to clone shaders
>>> >   i965/nir: Split shader optimization and lowering into three satages
>>> >   i965: Move postprocess_nir to codegen time
>>> >   nir/lower_tex: Report progress
>>> >   nir/lower_tex: Set the dest_type for txs instructions
>>> >   i965/fs: Don't allow SINT32 as a return type for resinfo
>>> >   i965: Use nir_lower_tex for texture coordinate lowering
>>> >
>>> > Kenneth Graunke (2):
>>> >   i965/nir: Add OPT() and OPT_V() macros for invoking NIR passes.
>>> >   i965/nir: Validate that NIR passes call nir_metadata_preserve().
>>> >
>>> > Rob Clark (2):
>>> >   nir: remove nir_variable::max_ifc_array_access
>>> >   nir: add array length field
>>> >
>>> >  src/glsl/Makefile.sources |   1 +
>>> >  src/glsl/nir/glsl_to_nir.cpp  |  14 +-
>>> >  src/glsl/nir/nir.c|   8 +
>>> >  src/glsl/nir/nir.h|  27 +-
>>> >  src/glsl/nir/nir_clone.c  | 671 
>>> > ++
>>> >  src/glsl/nir/nir_lower_tex.c  |  20 +-
>>> >  src/glsl/nir/nir_metadata.c   |  36 ++
>>> >  src/mesa/drivers/dri/i965/brw_fs.cpp  |  13 +-
>>> >  src/mesa/drivers/dri/i965/brw_fs.h|   3 -
>>> >  src/mesa/drivers/dri/i965/brw_fs_generator.cpp|  10 +-
>>> >  src/mesa/drivers/dri/i965/brw_fs_nir.cpp  |   4 +-
>>> >  src/mesa/drivers/dri/i965/brw_fs_visitor.cpp  | 125 
>>> >  src/mesa/drivers/dri/i965/brw_nir.c   | 268 +
>>> >  src/mesa/drivers/dri/i965/brw_nir.h   |  15 +
>>> >  src/mesa/drivers/dri/i965/brw_vec4.cpp|   7 +-
>>> >  src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp |   8 +-
>>> >  16 files changed, 966 insertions(+), 264 deletions(-)
>>> >  create mode 100644 src/glsl/nir/nir_clone.c
>>> >
>>>
>>
>>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 02/11] nir: add array length field

2015-11-14 Thread Connor Abbott

On Sat, Nov 14, 2015 at 11:55 AM, Jason Ekstrand  wrote:
> On Sat, Nov 14, 2015 at 8:19 AM, Connor Abbott  wrote:
>> I think it would be better if we kept track of the type of the
>> constant instead. That would also allow us to simplify the constant
>> construction code in, err, something else...
>
> We do keep track of the type.  It's in the variable.  Do you mean put
> a type field in nir_constant?

Yes.

>
>> On Wed, Nov 11, 2015 at 8:23 PM, Jason Ekstrand  wrote:
>>> From: Rob Clark 
>>>
>>> This will simplify things somewhat in clone.
>>>
>>> Signed-off-by: Rob Clark 
>>> Reviewed-by: Jason Ekstrand 
>>> ---
>>>  src/glsl/nir/glsl_to_nir.cpp | 5 +
>>>  src/glsl/nir/nir.h   | 5 +
>>>  2 files changed, 10 insertions(+)
>>>
>>> diff --git a/src/glsl/nir/glsl_to_nir.cpp b/src/glsl/nir/glsl_to_nir.cpp
>>> index 8e53e22..13fa987 100644
>>> --- a/src/glsl/nir/glsl_to_nir.cpp
>>> +++ b/src/glsl/nir/glsl_to_nir.cpp
>>> @@ -241,6 +241,8 @@ constant_copy(ir_constant *ir, void *mem_ctx)
>>>
>>> unsigned total_elems = ir->type->components();
>>> unsigned i;
>>> +
>>> +   ret->num_elements = 0;
>>> switch (ir->type->base_type) {
>>> case GLSL_TYPE_UINT:
>>>for (i = 0; i < total_elems; i++)
>>> @@ -265,6 +267,8 @@ constant_copy(ir_constant *ir, void *mem_ctx)
>>> case GLSL_TYPE_STRUCT:
>>>ret->elements = ralloc_array(mem_ctx, nir_constant *,
>>> ir->type->length);
>>> +  ret->num_elements = ir->type->length;
>>> +
>>>i = 0;
>>>foreach_in_list(ir_constant, field, &ir->components) {
>>>   ret->elements[i] = constant_copy(field, mem_ctx);
>>> @@ -275,6 +279,7 @@ constant_copy(ir_constant *ir, void *mem_ctx)
>>> case GLSL_TYPE_ARRAY:
>>>ret->elements = ralloc_array(mem_ctx, nir_constant *,
>>> ir->type->length);
>>> +  ret->num_elements = ir->type->length;
>>>
>>>for (i = 0; i < ir->type->length; i++)
>>>   ret->elements[i] = constant_copy(ir->array_elements[i], mem_ctx);
>>> diff --git a/src/glsl/nir/nir.h b/src/glsl/nir/nir.h
>>> index 6ffa60b..f99af4e 100644
>>> --- a/src/glsl/nir/nir.h
>>> +++ b/src/glsl/nir/nir.h
>>> @@ -111,6 +111,11 @@ typedef struct nir_constant {
>>>  */
>>> union nir_constant_data value;
>>>
>>> +   /* we could get this from the var->type but makes clone *much* easier to
>>> +* not have to care about the type.
>>> +*/
>>> +   unsigned num_elements;
>>> +
>>> /* Array elements / Structure Fields */
>>> struct nir_constant **elements;
>>>  } nir_constant;
>>> --
>>> 2.5.0.400.gff86faf
>>>
>>> ___
>>> mesa-dev mailing list
>>> mesa-dev@lists.freedesktop.org
>>> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 02/11] nir: add array length field

2015-11-14 Thread Connor Abbott

On Sat, Nov 14, 2015 at 11:55 AM, Rob Clark  wrote:
> well, clone just needs to know the number of elements, so this is the
> simplest possible solution.. not against tracking the type as well, if
> that is needed elsewhere.. or if there was a helper to map type to #
> of elements, I suppose, but for now this makes clone possible.

glsl_get_array_length() will do the right thing for everything except
matrices and vectors/scalars, so it should be possible to get the
number of elements in only a few lines.

>
> BR,
> -R
>
> On Sat, Nov 14, 2015 at 11:19 AM, Connor Abbott  wrote:
>> I think it would be better if we kept track of the type of the
>> constant instead. That would also allow us to simplify the constant
>> construction code in, err, something else...
>>
>> On Wed, Nov 11, 2015 at 8:23 PM, Jason Ekstrand  wrote:
>>> From: Rob Clark 
>>>
>>> This will simplify things somewhat in clone.
>>>
>>> Signed-off-by: Rob Clark 
>>> Reviewed-by: Jason Ekstrand 
>>> ---
>>>  src/glsl/nir/glsl_to_nir.cpp | 5 +
>>>  src/glsl/nir/nir.h   | 5 +
>>>  2 files changed, 10 insertions(+)
>>>
>>> diff --git a/src/glsl/nir/glsl_to_nir.cpp b/src/glsl/nir/glsl_to_nir.cpp
>>> index 8e53e22..13fa987 100644
>>> --- a/src/glsl/nir/glsl_to_nir.cpp
>>> +++ b/src/glsl/nir/glsl_to_nir.cpp
>>> @@ -241,6 +241,8 @@ constant_copy(ir_constant *ir, void *mem_ctx)
>>>
>>> unsigned total_elems = ir->type->components();
>>> unsigned i;
>>> +
>>> +   ret->num_elements = 0;
>>> switch (ir->type->base_type) {
>>> case GLSL_TYPE_UINT:
>>>for (i = 0; i < total_elems; i++)
>>> @@ -265,6 +267,8 @@ constant_copy(ir_constant *ir, void *mem_ctx)
>>> case GLSL_TYPE_STRUCT:
>>>ret->elements = ralloc_array(mem_ctx, nir_constant *,
>>> ir->type->length);
>>> +  ret->num_elements = ir->type->length;
>>> +
>>>i = 0;
>>>foreach_in_list(ir_constant, field, &ir->components) {
>>>   ret->elements[i] = constant_copy(field, mem_ctx);
>>> @@ -275,6 +279,7 @@ constant_copy(ir_constant *ir, void *mem_ctx)
>>> case GLSL_TYPE_ARRAY:
>>>ret->elements = ralloc_array(mem_ctx, nir_constant *,
>>> ir->type->length);
>>> +  ret->num_elements = ir->type->length;
>>>
>>>for (i = 0; i < ir->type->length; i++)
>>>   ret->elements[i] = constant_copy(ir->array_elements[i], mem_ctx);
>>> diff --git a/src/glsl/nir/nir.h b/src/glsl/nir/nir.h
>>> index 6ffa60b..f99af4e 100644
>>> --- a/src/glsl/nir/nir.h
>>> +++ b/src/glsl/nir/nir.h
>>> @@ -111,6 +111,11 @@ typedef struct nir_constant {
>>>  */
>>> union nir_constant_data value;
>>>
>>> +   /* we could get this from the var->type but makes clone *much* easier to
>>> +* not have to care about the type.
>>> +*/
>>> +   unsigned num_elements;
>>> +
>>> /* Array elements / Structure Fields */
>>> struct nir_constant **elements;
>>>  } nir_constant;
>>> --
>>> 2.5.0.400.gff86faf
>>>
>>> ___
>>> mesa-dev mailing list
>>> mesa-dev@lists.freedesktop.org
>>> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 00/11] i965/nir: Do texture rectangle lowering in NIR

2015-11-14 Thread Jason Ekstrand

On Sat, Nov 14, 2015 at 8:58 AM, Rob Clark  wrote:
> On Sat, Nov 14, 2015 at 11:01 AM, Jason Ekstrand  wrote:
>> On Thu, Nov 12, 2015 at 7:30 AM, Iago Toral  wrote:
>>> On Thu, 2015-11-12 at 16:23 +0100, Iago Toral wrote:
 Patches 1-4 are,
 Reviewed-by: Iago Toral Quiroga 

 Patch 5 seems to be missing.
>>
>> If it helps to calm reviewer's minds, I ran patches 1-5 with this patch on 
>> top:
>>
>> http://cgit.freedesktop.org/~jekstrand/mesa/commit/?h=wip/nir-clone
>>
>> Zero regressions in piglit, dEQP, and the CTS.
>
> imho, please push something like this to master, w/ perhaps an env-var
> switch (ofc just for debug builds)..  this way we can work nir_clone
> testing into normal CI test cycle, and protect against future
> difficult-to-track-down breakage

I thought about doing that but it didn't really work very well with
patch 6.  Also, by the time we get to patch 7, it's getting tested
pretty well. About the only thing that doesn't get tested there is
registers.  I'm not opposed to adding support for testing it in CI,
but I don't want to dirty up an API to do so if it can be avoided.
Would you be ok with cloning in a few key places?

--Jason

> (and you can even pre-emptively slap my r-b on that, since I'm happy
> however that is accomplished..)
>
> BR,
> -R
>
>>> Oh never mind, I've just seen your reply to the thread pointing to the
>>> repository.
>>>
>>> Iago
>>>
 Iago

 On Wed, 2015-11-11 at 17:23 -0800, Jason Ekstrand wrote:
 > On older hardware (Iron Lake and below), we can't support texture 
 > rectangle
 > natively.  Sandy Bridge through Haswell can support it but don't support
 > the GL_CLAMP wrap mode natively.  It isn't until Broadwell that GL_CLAMP 
 > is
 > supported together with GL_TEXTURE_RECTANGLE in hardware.  In the cases
 > where it isn't supported, we have to fake it by dividing by the texture
 > size.
 >
 > Previously, we had a rescale_texcoord function added a uniform to hold 
 > the
 > texture coordinate and used that to rescale/clamp the texture 
 > coordinates.
 > For a while now, nir_lower_tex has been able to lower texture rectangle 
 > to
 > a textureSize and a regular texture2D operation.  This series makes i965
 > use the nir_lower_tex path instead.  Incidentally, this fixes texture
 > rectangle support in vertex and geometry shaders on Haswell and below.
 > (The backend lowering was only ever done in the FS backend.)
 >
 > Since this is the first time we're doing any sort of shader variants in
 > NIR, the first several passes add the infastructure to do so.  Two of 
 > these
 > patches are from Ken, two are from Rob, and one (nir_clone itself) is my
 > rendition but heavily based on what Rob did only with less hashing.
 >
 > Jason Ekstrand (7):
 >   nir: support to clone shaders
 >   i965/nir: Split shader optimization and lowering into three satages
 >   i965: Move postprocess_nir to codegen time
 >   nir/lower_tex: Report progress
 >   nir/lower_tex: Set the dest_type for txs instructions
 >   i965/fs: Don't allow SINT32 as a return type for resinfo
 >   i965: Use nir_lower_tex for texture coordinate lowering
 >
 > Kenneth Graunke (2):
 >   i965/nir: Add OPT() and OPT_V() macros for invoking NIR passes.
 >   i965/nir: Validate that NIR passes call nir_metadata_preserve().
 >
 > Rob Clark (2):
 >   nir: remove nir_variable::max_ifc_array_access
 >   nir: add array length field
 >
 >  src/glsl/Makefile.sources |   1 +
 >  src/glsl/nir/glsl_to_nir.cpp  |  14 +-
 >  src/glsl/nir/nir.c|   8 +
 >  src/glsl/nir/nir.h|  27 +-
 >  src/glsl/nir/nir_clone.c  | 671 
 > ++
 >  src/glsl/nir/nir_lower_tex.c  |  20 +-
 >  src/glsl/nir/nir_metadata.c   |  36 ++
 >  src/mesa/drivers/dri/i965/brw_fs.cpp  |  13 +-
 >  src/mesa/drivers/dri/i965/brw_fs.h|   3 -
 >  src/mesa/drivers/dri/i965/brw_fs_generator.cpp|  10 +-
 >  src/mesa/drivers/dri/i965/brw_fs_nir.cpp  |   4 +-
 >  src/mesa/drivers/dri/i965/brw_fs_visitor.cpp  | 125 
 >  src/mesa/drivers/dri/i965/brw_nir.c   | 268 +
 >  src/mesa/drivers/dri/i965/brw_nir.h   |  15 +
 >  src/mesa/drivers/dri/i965/brw_vec4.cpp|   7 +-
 >  src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp |   8 +-
 >  16 files changed, 966 insertions(+), 264 deletions(-)
 >  create mode 100644 src/glsl/nir/nir_clone.c
 >

>>>
>>>
>> ___
>> mesa-dev mailing list
>> mesa-dev@lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
__

Re: [Mesa-dev] [PATCH 02/11] nir: add array length field

2015-11-14 Thread Jason Ekstrand

On Sat, Nov 14, 2015 at 9:25 AM, Connor Abbott  wrote:
> On Sat, Nov 14, 2015 at 11:55 AM, Rob Clark  wrote:
>> well, clone just needs to know the number of elements, so this is the
>> simplest possible solution.. not against tracking the type as well, if
>> that is needed elsewhere.. or if there was a helper to map type to #
>> of elements, I suppose, but for now this makes clone possible.
>
> glsl_get_array_length() will do the right thing for everything except
> matrices and vectors/scalars, so it should be possible to get the
> number of elements in only a few lines.

Actually, it works for matrices too so it should "just work"
--Jason

>>
>> BR,
>> -R
>>
>> On Sat, Nov 14, 2015 at 11:19 AM, Connor Abbott  wrote:
>>> I think it would be better if we kept track of the type of the
>>> constant instead. That would also allow us to simplify the constant
>>> construction code in, err, something else...
>>>
>>> On Wed, Nov 11, 2015 at 8:23 PM, Jason Ekstrand  
>>> wrote:
 From: Rob Clark 

 This will simplify things somewhat in clone.

 Signed-off-by: Rob Clark 
 Reviewed-by: Jason Ekstrand 
 ---
  src/glsl/nir/glsl_to_nir.cpp | 5 +
  src/glsl/nir/nir.h   | 5 +
  2 files changed, 10 insertions(+)

 diff --git a/src/glsl/nir/glsl_to_nir.cpp b/src/glsl/nir/glsl_to_nir.cpp
 index 8e53e22..13fa987 100644
 --- a/src/glsl/nir/glsl_to_nir.cpp
 +++ b/src/glsl/nir/glsl_to_nir.cpp
 @@ -241,6 +241,8 @@ constant_copy(ir_constant *ir, void *mem_ctx)

 unsigned total_elems = ir->type->components();
 unsigned i;
 +
 +   ret->num_elements = 0;
 switch (ir->type->base_type) {
 case GLSL_TYPE_UINT:
for (i = 0; i < total_elems; i++)
 @@ -265,6 +267,8 @@ constant_copy(ir_constant *ir, void *mem_ctx)
 case GLSL_TYPE_STRUCT:
ret->elements = ralloc_array(mem_ctx, nir_constant *,
 ir->type->length);
 +  ret->num_elements = ir->type->length;
 +
i = 0;
foreach_in_list(ir_constant, field, &ir->components) {
   ret->elements[i] = constant_copy(field, mem_ctx);
 @@ -275,6 +279,7 @@ constant_copy(ir_constant *ir, void *mem_ctx)
 case GLSL_TYPE_ARRAY:
ret->elements = ralloc_array(mem_ctx, nir_constant *,
 ir->type->length);
 +  ret->num_elements = ir->type->length;

for (i = 0; i < ir->type->length; i++)
   ret->elements[i] = constant_copy(ir->array_elements[i], mem_ctx);
 diff --git a/src/glsl/nir/nir.h b/src/glsl/nir/nir.h
 index 6ffa60b..f99af4e 100644
 --- a/src/glsl/nir/nir.h
 +++ b/src/glsl/nir/nir.h
 @@ -111,6 +111,11 @@ typedef struct nir_constant {
  */
 union nir_constant_data value;

 +   /* we could get this from the var->type but makes clone *much* easier 
 to
 +* not have to care about the type.
 +*/
 +   unsigned num_elements;
 +
 /* Array elements / Structure Fields */
 struct nir_constant **elements;
  } nir_constant;
 --
 2.5.0.400.gff86faf

 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 00/11] i965/nir: Do texture rectangle lowering in NIR

2015-11-14 Thread Jason Ekstrand

On Sat, Nov 14, 2015 at 9:44 AM, Rob Clark  wrote:
> On Sat, Nov 14, 2015 at 12:30 PM, Jason Ekstrand  wrote:
>> On Sat, Nov 14, 2015 at 8:58 AM, Rob Clark  wrote:
>>> On Sat, Nov 14, 2015 at 11:01 AM, Jason Ekstrand  
>>> wrote:
 On Thu, Nov 12, 2015 at 7:30 AM, Iago Toral  wrote:
> On Thu, 2015-11-12 at 16:23 +0100, Iago Toral wrote:
>> Patches 1-4 are,
>> Reviewed-by: Iago Toral Quiroga 
>>
>> Patch 5 seems to be missing.

 If it helps to calm reviewer's minds, I ran patches 1-5 with this patch on 
 top:

 http://cgit.freedesktop.org/~jekstrand/mesa/commit/?h=wip/nir-clone

 Zero regressions in piglit, dEQP, and the CTS.
>>>
>>> imho, please push something like this to master, w/ perhaps an env-var
>>> switch (ofc just for debug builds)..  this way we can work nir_clone
>>> testing into normal CI test cycle, and protect against future
>>> difficult-to-track-down breakage
>>
>> I thought about doing that but it didn't really work very well with
>> patch 6.  Also, by the time we get to patch 7, it's getting tested
>> pretty well. About the only thing that doesn't get tested there is
>> registers.  I'm not opposed to adding support for testing it in CI,
>> but I don't want to dirty up an API to do so if it can be avoided.
>> Would you be ok with cloning in a few key places?
>
> Well, I prefer testing between each stage.. it's a little brute-force,
> but it ensures we don't miss something that only appears between
> certain stages, now or in the future.  The few-key-places approach is
> certainly better than nothing.
>
> I guess the 'dirty up an API' bit referred to returning nir_shader's?
> I don't think that is *that* horrible a price to pay..

I'm more concerned about the fact that you get out a pointer that may
or may not be the one you passed in and we may or may not have deleted
the one you passed in and, to make things better, if you accidentally
ignore the return value it will work fine unless INTEL_NIR_CLONE is
enabled.  I think we're going to find ourselves breaking the nir_clone
testing code more often than breaking nir_clone.
--Jason

> BR,
> -R
>
>> --Jason
>>
>>> (and you can even pre-emptively slap my r-b on that, since I'm happy
>>> however that is accomplished..)
>>>
>>> BR,
>>> -R
>>>
> Oh never mind, I've just seen your reply to the thread pointing to the
> repository.
>
> Iago
>
>> Iago
>>
>> On Wed, 2015-11-11 at 17:23 -0800, Jason Ekstrand wrote:
>> > On older hardware (Iron Lake and below), we can't support texture 
>> > rectangle
>> > natively.  Sandy Bridge through Haswell can support it but don't 
>> > support
>> > the GL_CLAMP wrap mode natively.  It isn't until Broadwell that 
>> > GL_CLAMP is
>> > supported together with GL_TEXTURE_RECTANGLE in hardware.  In the cases
>> > where it isn't supported, we have to fake it by dividing by the texture
>> > size.
>> >
>> > Previously, we had a rescale_texcoord function added a uniform to hold 
>> > the
>> > texture coordinate and used that to rescale/clamp the texture 
>> > coordinates.
>> > For a while now, nir_lower_tex has been able to lower texture 
>> > rectangle to
>> > a textureSize and a regular texture2D operation.  This series makes 
>> > i965
>> > use the nir_lower_tex path instead.  Incidentally, this fixes texture
>> > rectangle support in vertex and geometry shaders on Haswell and below.
>> > (The backend lowering was only ever done in the FS backend.)
>> >
>> > Since this is the first time we're doing any sort of shader variants in
>> > NIR, the first several passes add the infastructure to do so.  Two of 
>> > these
>> > patches are from Ken, two are from Rob, and one (nir_clone itself) is 
>> > my
>> > rendition but heavily based on what Rob did only with less hashing.
>> >
>> > Jason Ekstrand (7):
>> >   nir: support to clone shaders
>> >   i965/nir: Split shader optimization and lowering into three satages
>> >   i965: Move postprocess_nir to codegen time
>> >   nir/lower_tex: Report progress
>> >   nir/lower_tex: Set the dest_type for txs instructions
>> >   i965/fs: Don't allow SINT32 as a return type for resinfo
>> >   i965: Use nir_lower_tex for texture coordinate lowering
>> >
>> > Kenneth Graunke (2):
>> >   i965/nir: Add OPT() and OPT_V() macros for invoking NIR passes.
>> >   i965/nir: Validate that NIR passes call nir_metadata_preserve().
>> >
>> > Rob Clark (2):
>> >   nir: remove nir_variable::max_ifc_array_access
>> >   nir: add array length field
>> >
>> >  src/glsl/Makefile.sources |   1 +
>> >  src/glsl/nir/glsl_to_nir.cpp  |  14 +-
>> >  src/glsl/nir/nir.c|   8 +
>> >  src/glsl/nir/nir.h|  27 +-
>> >  src/glsl/nir/

Re: [Mesa-dev] [PATCH 02/11] nir: add array length field

2015-11-14 Thread Jason Ekstrand

On Sat, Nov 14, 2015 at 9:30 AM, Jason Ekstrand  wrote:
> On Sat, Nov 14, 2015 at 9:25 AM, Connor Abbott  wrote:
>> On Sat, Nov 14, 2015 at 11:55 AM, Rob Clark  wrote:
>>> well, clone just needs to know the number of elements, so this is the
>>> simplest possible solution.. not against tracking the type as well, if
>>> that is needed elsewhere.. or if there was a helper to map type to #
>>> of elements, I suppose, but for now this makes clone possible.
>>
>> glsl_get_array_length() will do the right thing for everything except
>> matrices and vectors/scalars, so it should be possible to get the
>> number of elements in only a few lines.
>
> Actually, it works for matrices too so it should "just work"
> --Jason

Here's what it looks like if we use a type instead:

http://cgit.freedesktop.org/~jekstrand/mesa/commit/?h=wip/i965-nir-variants&id=508f58cedb130a3726ec2c1c7ed892cce2bfd08d

I kicked it off to CI but I'm pretty sure that's sufficient.
--Jason

>>>
>>> BR,
>>> -R
>>>
>>> On Sat, Nov 14, 2015 at 11:19 AM, Connor Abbott  wrote:
 I think it would be better if we kept track of the type of the
 constant instead. That would also allow us to simplify the constant
 construction code in, err, something else...

 On Wed, Nov 11, 2015 at 8:23 PM, Jason Ekstrand  
 wrote:
> From: Rob Clark 
>
> This will simplify things somewhat in clone.
>
> Signed-off-by: Rob Clark 
> Reviewed-by: Jason Ekstrand 
> ---
>  src/glsl/nir/glsl_to_nir.cpp | 5 +
>  src/glsl/nir/nir.h   | 5 +
>  2 files changed, 10 insertions(+)
>
> diff --git a/src/glsl/nir/glsl_to_nir.cpp b/src/glsl/nir/glsl_to_nir.cpp
> index 8e53e22..13fa987 100644
> --- a/src/glsl/nir/glsl_to_nir.cpp
> +++ b/src/glsl/nir/glsl_to_nir.cpp
> @@ -241,6 +241,8 @@ constant_copy(ir_constant *ir, void *mem_ctx)
>
> unsigned total_elems = ir->type->components();
> unsigned i;
> +
> +   ret->num_elements = 0;
> switch (ir->type->base_type) {
> case GLSL_TYPE_UINT:
>for (i = 0; i < total_elems; i++)
> @@ -265,6 +267,8 @@ constant_copy(ir_constant *ir, void *mem_ctx)
> case GLSL_TYPE_STRUCT:
>ret->elements = ralloc_array(mem_ctx, nir_constant *,
> ir->type->length);
> +  ret->num_elements = ir->type->length;
> +
>i = 0;
>foreach_in_list(ir_constant, field, &ir->components) {
>   ret->elements[i] = constant_copy(field, mem_ctx);
> @@ -275,6 +279,7 @@ constant_copy(ir_constant *ir, void *mem_ctx)
> case GLSL_TYPE_ARRAY:
>ret->elements = ralloc_array(mem_ctx, nir_constant *,
> ir->type->length);
> +  ret->num_elements = ir->type->length;
>
>for (i = 0; i < ir->type->length; i++)
>   ret->elements[i] = constant_copy(ir->array_elements[i], 
> mem_ctx);
> diff --git a/src/glsl/nir/nir.h b/src/glsl/nir/nir.h
> index 6ffa60b..f99af4e 100644
> --- a/src/glsl/nir/nir.h
> +++ b/src/glsl/nir/nir.h
> @@ -111,6 +111,11 @@ typedef struct nir_constant {
>  */
> union nir_constant_data value;
>
> +   /* we could get this from the var->type but makes clone *much* easier 
> to
> +* not have to care about the type.
> +*/
> +   unsigned num_elements;
> +
> /* Array elements / Structure Fields */
> struct nir_constant **elements;
>  } nir_constant;
> --
> 2.5.0.400.gff86faf
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v3 2/3] i965: Introduce a MOV_INDIRECT opcode.

2015-11-14 Thread Jason Ekstrand

On Fri, Nov 13, 2015 at 6:50 PM, Kenneth Graunke  wrote:
> The geometry and tessellation control shader stages both read from
> multiple URB entries (one per vertex).  The thread payload contains
> several URB handles which reference these separate memory segments.
>
> In GLSL, these inputs are represented as per-vertex arrays; the
> outermost array index selects which vertex's inputs to read.  This
> array index does not necessarily need to be constant.
>
> To handle that, we need to use indirect addressing on GRFs to select
> which of the thread payload registers has the appropriate URB handle.
> (This is before we can even think about applying the pull model!)
>
> This patch introduces a new opcode which performs a MOV from a
> source using VxH indirect addressing (which allows each of the 8
> SIMD channels to select distinct data.)
>
> Based on a patch by Jason Ekstrand.
>
> v2: Rename from INDIRECT_THREAD_PAYLOAD_MOV to MOV_INDIRECT; make it
> a bit more generic.  Use regs_read() instead of hacking up the
> register allocator.  (Suggested by Jason Ekstrand.)
>
> v3: Fix regs_read() to be more accurate for small unaligned regions.
> Also rebase on Matt's work.
>
> Signed-off-by: Kenneth Graunke 
> Reviewed-by: Abdiel Janulgue  [v1]
>
> stash
> ---
>  src/mesa/drivers/dri/i965/brw_defines.h| 10 
>  src/mesa/drivers/dri/i965/brw_fs.cpp   | 28 +
>  src/mesa/drivers/dri/i965/brw_fs.h |  5 
>  src/mesa/drivers/dri/i965/brw_fs_cse.cpp   |  1 +
>  src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 34 
> ++
>  src/mesa/drivers/dri/i965/brw_shader.cpp   |  2 ++
>  6 files changed, 80 insertions(+)
>
> Here's an updated version.  I changed the description of source 2 to be
> "the length of the region that could be accessed, in bytes" rather than
> "the maximum value of the indirect offset" (aka the largest possible value
> of source 1), as that's what I actually wanted - it's much easier to use.
>
> (Notably, the largest possible value of the indirect offset means that
> it affects the maximum possible base of the read...but you still read
> type_sz() bytes beyond that...which is ugly.  Size of the region is
> much easier to use, as base + size = end...and you don't read beyond
> the end.)
>
> This is all getting into dire "but isn't it off by 1?" territory, so
> here are some examples demonstrating how one might use this:
>
> "I want to read some .xyzw component of a vec4 stored in r7.4."
>
> src[0] = r7.4F
> src[1] = 
> src[2] = 16 bytes
>
> Because subnr != 0, we calculate subnr * type_sz, and get:
>
>  subnr * sizeof(float) = 4 * 4 = 16 bytes
>
> We add this to region_length to get a total size of 32 bytes.
> DIV_ROUND_UP(32, REG_SIZE) = 32 / 32 = 1 register read.
>
> Correct, because we're only reading the last 4 channels of r7,
> so our region is contained within 1 register.
>
> "I want to read up to 6 UD channels starting at r7.7."
>
> src[0] = r7.1UD
> src[1] = 
> src[2] = 6 * sizeof(UD) = 24 bytes
>
> Because subnr != 0, we calculate subnr * type_sz, and get:
>
>  subnr * sizeof(UD) = 7 * 4 = 28 bytes.
>
> We add this to region_length to get a total of 24 + 28 = 52 bytes.
> DIV_ROUND_UP(52, REG_SIZE) = ceil(52 / 32) = 2
>
> Correct, because we read part of r7 and part of r8.
>
> "I want to get the ICP handles for vertex 2 out of 6 vertices."
>
> Let's assume the first ICP handle starts at r3.
>
> The ICP handles for vertex N take up whole register, so we would pick
> src[0] = r(first icp handle)<8,8,1>UD
> src[1] = <7,6,5,4,3,2,1> * 4 + 2 * REG_SIZE
> src[2] = 6 vertices * REG_SIZE = 192 bytes
>
> subnr == 0, so we don't add anything; region_length stays as 192.
> DIV_ROUND_UP(192, 32) = 6.  We potentially read r3, r4, r5, r6, r7, r8,
> which contain the ICP handles for vertex 0, 1, 2, 3, 4, 5 - 6 vertices.
>
> So I think this will work.

Yeah, this all seems totally reasonable.

Reviewed-by: Jason Ekstrand 

Thanks for taking the time to get this right!
--Jason

> diff --git a/src/mesa/drivers/dri/i965/brw_defines.h 
> b/src/mesa/drivers/dri/i965/brw_defines.h
> index 6484484..0b8de63 100644
> --- a/src/mesa/drivers/dri/i965/brw_defines.h
> +++ b/src/mesa/drivers/dri/i965/brw_defines.h
> @@ -1289,6 +1289,16 @@ enum opcode {
>  * Calculate the high 32-bits of a 32x32 multiply.
>  */
> SHADER_OPCODE_MULH,
> +
> +   /**
> +* A MOV that uses VxH indirect addressing.
> +*
> +* Source 0: A register to start from (HW_REG).
> +* Source 1: An indirect offset (in bytes, UD GRF).
> +* Source 2: The length of the region that could be accessed (in bytes,
> +*   UD immediate).
> +*/
> +   SHADER_OPCODE_MOV_INDIRECT,
>  };
>
>  enum brw_urb_write_flags {
> diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
> b/src/mesa/drivers/dri/i965/brw_fs.cpp
> index 80b8c8e..

[Mesa-dev] [PATCH v2 3/3] nv50: add support for performance metrics on G84+

2015-11-14 Thread Samuel Pitoiset

Currently only one metric is exposed but more will be added later.

Signed-off-by: Samuel Pitoiset 
Tested-by: Pierre Moreau 
---
 src/gallium/drivers/nouveau/Makefile.sources   |   2 +
 src/gallium/drivers/nouveau/nv50/nv50_query_hw.c   |  19 +-
 .../drivers/nouveau/nv50/nv50_query_hw_metric.c| 207 +
 .../drivers/nouveau/nv50/nv50_query_hw_metric.h|  34 
 4 files changed, 259 insertions(+), 3 deletions(-)
 create mode 100644 src/gallium/drivers/nouveau/nv50/nv50_query_hw_metric.c
 create mode 100644 src/gallium/drivers/nouveau/nv50/nv50_query_hw_metric.h

diff --git a/src/gallium/drivers/nouveau/Makefile.sources 
b/src/gallium/drivers/nouveau/Makefile.sources
index a1aa135..12821a6 100644
--- a/src/gallium/drivers/nouveau/Makefile.sources
+++ b/src/gallium/drivers/nouveau/Makefile.sources
@@ -77,6 +77,8 @@ NV50_C_SOURCES := \
nv50/nv50_query.h \
nv50/nv50_query_hw.c \
nv50/nv50_query_hw.h \
+   nv50/nv50_query_hw_metric.c \
+   nv50/nv50_query_hw_metric.h \
nv50/nv50_query_hw_sm.c \
nv50/nv50_query_hw_sm.h \
nv50/nv50_resource.c \
diff --git a/src/gallium/drivers/nouveau/nv50/nv50_query_hw.c 
b/src/gallium/drivers/nouveau/nv50/nv50_query_hw.c
index 23108ac..b6ebbbf 100644
--- a/src/gallium/drivers/nouveau/nv50/nv50_query_hw.c
+++ b/src/gallium/drivers/nouveau/nv50/nv50_query_hw.c
@@ -25,6 +25,7 @@
 
 #include "nv50/nv50_context.h"
 #include "nv50/nv50_query_hw.h"
+#include "nv50/nv50_query_hw_metric.h"
 #include "nv50/nv50_query_hw_sm.h"
 #include "nv_object.xml.h"
 
@@ -349,6 +350,12 @@ nv50_hw_create_query(struct nv50_context *nv50, unsigned 
type, unsigned index)
   return (struct nv50_query *)hq;
}
 
+   hq = nv50_hw_metric_create_query(nv50, type);
+   if (hq) {
+  hq->base.funcs = &hw_query_funcs;
+  return (struct nv50_query *)hq;
+   }
+
hq = CALLOC_STRUCT(nv50_hw_query);
if (!hq)
   return NULL;
@@ -397,14 +404,20 @@ int
 nv50_hw_get_driver_query_info(struct nv50_screen *screen, unsigned id,
   struct pipe_driver_query_info *info)
 {
-   int num_hw_sm_queries = 0;
+   int num_hw_sm_queries = 0, num_hw_metric_queries = 0;
 
num_hw_sm_queries = nv50_hw_sm_get_driver_query_info(screen, 0, NULL);
+   num_hw_metric_queries =
+  nv50_hw_metric_get_driver_query_info(screen, 0, NULL);
 
if (!info)
-  return num_hw_sm_queries;
+  return num_hw_sm_queries + num_hw_metric_queries;
+
+   if (id < num_hw_sm_queries)
+  return nv50_hw_sm_get_driver_query_info(screen, id, info);
 
-   return nv50_hw_sm_get_driver_query_info(screen, id, info);
+   return nv50_hw_metric_get_driver_query_info(screen,
+   id - num_hw_sm_queries, info);
 }
 
 void
diff --git a/src/gallium/drivers/nouveau/nv50/nv50_query_hw_metric.c 
b/src/gallium/drivers/nouveau/nv50/nv50_query_hw_metric.c
new file mode 100644
index 000..13dad30
--- /dev/null
+++ b/src/gallium/drivers/nouveau/nv50/nv50_query_hw_metric.c
@@ -0,0 +1,207 @@
+/*
+ * Copyright 2015 Samuel Pitoiset
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+#include "nv50/nv50_context.h"
+#include "nv50/nv50_query_hw_metric.h"
+#include "nv50/nv50_query_hw_sm.h"
+
+/* === PERFORMANCE MONITORING METRICS for NV84+ === */
+static const char *nv50_hw_metric_names[] =
+{
+   "metric-branch_efficiency",
+};
+
+struct nv50_hw_metric_query_cfg {
+   uint32_t queries[4];
+   uint32_t num_queries;
+};
+
+#define _SM(n) NV50_HW_SM_QUERY(NV50_HW_SM_QUERY_ ##n)
+#define _M(n, c) [NV50_HW_METRIC_QUERY_##n] = c
+
+/*  Compute capability 1.1 (G84+)  */
+static const struct nv50_hw_metric_query_cfg
+sm11_branch_efficiency =
+{
+   .queries[0]  = _SM(BRANCH),
+   .queries[1]  = _SM(DIVERGENT_BRANCH),
+   .num_queries = 2,
+};
+
+static const struct nv50_hw_metric_query_cfg *sm11_hw_metric_queries[] =
+{
+   _M(BRANCH_EFFICIENCY, &sm11_branch_ef

[Mesa-dev] [PATCH v2 2/3] nv50: add compute-related MP perf counters on G84+

2015-11-14 Thread Samuel Pitoiset

These compute-related MP performance counters have been reverse
engineered using CUPTI which is part of NVIDIA CUDA.

As for nvc0, we use a compute kernel to read out those performance
counters, and the command stream to configure them. Note that Tesla
only exposes 4 MP performance counters, while Fermi has 8.

Only G84+ is supported because G80 is an old and weird card.

Tested on G84, G96, G200, MCP79 and GT218 with glxgears, glxspheres64,
xonotic-glx, heaven and valley.

Signed-off-by: Samuel Pitoiset 
Tested-by: Pierre Moreau 
---
 src/gallium/drivers/nouveau/Makefile.sources   |   2 +
 src/gallium/drivers/nouveau/nv50/nv50_context.h|   3 +-
 src/gallium/drivers/nouveau/nv50/nv50_query.c  |  23 ++
 src/gallium/drivers/nouveau/nv50/nv50_query_hw.c   |  34 +-
 src/gallium/drivers/nouveau/nv50/nv50_query_hw.h   |  16 +
 .../drivers/nouveau/nv50/nv50_query_hw_sm.c| 417 +
 .../drivers/nouveau/nv50/nv50_query_hw_sm.h|  45 +++
 src/gallium/drivers/nouveau/nv50/nv50_screen.c |   1 +
 src/gallium/drivers/nouveau/nv50/nv50_screen.h |   9 +
 9 files changed, 548 insertions(+), 2 deletions(-)
 create mode 100644 src/gallium/drivers/nouveau/nv50/nv50_query_hw_sm.c
 create mode 100644 src/gallium/drivers/nouveau/nv50/nv50_query_hw_sm.h

diff --git a/src/gallium/drivers/nouveau/Makefile.sources 
b/src/gallium/drivers/nouveau/Makefile.sources
index c2ff8e9..a1aa135 100644
--- a/src/gallium/drivers/nouveau/Makefile.sources
+++ b/src/gallium/drivers/nouveau/Makefile.sources
@@ -77,6 +77,8 @@ NV50_C_SOURCES := \
nv50/nv50_query.h \
nv50/nv50_query_hw.c \
nv50/nv50_query_hw.h \
+   nv50/nv50_query_hw_sm.c \
+   nv50/nv50_query_hw_sm.h \
nv50/nv50_resource.c \
nv50/nv50_resource.h \
nv50/nv50_screen.c \
diff --git a/src/gallium/drivers/nouveau/nv50/nv50_context.h 
b/src/gallium/drivers/nouveau/nv50/nv50_context.h
index fbafe02..2cebcd9 100644
--- a/src/gallium/drivers/nouveau/nv50/nv50_context.h
+++ b/src/gallium/drivers/nouveau/nv50/nv50_context.h
@@ -67,7 +67,8 @@
 /* compute bufctx (during launch_grid) */
 #define NV50_BIND_CP_GLOBAL   0
 #define NV50_BIND_CP_SCREEN   1
-#define NV50_BIND_CP_COUNT2
+#define NV50_BIND_CP_QUERY2
+#define NV50_BIND_CP_COUNT3
 
 /* bufctx for other operations */
 #define NV50_BIND_2D  0
diff --git a/src/gallium/drivers/nouveau/nv50/nv50_query.c 
b/src/gallium/drivers/nouveau/nv50/nv50_query.c
index dd9b85b..c31bf72 100644
--- a/src/gallium/drivers/nouveau/nv50/nv50_query.c
+++ b/src/gallium/drivers/nouveau/nv50/nv50_query.c
@@ -153,3 +153,26 @@ nv50_init_query_functions(struct nv50_context *nv50)
pipe->get_query_result = nv50_get_query_result;
pipe->render_condition = nv50_render_condition;
 }
+
+int
+nv50_screen_get_driver_query_info(struct pipe_screen *pscreen,
+  unsigned id,
+  struct pipe_driver_query_info *info)
+{
+   struct nv50_screen *screen = nv50_screen(pscreen);
+   int num_hw_queries = 0;
+
+   num_hw_queries = nv50_hw_get_driver_query_info(screen, 0, NULL);
+
+   if (!info)
+  return num_hw_queries;
+
+   /* Init default values. */
+   info->name = "this_is_not_the_query_you_are_looking_for";
+   info->query_type = 0xdeadd01d;
+   info->max_value.u64 = 0;
+   info->type = PIPE_DRIVER_QUERY_TYPE_UINT64;
+   info->group_id = -1;
+
+   return nv50_hw_get_driver_query_info(screen, id, info);
+}
diff --git a/src/gallium/drivers/nouveau/nv50/nv50_query_hw.c 
b/src/gallium/drivers/nouveau/nv50/nv50_query_hw.c
index 945ce7ab..23108ac 100644
--- a/src/gallium/drivers/nouveau/nv50/nv50_query_hw.c
+++ b/src/gallium/drivers/nouveau/nv50/nv50_query_hw.c
@@ -25,6 +25,7 @@
 
 #include "nv50/nv50_context.h"
 #include "nv50/nv50_query_hw.h"
+#include "nv50/nv50_query_hw_sm.h"
 #include "nv_object.xml.h"
 
 #define NV50_HW_QUERY_STATE_READY   0
@@ -41,7 +42,7 @@
 
 #define NV50_HW_QUERY_ALLOC_SPACE 256
 
-static bool
+bool
 nv50_hw_query_allocate(struct nv50_context *nv50, struct nv50_query *q,
int size)
 {
@@ -122,6 +123,9 @@ nv50_hw_begin_query(struct nv50_context *nv50, struct 
nv50_query *q)
struct nouveau_pushbuf *push = nv50->base.pushbuf;
struct nv50_hw_query *hq = nv50_hw_query(q);
 
+   if (hq->funcs && hq->funcs->begin_query)
+  return hq->funcs->begin_query(nv50, hq);
+
/* For occlusion queries we have to change the storage, because a previous
 * query might set the initial render condition to false even *after* we re-
 * initialized it to true.
@@ -193,6 +197,11 @@ nv50_hw_end_query(struct nv50_context *nv50, struct 
nv50_query *q)
struct nouveau_pushbuf *push = nv50->base.pushbuf;
struct nv50_hw_query *hq = nv50_hw_query(q);
 
+   if (hq->funcs && hq->funcs->end_query) {
+  hq->funcs->end_query(nv50, hq);
+  return;
+   }
+
hq->state = NV50_HW_QUERY_STATE_ENDED;
 
switch (q->type) {
@@ -261,6 +270,9 @@ nv5

[Mesa-dev] [PATCH v2 0/3] nv50: compute support & some perf counters

2015-11-14 Thread Samuel Pitoiset

Hi,

Only patch 1/3 has been updated. Patches 4 and 5 of the first version
have been dropped because those groups of GPU counters are going to
be removed.

Thanks.

Samuel Pitoiset (3):
  nv50: implement a basic compute support
  nv50: add compute-related MP perf counters on G84+
  nv50: add support for performance metrics on G84+

 src/gallium/drivers/nouveau/Makefile.sources   |   5 +
 src/gallium/drivers/nouveau/nv50/nv50_compute.c| 320 +++
 .../drivers/nouveau/nv50/nv50_compute.xml.h| 444 +
 src/gallium/drivers/nouveau/nv50/nv50_context.c|  30 +-
 src/gallium/drivers/nouveau/nv50/nv50_context.h|  24 +-
 src/gallium/drivers/nouveau/nv50/nv50_program.c|  22 +-
 src/gallium/drivers/nouveau/nv50/nv50_program.h|   7 +
 src/gallium/drivers/nouveau/nv50/nv50_query.c  |  23 ++
 src/gallium/drivers/nouveau/nv50/nv50_query_hw.c   |  47 ++-
 src/gallium/drivers/nouveau/nv50/nv50_query_hw.h   |  16 +
 .../drivers/nouveau/nv50/nv50_query_hw_metric.c| 207 ++
 .../drivers/nouveau/nv50/nv50_query_hw_metric.h|  34 ++
 .../drivers/nouveau/nv50/nv50_query_hw_sm.c| 417 +++
 .../drivers/nouveau/nv50/nv50_query_hw_sm.h|  45 +++
 src/gallium/drivers/nouveau/nv50/nv50_screen.c |  62 ++-
 src/gallium/drivers/nouveau/nv50/nv50_screen.h |  17 +
 src/gallium/drivers/nouveau/nv50/nv50_state.c  |  99 +
 17 files changed, 1809 insertions(+), 10 deletions(-)
 create mode 100644 src/gallium/drivers/nouveau/nv50/nv50_compute.c
 create mode 100644 src/gallium/drivers/nouveau/nv50/nv50_compute.xml.h
 create mode 100644 src/gallium/drivers/nouveau/nv50/nv50_query_hw_metric.c
 create mode 100644 src/gallium/drivers/nouveau/nv50/nv50_query_hw_metric.h
 create mode 100644 src/gallium/drivers/nouveau/nv50/nv50_query_hw_sm.c
 create mode 100644 src/gallium/drivers/nouveau/nv50/nv50_query_hw_sm.h

-- 
2.5.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v2 1/3] nv50: implement a basic compute support

2015-11-14 Thread Samuel Pitoiset

This adds the ability to launch simple compute kernels like the one I
will use to read out MP performance counters in the upcoming patch.

This compute support is based on the work of Francisco Jerez (aka curro)
that he did as part of his EVoC project in 2011/2012 to get OpenCL
working on Tesla. His original work can be found here:
https://github.com/curro/mesa/commits/nv50-compute

I did some improvements on the original code, like fixing using both 3D
and COMPUTE simultaneously, improving global buffers binding, and making
the code closer to what nvc0 already does. This compute support has been
tested by Pierre Moreau and myself with some compute kernels. This is a
step towards OpenCL.

Speaking about this, it seems like compute programs overlap fragment
programs when they are used both. To fix this, we need to re-validate
fragment programs when binding compute programs and vice versa.

Note that, textures, samplers and surfaces still need to be implemented.

Changes from v2:
 - use nv50->debug to get shader compiler reports
 - remove useless localOffset in nv50_program

Signed-off-by: Samuel Pitoiset 
Tested-by: Pierre Moreau 
---
 src/gallium/drivers/nouveau/Makefile.sources   |   1 +
 src/gallium/drivers/nouveau/nv50/nv50_compute.c| 320 +++
 .../drivers/nouveau/nv50/nv50_compute.xml.h| 444 +
 src/gallium/drivers/nouveau/nv50/nv50_context.c|  30 +-
 src/gallium/drivers/nouveau/nv50/nv50_context.h|  23 +-
 src/gallium/drivers/nouveau/nv50/nv50_program.c|  22 +-
 src/gallium/drivers/nouveau/nv50/nv50_program.h|   7 +
 src/gallium/drivers/nouveau/nv50/nv50_screen.c |  61 ++-
 src/gallium/drivers/nouveau/nv50/nv50_screen.h |   8 +
 src/gallium/drivers/nouveau/nv50/nv50_state.c  |  99 +
 10 files changed, 1006 insertions(+), 9 deletions(-)
 create mode 100644 src/gallium/drivers/nouveau/nv50/nv50_compute.c
 create mode 100644 src/gallium/drivers/nouveau/nv50/nv50_compute.xml.h

diff --git a/src/gallium/drivers/nouveau/Makefile.sources 
b/src/gallium/drivers/nouveau/Makefile.sources
index 83f8113..c2ff8e9 100644
--- a/src/gallium/drivers/nouveau/Makefile.sources
+++ b/src/gallium/drivers/nouveau/Makefile.sources
@@ -64,6 +64,7 @@ NV50_C_SOURCES := \
nv50/nv50_3ddefs.xml.h \
nv50/nv50_3d.xml.h \
nv50/nv50_blit.h \
+   nv50/nv50_compute.c \
nv50/nv50_context.c \
nv50/nv50_context.h \
nv50/nv50_defs.xml.h \
diff --git a/src/gallium/drivers/nouveau/nv50/nv50_compute.c 
b/src/gallium/drivers/nouveau/nv50/nv50_compute.c
new file mode 100644
index 000..6d23fd6
--- /dev/null
+++ b/src/gallium/drivers/nouveau/nv50/nv50_compute.c
@@ -0,0 +1,320 @@
+/*
+ * Copyright 2012 Francisco Jerez
+ * Copyright 2015 Samuel Pitoiset
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining
+ * a copy of this software and associated documentation files (the
+ * "Software"), to deal in the Software without restriction, including
+ * without limitation the rights to use, copy, modify, merge, publish,
+ * distribute, sublicense, and/or sell copies of the Software, and to
+ * permit persons to whom the Software is furnished to do so, subject to
+ * the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the
+ * next paragraph) shall be included in all copies or substantial
+ * portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
+ * IN NO EVENT SHALL THE COPYRIGHT OWNER(S) AND/OR ITS SUPPLIERS BE
+ * LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
+ * OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
+ * WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+ *
+ */
+
+#include "nv50/nv50_context.h"
+#include "nv50/nv50_compute.xml.h"
+
+#include "codegen/nv50_ir_driver.h"
+
+int
+nv50_screen_compute_setup(struct nv50_screen *screen,
+  struct nouveau_pushbuf *push)
+{
+   struct nouveau_device *dev = screen->base.device;
+   struct nouveau_object *chan = screen->base.channel;
+   struct nv04_fifo *fifo = (struct nv04_fifo *)chan->data;
+   unsigned obj_class;
+   int i, ret;
+
+   switch (dev->chipset & 0xf0) {
+   case 0x50:
+   case 0x80:
+   case 0x90:
+  obj_class = NV50_COMPUTE_CLASS;
+  break;
+   case 0xa0:
+  switch (dev->chipset) {
+  case 0xa3:
+  case 0xa5:
+  case 0xa8:
+ obj_class = NVA3_COMPUTE_CLASS;
+ break;
+  default:
+ obj_class = NV50_COMPUTE_CLASS;
+ break;
+  }
+  break;
+   default:
+  NOUVEAU_ERR("unsupported chipset: NV%02x\n", dev->chipset);
+  return -1;
+   }
+
+   ret = nouveau_object_new(chan, 0xbeef50c0, obj_class, NULL, 0,
+&screen->compute);

[Mesa-dev] [PATCH 03/36] i965: Define state flag to signal that the URB size has been altered.

2015-11-14 Thread Jordan Justen

From: Francisco Jerez 

This will make sure that we recalculate the URB layout anytime the URB
size is modified by the L3 partitioning code.
---
 src/mesa/drivers/dri/i965/brw_context.h  | 2 ++
 src/mesa/drivers/dri/i965/brw_state_upload.c | 1 +
 src/mesa/drivers/dri/i965/gen7_urb.c | 3 +++
 3 files changed, 6 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_context.h 
b/src/mesa/drivers/dri/i965/brw_context.h
index 20d2dd0..ac05658 100644
--- a/src/mesa/drivers/dri/i965/brw_context.h
+++ b/src/mesa/drivers/dri/i965/brw_context.h
@@ -213,6 +213,7 @@ enum brw_state_id {
BRW_STATE_VS_ATTRIB_WORKAROUNDS,
BRW_STATE_COMPUTE_PROGRAM,
BRW_STATE_CS_WORK_GROUPS,
+   BRW_STATE_URB_SIZE,
BRW_NUM_STATE_BITS
 };
 
@@ -293,6 +294,7 @@ enum brw_state_id {
 #define BRW_NEW_VS_ATTRIB_WORKAROUNDS   (1ull << 
BRW_STATE_VS_ATTRIB_WORKAROUNDS)
 #define BRW_NEW_COMPUTE_PROGRAM (1ull << BRW_STATE_COMPUTE_PROGRAM)
 #define BRW_NEW_CS_WORK_GROUPS  (1ull << BRW_STATE_CS_WORK_GROUPS)
+#define BRW_NEW_URB_SIZE(1ull << BRW_STATE_URB_SIZE)
 
 struct brw_state_flags {
/** State update flags signalled by mesa internals */
diff --git a/src/mesa/drivers/dri/i965/brw_state_upload.c 
b/src/mesa/drivers/dri/i965/brw_state_upload.c
index 6f8daf6..aab5c91 100644
--- a/src/mesa/drivers/dri/i965/brw_state_upload.c
+++ b/src/mesa/drivers/dri/i965/brw_state_upload.c
@@ -618,6 +618,7 @@ static struct dirty_bit_map brw_bits[] = {
DEFINE_BIT(BRW_NEW_VS_ATTRIB_WORKAROUNDS),
DEFINE_BIT(BRW_NEW_COMPUTE_PROGRAM),
DEFINE_BIT(BRW_NEW_CS_WORK_GROUPS),
+   DEFINE_BIT(BRW_NEW_URB_SIZE),
{0, 0, 0}
 };
 
diff --git a/src/mesa/drivers/dri/i965/gen7_urb.c 
b/src/mesa/drivers/dri/i965/gen7_urb.c
index 6916217..11a4f03 100644
--- a/src/mesa/drivers/dri/i965/gen7_urb.c
+++ b/src/mesa/drivers/dri/i965/gen7_urb.c
@@ -153,6 +153,7 @@ gen7_upload_urb(struct brw_context *brw)
 * skip the rest of the logic.
 */
if (!(brw->ctx.NewDriverState & BRW_NEW_CONTEXT) &&
+   !(brw->ctx.NewDriverState & BRW_NEW_URB_SIZE) &&
brw->urb.vsize == vs_size &&
brw->urb.gs_present == gs_present &&
brw->urb.gsize == gs_size) {
@@ -176,6 +177,7 @@ gen7_upload_urb(struct brw_context *brw)
unsigned chunk_size_bytes = 8192;
 
/* Determine the size of the URB in chunks.
+* BRW_NEW_URB_SIZE
 */
unsigned urb_chunks = brw->urb.size * 1024 / chunk_size_bytes;
 
@@ -314,6 +316,7 @@ const struct brw_tracked_state gen7_urb = {
.dirty = {
   .mesa = 0,
   .brw = BRW_NEW_CONTEXT |
+ BRW_NEW_URB_SIZE |
  BRW_NEW_GEOMETRY_PROGRAM |
  BRW_NEW_GS_PROG_DATA |
  BRW_NEW_VS_PROG_DATA,
-- 
2.6.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 01/36] i965: Define symbolic constants for some useful L3 cache control registers.

2015-11-14 Thread Jordan Justen

From: Francisco Jerez 

---
 src/mesa/drivers/dri/i965/intel_reg.h | 53 +++
 1 file changed, 53 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/intel_reg.h 
b/src/mesa/drivers/dri/i965/intel_reg.h
index a261c2b..0b167d5 100644
--- a/src/mesa/drivers/dri/i965/intel_reg.h
+++ b/src/mesa/drivers/dri/i965/intel_reg.h
@@ -192,3 +192,56 @@
 #define MI_PREDICATE_RESULT 0x2418
 #define MI_PREDICATE_RESULT_1   0x241C
 #define MI_PREDICATE_RESULT_2   0x2214
+
+/* L3 cache control registers. */
+#define GEN7_L3SQCREG1 0xb010
+/* L3SQ general and high priority credit initialization. */
+# define IVB_L3SQCREG1_SQGHPCI_DEFAULT 0x0073
+# define VLV_L3SQCREG1_SQGHPCI_DEFAULT 0x00d3
+# define HSW_L3SQCREG1_SQGHPCI_DEFAULT 0x0061
+# define GEN7_L3SQCREG1_CONV_DC_UC (1 << 24)
+# define GEN7_L3SQCREG1_CONV_IS_UC (1 << 25)
+# define GEN7_L3SQCREG1_CONV_C_UC  (1 << 26)
+# define GEN7_L3SQCREG1_CONV_T_UC  (1 << 27)
+
+#define GEN7_L3CNTLREG20xb020
+# define GEN7_L3CNTLREG2_SLM_ENABLE(1 << 0)
+# define GEN7_L3CNTLREG2_URB_ALLOC_SHIFT   1
+# define GEN7_L3CNTLREG2_URB_ALLOC_MASKINTEL_MASK(6, 1)
+# define GEN7_L3CNTLREG2_URB_LOW_BW(1 << 7)
+# define GEN7_L3CNTLREG2_ALL_ALLOC_SHIFT   8
+# define GEN7_L3CNTLREG2_ALL_ALLOC_MASKINTEL_MASK(13, 8)
+# define GEN7_L3CNTLREG2_RO_ALLOC_SHIFT14
+# define GEN7_L3CNTLREG2_RO_ALLOC_MASK INTEL_MASK(19, 14)
+# define GEN7_L3CNTLREG2_RO_LOW_BW (1 << 20)
+# define GEN7_L3CNTLREG2_DC_ALLOC_SHIFT21
+# define GEN7_L3CNTLREG2_DC_ALLOC_MASK INTEL_MASK(26, 21)
+# define GEN7_L3CNTLREG2_DC_LOW_BW (1 << 27)
+
+#define GEN7_L3CNTLREG30xb024
+# define GEN7_L3CNTLREG3_IS_ALLOC_SHIFT1
+# define GEN7_L3CNTLREG3_IS_ALLOC_MASK INTEL_MASK(6, 1)
+# define GEN7_L3CNTLREG3_IS_LOW_BW (1 << 7)
+# define GEN7_L3CNTLREG3_C_ALLOC_SHIFT 8
+# define GEN7_L3CNTLREG3_C_ALLOC_MASK  INTEL_MASK(13, 8)
+# define GEN7_L3CNTLREG3_C_LOW_BW  (1 << 14)
+# define GEN7_L3CNTLREG3_T_ALLOC_SHIFT 15
+# define GEN7_L3CNTLREG3_T_ALLOC_MASK  INTEL_MASK(20, 15)
+# define GEN7_L3CNTLREG3_T_LOW_BW  (1 << 21)
+
+#define HSW_SCRATCH1   0xb038
+#define HSW_SCRATCH1_L3_ATOMIC_DISABLE (1 << 27)
+
+#define HSW_ROW_CHICKEN3   0xe49c
+#define HSW_ROW_CHICKEN3_L3_ATOMIC_DISABLE (1 << 6)
+
+#define GEN8_L3CNTLREG 0x7034
+# define GEN8_L3CNTLREG_SLM_ENABLE (1 << 0)
+# define GEN8_L3CNTLREG_URB_ALLOC_SHIFT1
+# define GEN8_L3CNTLREG_URB_ALLOC_MASK INTEL_MASK(7, 1)
+# define GEN8_L3CNTLREG_RO_ALLOC_SHIFT 11
+# define GEN8_L3CNTLREG_RO_ALLOC_MASK  INTEL_MASK(17, 11)
+# define GEN8_L3CNTLREG_DC_ALLOC_SHIFT 18
+# define GEN8_L3CNTLREG_DC_ALLOC_MASK  INTEL_MASK(24, 18)
+# define GEN8_L3CNTLREG_ALL_ALLOC_SHIFT25
+# define GEN8_L3CNTLREG_ALL_ALLOC_MASK INTEL_MASK(31, 25)
-- 
2.6.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 02/36] i965: Keep track of whether LRI is allowed in the context struct.

2015-11-14 Thread Jordan Justen

From: Francisco Jerez 

This stores the result of can_do_pipelined_register_writes() in the
context struct so we can find out later whether LRI can be used to
program the L3 configuration.
---
 src/mesa/drivers/dri/i965/brw_context.h  | 5 +
 src/mesa/drivers/dri/i965/intel_extensions.c | 8 +---
 2 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_context.h 
b/src/mesa/drivers/dri/i965/brw_context.h
index 4b2db61..20d2dd0 100644
--- a/src/mesa/drivers/dri/i965/brw_context.h
+++ b/src/mesa/drivers/dri/i965/brw_context.h
@@ -840,6 +840,11 @@ struct brw_context
bool use_resource_streamer;
 
/**
+* Whether LRI can be used to write register values from the batch buffer.
+*/
+   bool can_do_pipelined_register_writes;
+
+   /**
 * Some versions of Gen hardware don't do centroid interpolation correctly
 * on unlit pixels, causing incorrect values for derivatives near triangle
 * edges.  Enabling this flag causes the fragment shader to use
diff --git a/src/mesa/drivers/dri/i965/intel_extensions.c 
b/src/mesa/drivers/dri/i965/intel_extensions.c
index 386b63c..800e97c 100644
--- a/src/mesa/drivers/dri/i965/intel_extensions.c
+++ b/src/mesa/drivers/dri/i965/intel_extensions.c
@@ -40,8 +40,8 @@
 static bool
 can_do_pipelined_register_writes(struct brw_context *brw)
 {
-   /* Supposedly, Broadwell just works. */
-   if (brw->gen >= 8)
+   /* Supposedly Broadwell just works, and earlier gens are sloppy. */
+   if (brw->gen != 7)
   return true;
 
static int result = -1;
@@ -319,6 +319,8 @@ intelInitExtensions(struct gl_context *ctx)
}
 
brw->predicate.supported = false;
+   brw->can_do_pipelined_register_writes =
+  can_do_pipelined_register_writes(brw);
 
if (brw->gen >= 7) {
   ctx->Extensions.ARB_conservative_depth = true;
@@ -334,7 +336,7 @@ intelInitExtensions(struct gl_context *ctx)
   ctx->Extensions.ARB_texture_view = true;
   ctx->Extensions.ARB_shader_storage_buffer_object = true;
 
-  if (can_do_pipelined_register_writes(brw)) {
+  if (brw->can_do_pipelined_register_writes) {
  ctx->Extensions.ARB_draw_indirect = true;
  ctx->Extensions.ARB_transform_feedback2 = true;
  ctx->Extensions.ARB_transform_feedback3 = true;
-- 
2.6.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 08/36] i965: Implement selection of the closest L3 configuration based on a vector of weights.

2015-11-14 Thread Jordan Justen

From: Francisco Jerez 

The input of the L3 set-up code is a vector giving the approximate
desired relative size of each partition.  This implements logic to
compare the input vector against the table of validated configurations
for the device and pick the closest compatible one.
---
 src/mesa/drivers/dri/i965/gen7_l3_state.c | 95 +++
 1 file changed, 95 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/gen7_l3_state.c 
b/src/mesa/drivers/dri/i965/gen7_l3_state.c
index c863b7f..4d0cfcd 100644
--- a/src/mesa/drivers/dri/i965/gen7_l3_state.c
+++ b/src/mesa/drivers/dri/i965/gen7_l3_state.c
@@ -163,6 +163,101 @@ get_l3_way_size(const struct brw_device_info *devinfo)
 }
 
 /**
+ * L3 configuration represented as a vector of weights giving the desired
+ * relative size of each partition.  The scale is arbitrary, only the ratios
+ * between weights will have an influence on the selection of the closest L3
+ * configuration.
+ */
+struct brw_l3_weights {
+   float w[NUM_L3P];
+};
+
+/**
+ * L1-normalize a vector of L3 partition weights.
+ */
+static struct brw_l3_weights
+norm_l3_weights(struct brw_l3_weights w)
+{
+   float sz = 0;
+
+   for (unsigned i = 0; i < NUM_L3P; i++)
+  sz += w.w[i];
+
+   for (unsigned i = 0; i < NUM_L3P; i++)
+  w.w[i] /= sz;
+
+   return w;
+}
+
+/**
+ * Get the relative partition weights of the specified L3 configuration.
+ */
+static struct brw_l3_weights
+get_config_l3_weights(const struct brw_l3_config *cfg)
+{
+   if (cfg) {
+  struct brw_l3_weights w;
+
+  for (unsigned i = 0; i < NUM_L3P; i++)
+ w.w[i] = cfg->n[i];
+
+  return norm_l3_weights(w);
+   } else {
+  const struct brw_l3_weights w = { { 0 } };
+  return w;
+   }
+}
+
+/**
+ * Distance between two L3 configurations represented as vectors of weights.
+ * Usually just the L1 metric except when the two configurations are
+ * considered incompatible in which case the distance will be infinite.  Note
+ * that the compatibility condition is asymmetric -- They will be considered
+ * incompatible whenever the reference configuration \p w0 requires SLM, DC,
+ * or URB but \p w1 doesn't provide it.
+ */
+static float
+diff_l3_weights(struct brw_l3_weights w0, struct brw_l3_weights w1)
+{
+   if ((w0.w[L3P_SLM] && !w1.w[L3P_SLM]) ||
+   (w0.w[L3P_DC] && !w1.w[L3P_DC] && !w1.w[L3P_ALL]) ||
+   (w0.w[L3P_URB] && !w1.w[L3P_URB])) {
+  return HUGE_VALF;
+
+   } else {
+  float dw = 0;
+
+  for (unsigned i = 0; i < NUM_L3P; i++)
+ dw += fabs(w0.w[i] - w1.w[i]);
+
+  return dw;
+   }
+}
+
+/**
+ * Return the closest validated L3 configuration for the specified device and
+ * weight vector.
+ */
+static const struct brw_l3_config *
+get_l3_config(const struct brw_device_info *devinfo, struct brw_l3_weights w0)
+{
+   const struct brw_l3_config *const cfgs = get_l3_configs(devinfo);
+   const struct brw_l3_config *cfg_best = NULL;
+   float dw_best = HUGE_VALF;
+
+   for (const struct brw_l3_config *cfg = cfgs; cfg->n[L3P_URB]; cfg++) {
+  const float dw = diff_l3_weights(w0, get_config_l3_weights(cfg));
+
+  if (dw < dw_best) {
+ cfg_best = cfg;
+ dw_best = dw;
+  }
+   }
+
+   return cfg_best;
+}
+
+/**
  * Program the hardware to use the specified L3 configuration.
  */
 static void
-- 
2.6.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 11/36] i965: Add debug flag to print out the new L3 state during transitions.

2015-11-14 Thread Jordan Justen

From: Francisco Jerez 

---
 src/mesa/drivers/dri/i965/gen7_l3_state.c | 17 +
 src/mesa/drivers/dri/i965/intel_debug.c   |  1 +
 src/mesa/drivers/dri/i965/intel_debug.h   |  1 +
 3 files changed, 19 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/gen7_l3_state.c 
b/src/mesa/drivers/dri/i965/gen7_l3_state.c
index 58eb07b..45bad02 100644
--- a/src/mesa/drivers/dri/i965/gen7_l3_state.c
+++ b/src/mesa/drivers/dri/i965/gen7_l3_state.c
@@ -435,6 +435,18 @@ update_urb_size(struct brw_context *brw, const struct 
brw_l3_config *cfg)
}
 }
 
+/**
+ * Print out the specified L3 configuration.
+ */
+static void
+dump_l3_config(const struct brw_l3_config *cfg)
+{
+   fprintf(stderr, "SLM=%d URB=%d ALL=%d DC=%d RO=%d IS=%d C=%d T=%d\n",
+   cfg->n[L3P_SLM], cfg->n[L3P_URB], cfg->n[L3P_ALL],
+   cfg->n[L3P_DC], cfg->n[L3P_RO],
+   cfg->n[L3P_IS], cfg->n[L3P_C], cfg->n[L3P_T]);
+}
+
 static void
 emit_l3_state(struct brw_context *brw)
 {
@@ -464,6 +476,11 @@ emit_l3_state(struct brw_context *brw)
   setup_l3_config(brw, cfg);
   update_urb_size(brw, cfg);
   brw->l3.config = cfg;
+
+  if (unlikely(INTEL_DEBUG & DEBUG_L3)) {
+ fprintf(stderr, "L3 config transition (%f > %f): ", dw, dw_threshold);
+ dump_l3_config(cfg);
+  }
}
 }
 
diff --git a/src/mesa/drivers/dri/i965/intel_debug.c 
b/src/mesa/drivers/dri/i965/intel_debug.c
index c00d2e7..495540a 100644
--- a/src/mesa/drivers/dri/i965/intel_debug.c
+++ b/src/mesa/drivers/dri/i965/intel_debug.c
@@ -75,6 +75,7 @@ static const struct debug_control debug_control[] = {
{ "cs",  DEBUG_CS },
{ "hex", DEBUG_HEX },
{ "nocompact",   DEBUG_NO_COMPACTION },
+   { "l3",  DEBUG_L3 },
{ NULL,0 }
 };
 
diff --git a/src/mesa/drivers/dri/i965/intel_debug.h 
b/src/mesa/drivers/dri/i965/intel_debug.h
index 98bd7e9..7a8ecec 100644
--- a/src/mesa/drivers/dri/i965/intel_debug.h
+++ b/src/mesa/drivers/dri/i965/intel_debug.h
@@ -69,6 +69,7 @@ extern uint64_t INTEL_DEBUG;
 #define DEBUG_CS  (1ull << 33)
 #define DEBUG_HEX (1ull << 34)
 #define DEBUG_NO_COMPACTION   (1ull << 35)
+#define DEBUG_L3  (1ull << 36)
 
 #ifdef HAVE_ANDROID_PLATFORM
 #define LOG_TAG "INTEL-MESA"
-- 
2.6.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 00/36] Computer shader shared variables

2015-11-14 Thread Jordan Justen

git://people.freedesktop.org/~jljusten/mesa cs-shared-variables-v1
http://patchwork.freedesktop.org/bundle/jljusten/cs-shared-variables-v1

Patches 1 - 13:

 * Rebased curro's "i965: L3 cache partitioning." (sent Sept 6)

Patches 14 - 19:

 * Rework lower_ubo_reference to allow code sharing with
   lower_shared_reference

Patches 20 - 28:

 * Add shared variable support for i965. Add lower_shared_reference,
   which works similar to lower_ubo_reference for SSBOs, except it
   merges all shared variable into one shared variable region. (Rather
   than separate BOs like SSBOs allows.)

Patches 29 - 36:

 * Adds atomic support for shared variable on i965, which is
   implemented similar to SSBOs.

On Ivy Bridge fixes several piglit and OpenGLES 3.1 CTS tests:

 * spec/arb_compute_shader/compiler/shared-atomics.comp: fail pass
 * spec/arb_compute_shader/execution/shared-atomic: crash pass
 * spec/arb_compute_shader/execution/simple-barrier: crash pass

 * es31-cts/compute_shader/atomic-case1: fail pass
 * es31-cts/compute_shader/atomic-case3: fail pass
 * es31-cts/compute_shader/shared-indexing: fail pass
 * es31-cts/compute_shader/shared-max: fail pass
 * es31-cts/compute_shader/shared-simple: fail pass
 * es31-cts/compute_shader/shared-struct: fail pass
 * es31-cts/compute_shader/work-group-size: fail pass

Francisco Jerez (13):
  i965: Define symbolic constants for some useful L3 cache control
registers.
  i965: Keep track of whether LRI is allowed in the context struct.
  i965: Define state flag to signal that the URB size has been altered.
  i965/gen8: Don't add workaround bits to PIPE_CONTROL stalls if DC
flush is set.
  i965: Import tables enumerating the set of validated L3
configurations.
  i965: Implement programming of the L3 configuration.
  i965/hsw: Enable L3 atomics.
  i965: Implement selection of the closest L3 configuration based on a
vector of weights.
  i965: Calculate appropriate L3 partition weights for the current
pipeline state.
  i965: Implement L3 state atom.
  i965: Add debug flag to print out the new L3 state during transitions.
  i965: Work around L3 state leaks during context switches.
  i965: Hook up L3 partitioning state atom.

Jordan Justen (23):
  glsl ubo/ssbo: Use enum to track current buffer access type
  glsl ubo/ssbo: Split buffer access to insert_buffer_access
  glsl ubo/ssbo: Add lower_buffer_access class
  glsl ubo/ssbo: Move is_dereferenced_thing_row_major into
lower_buffer_access
  glsl ubo/ssbo: Move common code into
lower_buffer_access::setup_buffer_access
  glsl: Add default matrix ordering in lower_buffer_access
  glsl: Don't lower_variable_index_to_cond_assign for shared variables
  glsl: Add lowering pass for shared variable references
  nir: Translate glsl shared var load intrinsic to nir intrinsic
  nir: Translate glsl shared var store intrinsic to nir intrinsic
  i965: Disable vector splitting on shared variables
  i965/fs: Handle nir shared variable load intrinsic
  i965/fs: Handle nir shared variable store intrinsic function
  i965: Enable shared local memory for CS shared variables
  i965: Lower shared variable references to intrinsic calls
  glsl: Allow atomic functions to be used with shared variables
  glsl: Replace atomic_ssbo and ssbo_atomic with atomic
  glsl: Check for SSBO variable in SSBO atomic lowering
  glsl: Translate atomic intrinsic functions on shared variables
  glsl: Buffer atomics are supported for compute shaders
  glsl: Disable several optimizations on shared variables
  nir: Add nir intrinsics for shared variable atomic operations
  i965/nir: Implement shared variable atomic operations

 src/glsl/Makefile.sources  |   2 +
 src/glsl/ast_function.cpp  |  18 +-
 src/glsl/builtin_functions.cpp | 236 -
 src/glsl/ir_optimization.h |   1 +
 src/glsl/linker.cpp|   4 +
 src/glsl/lower_buffer_access.cpp   | 565 +
 src/glsl/lower_buffer_access.h |  72 +++
 src/glsl/lower_shared_reference.cpp| 511 +++
 src/glsl/lower_ubo_reference.cpp   | 536 +++
 src/glsl/lower_variable_index_to_cond_assign.cpp   |   3 +
 src/glsl/nir/glsl_to_nir.cpp   | 131 -
 src/glsl/nir/nir_intrinsics.h  |  29 +-
 src/glsl/opt_constant_propagation.cpp  |   3 +-
 src/glsl/opt_constant_variable.cpp |   3 +-
 src/glsl/opt_copy_propagation.cpp  |   3 +-
 src/mesa/drivers/dri/i965/Makefile.sources |   1 +
 src/mesa/drivers/dri/i965/brw_compiler.h   |   1 +
 src/mesa/drivers/dri/i965/brw_context.h|  17 +-
 src/mesa/drivers/dri/i965/brw_cs.c |   2 +
 src/mesa/drivers/dri/i965/brw_defines.h|   4 +
 src/mesa/drivers/dri/i965/brw_fs.h |   2 +
 src/mesa/

[Mesa-dev] [PATCH 09/36] i965: Calculate appropriate L3 partition weights for the current pipeline state.

2015-11-14 Thread Jordan Justen

From: Francisco Jerez 

This calculates a rather conservative partitioning of the L3 cache
based on the shaders currently bound to the pipeline and whether they
use SLM, atomics, images or scratch space.  The result is intended to
be fine-tuned later on based on other pipeline state.
---
 src/mesa/drivers/dri/i965/brw_compiler.h  |  1 +
 src/mesa/drivers/dri/i965/gen7_l3_state.c | 53 +++
 2 files changed, 54 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_compiler.h 
b/src/mesa/drivers/dri/i965/brw_compiler.h
index e3a26d6..6be59cd 100644
--- a/src/mesa/drivers/dri/i965/brw_compiler.h
+++ b/src/mesa/drivers/dri/i965/brw_compiler.h
@@ -300,6 +300,7 @@ struct brw_stage_prog_data {
 
unsigned curb_read_length;
unsigned total_scratch;
+   unsigned total_shared;
 
/**
 * Register where the thread expects to find input data from the URB
diff --git a/src/mesa/drivers/dri/i965/gen7_l3_state.c 
b/src/mesa/drivers/dri/i965/gen7_l3_state.c
index 4d0cfcd..1a88261 100644
--- a/src/mesa/drivers/dri/i965/gen7_l3_state.c
+++ b/src/mesa/drivers/dri/i965/gen7_l3_state.c
@@ -258,6 +258,59 @@ get_l3_config(const struct brw_device_info *devinfo, 
struct brw_l3_weights w0)
 }
 
 /**
+ * Return a reasonable default L3 configuration for the specified device based
+ * on whether SLM and DC are required.  In the non-SLM non-DC case the result
+ * is intended to approximately resemble the hardware defaults.
+ */
+static struct brw_l3_weights
+get_default_l3_weights(const struct brw_device_info *devinfo,
+   bool needs_dc, bool needs_slm)
+{
+   struct brw_l3_weights w = {{ 0 }};
+
+   w.w[L3P_SLM] = needs_slm;
+   w.w[L3P_URB] = 1.0;
+
+   if (devinfo->gen >= 8) {
+  w.w[L3P_ALL] = 1.0;
+   } else {
+  w.w[L3P_DC] = needs_dc ? 0.1 : 0;
+  w.w[L3P_RO] = devinfo->is_baytrail ? 0.5 : 1.0;
+   }
+
+   return norm_l3_weights(w);
+}
+
+/**
+ * Calculate the desired L3 partitioning based on the current state of the
+ * pipeline.  For now this simply returns the conservative defaults calculated
+ * by get_default_l3_weights(), but we could probably do better by gathering
+ * more statistics from the pipeline state (e.g. guess of expected URB usage
+ * and bound surfaces), or by using feed-back from performance counters.
+ */
+static struct brw_l3_weights
+get_pipeline_state_l3_weights(const struct brw_context *brw)
+{
+   const struct brw_stage_state *stage_states[] = {
+  &brw->vs.base, &brw->gs.base, &brw->wm.base, &brw->cs.base
+   };
+   bool needs_dc = false, needs_slm = false;
+
+   for (unsigned i = 0; i < ARRAY_SIZE(stage_states); i++) {
+  const struct gl_shader_program *prog =
+ brw->ctx._Shader->CurrentProgram[stage_states[i]->stage];
+  const struct brw_stage_prog_data *prog_data = stage_states[i]->prog_data;
+
+  needs_dc |= (prog && prog->NumAtomicBuffers) ||
+ (prog_data && (prog_data->total_scratch || 
prog_data->nr_image_params));
+  needs_slm |= prog_data && prog_data->total_shared;
+   }
+
+   return get_default_l3_weights(brw->intelScreen->devinfo,
+ needs_dc, needs_slm);
+}
+
+/**
  * Program the hardware to use the specified L3 configuration.
  */
 static void
-- 
2.6.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 12/36] i965: Work around L3 state leaks during context switches.

2015-11-14 Thread Jordan Justen

From: Francisco Jerez 

This is going to require some rather intrusive kernel changes to fix
properly, in the meantime (and forever on at least pre-v4.1 kernels)
we'll have to restore the hardware defaults at the end of every batch
in which the L3 configuration was changed to avoid interfering with
the DDX and GL clients that use an older non-L3-aware version of Mesa.
---
 src/mesa/drivers/dri/i965/brw_state.h |  4 +++
 src/mesa/drivers/dri/i965/gen7_l3_state.c | 48 +++
 src/mesa/drivers/dri/i965/intel_batchbuffer.c |  7 
 src/mesa/drivers/dri/i965/intel_batchbuffer.h |  6 +++-
 4 files changed, 64 insertions(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/brw_state.h 
b/src/mesa/drivers/dri/i965/brw_state.h
index 49f301a..b7c0039 100644
--- a/src/mesa/drivers/dri/i965/brw_state.h
+++ b/src/mesa/drivers/dri/i965/brw_state.h
@@ -380,6 +380,10 @@ void gen7_update_binding_table_from_array(struct 
brw_context *brw,
 void gen7_disable_hw_binding_tables(struct brw_context *brw);
 void gen7_reset_hw_bt_pool_offsets(struct brw_context *brw);
 
+/* gen7_l3_state.c */
+void
+gen7_restore_default_l3_config(struct brw_context *brw);
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/src/mesa/drivers/dri/i965/gen7_l3_state.c 
b/src/mesa/drivers/dri/i965/gen7_l3_state.c
index 45bad02..84ab118 100644
--- a/src/mesa/drivers/dri/i965/gen7_l3_state.c
+++ b/src/mesa/drivers/dri/i965/gen7_l3_state.c
@@ -495,3 +495,51 @@ const struct brw_tracked_state gen7_l3_state = {
},
.emit = emit_l3_state
 };
+
+/**
+ * Hack to restore the default L3 configuration.
+ *
+ * This will be called at the end of every batch in order to reset the L3
+ * configuration to the default values for the time being until the kernel is
+ * fixed.  Until kernel commit 6702cf16e0ba8b0129f5aa1b6609d4e9c70bc13b
+ * (included in v4.1) we would set the MI_RESTORE_INHIBIT bit when submitting
+ * batch buffers for the default context used by the DDX, which meant that any
+ * context state changed by the GL would leak into the DDX, the assumption
+ * being that the DDX would initialize any state it cares about manually.  The
+ * DDX is however not careful enough to program an L3 configuration
+ * explicitly, and it makes assumptions about it (URB size) which won't hold
+ * and cause it to misrender if we let our L3 set-up to leak into the DDX.
+ *
+ * Since v4.1 of the Linux kernel the default context is saved and restored
+ * normally, so it's far less likely for our L3 programming to interfere with
+ * other contexts -- In fact restoring the default L3 configuration at the end
+ * of the batch will be redundant most of the time.  A kind of state leak is
+ * still possible though if the context making assumptions about L3 state is
+ * created immediately after our context was active (e.g. without the DDX
+ * default context being scheduled in between) because at present the DRM
+ * doesn't fully initialize the contents of newly created contexts and instead
+ * sets the MI_RESTORE_INHIBIT flag causing it to inherit the state from the
+ * last active context.
+ *
+ * It's possible to realize such a scenario if, say, an X server (or a GL
+ * application using an outdated non-L3-aware Mesa version) is started while
+ * another GL application is running and happens to have modified the L3
+ * configuration, or if no X server is running at all and a GL application
+ * using a non-L3-aware Mesa version is started after another GL application
+ * ran and modified the L3 configuration -- The latter situation can actually
+ * be reproduced easily on IVB in our CI system.
+ */
+void
+gen7_restore_default_l3_config(struct brw_context *brw)
+{
+   const struct brw_l3_weights w =
+  get_default_l3_weights(brw->intelScreen->devinfo, false, false);
+   const struct brw_l3_config *const cfg =
+  get_l3_config(brw->intelScreen->devinfo, w);
+
+   if (cfg != brw->l3.config && brw->can_do_pipelined_register_writes) {
+  setup_l3_config(brw, cfg);
+  update_urb_size(brw, cfg);
+  brw->l3.config = cfg;
+   }
+}
diff --git a/src/mesa/drivers/dri/i965/intel_batchbuffer.c 
b/src/mesa/drivers/dri/i965/intel_batchbuffer.c
index 0363bd3..f778074 100644
--- a/src/mesa/drivers/dri/i965/intel_batchbuffer.c
+++ b/src/mesa/drivers/dri/i965/intel_batchbuffer.c
@@ -208,6 +208,13 @@ brw_finish_batch(struct brw_context *brw)
brw_emit_query_end(brw);
 
if (brw->batch.ring == RENDER_RING) {
+  /* Work around L3 state leaks into contexts set MI_RESTORE_INHIBIT which
+   * assume that the L3 cache is configured according to the hardware
+   * defaults.
+   */
+  if (brw->gen >= 7)
+ gen7_restore_default_l3_config(brw);
+
   /* We may also need to snapshot and disable OA counters. */
   brw_perf_monitor_finish_batch(brw);
 
diff --git a/src/mesa/drivers/dri/i965/intel_batchbuffer.h 
b/src/mesa/drivers/dri/i965/intel_batchbuffer.h
index 2b177d3..f473690 100644
--- a/src/mesa/drivers/dri/i9

[Mesa-dev] [PATCH 15/36] glsl ubo/ssbo: Split buffer access to insert_buffer_access

2015-11-14 Thread Jordan Justen

This allows the code in emit_access to be generic enough to also be
for lowering shared variables.

Signed-off-by: Jordan Justen 
Cc: Samuel Iglesias Gonsalvez 
Cc: Iago Toral Quiroga 
---
 src/glsl/lower_ubo_reference.cpp | 78 ++--
 1 file changed, 43 insertions(+), 35 deletions(-)

diff --git a/src/glsl/lower_ubo_reference.cpp b/src/glsl/lower_ubo_reference.cpp
index 41012db..b8fcc8e 100644
--- a/src/glsl/lower_ubo_reference.cpp
+++ b/src/glsl/lower_ubo_reference.cpp
@@ -170,6 +170,9 @@ public:
   ssbo_atomic_access,
} buffer_access_type;
 
+   void insert_buffer_access(ir_dereference *deref, const glsl_type *type,
+ ir_rvalue *offset, unsigned mask, int channel);
+
void emit_access(bool is_write, ir_dereference *deref,
 ir_variable *base_offset, unsigned int deref_offset,
 bool row_major, int matrix_columns,
@@ -689,6 +692,41 @@ lower_ubo_reference_visitor::ssbo_load(const struct 
glsl_type *type,
return new(mem_ctx) ir_call(sig, deref_result, &call_params);
 }
 
+void
+lower_ubo_reference_visitor::insert_buffer_access(ir_dereference *deref,
+  const glsl_type *type,
+  ir_rvalue *offset,
+  unsigned mask,
+  int channel)
+{
+   switch (buffer_access_type) {
+   case ubo_load_access:
+  base_ir->insert_before(assign(deref->clone(mem_ctx, NULL),
+ubo_load(type, offset),
+mask));
+  break;
+   case ssbo_load_access: {
+  ir_call *load_ssbo = ssbo_load(type, offset);
+  base_ir->insert_before(load_ssbo);
+  ir_rvalue *value = load_ssbo->return_deref->as_rvalue()->clone(mem_ctx, 
NULL);
+  ir_assignment *assignment =
+ assign(deref->clone(mem_ctx, NULL), value, mask);
+  base_ir->insert_before(assignment);
+  break;
+   }
+   case ssbo_store_access:
+  if (channel >= 0) {
+ base_ir->insert_after(ssbo_store(swizzle(deref, channel, 1),
+  offset, 1));
+  } else {
+ base_ir->insert_after(ssbo_store(deref, offset, mask));
+  }
+  break;
+   default:
+  unreachable("invalid buffer_access_type in insert_buffer_access");
+   }
+}
+
 static inline int
 writemask_for_size(unsigned n)
 {
@@ -802,19 +840,9 @@ lower_ubo_reference_visitor::emit_access(bool is_write,
if (!row_major) {
   ir_rvalue *offset =
  add(base_offset, new(mem_ctx) ir_constant(deref_offset));
-  if (is_write)
- base_ir->insert_after(ssbo_store(deref, offset, write_mask));
-  else {
- if (buffer_access_type == ubo_load_access) {
- base_ir->insert_before(assign(deref->clone(mem_ctx, NULL),
-   ubo_load(deref->type, offset)));
- } else {
-ir_call *load_ssbo = ssbo_load(deref->type, offset);
-base_ir->insert_before(load_ssbo);
-ir_rvalue *value = 
load_ssbo->return_deref->as_rvalue()->clone(mem_ctx, NULL);
-base_ir->insert_before(assign(deref->clone(mem_ctx, NULL), value));
- }
-  }
+  unsigned mask =
+ is_write ? write_mask : (1 << deref->type->vector_elements) - 1;
+  insert_buffer_access(deref, deref->type, offset, mask, -1);
} else {
   unsigned N = deref->type->is_double() ? 8 : 4;
 
@@ -863,28 +891,8 @@ lower_ubo_reference_visitor::emit_access(bool is_write,
  ir_rvalue *chan_offset =
 add(base_offset,
 new(mem_ctx) ir_constant(deref_offset + i * matrix_stride));
- if (is_write) {
-/* If the component is not in the writemask, then don't
- * store any value.
- */
-if (!((1 << i) & write_mask))
-   continue;
-
-base_ir->insert_after(ssbo_store(swizzle(deref, i, 1), 
chan_offset, 1));
- } else {
-if (buffer_access_type == ubo_load_access) {
-   base_ir->insert_before(assign(deref->clone(mem_ctx, NULL),
- ubo_load(deref_type, chan_offset),
- (1U << i)));
-} else {
-   ir_call *load_ssbo = ssbo_load(deref_type, chan_offset);
-   base_ir->insert_before(load_ssbo);
-   ir_rvalue *value = 
load_ssbo->return_deref->as_rvalue()->clone(mem_ctx, NULL);
-   base_ir->insert_before(assign(deref->clone(mem_ctx, NULL),
- value,
- (1U << i)));
-}
- }
+ if (!is_write || ((1U << i) & write_mask))
+insert_buffer_access(deref, deref_type, chan_offset, (1U << i), i);
   }
}

[Mesa-dev] [PATCH 26/36] i965/fs: Handle nir shared variable store intrinsic function

2015-11-14 Thread Jordan Justen

Signed-off-by: Jordan Justen 
---

Notes:
I have ported this commit to shared variable stores:

commit 0cb7d7b4b7c32246d4c4225a1d17d7ff79a7526d
Author: Kristian Høgsberg Kristensen 
Date:   Wed Oct 21 23:43:34 2015 -0700

i965/fs: Optimize ssbo stores

It is not included because it regresses
ES31-CTS.compute_shader.shared-max.

 src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 54 
 1 file changed, 54 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
index e9336fd..c8c6370 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
@@ -2330,6 +2330,60 @@ fs_visitor::nir_emit_intrinsic(const fs_builder &bld, 
nir_intrinsic_instr *instr
   break;
}
 
+   case nir_intrinsic_store_shared_indirect:
+  has_indirect = true;
+  /* fallthrough */
+   case nir_intrinsic_store_shared: {
+  assert(devinfo->gen >= 7);
+
+  /* Block index */
+  fs_reg surf_index;
+  unsigned index = BRW_SLM_SURFACE_INDEX;
+  surf_index = fs_reg(index);
+
+  /* Offset */
+  fs_reg offset_reg = vgrf(glsl_type::uint_type);
+  unsigned const_offset_bytes = 0;
+  if (has_indirect) {
+ bld.MOV(offset_reg, get_nir_src(instr->src[1]));
+  } else {
+ const_offset_bytes = instr->const_index[0];
+ bld.MOV(offset_reg, fs_reg(const_offset_bytes));
+  }
+
+  /* Value */
+  fs_reg val_reg = get_nir_src(instr->src[0]);
+
+  /* Writemask */
+  unsigned writemask = instr->const_index[1];
+
+  /* Write each component present in the writemask */
+  unsigned skipped_channels = 0;
+  for (int i = 0; i < instr->num_components; i++) {
+ int component_mask = 1 << i;
+ if (writemask & component_mask) {
+if (skipped_channels) {
+   if (!has_indirect) {
+  const_offset_bytes += 4 * skipped_channels;
+  bld.MOV(offset_reg, fs_reg(const_offset_bytes));
+   } else {
+  bld.ADD(offset_reg, offset_reg,
+   brw_imm_ud(4 * skipped_channels));
+   }
+   skipped_channels = 0;
+}
+
+emit_untyped_write(bld, surf_index, offset_reg,
+   offset(val_reg, bld, i),
+   1 /* dims */, 1 /* size */,
+   BRW_PREDICATE_NONE);
+ }
+
+ skipped_channels++;
+  }
+  break;
+   }
+
case nir_intrinsic_load_input_indirect:
   has_indirect = true;
   /* fallthrough */
-- 
2.6.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 31/36] glsl: Check for SSBO variable in SSBO atomic lowering

2015-11-14 Thread Jordan Justen

When an atomic function is called, we need to check to see if it is
for an SSBO variable before lowering it to the SSBO specific intrinsic
function.

Signed-off-by: Jordan Justen 
Cc: Samuel Iglesias Gonsalvez 
Cc: Iago Toral Quiroga 
---
 src/glsl/lower_ubo_reference.cpp | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/src/glsl/lower_ubo_reference.cpp b/src/glsl/lower_ubo_reference.cpp
index a64e9d7..d083936 100644
--- a/src/glsl/lower_ubo_reference.cpp
+++ b/src/glsl/lower_ubo_reference.cpp
@@ -855,6 +855,20 @@ 
lower_ubo_reference_visitor::lower_ssbo_atomic_intrinsic(ir_call *ir)
 ir_call *
 lower_ubo_reference_visitor::check_for_ssbo_atomic_intrinsic(ir_call *ir)
 {
+   exec_list& params = ir->actual_parameters;
+
+   if (params.length() < 2)
+  return ir;
+
+   ir_rvalue *rvalue =
+  ((ir_instruction *) params.get_head())->as_rvalue();
+   if (!rvalue)
+  return ir;
+
+   ir_variable *var = rvalue->variable_referenced();
+   if (!var || !var->is_in_buffer_block())
+  return ir;
+
const char *callee = ir->callee_name();
if (!strcmp("__intrinsic_atomic_add", callee) ||
!strcmp("__intrinsic_atomic_min", callee) ||
-- 
2.6.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 04/36] i965/gen8: Don't add workaround bits to PIPE_CONTROL stalls if DC flush is set.

2015-11-14 Thread Jordan Justen

From: Francisco Jerez 

According to the hardware docs a DC flush is sufficient to make
CS_STALL happy, there's no need to add STALL_AT_SCOREBOARD whenever
it's present.
---
 src/mesa/drivers/dri/i965/brw_pipe_control.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/brw_pipe_control.c 
b/src/mesa/drivers/dri/i965/brw_pipe_control.c
index a2aef8a..ae3d818 100644
--- a/src/mesa/drivers/dri/i965/brw_pipe_control.c
+++ b/src/mesa/drivers/dri/i965/brw_pipe_control.c
@@ -36,6 +36,7 @@
  *  - Stall at Pixel Scoreboard
  *  - Post-Sync Operation
  *  - Depth Stall
+ *  - DC Flush Enable
  *
  * I chose "Stall at Pixel Scoreboard" since we've used it effectively
  * in the past, but the choice is fairly arbitrary.
@@ -49,7 +50,8 @@ gen8_add_cs_stall_workaround_bits(uint32_t *flags)
   PIPE_CONTROL_WRITE_DEPTH_COUNT |
   PIPE_CONTROL_WRITE_TIMESTAMP |
   PIPE_CONTROL_STALL_AT_SCOREBOARD |
-  PIPE_CONTROL_DEPTH_STALL;
+  PIPE_CONTROL_DEPTH_STALL |
+  PIPE_CONTROL_DATA_CACHE_INVALIDATE;
 
/* If we're doing a CS stall, and don't already have one of the
 * workaround bits set, add "Stall at Pixel Scoreboard."
-- 
2.6.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 16/36] glsl ubo/ssbo: Add lower_buffer_access class

2015-11-14 Thread Jordan Justen

This class has code that will be shared by lower_ubo_reference and
lower_shared_reference. (lower_shared_reference will be used to
support compute shader shared variables.)

Signed-off-by: Jordan Justen 
Cc: Samuel Iglesias Gonsalvez 
Cc: Iago Toral Quiroga 
---
 src/glsl/Makefile.sources|   1 +
 src/glsl/lower_buffer_access.cpp | 307 +++
 src/glsl/lower_buffer_access.h   |  56 +++
 src/glsl/lower_ubo_reference.cpp | 180 +--
 4 files changed, 367 insertions(+), 177 deletions(-)
 create mode 100644 src/glsl/lower_buffer_access.cpp
 create mode 100644 src/glsl/lower_buffer_access.h

diff --git a/src/glsl/Makefile.sources b/src/glsl/Makefile.sources
index d4b02c1..f2c95c0 100644
--- a/src/glsl/Makefile.sources
+++ b/src/glsl/Makefile.sources
@@ -155,6 +155,7 @@ LIBGLSL_FILES = \
loop_analysis.h \
loop_controls.cpp \
loop_unroll.cpp \
+   lower_buffer_access.cpp \
lower_clip_distance.cpp \
lower_const_arrays_to_uniforms.cpp \
lower_discard.cpp \
diff --git a/src/glsl/lower_buffer_access.cpp b/src/glsl/lower_buffer_access.cpp
new file mode 100644
index 000..e0b5a2f
--- /dev/null
+++ b/src/glsl/lower_buffer_access.cpp
@@ -0,0 +1,307 @@
+/*
+ * Copyright (c) 2015 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+/**
+ * \file lower_buffer_access.cpp
+ *
+ * Helper for IR lowering pass to replace dereferences of buffer object based
+ * shader variables with intrinsic function calls.
+ *
+ * This helper is used by lowering passes for UBOs, SSBOs and compute shader
+ * shared variables.
+ */
+
+#include "ir.h"
+#include "ir_builder.h"
+#include "ir_rvalue_visitor.h"
+#include "main/macros.h"
+#include "util/list.h"
+#include "glsl_parser_extras.h"
+#include "lower_buffer_access.h"
+
+using namespace ir_builder;
+
+namespace lower_buffer_access {
+
+static inline int
+writemask_for_size(unsigned n)
+{
+   return ((1 << n) - 1);
+}
+
+/**
+ * Takes LHS and emits a series of assignments into its components
+ * from the shared variable storage.
+ *
+ * Recursively calls itself to break the deref down to the point that
+ * the intrinsic calls are generated.
+ */
+void
+lower_buffer_access::emit_access(bool is_write,
+ ir_dereference *deref,
+ ir_variable *base_offset,
+ unsigned int deref_offset,
+ bool row_major,
+ int matrix_columns,
+ unsigned int packing,
+ unsigned int write_mask)
+{
+   if (deref->type->is_record()) {
+  unsigned int field_offset = 0;
+
+  for (unsigned i = 0; i < deref->type->length; i++) {
+ const struct glsl_struct_field *field =
+&deref->type->fields.structure[i];
+ ir_dereference *field_deref =
+new(mem_ctx) ir_dereference_record(deref->clone(mem_ctx, NULL),
+   field->name);
+
+ field_offset =
+glsl_align(field_offset,
+   field->type->std140_base_alignment(row_major));
+
+ emit_access(is_write, field_deref, base_offset,
+ deref_offset + field_offset,
+ row_major, 1, packing,
+ writemask_for_size(field_deref->type->vector_elements));
+
+ field_offset += field->type->std140_size(row_major);
+  }
+  return;
+   }
+
+   if (deref->type->is_array()) {
+  unsigned array_stride = packing == GLSL_INTERFACE_PACKING_STD430 ?
+ deref->type->fields.array->std430_array_stride(row_major) :
+ glsl_align(deref->type->fields.array->std140_size(row_major), 16);
+
+  for (unsigned i = 0; i < deref->type->length; i++) {
+ ir_constant *element

[Mesa-dev] [PATCH 07/36] i965/hsw: Enable L3 atomics.

2015-11-14 Thread Jordan Justen

From: Francisco Jerez 

Improves performance of the arb_shader_image_load_store-atomicity
piglit test by over 25x (which isn't a real benchmark it's just heavy
on atomics -- the improvement in a microbenchmark I wrote a while ago
seemed to be even greater).  The drawback is one needs to be
extra-careful not to hang the GPU (in fact the whole system).  A DC
partition must have been allocated on L3, the "convert L3 cycle for DC
to UC" bit may not be set, the MOCS L3 cacheability bit must be set
for all surfaces accessed using DC atomics, and the SCRATCH1 and
ROW_CHICKEN3 bits must be kept in sync.

A fairly recent kernel is required for the command parser to allow
writes to these registers.
---
 src/mesa/drivers/dri/i965/gen7_l3_state.c | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/gen7_l3_state.c 
b/src/mesa/drivers/dri/i965/gen7_l3_state.c
index 48bca29..c863b7f 100644
--- a/src/mesa/drivers/dri/i965/gen7_l3_state.c
+++ b/src/mesa/drivers/dri/i965/gen7_l3_state.c
@@ -254,5 +254,19 @@ setup_l3_config(struct brw_context *brw, const struct 
brw_l3_config *cfg)
 SET_FIELD(cfg->n[L3P_T], GEN7_L3CNTLREG3_T_ALLOC));
 
   ADVANCE_BATCH();
+
+  if (brw->is_haswell && brw->intelScreen->cmd_parser_version >= 4) {
+ /* Enable L3 atomics on HSW if we have a DC partition, otherwise keep
+  * them disabled to avoid crashing the system hard.
+  */
+ BEGIN_BATCH(5);
+ OUT_BATCH(MI_LOAD_REGISTER_IMM | (5 - 2));
+ OUT_BATCH(HSW_SCRATCH1);
+ OUT_BATCH(has_dc ? 0 : HSW_SCRATCH1_L3_ATOMIC_DISABLE);
+ OUT_BATCH(HSW_ROW_CHICKEN3);
+ OUT_BATCH(HSW_ROW_CHICKEN3_L3_ATOMIC_DISABLE << 16 |
+   (has_dc ? 0 : HSW_ROW_CHICKEN3_L3_ATOMIC_DISABLE));
+ ADVANCE_BATCH();
+  }
}
 }
-- 
2.6.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 18/36] glsl ubo/ssbo: Move common code into lower_buffer_access::setup_buffer_access

2015-11-14 Thread Jordan Justen

This code will also be usable by the pass to lower shared variables.

Note, that *const_offset is adjusted by setup_buffer_access so it must
be initialized before calling setup_buffer_access.

Signed-off-by: Jordan Justen 
Cc: Samuel Iglesias Gonsalvez 
Cc: Iago Toral Quiroga 
---
 src/glsl/lower_buffer_access.cpp | 167 +++
 src/glsl/lower_buffer_access.h   |   5 ++
 src/glsl/lower_ubo_reference.cpp | 160 +
 3 files changed, 175 insertions(+), 157 deletions(-)

diff --git a/src/glsl/lower_buffer_access.cpp b/src/glsl/lower_buffer_access.cpp
index b7fc107..87f64a9 100644
--- a/src/glsl/lower_buffer_access.cpp
+++ b/src/glsl/lower_buffer_access.cpp
@@ -394,4 +394,171 @@ 
lower_buffer_access::is_dereferenced_thing_row_major(const ir_rvalue *deref)
return false;
 }
 
+void
+lower_buffer_access::setup_buffer_access(ir_variable *var,
+ ir_rvalue *deref,
+ ir_rvalue **offset,
+ unsigned *const_offset,
+ bool *row_major,
+ int *matrix_columns,
+ unsigned packing)
+{
+   *offset = new(mem_ctx) ir_constant(0u);
+   *row_major = is_dereferenced_thing_row_major(deref);
+   *matrix_columns = 1;
+
+   /* Calculate the offset to the start of the region of the UBO
+* dereferenced by *rvalue.  This may be a variable offset if an
+* array dereference has a variable index.
+*/
+   while (deref) {
+  switch (deref->ir_type) {
+  case ir_type_dereference_variable: {
+ deref = NULL;
+ break;
+  }
+
+  case ir_type_dereference_array: {
+ ir_dereference_array *deref_array = (ir_dereference_array *) deref;
+ unsigned array_stride;
+ if (deref_array->array->type->is_vector()) {
+/* We get this when storing or loading a component out of a vector
+ * with a non-constant index. This happens for v[i] = f where v is
+ * a vector (or m[i][j] = f where m is a matrix). If we don't
+ * lower that here, it gets turned into v = vector_insert(v, i,
+ * f), which loads the entire vector, modifies one component and
+ * then write the entire thing back.  That breaks if another
+ * thread or SIMD channel is modifying the same vector.
+ */
+array_stride = 4;
+if (deref_array->array->type->is_double())
+   array_stride *= 2;
+ } else if (deref_array->array->type->is_matrix() && *row_major) {
+/* When loading a vector out of a row major matrix, the
+ * step between the columns (vectors) is the size of a
+ * float, while the step between the rows (elements of a
+ * vector) is handled below in emit_ubo_loads.
+ */
+array_stride = 4;
+if (deref_array->array->type->is_double())
+   array_stride *= 2;
+*matrix_columns = deref_array->array->type->matrix_columns;
+ } else if (deref_array->type->without_array()->is_interface()) {
+/* We're processing an array dereference of an interface instance
+ * array. The thing being dereferenced *must* be a variable
+ * dereference because interfaces cannot be embedded in other
+ * types. In terms of calculating the offsets for the lowering
+ * pass, we don't care about the array index. All elements of an
+ * interface instance array will have the same offsets relative to
+ * the base of the block that backs them.
+ */
+deref = deref_array->array->as_dereference();
+break;
+ } else {
+/* Whether or not the field is row-major (because it might be a
+ * bvec2 or something) does not affect the array itself. We need
+ * to know whether an array element in its entirety is row-major.
+ */
+const bool array_row_major =
+   is_dereferenced_thing_row_major(deref_array);
+
+/* The array type will give the correct interface packing
+ * information
+ */
+if (packing == GLSL_INTERFACE_PACKING_STD430) {
+   array_stride = 
deref_array->type->std430_array_stride(array_row_major);
+} else {
+   array_stride = deref_array->type->std140_size(array_row_major);
+   array_stride = glsl_align(array_stride, 16);
+}
+ }
+
+ ir_rvalue *array_index = deref_array->array_index;
+ if (array_index->type->base_type == GLSL_TYPE_INT)
+array_index = i2u(array_index);
+
+ ir_constant *const_index =
+array_index->constant_expression_value(NULL);
+ if (const_ind

[Mesa-dev] [PATCH 35/36] nir: Add nir intrinsics for shared variable atomic operations

2015-11-14 Thread Jordan Justen

Signed-off-by: Jordan Justen 
---
 src/glsl/nir/glsl_to_nir.cpp  | 53 +++
 src/glsl/nir/nir_intrinsics.h | 25 
 2 files changed, 78 insertions(+)

diff --git a/src/glsl/nir/glsl_to_nir.cpp b/src/glsl/nir/glsl_to_nir.cpp
index 231870d..d787304 100644
--- a/src/glsl/nir/glsl_to_nir.cpp
+++ b/src/glsl/nir/glsl_to_nir.cpp
@@ -736,6 +736,22 @@ nir_visitor::visit(ir_call *ir)
  op = nir_intrinsic_load_shared;
   } else if (strcmp(ir->callee_name(), "__intrinsic_store_shared") == 0) {
  op = nir_intrinsic_store_shared;
+  } else if (strcmp(ir->callee_name(), "__intrinsic_atomic_add_shared") == 
0) {
+ op = nir_intrinsic_shared_atomic_add;
+  } else if (strcmp(ir->callee_name(), "__intrinsic_atomic_and_shared") == 
0) {
+ op = nir_intrinsic_shared_atomic_and;
+  } else if (strcmp(ir->callee_name(), "__intrinsic_atomic_or_shared") == 
0) {
+ op = nir_intrinsic_shared_atomic_or;
+  } else if (strcmp(ir->callee_name(), "__intrinsic_atomic_xor_shared") == 
0) {
+ op = nir_intrinsic_shared_atomic_xor;
+  } else if (strcmp(ir->callee_name(), "__intrinsic_atomic_min_shared") == 
0) {
+ op = nir_intrinsic_shared_atomic_min;
+  } else if (strcmp(ir->callee_name(), "__intrinsic_atomic_max_shared") == 
0) {
+ op = nir_intrinsic_shared_atomic_max;
+  } else if (strcmp(ir->callee_name(), 
"__intrinsic_atomic_exchange_shared") == 0) {
+ op = nir_intrinsic_shared_atomic_exchange;
+  } else if (strcmp(ir->callee_name(), 
"__intrinsic_atomic_comp_swap_shared") == 0) {
+ op = nir_intrinsic_shared_atomic_comp_swap;
   } else {
  unreachable("not reached");
   }
@@ -1041,6 +1057,43 @@ nir_visitor::visit(ir_call *ir)
  nir_builder_instr_insert(&b, &instr->instr);
  break;
   }
+  case nir_intrinsic_shared_atomic_add:
+  case nir_intrinsic_shared_atomic_min:
+  case nir_intrinsic_shared_atomic_max:
+  case nir_intrinsic_shared_atomic_and:
+  case nir_intrinsic_shared_atomic_or:
+  case nir_intrinsic_shared_atomic_xor:
+  case nir_intrinsic_shared_atomic_exchange:
+  case nir_intrinsic_shared_atomic_comp_swap: {
+ int param_count = ir->actual_parameters.length();
+ assert(param_count == 2 || param_count == 3);
+
+ /* Offset */
+ exec_node *param = ir->actual_parameters.get_head();
+ ir_instruction *inst = (ir_instruction *) param;
+ instr->src[0] = nir_src_for_ssa(evaluate_rvalue(inst->as_rvalue()));
+
+ /* data1 parameter (this is always present) */
+ param = param->get_next();
+ inst = (ir_instruction *) param;
+ instr->src[1] = nir_src_for_ssa(evaluate_rvalue(inst->as_rvalue()));
+
+ /* data2 parameter (only with atomic_comp_swap) */
+ if (param_count == 3) {
+assert(op == nir_intrinsic_shared_atomic_comp_swap);
+param = param->get_next();
+inst = (ir_instruction *) param;
+instr->src[2] =
+   nir_src_for_ssa(evaluate_rvalue(inst->as_rvalue()));
+ }
+
+ /* Atomic result */
+ assert(ir->return_deref);
+ nir_ssa_dest_init(&instr->instr, &instr->dest,
+   ir->return_deref->type->vector_elements, NULL);
+ nir_builder_instr_insert(&b, &instr->instr);
+ break;
+  }
   default:
  unreachable("not reached");
   }
diff --git a/src/glsl/nir/nir_intrinsics.h b/src/glsl/nir/nir_intrinsics.h
index 6912258..31b83fe 100644
--- a/src/glsl/nir/nir_intrinsics.h
+++ b/src/glsl/nir/nir_intrinsics.h
@@ -203,6 +203,31 @@ INTRINSIC(ssbo_atomic_xor, 3, ARR(1, 1, 1), true, 1, 0, 0, 
0)
 INTRINSIC(ssbo_atomic_exchange, 3, ARR(1, 1, 1), true, 1, 0, 0, 0)
 INTRINSIC(ssbo_atomic_comp_swap, 4, ARR(1, 1, 1, 1), true, 1, 0, 0, 0)
 
+/*
+ * CS shared variable atomic intrinsics
+ *
+ * All of the shared variable atomic memory operations read a value from
+ * memory, compute a new value using one of the operations below, write the
+ * new value to memory, and return the original value read.
+ *
+ * All operations take 2 sources except CompSwap that takes 3. These
+ * sources represent:
+ *
+ * 0: The offset into the shared variable storage region that the atomic
+ *operation will operate on.
+ * 1: The data parameter to the atomic function (i.e. the value to add
+ *in shared_atomic_add, etc).
+ * 2: For CompSwap only: the second data parameter.
+ */
+INTRINSIC(shared_atomic_add, 2, ARR(1, 1), true, 1, 0, 0, 0)
+INTRINSIC(shared_atomic_min, 2, ARR(1, 1), true, 1, 0, 0, 0)
+INTRINSIC(shared_atomic_max, 2, ARR(1, 1), true, 1, 0, 0, 0)
+INTRINSIC(shared_atomic_and, 2, ARR(1, 1), true, 1, 0, 0, 0)
+INTRINSIC(shared_atomic_or, 2, ARR(1, 1), true, 1, 0, 0, 0)
+INTRINSIC(shared_atomic_xor, 2, ARR(1, 1), true, 1, 0, 0, 0)
+INTRINSIC(shared_atomic_exchange, 2, ARR(1, 1), true, 1, 0, 0, 0

[Mesa-dev] [PATCH 25/36] i965/fs: Handle nir shared variable load intrinsic

2015-11-14 Thread Jordan Justen

Signed-off-by: Jordan Justen 
---
 src/mesa/drivers/dri/i965/brw_defines.h  |  2 ++
 src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 33 
 2 files changed, 35 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_defines.h 
b/src/mesa/drivers/dri/i965/brw_defines.h
index 6484484..e178ce5 100644
--- a/src/mesa/drivers/dri/i965/brw_defines.h
+++ b/src/mesa/drivers/dri/i965/brw_defines.h
@@ -2851,4 +2851,6 @@ enum brw_wm_barycentric_interp_mode {
 # define GPGPU_WALKER_THREAD_WIDTH_MAX_SHIFT0
 # define GPGPU_WALKER_THREAD_WIDTH_MAX_MASK INTEL_MASK(5, 0)
 
+#define BRW_SLM_SURFACE_INDEX 0xfe
+
 #endif
diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
index c282f83..e9336fd 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
@@ -33,6 +33,9 @@
 #include "brw_fs_surface_builder.h"
 #include "brw_vec4_gs_visitor.h"
 
+#include "brw_fs_builder.h"
+#include "brw_fs_surface_builder.h"
+
 using namespace brw;
 using namespace brw::surface_access;
 
@@ -2297,6 +2300,36 @@ fs_visitor::nir_emit_intrinsic(const fs_builder &bld, 
nir_intrinsic_instr *instr
   break;
}
 
+   case nir_intrinsic_load_shared_indirect:
+  has_indirect = true;
+  /* fallthrough */
+   case nir_intrinsic_load_shared: {
+  assert(devinfo->gen >= 7);
+
+  fs_reg surf_index;
+  unsigned index = BRW_SLM_SURFACE_INDEX;
+  surf_index = fs_reg(index);
+
+  /* Get the offset to read from */
+  fs_reg offset_reg;
+  if (has_indirect) {
+ offset_reg = get_nir_src(instr->src[0]);
+  } else {
+ offset_reg = fs_reg(instr->const_index[0]);
+  }
+
+  /* Read the vector */
+  fs_reg read_result = emit_untyped_read(bld, surf_index, offset_reg,
+ 1 /* dims */,
+ instr->num_components,
+ BRW_PREDICATE_NONE);
+  read_result.type = dest.type;
+  for (int i = 0; i < instr->num_components; i++)
+ bld.MOV(offset(dest, bld, i), offset(read_result, bld, i));
+
+  break;
+   }
+
case nir_intrinsic_load_input_indirect:
   has_indirect = true;
   /* fallthrough */
-- 
2.6.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 30/36] glsl: Replace atomic_ssbo and ssbo_atomic with atomic

2015-11-14 Thread Jordan Justen

The atomic functions can also be used with shared variables in compute
shaders.

When lowering the intrinsic in lower_ubo_reference, we still create an
SSBO specific intrinsic since SSBO accesses can be indirectly
addressed, whereas all compute shader shared variable live in a single
shared variable area.

Signed-off-by: Jordan Justen 
Cc: Samuel Iglesias Gonsalvez 
Cc: Iago Toral Quiroga 
---
 src/glsl/builtin_functions.cpp   | 230 +++
 src/glsl/lower_ubo_reference.cpp |  18 +--
 src/glsl/nir/glsl_to_nir.cpp |  16 +--
 3 files changed, 132 insertions(+), 132 deletions(-)

diff --git a/src/glsl/builtin_functions.cpp b/src/glsl/builtin_functions.cpp
index 1349444..3e767e8 100644
--- a/src/glsl/builtin_functions.cpp
+++ b/src/glsl/builtin_functions.cpp
@@ -759,16 +759,16 @@ private:
ir_function_signature *_atomic_counter_op(const char *intrinsic,
  builtin_available_predicate 
avail);
 
-   ir_function_signature *_atomic_ssbo_intrinsic2(builtin_available_predicate 
avail,
-  const glsl_type *type);
-   ir_function_signature *_atomic_ssbo_op2(const char *intrinsic,
-   builtin_available_predicate avail,
-   const glsl_type *type);
-   ir_function_signature *_atomic_ssbo_intrinsic3(builtin_available_predicate 
avail,
-  const glsl_type *type);
-   ir_function_signature *_atomic_ssbo_op3(const char *intrinsic,
-   builtin_available_predicate avail,
-   const glsl_type *type);
+   ir_function_signature *_atomic_intrinsic2(builtin_available_predicate avail,
+ const glsl_type *type);
+   ir_function_signature *_atomic_op2(const char *intrinsic,
+  builtin_available_predicate avail,
+  const glsl_type *type);
+   ir_function_signature *_atomic_intrinsic3(builtin_available_predicate avail,
+ const glsl_type *type);
+   ir_function_signature *_atomic_op3(const char *intrinsic,
+  builtin_available_predicate avail,
+  const glsl_type *type);
 
B1(min3)
B1(max3)
@@ -915,53 +915,53 @@ builtin_builder::create_intrinsics()
 _atomic_counter_intrinsic(shader_atomic_counters),
 NULL);
 
-   add_function("__intrinsic_ssbo_atomic_add",
-_atomic_ssbo_intrinsic2(shader_storage_buffer_object,
-glsl_type::uint_type),
-_atomic_ssbo_intrinsic2(shader_storage_buffer_object,
-glsl_type::int_type),
-NULL);
-   add_function("__intrinsic_ssbo_atomic_min",
-_atomic_ssbo_intrinsic2(shader_storage_buffer_object,
-glsl_type::uint_type),
-_atomic_ssbo_intrinsic2(shader_storage_buffer_object,
-glsl_type::int_type),
-NULL);
-   add_function("__intrinsic_ssbo_atomic_max",
-_atomic_ssbo_intrinsic2(shader_storage_buffer_object,
-glsl_type::uint_type),
-_atomic_ssbo_intrinsic2(shader_storage_buffer_object,
-glsl_type::int_type),
-NULL);
-   add_function("__intrinsic_ssbo_atomic_and",
-_atomic_ssbo_intrinsic2(shader_storage_buffer_object,
-glsl_type::uint_type),
-_atomic_ssbo_intrinsic2(shader_storage_buffer_object,
-glsl_type::int_type),
-NULL);
-   add_function("__intrinsic_ssbo_atomic_or",
-_atomic_ssbo_intrinsic2(shader_storage_buffer_object,
-glsl_type::uint_type),
-_atomic_ssbo_intrinsic2(shader_storage_buffer_object,
-glsl_type::int_type),
-NULL);
-   add_function("__intrinsic_ssbo_atomic_xor",
-_atomic_ssbo_intrinsic2(shader_storage_buffer_object,
-glsl_type::uint_type),
-_atomic_ssbo_intrinsic2(shader_storage_buffer_object,
-glsl_type::int_type),
-NULL);
-   add_function("__intrinsic_ssbo_atomic_exchange",
-_atomic_ssbo_intrinsic2(shader_storage_buffer_object,
-glsl_type::uint_type),
-_atomic_ssbo_intrinsic2(shader_storage_buffer_object,
-glsl_type::int_type),
-

[Mesa-dev] [PATCH 27/36] i965: Enable shared local memory for CS shared variables

2015-11-14 Thread Jordan Justen

Signed-off-by: Jordan Justen 
---
 src/mesa/drivers/dri/i965/brw_cs.c|  2 ++
 src/mesa/drivers/dri/i965/brw_defines.h   |  2 ++
 src/mesa/drivers/dri/i965/gen7_cs_state.c | 12 
 3 files changed, 16 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_cs.c 
b/src/mesa/drivers/dri/i965/brw_cs.c
index 263d224..704b00d 100644
--- a/src/mesa/drivers/dri/i965/brw_cs.c
+++ b/src/mesa/drivers/dri/i965/brw_cs.c
@@ -69,6 +69,8 @@ brw_codegen_cs_prog(struct brw_context *brw,
 
memset(&prog_data, 0, sizeof(prog_data));
 
+   prog_data.base.total_shared = prog->Comp.SharedSize;
+
assign_cs_binding_table_offsets(brw->intelScreen->devinfo, prog,
&cp->program.Base, &prog_data);
 
diff --git a/src/mesa/drivers/dri/i965/brw_defines.h 
b/src/mesa/drivers/dri/i965/brw_defines.h
index e178ce5..9a83a74 100644
--- a/src/mesa/drivers/dri/i965/brw_defines.h
+++ b/src/mesa/drivers/dri/i965/brw_defines.h
@@ -2830,6 +2830,8 @@ enum brw_wm_barycentric_interp_mode {
 /* GEN7 DW5, GEN8+ DW6 */
 # define MEDIA_BARRIER_ENABLE_SHIFT 21
 # define MEDIA_BARRIER_ENABLE_MASK  INTEL_MASK(21, 21)
+# define MEDIA_SHARED_LOCAL_MEMORY_SIZE_SHIFT   16
+# define MEDIA_SHARED_LOCAL_MEMORY_SIZE_MASKINTEL_MASK(20, 16)
 # define MEDIA_GPGPU_THREAD_COUNT_SHIFT 0
 # define MEDIA_GPGPU_THREAD_COUNT_MASK  INTEL_MASK(7, 0)
 # define GEN8_MEDIA_GPGPU_THREAD_COUNT_SHIFT0
diff --git a/src/mesa/drivers/dri/i965/gen7_cs_state.c 
b/src/mesa/drivers/dri/i965/gen7_cs_state.c
index 2d7c04f..344ea5a 100644
--- a/src/mesa/drivers/dri/i965/gen7_cs_state.c
+++ b/src/mesa/drivers/dri/i965/gen7_cs_state.c
@@ -164,8 +164,20 @@ brw_upload_cs_state(struct brw_context *brw)
   SET_FIELD(threads, GEN8_MEDIA_GPGPU_THREAD_COUNT) :
   SET_FIELD(threads, MEDIA_GPGPU_THREAD_COUNT);
assert(threads <= brw->max_cs_threads);
+
+   assert(prog_data->total_shared <= 64 * 1024);
+   uint32_t slm_size = 0;
+   if (prog_data->total_shared > 0) {
+  /* slm_size is in 4k increments, but must be a power of 2. */
+  slm_size = 4 * 1024;
+  while (slm_size < 64 * 1024 && slm_size < prog_data->total_shared)
+ slm_size <<= 1;
+  slm_size /= 4 * 1024;
+   }
+
desc[dw++] =
   SET_FIELD(cs_prog_data->uses_barrier, MEDIA_BARRIER_ENABLE) |
+  SET_FIELD(slm_size, MEDIA_SHARED_LOCAL_MEMORY_SIZE) |
   media_threads;
 
BEGIN_BATCH(4);
-- 
2.6.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 05/36] i965: Import tables enumerating the set of validated L3 configurations.

2015-11-14 Thread Jordan Justen

From: Francisco Jerez 

It should be possible to use additional L3 configurations other than
the ones listed in the tables of validated allocations ("BSpec »
3D-Media-GPGPU Engine » L3 Cache and URB [IVB+] » L3 Cache and URB [*]
» L3 Allocation and Programming"), but it seems sensible for now to
hard-code the tables in order to stick to the hardware docs.  Instead
of setting up the arbitrary L3 partitioning given as input, the
closest validated L3 configuration will be looked up in these tables
and used to program the hardware.

The included tables should work for Gen7-9.  Note that the quantities
are specified in ways rather than in KB, this is because the L3
control registers expect the value in ways, and because by doing that
we can re-use a single table for all GT variants of the same
generation (and in the case of IVB/HSW and CHV/SKL across different
generations) which generally have different L3 way sizes but allow the
same combinations of way allocations.
---
 src/mesa/drivers/dri/i965/Makefile.sources |   1 +
 src/mesa/drivers/dri/i965/gen7_l3_state.c  | 163 +
 2 files changed, 164 insertions(+)
 create mode 100644 src/mesa/drivers/dri/i965/gen7_l3_state.c

diff --git a/src/mesa/drivers/dri/i965/Makefile.sources 
b/src/mesa/drivers/dri/i965/Makefile.sources
index 5a88d66..91901ad 100644
--- a/src/mesa/drivers/dri/i965/Makefile.sources
+++ b/src/mesa/drivers/dri/i965/Makefile.sources
@@ -184,6 +184,7 @@ i965_FILES = \
gen7_cs_state.c \
gen7_disable.c \
gen7_gs_state.c \
+   gen7_l3_state.c \
gen7_misc_state.c \
gen7_sf_state.c \
gen7_sol_state.c \
diff --git a/src/mesa/drivers/dri/i965/gen7_l3_state.c 
b/src/mesa/drivers/dri/i965/gen7_l3_state.c
new file mode 100644
index 000..8f9ba5b
--- /dev/null
+++ b/src/mesa/drivers/dri/i965/gen7_l3_state.c
@@ -0,0 +1,163 @@
+/*
+ * Copyright (c) 2015 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ */
+
+#include "brw_context.h"
+#include "brw_defines.h"
+#include "brw_state.h"
+#include "intel_batchbuffer.h"
+
+/**
+ * Chunk of L3 cache reserved for some specific purpose.
+ */
+enum brw_l3_partition {
+   /** Shared local memory. */
+   L3P_SLM = 0,
+   /** Unified return buffer. */
+   L3P_URB,
+   /** Union of DC and RO. */
+   L3P_ALL,
+   /** Data cluster RW partition. */
+   L3P_DC,
+   /** Union of IS, C and T. */
+   L3P_RO,
+   /** Instruction and state cache. */
+   L3P_IS,
+   /** Constant cache. */
+   L3P_C,
+   /** Texture cache. */
+   L3P_T,
+   /** Number of supported L3 partitions. */
+   NUM_L3P
+};
+
+/**
+ * L3 configuration represented as the number of ways allocated for each
+ * partition.  \sa get_l3_way_size().
+ */
+struct brw_l3_config {
+   unsigned n[NUM_L3P];
+};
+
+/**
+ * IVB/HSW validated L3 configurations.
+ */
+static const struct brw_l3_config ivb_l3_configs[] = {
+   {{  0, 32,  0,  0, 32,  0,  0,  0 }},
+   {{  0, 32,  0, 16, 16,  0,  0,  0 }},
+   {{  0, 32,  0,  4,  0,  8,  4, 16 }},
+   {{  0, 28,  0,  8,  0,  8,  4, 16 }},
+   {{  0, 28,  0, 16,  0,  8,  4,  8 }},
+   {{  0, 28,  0,  8,  0, 16,  4,  8 }},
+   {{  0, 28,  0,  0,  0, 16,  4, 16 }},
+   {{  0, 32,  0,  0,  0, 16,  0, 16 }},
+   {{  0, 28,  0,  4, 32,  0,  0,  0 }},
+   {{ 16, 16,  0, 16, 16,  0,  0,  0 }},
+   {{ 16, 16,  0,  8,  0,  8,  8,  8 }},
+   {{ 16, 16,  0,  4,  0,  8,  4, 16 }},
+   {{ 16, 16,  0,  4,  0, 16,  4,  8 }},
+   {{ 16, 16,  0,  0, 32,  0,  0,  0 }},
+   {{ 0 }}
+};
+
+/**
+ * VLV validated L3 configurations.
+ */
+static const struct brw_l3_config vlv_l3_configs[] = {
+   {{  0, 80,  0,  0, 16,  0,  0,  0 }},
+   {{  0, 80,  0,  8,  8,  0,  0,  0 }},
+   {{  0, 64,  0, 16, 16,  0,  0,  0 }},
+   {{  0, 64,  0,  0, 32,  0,  0,  0 }},
+   {{  0, 60,  0,  4, 32,  0,  0,  0 }},
+   {{ 32, 32,  0, 16, 16,  0,  0,  0 }},
+   {{ 32, 40,  0,  8, 16,  0,  0,  0 }},
+   {{ 32, 40,

[Mesa-dev] [PATCH 21/36] glsl: Add lowering pass for shared variable references

2015-11-14 Thread Jordan Justen

In this lowering pass, shared variables are decomposed into intrinsic
calls.

Signed-off-by: Jordan Justen 
---
 src/glsl/Makefile.sources   |   1 +
 src/glsl/ir_optimization.h  |   1 +
 src/glsl/linker.cpp |   4 +
 src/glsl/lower_shared_reference.cpp | 360 
 src/mesa/main/mtypes.h  |   7 +
 5 files changed, 373 insertions(+)
 create mode 100644 src/glsl/lower_shared_reference.cpp

diff --git a/src/glsl/Makefile.sources b/src/glsl/Makefile.sources
index f2c95c0..4f20c0f 100644
--- a/src/glsl/Makefile.sources
+++ b/src/glsl/Makefile.sources
@@ -180,6 +180,7 @@ LIBGLSL_FILES = \
lower_vector_insert.cpp \
lower_vertex_id.cpp \
lower_output_reads.cpp \
+   lower_shared_reference.cpp \
lower_ubo_reference.cpp \
opt_algebraic.cpp \
opt_array_splitting.cpp \
diff --git a/src/glsl/ir_optimization.h b/src/glsl/ir_optimization.h
index 2fee81c..dabd80a 100644
--- a/src/glsl/ir_optimization.h
+++ b/src/glsl/ir_optimization.h
@@ -124,6 +124,7 @@ bool lower_const_arrays_to_uniforms(exec_list 
*instructions);
 bool lower_clip_distance(gl_shader *shader);
 void lower_output_reads(unsigned stage, exec_list *instructions);
 bool lower_packing_builtins(exec_list *instructions, int op_mask);
+void lower_shared_reference(struct gl_shader *shader, unsigned *shared_size);
 void lower_ubo_reference(struct gl_shader *shader);
 void lower_packed_varyings(void *mem_ctx,
unsigned locations_used, ir_variable_mode mode,
diff --git a/src/glsl/linker.cpp b/src/glsl/linker.cpp
index db00f8f..dedcf77 100644
--- a/src/glsl/linker.cpp
+++ b/src/glsl/linker.cpp
@@ -4452,6 +4452,10 @@ link_shaders(struct gl_context *ctx, struct 
gl_shader_program *prog)
   if (ctx->Const.ShaderCompilerOptions[i].LowerBufferInterfaceBlocks)
  lower_ubo_reference(prog->_LinkedShaders[i]);
 
+  if (ctx->Const.ShaderCompilerOptions[i].LowerShaderSharedVariables)
+ lower_shared_reference(prog->_LinkedShaders[i],
+&prog->Comp.SharedSize);
+
   lower_vector_derefs(prog->_LinkedShaders[i]);
}
 
diff --git a/src/glsl/lower_shared_reference.cpp 
b/src/glsl/lower_shared_reference.cpp
new file mode 100644
index 000..810c6b6
--- /dev/null
+++ b/src/glsl/lower_shared_reference.cpp
@@ -0,0 +1,360 @@
+/*
+ * Copyright (c) 2015 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+/**
+ * \file lower_shared_reference.cpp
+ *
+ * IR lower pass to replace dereferences of compute shader shared variables
+ * with intrinsic function calls.
+ *
+ * This relieves drivers of the responsibility of allocating space for the
+ * shared variables in the shared memory region.
+ */
+
+#include "ir.h"
+#include "ir_builder.h"
+#include "ir_rvalue_visitor.h"
+#include "main/macros.h"
+#include "util/list.h"
+#include "glsl_parser_extras.h"
+#include "lower_buffer_access.h"
+
+using namespace ir_builder;
+
+namespace {
+
+struct var_offset {
+   struct list_head node;
+   const ir_variable *var;
+   unsigned offset;
+};
+
+class lower_shared_reference_visitor :
+  public lower_buffer_access::lower_buffer_access {
+public:
+
+   lower_shared_reference_visitor(struct gl_shader *shader)
+  : 
lower_buffer_access::lower_buffer_access(GLSL_MATRIX_LAYOUT_COLUMN_MAJOR),
+list_ctx(ralloc_context(NULL)), shader(shader),
+shared_size(0u)
+   {
+  list_inithead(&var_offsets);
+   }
+
+   ~lower_shared_reference_visitor()
+   {
+  ralloc_free(list_ctx);
+   }
+
+   enum {
+  shared_load_access,
+  shared_store_access,
+  shared_atomic_access,
+   } buffer_access_type;
+
+   void insert_buffer_access(ir_dereference *deref, const glsl_type *type,
+ ir_rvalue *offset, unsigned mask, int channel);
+
+   void handle_rvalue(ir_rvalue **rv

[Mesa-dev] [PATCH 19/36] glsl: Add default matrix ordering in lower_buffer_access

2015-11-14 Thread Jordan Justen

For compute shader shared variable we will set a default of column
major.

Signed-off-by: Jordan Justen 
---
 src/glsl/lower_buffer_access.cpp | 5 +++--
 src/glsl/lower_buffer_access.h   | 9 +
 2 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/src/glsl/lower_buffer_access.cpp b/src/glsl/lower_buffer_access.cpp
index 87f64a9..8beec0d 100644
--- a/src/glsl/lower_buffer_access.cpp
+++ b/src/glsl/lower_buffer_access.cpp
@@ -370,8 +370,9 @@ lower_buffer_access::is_dereferenced_thing_row_major(const 
ir_rvalue *deref)
 
  switch (matrix_layout) {
  case GLSL_MATRIX_LAYOUT_INHERITED:
-assert(!matrix);
-return false;
+assert(default_matrix_layout != GLSL_MATRIX_LAYOUT_INHERITED ||
+   !matrix);
+return default_matrix_layout == GLSL_MATRIX_LAYOUT_ROW_MAJOR;
  case GLSL_MATRIX_LAYOUT_COLUMN_MAJOR:
 return false;
  case GLSL_MATRIX_LAYOUT_ROW_MAJOR:
diff --git a/src/glsl/lower_buffer_access.h b/src/glsl/lower_buffer_access.h
index 77cf180..5562870 100644
--- a/src/glsl/lower_buffer_access.h
+++ b/src/glsl/lower_buffer_access.h
@@ -39,6 +39,14 @@ namespace lower_buffer_access {
 
 class lower_buffer_access : public ir_rvalue_enter_visitor {
 public:
+   lower_buffer_access() :
+  default_matrix_layout(GLSL_MATRIX_LAYOUT_INHERITED)
+   {}
+
+   lower_buffer_access(enum glsl_matrix_layout default_matrix_layout) :
+  default_matrix_layout(default_matrix_layout)
+   {}
+
virtual void
insert_buffer_access(ir_dereference *deref, const glsl_type *type,
 ir_rvalue *offset, unsigned mask, int channel) = 0;
@@ -56,6 +64,7 @@ public:
 unsigned packing);
 
void *mem_ctx;
+   enum glsl_matrix_layout default_matrix_layout;
 };
 
 } /* namespace lower_buffer_access */
-- 
2.6.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 24/36] i965: Disable vector splitting on shared variables

2015-11-14 Thread Jordan Justen

Signed-off-by: Jordan Justen 
---
 src/mesa/drivers/dri/i965/brw_fs_vector_splitting.cpp | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/mesa/drivers/dri/i965/brw_fs_vector_splitting.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_vector_splitting.cpp
index cab5af3..2c7e0dc 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_vector_splitting.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_vector_splitting.cpp
@@ -107,6 +107,7 @@ ir_vector_reference_visitor::get_variable_entry(ir_variable 
*var)
switch (var->data.mode) {
case ir_var_uniform:
case ir_var_shader_storage:
+   case ir_var_shader_shared:
case ir_var_shader_in:
case ir_var_shader_out:
case ir_var_system_value:
-- 
2.6.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 23/36] nir: Translate glsl shared var store intrinsic to nir intrinsic

2015-11-14 Thread Jordan Justen

Signed-off-by: Jordan Justen 
---
 src/glsl/nir/glsl_to_nir.cpp  | 33 +
 src/glsl/nir/nir_intrinsics.h |  3 ++-
 2 files changed, 35 insertions(+), 1 deletion(-)

diff --git a/src/glsl/nir/glsl_to_nir.cpp b/src/glsl/nir/glsl_to_nir.cpp
index 67f1aed..cc1719a 100644
--- a/src/glsl/nir/glsl_to_nir.cpp
+++ b/src/glsl/nir/glsl_to_nir.cpp
@@ -734,6 +734,8 @@ nir_visitor::visit(ir_call *ir)
  op = nir_intrinsic_memory_barrier_shared;
   } else if (strcmp(ir->callee_name(), "__intrinsic_load_shared") == 0) {
  op = nir_intrinsic_load_shared;
+  } else if (strcmp(ir->callee_name(), "__intrinsic_store_shared") == 0) {
+ op = nir_intrinsic_store_shared;
   } else {
  unreachable("not reached");
   }
@@ -1008,6 +1010,37 @@ nir_visitor::visit(ir_call *ir)
  nir_builder_instr_insert(&b, &instr->instr);
  break;
   }
+  case nir_intrinsic_store_shared: {
+ exec_node *param = ir->actual_parameters.get_head();
+ ir_rvalue *offset = ((ir_instruction *)param)->as_rvalue();
+
+ param = param->get_next();
+ ir_rvalue *val = ((ir_instruction *)param)->as_rvalue();
+
+ param = param->get_next();
+ ir_constant *write_mask = ((ir_instruction *)param)->as_constant();
+ assert(write_mask);
+
+ /* Check if we need the indirect version */
+ ir_constant *const_offset = offset->as_constant();
+ if (!const_offset) {
+op = nir_intrinsic_store_shared_indirect;
+ralloc_free(instr);
+instr = nir_intrinsic_instr_create(shader, op);
+instr->src[1] = nir_src_for_ssa(evaluate_rvalue(offset));
+instr->const_index[0] = 0;
+ } else {
+instr->const_index[0] = const_offset->value.u[0];
+ }
+
+ instr->const_index[1] = write_mask->value.u[0];
+
+ instr->src[0] = nir_src_for_ssa(evaluate_rvalue(val));
+ instr->num_components = val->type->vector_elements;
+
+ nir_builder_instr_insert(&b, &instr->instr);
+ break;
+  }
   default:
  unreachable("not reached");
   }
diff --git a/src/glsl/nir/nir_intrinsics.h b/src/glsl/nir/nir_intrinsics.h
index de15128..6912258 100644
--- a/src/glsl/nir/nir_intrinsics.h
+++ b/src/glsl/nir/nir_intrinsics.h
@@ -277,5 +277,6 @@ LOAD(shared, 0, 1, NIR_INTRINSIC_CAN_ELIMINATE)
 STORE(output, 0, 0, 0, 0)
 STORE(per_vertex_output, 1, 1, 0, 0)
 STORE(ssbo, 1, 1, 1, 0)
+STORE(shared, 0, 0, 0, 0)
 
-LAST_INTRINSIC(store_ssbo_indirect)
+LAST_INTRINSIC(store_shared_indirect)
-- 
2.6.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 13/36] i965: Hook up L3 partitioning state atom.

2015-11-14 Thread Jordan Justen

From: Francisco Jerez 

---
 src/mesa/drivers/dri/i965/brw_context.h  | 4 ++--
 src/mesa/drivers/dri/i965/brw_state_upload.c | 4 
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_context.h 
b/src/mesa/drivers/dri/i965/brw_context.h
index 618d785..44d404a 100644
--- a/src/mesa/drivers/dri/i965/brw_context.h
+++ b/src/mesa/drivers/dri/i965/brw_context.h
@@ -1176,8 +1176,8 @@ struct brw_context
} perfmon;
 
int num_atoms[BRW_NUM_PIPELINES];
-   const struct brw_tracked_state render_atoms[60];
-   const struct brw_tracked_state compute_atoms[9];
+   const struct brw_tracked_state render_atoms[61];
+   const struct brw_tracked_state compute_atoms[10];
 
/* If (INTEL_DEBUG & DEBUG_BATCH) */
struct {
diff --git a/src/mesa/drivers/dri/i965/brw_state_upload.c 
b/src/mesa/drivers/dri/i965/brw_state_upload.c
index aab5c91..cd1fbee 100644
--- a/src/mesa/drivers/dri/i965/brw_state_upload.c
+++ b/src/mesa/drivers/dri/i965/brw_state_upload.c
@@ -186,6 +186,7 @@ static const struct brw_tracked_state *gen7_render_atoms[] =
&brw_cc_vp,
&gen7_sf_clip_viewport,
 
+   &gen7_l3_state,
&gen7_push_constant_space,
&gen7_urb,
&gen6_blend_state,  /* must do before cc unit */
@@ -257,6 +258,7 @@ static const struct brw_tracked_state *gen7_render_atoms[] =
 static const struct brw_tracked_state *gen7_compute_atoms[] =
 {
&brw_state_base_address,
+   &gen7_l3_state,
&brw_cs_image_surfaces,
&gen7_cs_push_constants,
&brw_cs_pull_constants,
@@ -275,6 +277,7 @@ static const struct brw_tracked_state *gen8_render_atoms[] =
&brw_cc_vp,
&gen8_sf_clip_viewport,
 
+   &gen7_l3_state,
&gen7_push_constant_space,
&gen7_urb,
&gen8_blend_state,
@@ -352,6 +355,7 @@ static const struct brw_tracked_state *gen8_render_atoms[] =
 static const struct brw_tracked_state *gen8_compute_atoms[] =
 {
&gen8_state_base_address,
+   &gen7_l3_state,
&brw_cs_image_surfaces,
&gen7_cs_push_constants,
&brw_cs_pull_constants,
-- 
2.6.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 33/36] glsl: Buffer atomics are supported for compute shaders

2015-11-14 Thread Jordan Justen

Signed-off-by: Jordan Justen 
---
 src/glsl/builtin_functions.cpp | 70 +++---
 1 file changed, 38 insertions(+), 32 deletions(-)

diff --git a/src/glsl/builtin_functions.cpp b/src/glsl/builtin_functions.cpp
index 3e767e8..bd4c5a3 100644
--- a/src/glsl/builtin_functions.cpp
+++ b/src/glsl/builtin_functions.cpp
@@ -465,6 +465,12 @@ compute_shader(const _mesa_glsl_parse_state *state)
 }
 
 static bool
+buffer_atomics_supported(const _mesa_glsl_parse_state *state)
+{
+   return compute_shader(state) || shader_storage_buffer_object(state);
+}
+
+static bool
 barrier_supported(const _mesa_glsl_parse_state *state)
 {
return compute_shader(state) ||
@@ -916,51 +922,51 @@ builtin_builder::create_intrinsics()
 NULL);
 
add_function("__intrinsic_atomic_add",
-_atomic_intrinsic2(shader_storage_buffer_object,
+_atomic_intrinsic2(buffer_atomics_supported,
glsl_type::uint_type),
-_atomic_intrinsic2(shader_storage_buffer_object,
+_atomic_intrinsic2(buffer_atomics_supported,
glsl_type::int_type),
 NULL);
add_function("__intrinsic_atomic_min",
-_atomic_intrinsic2(shader_storage_buffer_object,
+_atomic_intrinsic2(buffer_atomics_supported,
glsl_type::uint_type),
-_atomic_intrinsic2(shader_storage_buffer_object,
+_atomic_intrinsic2(buffer_atomics_supported,
glsl_type::int_type),
 NULL);
add_function("__intrinsic_atomic_max",
-_atomic_intrinsic2(shader_storage_buffer_object,
+_atomic_intrinsic2(buffer_atomics_supported,
glsl_type::uint_type),
-_atomic_intrinsic2(shader_storage_buffer_object,
+_atomic_intrinsic2(buffer_atomics_supported,
glsl_type::int_type),
 NULL);
add_function("__intrinsic_atomic_and",
-_atomic_intrinsic2(shader_storage_buffer_object,
+_atomic_intrinsic2(buffer_atomics_supported,
glsl_type::uint_type),
-_atomic_intrinsic2(shader_storage_buffer_object,
+_atomic_intrinsic2(buffer_atomics_supported,
glsl_type::int_type),
 NULL);
add_function("__intrinsic_atomic_or",
-_atomic_intrinsic2(shader_storage_buffer_object,
+_atomic_intrinsic2(buffer_atomics_supported,
glsl_type::uint_type),
-_atomic_intrinsic2(shader_storage_buffer_object,
+_atomic_intrinsic2(buffer_atomics_supported,
glsl_type::int_type),
 NULL);
add_function("__intrinsic_atomic_xor",
-_atomic_intrinsic2(shader_storage_buffer_object,
+_atomic_intrinsic2(buffer_atomics_supported,
glsl_type::uint_type),
-_atomic_intrinsic2(shader_storage_buffer_object,
+_atomic_intrinsic2(buffer_atomics_supported,
glsl_type::int_type),
 NULL);
add_function("__intrinsic_atomic_exchange",
-_atomic_intrinsic2(shader_storage_buffer_object,
+_atomic_intrinsic2(buffer_atomics_supported,
glsl_type::uint_type),
-_atomic_intrinsic2(shader_storage_buffer_object,
+_atomic_intrinsic2(buffer_atomics_supported,
glsl_type::int_type),
 NULL);
add_function("__intrinsic_atomic_comp_swap",
-_atomic_intrinsic3(shader_storage_buffer_object,
+_atomic_intrinsic3(buffer_atomics_supported,
glsl_type::uint_type),
-_atomic_intrinsic3(shader_storage_buffer_object,
+_atomic_intrinsic3(buffer_atomics_supported,
glsl_type::int_type),
 NULL);
 
@@ -2658,66 +2664,66 @@ builtin_builder::create_builtins()
 
add_function("atomicAdd",
 _atomic_op2("__intrinsic_atomic_add",
-shader_storage_buffer_object,
+buffer_atomics_supported,
 glsl_type::uint_type),
 _atomic_op2("__intrinsic_atomic_add",
-shader_storage_buffer_object,
+buffer_atomics_supported,
 glsl_type::int_type),
 NULL);
add_function("atomicMin",
 _atomic_op2("__intrinsic_atomic_min",
-shader_storage_buffer_

[Mesa-dev] [PATCH 14/36] glsl ubo/ssbo: Use enum to track current buffer access type

2015-11-14 Thread Jordan Justen

Signed-off-by: Jordan Justen 
Cc: Samuel Iglesias Gonsalvez 
Cc: Iago Toral Quiroga 
---
 src/glsl/lower_ubo_reference.cpp | 26 +-
 1 file changed, 21 insertions(+), 5 deletions(-)

diff --git a/src/glsl/lower_ubo_reference.cpp b/src/glsl/lower_ubo_reference.cpp
index b74aa3d..41012db 100644
--- a/src/glsl/lower_ubo_reference.cpp
+++ b/src/glsl/lower_ubo_reference.cpp
@@ -162,6 +162,14 @@ public:
ir_call *ssbo_store(ir_rvalue *deref, ir_rvalue *offset,
unsigned write_mask);
 
+   enum {
+  ubo_load_access,
+  ssbo_load_access,
+  ssbo_store_access,
+  ssbo_get_array_length,
+  ssbo_atomic_access,
+   } buffer_access_type;
+
void emit_access(bool is_write, ir_dereference *deref,
 ir_variable *base_offset, unsigned int deref_offset,
 bool row_major, int matrix_columns,
@@ -189,7 +197,6 @@ public:
struct gl_uniform_buffer_variable *ubo_var;
ir_rvalue *uniform_block;
bool progress;
-   bool is_shader_storage;
 };
 
 /**
@@ -339,10 +346,9 @@ 
lower_ubo_reference_visitor::setup_for_load_or_store(ir_variable *var,
deref, &nonconst_block_index);
 
/* Locate the block by interface name */
-   this->is_shader_storage = var->is_in_shader_storage_block();
unsigned num_blocks;
struct gl_uniform_block **blocks;
-   if (this->is_shader_storage) {
+   if (buffer_access_type != ubo_load_access) {
   num_blocks = shader->NumShaderStorageBlocks;
   blocks = shader->ShaderStorageBlocks;
} else {
@@ -552,6 +558,10 @@ lower_ubo_reference_visitor::handle_rvalue(ir_rvalue 
**rvalue)
int matrix_columns;
unsigned packing = var->get_interface_type()->interface_packing;
 
+   buffer_access_type =
+  var->is_in_shader_storage_block() ?
+  ssbo_load_access : ubo_load_access;
+
/* Compute the offset to the start if the dereference as well as other
 * information we need to configure the write
 */
@@ -795,7 +805,7 @@ lower_ubo_reference_visitor::emit_access(bool is_write,
   if (is_write)
  base_ir->insert_after(ssbo_store(deref, offset, write_mask));
   else {
- if (!this->is_shader_storage) {
+ if (buffer_access_type == ubo_load_access) {
  base_ir->insert_before(assign(deref->clone(mem_ctx, NULL),
ubo_load(deref->type, offset)));
  } else {
@@ -862,7 +872,7 @@ lower_ubo_reference_visitor::emit_access(bool is_write,
 
 base_ir->insert_after(ssbo_store(swizzle(deref, i, 1), 
chan_offset, 1));
  } else {
-if (!this->is_shader_storage) {
+if (buffer_access_type == ubo_load_access) {
base_ir->insert_before(assign(deref->clone(mem_ctx, NULL),
  ubo_load(deref_type, chan_offset),
  (1U << i)));
@@ -891,6 +901,8 @@ lower_ubo_reference_visitor::write_to_memory(ir_dereference 
*deref,
int matrix_columns;
unsigned packing = var->get_interface_type()->interface_packing;
 
+   buffer_access_type = ssbo_store_access;
+
/* Compute the offset to the start if the dereference as well as other
 * information we need to configure the write
 */
@@ -1068,6 +1080,8 @@ 
lower_ubo_reference_visitor::process_ssbo_unsized_array_length(ir_rvalue **rvalu
unsigned packing = var->get_interface_type()->interface_packing;
int unsized_array_stride = calculate_unsized_array_stride(deref, packing);
 
+   buffer_access_type = ssbo_get_array_length;
+
/* Compute the offset to the start if the dereference as well as other
 * information we need to calculate the length.
 */
@@ -1181,6 +1195,8 @@ 
lower_ubo_reference_visitor::lower_ssbo_atomic_intrinsic(ir_call *ir)
int matrix_columns;
unsigned packing = var->get_interface_type()->interface_packing;
 
+   buffer_access_type = ssbo_atomic_access;
+
setup_for_load_or_store(var, deref,
&offset, &const_offset,
&row_major, &matrix_columns,
-- 
2.6.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 32/36] glsl: Translate atomic intrinsic functions on shared variables

2015-11-14 Thread Jordan Justen

When an intrinsic atomic operation is used on a shared variable, we
translate it to a new 'share variable' specific intrinsic function
call.

For example, add call to __intrinsic_atomic_add when used on a shared
variable will be translated to a call to
__intrinsic_atomic_add_shared.

Signed-off-by: Jordan Justen 
---
 src/glsl/lower_shared_reference.cpp | 151 
 1 file changed, 151 insertions(+)

diff --git a/src/glsl/lower_shared_reference.cpp 
b/src/glsl/lower_shared_reference.cpp
index 810c6b6..7ff2c0c 100644
--- a/src/glsl/lower_shared_reference.cpp
+++ b/src/glsl/lower_shared_reference.cpp
@@ -79,6 +79,10 @@ public:
ir_visitor_status visit_enter(ir_assignment *ir);
void handle_assignment(ir_assignment *ir);
 
+   ir_call *lower_shared_atomic_intrinsic(ir_call *ir);
+   ir_call *check_for_shared_atomic_intrinsic(ir_call *ir);
+   ir_visitor_status visit_enter(ir_call *ir);
+
unsigned get_shared_offset(const ir_variable *);
 
ir_call *shared_load(const struct glsl_type *type, ir_rvalue *offset);
@@ -337,6 +341,153 @@ lower_shared_reference_visitor::shared_load(const struct 
glsl_type *type,
return new(mem_ctx) ir_call(sig, deref_result, &call_params);
 }
 
+/* Lowers the intrinsic call to a new internal intrinsic that swaps the
+ * access to the buffer variable in the first parameter by an offset
+ * and block index. This involves creating the new internal intrinsic
+ * (i.e. the new function signature).
+ */
+ir_call *
+lower_shared_reference_visitor::lower_shared_atomic_intrinsic(ir_call *ir)
+{
+   /* Shared atomics usually have 2 parameters, the shared variable and an
+* integer argument. The exception is CompSwap, that has an additional
+* integer parameter.
+*/
+   int param_count = ir->actual_parameters.length();
+   assert(param_count == 2 || param_count == 3);
+
+   /* First argument must be a scalar integer buffer variable */
+   exec_node *param = ir->actual_parameters.get_head();
+   ir_instruction *inst = (ir_instruction *) param;
+   assert(inst->ir_type == ir_type_dereference_variable ||
+  inst->ir_type == ir_type_dereference_array ||
+  inst->ir_type == ir_type_dereference_record ||
+  inst->ir_type == ir_type_swizzle);
+
+   ir_rvalue *deref = (ir_rvalue *) inst;
+   assert(deref->type->is_scalar() && deref->type->is_integer());
+
+   ir_variable *var = deref->variable_referenced();
+   assert(var);
+
+   /* Compute the offset to the start if the dereference and the
+* block index
+*/
+   mem_ctx = ralloc_parent(shader->ir);
+
+   ir_rvalue *offset = NULL;
+   unsigned const_offset = get_shared_offset(var);
+   bool row_major;
+   int matrix_columns;
+   const glsl_type *iface = var->get_interface_type();
+   unsigned packing =
+  iface ? iface->interface_packing : GLSL_INTERFACE_PACKING_STD430;
+   buffer_access_type = shared_atomic_access;
+
+   setup_buffer_access(var, deref,
+   &offset, &const_offset,
+   &row_major, &matrix_columns, packing);
+
+   assert(offset);
+   assert(!row_major);
+   assert(matrix_columns == 1);
+
+   ir_rvalue *deref_offset =
+  add(offset, new(mem_ctx) ir_constant(const_offset));
+
+   /* Create the new internal function signature that will take a block
+* index and offset instead of a buffer variable
+*/
+   exec_list sig_params;
+   ir_variable *sig_param = new(mem_ctx)
+  ir_variable(glsl_type::uint_type, "offset" , ir_var_function_in);
+   sig_params.push_tail(sig_param);
+
+   const glsl_type *type = deref->type->base_type == GLSL_TYPE_INT ?
+  glsl_type::int_type : glsl_type::uint_type;
+   sig_param = new(mem_ctx)
+ ir_variable(type, "data1", ir_var_function_in);
+   sig_params.push_tail(sig_param);
+
+   if (param_count == 3) {
+  sig_param = new(mem_ctx)
+ir_variable(type, "data2", ir_var_function_in);
+  sig_params.push_tail(sig_param);
+   }
+
+   ir_function_signature *sig =
+  new(mem_ctx) ir_function_signature(deref->type,
+ compute_shader_enabled);
+   assert(sig);
+   sig->replace_parameters(&sig_params);
+   sig->is_intrinsic = true;
+
+   char func_name[64];
+   sprintf(func_name, "%s_shared", ir->callee_name());
+   ir_function *f = new(mem_ctx) ir_function(func_name);
+   f->add_signature(sig);
+
+   /* Now, create the call to the internal intrinsic */
+   exec_list call_params;
+   call_params.push_tail(deref_offset);
+   param = ir->actual_parameters.get_head()->get_next();
+   ir_rvalue *param_as_rvalue = ((ir_instruction *) param)->as_rvalue();
+   call_params.push_tail(param_as_rvalue->clone(mem_ctx, NULL));
+   if (param_count == 3) {
+  param = param->get_next();
+  param_as_rvalue = ((ir_instruction *) param)->as_rvalue();
+  call_params.push_tail(param_as_rvalue->clone(mem_ctx, NULL));
+   }
+   ir_dereference_variable *return_deref =
+  ir->return_deref->clone(mem_ctx, NULL);
+

[Mesa-dev] [PATCH 36/36] i965/nir: Implement shared variable atomic operations

2015-11-14 Thread Jordan Justen

Signed-off-by: Jordan Justen 
---
 src/mesa/drivers/dri/i965/brw_fs.h   |  2 ++
 src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 60 
 2 files changed, 62 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_fs.h 
b/src/mesa/drivers/dri/i965/brw_fs.h
index f40e58b..d8e0916 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.h
+++ b/src/mesa/drivers/dri/i965/brw_fs.h
@@ -276,6 +276,8 @@ public:
nir_intrinsic_instr *instr);
void nir_emit_ssbo_atomic(const brw::fs_builder &bld,
  int op, nir_intrinsic_instr *instr);
+   void nir_emit_shared_atomic(const brw::fs_builder &bld,
+   int op, nir_intrinsic_instr *instr);
void nir_emit_texture(const brw::fs_builder &bld,
  nir_tex_instr *instr);
void nir_emit_jump(const brw::fs_builder &bld,
diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
index c8c6370..792fda7 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
@@ -1967,6 +1967,37 @@ fs_visitor::nir_emit_cs_intrinsic(const fs_builder &bld,
   break;
}
 
+   case nir_intrinsic_shared_atomic_add:
+  nir_emit_shared_atomic(bld, BRW_AOP_ADD, instr);
+  break;
+   case nir_intrinsic_shared_atomic_min:
+  if (dest.type == BRW_REGISTER_TYPE_D)
+ nir_emit_shared_atomic(bld, BRW_AOP_IMIN, instr);
+  else
+ nir_emit_shared_atomic(bld, BRW_AOP_UMIN, instr);
+  break;
+   case nir_intrinsic_shared_atomic_max:
+  if (dest.type == BRW_REGISTER_TYPE_D)
+ nir_emit_shared_atomic(bld, BRW_AOP_IMAX, instr);
+  else
+ nir_emit_shared_atomic(bld, BRW_AOP_UMAX, instr);
+  break;
+   case nir_intrinsic_shared_atomic_and:
+  nir_emit_shared_atomic(bld, BRW_AOP_AND, instr);
+  break;
+   case nir_intrinsic_shared_atomic_or:
+  nir_emit_shared_atomic(bld, BRW_AOP_OR, instr);
+  break;
+   case nir_intrinsic_shared_atomic_xor:
+  nir_emit_shared_atomic(bld, BRW_AOP_XOR, instr);
+  break;
+   case nir_intrinsic_shared_atomic_exchange:
+  nir_emit_shared_atomic(bld, BRW_AOP_MOV, instr);
+  break;
+   case nir_intrinsic_shared_atomic_comp_swap:
+  nir_emit_shared_atomic(bld, BRW_AOP_CMPWR, instr);
+  break;
+
default:
   nir_emit_intrinsic(bld, instr);
   break;
@@ -2607,6 +2638,35 @@ fs_visitor::nir_emit_ssbo_atomic(const fs_builder &bld,
 }
 
 void
+fs_visitor::nir_emit_shared_atomic(const fs_builder &bld,
+   int op, nir_intrinsic_instr *instr)
+{
+   fs_reg dest;
+   if (nir_intrinsic_infos[instr->intrinsic].has_dest)
+  dest = get_nir_dest(instr->dest);
+
+   unsigned index = BRW_SLM_SURFACE_INDEX;
+   fs_reg surface = fs_reg(index);
+
+   fs_reg offset = get_nir_src(instr->src[0]);
+   fs_reg data1 = get_nir_src(instr->src[1]);
+   fs_reg data2;
+   if (op == BRW_AOP_CMPWR)
+  data2 = get_nir_src(instr->src[2]);
+
+   /* Emit the actual atomic operation operation */
+
+   fs_reg atomic_result =
+  surface_access::emit_untyped_atomic(bld, surface, offset,
+  data1, data2,
+  1 /* dims */, 1 /* rsize */,
+  op,
+  BRW_PREDICATE_NONE);
+   dest.type = atomic_result.type;
+   bld.MOV(dest, atomic_result);
+}
+
+void
 fs_visitor::nir_emit_texture(const fs_builder &bld, nir_tex_instr *instr)
 {
unsigned sampler = instr->sampler_index;
-- 
2.6.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 34/36] glsl: Disable several optimizations on shared variables

2015-11-14 Thread Jordan Justen

Shared variables can be accessed by other threads within the same
local workgroup. This prevents us from performing certain
optimizations with shared variables.

Signed-off-by: Jordan Justen 
---
 src/glsl/opt_constant_propagation.cpp | 3 ++-
 src/glsl/opt_constant_variable.cpp| 3 ++-
 src/glsl/opt_copy_propagation.cpp | 3 ++-
 3 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/src/glsl/opt_constant_propagation.cpp 
b/src/glsl/opt_constant_propagation.cpp
index 184aaa1..fb24a4f 100644
--- a/src/glsl/opt_constant_propagation.cpp
+++ b/src/glsl/opt_constant_propagation.cpp
@@ -500,7 +500,8 @@ ir_constant_propagation_visitor::add_constant(ir_assignment 
*ir)
 * the variable value isn't modified between this assignment and the next
 * instruction where its value is read.
 */
-   if (deref->var->data.mode == ir_var_shader_storage)
+   if (deref->var->data.mode == ir_var_shader_storage ||
+   deref->var->data.mode == ir_var_shader_shared)
   return;
 
entry = new(this->mem_ctx) acp_entry(deref->var, ir->write_mask, constant);
diff --git a/src/glsl/opt_constant_variable.cpp 
b/src/glsl/opt_constant_variable.cpp
index cdfbc34..56f6a81 100644
--- a/src/glsl/opt_constant_variable.cpp
+++ b/src/glsl/opt_constant_variable.cpp
@@ -120,7 +120,8 @@ ir_constant_variable_visitor::visit_enter(ir_assignment *ir)
 * and we can't be sure that this variable won't be written by another
 * thread.
 */
-   if (var->data.mode == ir_var_shader_storage)
+   if (var->data.mode == ir_var_shader_storage ||
+   var->data.mode == ir_var_shader_shared)
   return visit_continue;
 
constval = ir->rhs->constant_expression_value();
diff --git a/src/glsl/opt_copy_propagation.cpp 
b/src/glsl/opt_copy_propagation.cpp
index f206995..5d4cb4f 100644
--- a/src/glsl/opt_copy_propagation.cpp
+++ b/src/glsl/opt_copy_propagation.cpp
@@ -330,7 +330,8 @@ ir_copy_propagation_visitor::add_copy(ir_assignment *ir)
  */
 ir->condition = new(ralloc_parent(ir)) ir_constant(false);
 this->progress = true;
-  } else if (lhs_var->data.mode != ir_var_shader_storage) {
+  } else if (lhs_var->data.mode != ir_var_shader_storage &&
+ lhs_var->data.mode != ir_var_shader_shared) {
 entry = new(this->acp) acp_entry(lhs_var, rhs_var);
 this->acp->push_tail(entry);
   }
-- 
2.6.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 28/36] i965: Lower shared variable references to intrinsic calls

2015-11-14 Thread Jordan Justen

Signed-off-by: Jordan Justen 
---
 src/mesa/drivers/dri/i965/brw_shader.cpp | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_shader.cpp 
b/src/mesa/drivers/dri/i965/brw_shader.cpp
index a438e18..14c37a0 100644
--- a/src/mesa/drivers/dri/i965/brw_shader.cpp
+++ b/src/mesa/drivers/dri/i965/brw_shader.cpp
@@ -152,6 +152,9 @@ brw_compiler_create(void *mem_ctx, const struct 
brw_device_info *devinfo)
   compiler->glsl_compiler_options[i].LowerBufferInterfaceBlocks = true;
}
 
+   compiler->glsl_compiler_options[MESA_SHADER_COMPUTE]
+  .LowerShaderSharedVariables = true;
+
return compiler;
 }
 
-- 
2.6.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 29/36] glsl: Allow atomic functions to be used with shared variables

2015-11-14 Thread Jordan Justen

Signed-off-by: Jordan Justen 
---
 src/glsl/ast_function.cpp | 18 ++
 1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/src/glsl/ast_function.cpp b/src/glsl/ast_function.cpp
index 466ece6..da1167a 100644
--- a/src/glsl/ast_function.cpp
+++ b/src/glsl/ast_function.cpp
@@ -143,19 +143,21 @@ verify_image_parameter(YYLTYPE *loc, 
_mesa_glsl_parse_state *state,
 }
 
 static bool
-verify_first_atomic_ssbo_parameter(YYLTYPE *loc, _mesa_glsl_parse_state *state,
+verify_first_atomic_parameter(YYLTYPE *loc, _mesa_glsl_parse_state *state,
ir_variable *var)
 {
-   if (!var || !var->is_in_shader_storage_block()) {
+   if (!var ||
+   (!var->is_in_shader_storage_block() &&
+var->data.mode != ir_var_shader_shared)) {
   _mesa_glsl_error(loc, state, "First argument to atomic function "
-   "must be a buffer variable");
+   "must be a buffer or shared variable");
   return false;
}
return true;
 }
 
 static bool
-is_atomic_ssbo_function(const char *func_name)
+is_atomic_function(const char *func_name)
 {
return !strcmp(func_name, "atomicAdd") ||
   !strcmp(func_name, "atomicMin") ||
@@ -276,16 +278,16 @@ verify_parameter_modes(_mesa_glsl_parse_state *state,
 
/* The first parameter of atomic functions must be a buffer variable */
const char *func_name = sig->function_name();
-   bool is_atomic_ssbo = is_atomic_ssbo_function(func_name);
-   if (is_atomic_ssbo) {
+   bool is_atomic = is_atomic_function(func_name);
+   if (is_atomic) {
   const ir_rvalue *const actual = (ir_rvalue *) actual_ir_parameters.head;
 
   const ast_expression *const actual_ast =
  exec_node_data(ast_expression, actual_ast_parameters.head, link);
   YYLTYPE loc = actual_ast->get_location();
 
-  if (!verify_first_atomic_ssbo_parameter(&loc, state,
-  actual->variable_referenced())) {
+  if (!verify_first_atomic_parameter(&loc, state,
+ actual->variable_referenced())) {
  return false;
   }
}
-- 
2.6.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 10/36] i965: Implement L3 state atom.

2015-11-14 Thread Jordan Justen

From: Francisco Jerez 

The L3 state atom calculates the target L3 partition weights when the
program bound to some shader stage is modified, and in case they are
far enough from the current partitioning it makes sure that the L3
state is re-emitted.
---
 src/mesa/drivers/dri/i965/brw_context.h   |  6 
 src/mesa/drivers/dri/i965/brw_state.h |  1 +
 src/mesa/drivers/dri/i965/gen7_l3_state.c | 60 +++
 3 files changed, 67 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_context.h 
b/src/mesa/drivers/dri/i965/brw_context.h
index ac05658..618d785 100644
--- a/src/mesa/drivers/dri/i965/brw_context.h
+++ b/src/mesa/drivers/dri/i965/brw_context.h
@@ -676,6 +676,8 @@ enum brw_predicate_state {
 
 struct shader_times;
 
+struct brw_l3_config;
+
 /**
  * brw_context is derived from gl_context.
  */
@@ -1218,6 +1220,10 @@ struct brw_context
int basevertex;
 
struct {
+  const struct brw_l3_config *config;
+   } l3;
+
+   struct {
   drm_intel_bo *bo;
   const char **names;
   int *ids;
diff --git a/src/mesa/drivers/dri/i965/brw_state.h 
b/src/mesa/drivers/dri/i965/brw_state.h
index 94734ba..49f301a 100644
--- a/src/mesa/drivers/dri/i965/brw_state.h
+++ b/src/mesa/drivers/dri/i965/brw_state.h
@@ -129,6 +129,7 @@ extern const struct brw_tracked_state gen7_depthbuffer;
 extern const struct brw_tracked_state gen7_clip_state;
 extern const struct brw_tracked_state gen7_disable_stages;
 extern const struct brw_tracked_state gen7_gs_state;
+extern const struct brw_tracked_state gen7_l3_state;
 extern const struct brw_tracked_state gen7_ps_state;
 extern const struct brw_tracked_state gen7_push_constant_space;
 extern const struct brw_tracked_state gen7_sbe_state;
diff --git a/src/mesa/drivers/dri/i965/gen7_l3_state.c 
b/src/mesa/drivers/dri/i965/gen7_l3_state.c
index 1a88261..58eb07b 100644
--- a/src/mesa/drivers/dri/i965/gen7_l3_state.c
+++ b/src/mesa/drivers/dri/i965/gen7_l3_state.c
@@ -418,3 +418,63 @@ setup_l3_config(struct brw_context *brw, const struct 
brw_l3_config *cfg)
   }
}
 }
+
+/**
+ * Update the URB size in the context state for the specified L3
+ * configuration.
+ */
+static void
+update_urb_size(struct brw_context *brw, const struct brw_l3_config *cfg)
+{
+   const unsigned sz = cfg->n[L3P_URB] *
+  get_l3_way_size(brw->intelScreen->devinfo);
+
+   if (brw->urb.size != sz) {
+  brw->urb.size = sz;
+  brw->ctx.NewDriverState |= BRW_NEW_URB_SIZE;
+   }
+}
+
+static void
+emit_l3_state(struct brw_context *brw)
+{
+   const struct brw_l3_weights w = get_pipeline_state_l3_weights(brw);
+   const float dw = diff_l3_weights(w, get_config_l3_weights(brw->l3.config));
+   /* The distance between any two compatible weight vectors cannot exceed two
+* due to the triangle inequality.
+*/
+   const float large_dw_threshold = 2.0;
+   /* Somewhat arbitrary, simply makes sure that there will be no repeated
+* transitions to the same L3 configuration, could probably do better here.
+*/
+   const float small_dw_threshold = 0.5;
+   /* If we're emitting a new batch the caches should already be clean and the
+* transition should be relatively cheap, so it shouldn't hurt much to use
+* the smaller threshold.  Otherwise use the larger threshold so that we
+* only reprogram the L3 mid-batch if the most recently programmed
+* configuration is incompatible with the current pipeline state.
+*/
+   const float dw_threshold = (brw->ctx.NewDriverState & BRW_NEW_BATCH ?
+   small_dw_threshold : large_dw_threshold);
+
+   if (dw > dw_threshold && brw->can_do_pipelined_register_writes) {
+  const struct brw_l3_config *const cfg =
+ get_l3_config(brw->intelScreen->devinfo, w);
+
+  setup_l3_config(brw, cfg);
+  update_urb_size(brw, cfg);
+  brw->l3.config = cfg;
+   }
+}
+
+const struct brw_tracked_state gen7_l3_state = {
+   .dirty = {
+  .mesa = 0,
+  .brw = BRW_NEW_BATCH |
+ BRW_NEW_VS_PROG_DATA |
+ BRW_NEW_GS_PROG_DATA |
+ BRW_NEW_FS_PROG_DATA |
+ BRW_NEW_CS_PROG_DATA,
+   },
+   .emit = emit_l3_state
+};
-- 
2.6.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 22/36] nir: Translate glsl shared var load intrinsic to nir intrinsic

2015-11-14 Thread Jordan Justen

Signed-off-by: Jordan Justen 
---
 src/glsl/nir/glsl_to_nir.cpp  | 29 +
 src/glsl/nir/nir_intrinsics.h |  1 +
 2 files changed, 30 insertions(+)

diff --git a/src/glsl/nir/glsl_to_nir.cpp b/src/glsl/nir/glsl_to_nir.cpp
index b10d192..67f1aed 100644
--- a/src/glsl/nir/glsl_to_nir.cpp
+++ b/src/glsl/nir/glsl_to_nir.cpp
@@ -732,6 +732,8 @@ nir_visitor::visit(ir_call *ir)
  op = nir_intrinsic_memory_barrier_image;
   } else if (strcmp(ir->callee_name(), 
"__intrinsic_memory_barrier_shared") == 0) {
  op = nir_intrinsic_memory_barrier_shared;
+  } else if (strcmp(ir->callee_name(), "__intrinsic_load_shared") == 0) {
+ op = nir_intrinsic_load_shared;
   } else {
  unreachable("not reached");
   }
@@ -979,6 +981,33 @@ nir_visitor::visit(ir_call *ir)
  nir_builder_instr_insert(&b, &instr->instr);
  break;
   }
+  case nir_intrinsic_load_shared: {
+ exec_node *param = ir->actual_parameters.get_head();
+ ir_rvalue *offset = ((ir_instruction *)param)->as_rvalue();
+
+ /* Check if we need the indirect version */
+ ir_constant *const_offset = offset->as_constant();
+ if (!const_offset) {
+op = nir_intrinsic_load_shared_indirect;
+ralloc_free(instr);
+instr = nir_intrinsic_instr_create(shader, op);
+instr->src[0] = nir_src_for_ssa(evaluate_rvalue(offset));
+instr->const_index[0] = 0;
+dest = &instr->dest;
+ } else {
+instr->const_index[0] = const_offset->value.u[0];
+ }
+
+ const glsl_type *type = ir->return_deref->var->type;
+ instr->num_components = type->vector_elements;
+
+ /* Setup destination register */
+ nir_ssa_dest_init(&instr->instr, &instr->dest,
+   type->vector_elements, NULL);
+
+ nir_builder_instr_insert(&b, &instr->instr);
+ break;
+  }
   default:
  unreachable("not reached");
   }
diff --git a/src/glsl/nir/nir_intrinsics.h b/src/glsl/nir/nir_intrinsics.h
index b8d7d6c..de15128 100644
--- a/src/glsl/nir/nir_intrinsics.h
+++ b/src/glsl/nir/nir_intrinsics.h
@@ -257,6 +257,7 @@ LOAD(per_vertex_input, 1, 1, NIR_INTRINSIC_CAN_ELIMINATE | 
NIR_INTRINSIC_CAN_REO
 LOAD(ssbo, 1, 1, NIR_INTRINSIC_CAN_ELIMINATE)
 LOAD(output, 0, 1, NIR_INTRINSIC_CAN_ELIMINATE)
 LOAD(per_vertex_output, 1, 1, NIR_INTRINSIC_CAN_ELIMINATE)
+LOAD(shared, 0, 1, NIR_INTRINSIC_CAN_ELIMINATE)
 
 /*
  * Stores work the same way as loads, except now the first register input is
-- 
2.6.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 20/36] glsl: Don't lower_variable_index_to_cond_assign for shared variables

2015-11-14 Thread Jordan Justen

Signed-off-by: Jordan Justen 
---
 src/glsl/lower_variable_index_to_cond_assign.cpp | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/src/glsl/lower_variable_index_to_cond_assign.cpp 
b/src/glsl/lower_variable_index_to_cond_assign.cpp
index 1ab3afe..a1ba934 100644
--- a/src/glsl/lower_variable_index_to_cond_assign.cpp
+++ b/src/glsl/lower_variable_index_to_cond_assign.cpp
@@ -378,6 +378,9 @@ public:
   case ir_var_shader_storage:
 return this->lower_uniforms;
 
+  case ir_var_shader_shared:
+return false;
+
   case ir_var_function_in:
   case ir_var_const_in:
  return this->lower_temps;
-- 
2.6.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 06/36] i965: Implement programming of the L3 configuration.

2015-11-14 Thread Jordan Justen

From: Francisco Jerez 

---
 src/mesa/drivers/dri/i965/gen7_l3_state.c | 95 +++
 1 file changed, 95 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/gen7_l3_state.c 
b/src/mesa/drivers/dri/i965/gen7_l3_state.c
index 8f9ba5b..48bca29 100644
--- a/src/mesa/drivers/dri/i965/gen7_l3_state.c
+++ b/src/mesa/drivers/dri/i965/gen7_l3_state.c
@@ -161,3 +161,98 @@ get_l3_way_size(const struct brw_device_info *devinfo)
else
   return 2 << devinfo->gt;
 }
+
+/**
+ * Program the hardware to use the specified L3 configuration.
+ */
+static void
+setup_l3_config(struct brw_context *brw, const struct brw_l3_config *cfg)
+{
+   const bool has_dc = cfg->n[L3P_DC] || cfg->n[L3P_ALL];
+   const bool has_is = cfg->n[L3P_IS] || cfg->n[L3P_RO] || cfg->n[L3P_ALL];
+   const bool has_c = cfg->n[L3P_C] || cfg->n[L3P_RO] || cfg->n[L3P_ALL];
+   const bool has_t = cfg->n[L3P_T] || cfg->n[L3P_RO] || cfg->n[L3P_ALL];
+   const bool has_slm = cfg->n[L3P_SLM];
+
+   /* According to the hardware docs, the L3 partitioning can only be changed
+* while the pipeline is completely drained and the caches are flushed,
+* what involves a first PIPE_CONTROL flush which stalls the pipeline and
+* initiates invalidation of the relevant caches...
+*/
+   brw_emit_pipe_control_flush(brw,
+   PIPE_CONTROL_TEXTURE_CACHE_INVALIDATE |
+   PIPE_CONTROL_CONST_CACHE_INVALIDATE |
+   PIPE_CONTROL_INSTRUCTION_INVALIDATE |
+   PIPE_CONTROL_DATA_CACHE_INVALIDATE |
+   PIPE_CONTROL_NO_WRITE |
+   PIPE_CONTROL_CS_STALL);
+
+   /* ...followed by a second stalling flush which guarantees that
+* invalidation is complete when the L3 configuration registers are
+* modified.
+*/
+   brw_emit_pipe_control_flush(brw,
+   PIPE_CONTROL_DATA_CACHE_INVALIDATE |
+   PIPE_CONTROL_NO_WRITE |
+   PIPE_CONTROL_CS_STALL);
+
+   if (brw->gen >= 8) {
+  assert(!cfg->n[L3P_IS] && !cfg->n[L3P_C] && !cfg->n[L3P_T]);
+
+  BEGIN_BATCH(3);
+  OUT_BATCH(MI_LOAD_REGISTER_IMM | (3 - 2));
+
+  /* Set up the L3 partitioning. */
+  OUT_BATCH(GEN8_L3CNTLREG);
+  OUT_BATCH((has_slm ? GEN8_L3CNTLREG_SLM_ENABLE : 0) |
+SET_FIELD(cfg->n[L3P_URB], GEN8_L3CNTLREG_URB_ALLOC) |
+SET_FIELD(cfg->n[L3P_RO], GEN8_L3CNTLREG_RO_ALLOC) |
+SET_FIELD(cfg->n[L3P_DC], GEN8_L3CNTLREG_DC_ALLOC) |
+SET_FIELD(cfg->n[L3P_ALL], GEN8_L3CNTLREG_ALL_ALLOC));
+
+  ADVANCE_BATCH();
+
+   } else {
+  assert(!cfg->n[L3P_ALL]);
+
+  /* When enabled SLM only uses a portion of the L3 on half of the banks,
+   * the matching space on the remaining banks has to be allocated to a
+   * client (URB for all validated configurations) set to the
+   * lower-bandwidth 2-bank address hashing mode.
+   */
+  const bool urb_low_bw = has_slm && !brw->is_baytrail;
+  assert(!urb_low_bw || cfg->n[L3P_URB] == cfg->n[L3P_SLM]);
+
+  /* Minimum number of ways that can be allocated to the URB. */
+  const unsigned n0_urb = (brw->is_baytrail ? 32 : 0);
+  assert(cfg->n[L3P_URB] >= n0_urb);
+
+  BEGIN_BATCH(7);
+  OUT_BATCH(MI_LOAD_REGISTER_IMM | (7 - 2));
+
+  /* Demote any clients with no ways assigned to LLC. */
+  OUT_BATCH(GEN7_L3SQCREG1);
+  OUT_BATCH((brw->is_haswell ? HSW_L3SQCREG1_SQGHPCI_DEFAULT :
+ brw->is_baytrail ? VLV_L3SQCREG1_SQGHPCI_DEFAULT :
+ IVB_L3SQCREG1_SQGHPCI_DEFAULT) |
+(has_dc ? 0 : GEN7_L3SQCREG1_CONV_DC_UC) |
+(has_is ? 0 : GEN7_L3SQCREG1_CONV_IS_UC) |
+(has_c ? 0 : GEN7_L3SQCREG1_CONV_C_UC) |
+(has_t ? 0 : GEN7_L3SQCREG1_CONV_T_UC));
+
+  /* Set up the L3 partitioning. */
+  OUT_BATCH(GEN7_L3CNTLREG2);
+  OUT_BATCH((has_slm ? GEN7_L3CNTLREG2_SLM_ENABLE : 0) |
+SET_FIELD(cfg->n[L3P_URB] - n0_urb, GEN7_L3CNTLREG2_URB_ALLOC) 
|
+(urb_low_bw ? GEN7_L3CNTLREG2_URB_LOW_BW : 0) |
+SET_FIELD(cfg->n[L3P_ALL], GEN7_L3CNTLREG2_ALL_ALLOC) |
+SET_FIELD(cfg->n[L3P_RO], GEN7_L3CNTLREG2_RO_ALLOC) |
+SET_FIELD(cfg->n[L3P_DC], GEN7_L3CNTLREG2_DC_ALLOC));
+  OUT_BATCH(GEN7_L3CNTLREG3);
+  OUT_BATCH(SET_FIELD(cfg->n[L3P_IS], GEN7_L3CNTLREG3_IS_ALLOC) |
+SET_FIELD(cfg->n[L3P_C], GEN7_L3CNTLREG3_C_ALLOC) |
+SET_FIELD(cfg->n[L3P_T], GEN7_L3CNTLREG3_T_ALLOC));
+
+  ADVANCE_BATCH();
+   }
+}
-- 
2.6.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 17/36] glsl ubo/ssbo: Move is_dereferenced_thing_row_major into lower_buffer_access

2015-11-14 Thread Jordan Justen

Signed-off-by: Jordan Justen 
Cc: Samuel Iglesias Gonsalvez 
Cc: Iago Toral Quiroga 
---
 src/glsl/lower_buffer_access.cpp | 90 
 src/glsl/lower_buffer_access.h   |  2 +
 src/glsl/lower_ubo_reference.cpp | 90 
 3 files changed, 92 insertions(+), 90 deletions(-)

diff --git a/src/glsl/lower_buffer_access.cpp b/src/glsl/lower_buffer_access.cpp
index e0b5a2f..b7fc107 100644
--- a/src/glsl/lower_buffer_access.cpp
+++ b/src/glsl/lower_buffer_access.cpp
@@ -304,4 +304,94 @@ is_dereferenced_thing_row_major(const ir_dereference 
*deref)
return false;
 }
 
+/**
+ * Determine if a thing being dereferenced is row-major
+ *
+ * There is some trickery here.
+ *
+ * If the thing being dereferenced is a member of uniform block \b without an
+ * instance name, then the name of the \c ir_variable is the field name of an
+ * interface type.  If this field is row-major, then the thing referenced is
+ * row-major.
+ *
+ * If the thing being dereferenced is a member of uniform block \b with an
+ * instance name, then the last dereference in the tree will be an
+ * \c ir_dereference_record.  If that record field is row-major, then the
+ * thing referenced is row-major.
+ */
+bool
+lower_buffer_access::is_dereferenced_thing_row_major(const ir_rvalue *deref)
+{
+   bool matrix = false;
+   const ir_rvalue *ir = deref;
+
+   while (true) {
+  matrix = matrix || ir->type->without_array()->is_matrix();
+
+  switch (ir->ir_type) {
+  case ir_type_dereference_array: {
+ const ir_dereference_array *const array_deref =
+(const ir_dereference_array *) ir;
+
+ ir = array_deref->array;
+ break;
+  }
+
+  case ir_type_dereference_record: {
+ const ir_dereference_record *const record_deref =
+(const ir_dereference_record *) ir;
+
+ ir = record_deref->record;
+
+ const int idx = ir->type->field_index(record_deref->field);
+ assert(idx >= 0);
+
+ const enum glsl_matrix_layout matrix_layout =
+glsl_matrix_layout(ir->type->fields.structure[idx].matrix_layout);
+
+ switch (matrix_layout) {
+ case GLSL_MATRIX_LAYOUT_INHERITED:
+break;
+ case GLSL_MATRIX_LAYOUT_COLUMN_MAJOR:
+return false;
+ case GLSL_MATRIX_LAYOUT_ROW_MAJOR:
+return matrix || deref->type->without_array()->is_record();
+ }
+
+ break;
+  }
+
+  case ir_type_dereference_variable: {
+ const ir_dereference_variable *const var_deref =
+(const ir_dereference_variable *) ir;
+
+ const enum glsl_matrix_layout matrix_layout =
+glsl_matrix_layout(var_deref->var->data.matrix_layout);
+
+ switch (matrix_layout) {
+ case GLSL_MATRIX_LAYOUT_INHERITED:
+assert(!matrix);
+return false;
+ case GLSL_MATRIX_LAYOUT_COLUMN_MAJOR:
+return false;
+ case GLSL_MATRIX_LAYOUT_ROW_MAJOR:
+return matrix || deref->type->without_array()->is_record();
+ }
+
+ unreachable("invalid matrix layout");
+ break;
+  }
+
+  default:
+ return false;
+  }
+   }
+
+   /* The tree must have ended with a dereference that wasn't an
+* ir_dereference_variable.  That is invalid, and it should be impossible.
+*/
+   unreachable("invalid dereference tree");
+   return false;
+}
+
 } /* namespace lower_buffer_access */
diff --git a/src/glsl/lower_buffer_access.h b/src/glsl/lower_buffer_access.h
index 3138963..0698e22 100644
--- a/src/glsl/lower_buffer_access.h
+++ b/src/glsl/lower_buffer_access.h
@@ -48,6 +48,8 @@ public:
 bool row_major, int matrix_columns,
 unsigned int packing, unsigned int write_mask);
 
+   bool is_dereferenced_thing_row_major(const ir_rvalue *deref);
+
void *mem_ctx;
 };
 
diff --git a/src/glsl/lower_ubo_reference.cpp b/src/glsl/lower_ubo_reference.cpp
index 8de4f5e..7e1221b 100644
--- a/src/glsl/lower_ubo_reference.cpp
+++ b/src/glsl/lower_ubo_reference.cpp
@@ -42,96 +42,6 @@
 
 using namespace ir_builder;
 
-/**
- * Determine if a thing being dereferenced is row-major
- *
- * There is some trickery here.
- *
- * If the thing being dereferenced is a member of uniform block \b without an
- * instance name, then the name of the \c ir_variable is the field name of an
- * interface type.  If this field is row-major, then the thing referenced is
- * row-major.
- *
- * If the thing being dereferenced is a member of uniform block \b with an
- * instance name, then the last dereference in the tree will be an
- * \c ir_dereference_record.  If that record field is row-major, then the
- * thing referenced is row-major.
- */
-static bool
-is_dereferenced_thing_row_major(const ir_rvalue *deref)
-{
-   bool matrix = false;
-   const ir_rvalue *ir = deref;
-
-   while (true) {
-  matrix = matrix || ir->type->without_a

[Mesa-dev] [PATCH] nv50: free interpolation parameters in nv50_program_destroy()

2015-11-14 Thread Samuel Pitoiset

As for nvc0, we need to free memory allocated by interpolation
parameters. This fixes a memory leak spotted by valgrind.

Signed-off-by: Samuel Pitoiset 
---
 src/gallium/drivers/nouveau/nv50/nv50_program.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/gallium/drivers/nouveau/nv50/nv50_program.c 
b/src/gallium/drivers/nouveau/nv50/nv50_program.c
index 89e7a33..707bf7a 100644
--- a/src/gallium/drivers/nouveau/nv50/nv50_program.c
+++ b/src/gallium/drivers/nouveau/nv50/nv50_program.c
@@ -489,7 +489,7 @@ nv50_program_destroy(struct nv50_context *nv50, struct 
nv50_program *p)
FREE(p->code);
 
FREE(p->fixups);
-
+   FREE(p->interps);
FREE(p->so);
 
memset(p, 0, sizeof(*p));
-- 
2.5.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] nv50: free interpolation parameters in nv50_program_destroy()

2015-11-14 Thread Ilia Mirkin

Reviewed-by: Ilia Mirkin 

Thanks! I missed this in all the commotion of trying to get it
actually working :)

On Sat, Nov 14, 2015 at 5:00 PM, Samuel Pitoiset
 wrote:
> As for nvc0, we need to free memory allocated by interpolation
> parameters. This fixes a memory leak spotted by valgrind.
>
> Signed-off-by: Samuel Pitoiset 
> ---
>  src/gallium/drivers/nouveau/nv50/nv50_program.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/src/gallium/drivers/nouveau/nv50/nv50_program.c 
> b/src/gallium/drivers/nouveau/nv50/nv50_program.c
> index 89e7a33..707bf7a 100644
> --- a/src/gallium/drivers/nouveau/nv50/nv50_program.c
> +++ b/src/gallium/drivers/nouveau/nv50/nv50_program.c
> @@ -489,7 +489,7 @@ nv50_program_destroy(struct nv50_context *nv50, struct 
> nv50_program *p)
> FREE(p->code);
>
> FREE(p->fixups);
> -
> +   FREE(p->interps);
> FREE(p->so);
>
> memset(p, 0, sizeof(*p));
> --
> 2.5.3
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 92954] [softpipe] piglit drawbuffer-modes regression

2015-11-14 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=92954

Bug ID: 92954
   Summary: [softpipe] piglit drawbuffer-modes regression
   Product: Mesa
   Version: git
  Hardware: x86-64 (AMD64)
OS: Linux (All)
Status: NEW
  Keywords: bisected, regression
  Severity: normal
  Priority: medium
 Component: Mesa core
  Assignee: mesa-dev@lists.freedesktop.org
  Reporter: v...@freedesktop.org
QA Contact: mesa-dev@lists.freedesktop.org
CC: airl...@freedesktop.org

mesa: 3694d58e6c4a39bd84e8aef0d8e67c3ae9447f33 (master 11.1.0-devel)

$ ./bin/drawbuffer-modes -auto
Probe color at (20,20)
  Expected: 0.10 0.20 0.30 1.00
  Observed: 1.00 1.00 1.00 1.00
Probe color at (20,20)
  Expected: 0.30 0.40 0.50 1.00
  Observed: 1.00 1.00 1.00 1.00
Probe color at (20,20)
  Expected: 0.40 0.50 0.60 1.00
  Observed: 1.00 1.00 1.00 1.00
Probe color at (20,20)
  Expected: 0.60 0.70 0.80 1.00
  Observed: 1.00 1.00 1.00 1.00
glDrawBuffer(GL_NONE) modifies GL_FRONT_LEFT buffer
PIGLIT: {"result": "fail" }


2b676570960277d47477822ffeccc672613f9142 is the first bad commit
commit 2b676570960277d47477822ffeccc672613f9142
Author: Dave Airlie 
Date:   Fri Oct 9 01:38:08 2015 +0100

gallium/swrast: fix front buffer blitting. (v2)

So I've known this was broken before, cogl has a workaround
for it from what I know, but with the gallium based swrast
drivers BlitFramebuffer from back to front or vice-versa
was pretty broken.

The legacy swrast driver tracks when a front buffer is used
and does the get/put images when it is mapped/unmapped,
so this patch attempts to add the same functionality to the
gallium drivers.

It creates a new context interface to denote when a front
buffer is being created, and passes a private pointer to it,
this pointer is then used to decide on map/unmap if the
contents should be updated from the real frontbuffer using
get/put image.

This is primarily to make gtk's gl code work, the only
thing I've tested so far is the glarea test from
https://github.com/ebassi/glarea-example.git

v2: bump extension version,
check extension version before calling get image. (Ian)

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91930

Cc: 
Signed-off-by: Dave Airlie 

:04 04 1726a5f7f01d9ae1fc76de431bb24cce16cf69ee
75f18ded95766bac58d0c37c96ffa071a4a32071 Minclude
:04 04 ee333146b90c837460ceab252445e3b38fc528d5
dc0656326c3fb9750fda1f6d3e75aeed99003bc6 Msrc
bisect run success

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 02/36] i965: Keep track of whether LRI is allowed in the context struct.

2015-11-14 Thread Jordan Justen

On 2015-11-14 13:43:38, Jordan Justen wrote:
> From: Francisco Jerez 
> 
> This stores the result of can_do_pipelined_register_writes() in the
> context struct so we can find out later whether LRI can be used to
> program the L3 configuration.
> ---
>  src/mesa/drivers/dri/i965/brw_context.h  | 5 +
>  src/mesa/drivers/dri/i965/intel_extensions.c | 8 +---
>  2 files changed, 10 insertions(+), 3 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_context.h 
> b/src/mesa/drivers/dri/i965/brw_context.h
> index 4b2db61..20d2dd0 100644
> --- a/src/mesa/drivers/dri/i965/brw_context.h
> +++ b/src/mesa/drivers/dri/i965/brw_context.h
> @@ -840,6 +840,11 @@ struct brw_context
> bool use_resource_streamer;
>  
> /**
> +* Whether LRI can be used to write register values from the batch buffer.
> +*/
> +   bool can_do_pipelined_register_writes;
> +
> +   /**
>  * Some versions of Gen hardware don't do centroid interpolation correctly
>  * on unlit pixels, causing incorrect values for derivatives near triangle
>  * edges.  Enabling this flag causes the fragment shader to use
> diff --git a/src/mesa/drivers/dri/i965/intel_extensions.c 
> b/src/mesa/drivers/dri/i965/intel_extensions.c
> index 386b63c..800e97c 100644
> --- a/src/mesa/drivers/dri/i965/intel_extensions.c
> +++ b/src/mesa/drivers/dri/i965/intel_extensions.c
> @@ -40,8 +40,8 @@
>  static bool
>  can_do_pipelined_register_writes(struct brw_context *brw)
>  {
> -   /* Supposedly, Broadwell just works. */
> -   if (brw->gen >= 8)
> +   /* Supposedly Broadwell just works, and earlier gens are sloppy. */
> +   if (brw->gen != 7)

I think this change should be moved into a separate patch.

Also, what about something like this comment instead?

   /**
* gen >= 8 specifically allows these writes. gen <= 6 also
* doesn't block them.
*/

With this split out, the patch is

Reviewed-by: Jordan Justen 

>return true;
>  
> static int result = -1;
> @@ -319,6 +319,8 @@ intelInitExtensions(struct gl_context *ctx)
> }
>  
> brw->predicate.supported = false;
> +   brw->can_do_pipelined_register_writes =
> +  can_do_pipelined_register_writes(brw);
>  
> if (brw->gen >= 7) {
>ctx->Extensions.ARB_conservative_depth = true;
> @@ -334,7 +336,7 @@ intelInitExtensions(struct gl_context *ctx)
>ctx->Extensions.ARB_texture_view = true;
>ctx->Extensions.ARB_shader_storage_buffer_object = true;
>  
> -  if (can_do_pipelined_register_writes(brw)) {
> +  if (brw->can_do_pipelined_register_writes) {
>   ctx->Extensions.ARB_draw_indirect = true;
>   ctx->Extensions.ARB_transform_feedback2 = true;
>   ctx->Extensions.ARB_transform_feedback3 = true;
> -- 
> 2.6.2
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 01/36] i965: Define symbolic constants for some useful L3 cache control registers.

2015-11-14 Thread Jordan Justen

Reviewed-by: Jordan Justen 

On 2015-11-14 13:43:37, Jordan Justen wrote:
> From: Francisco Jerez 
> 
> ---
>  src/mesa/drivers/dri/i965/intel_reg.h | 53 
> +++
>  1 file changed, 53 insertions(+)
> 
> diff --git a/src/mesa/drivers/dri/i965/intel_reg.h 
> b/src/mesa/drivers/dri/i965/intel_reg.h
> index a261c2b..0b167d5 100644
> --- a/src/mesa/drivers/dri/i965/intel_reg.h
> +++ b/src/mesa/drivers/dri/i965/intel_reg.h
> @@ -192,3 +192,56 @@
>  #define MI_PREDICATE_RESULT 0x2418
>  #define MI_PREDICATE_RESULT_1   0x241C
>  #define MI_PREDICATE_RESULT_2   0x2214
> +
> +/* L3 cache control registers. */
> +#define GEN7_L3SQCREG1 0xb010
> +/* L3SQ general and high priority credit initialization. */
> +# define IVB_L3SQCREG1_SQGHPCI_DEFAULT 0x0073
> +# define VLV_L3SQCREG1_SQGHPCI_DEFAULT 0x00d3
> +# define HSW_L3SQCREG1_SQGHPCI_DEFAULT 0x0061
> +# define GEN7_L3SQCREG1_CONV_DC_UC (1 << 24)
> +# define GEN7_L3SQCREG1_CONV_IS_UC (1 << 25)
> +# define GEN7_L3SQCREG1_CONV_C_UC  (1 << 26)
> +# define GEN7_L3SQCREG1_CONV_T_UC  (1 << 27)
> +
> +#define GEN7_L3CNTLREG20xb020
> +# define GEN7_L3CNTLREG2_SLM_ENABLE(1 << 0)
> +# define GEN7_L3CNTLREG2_URB_ALLOC_SHIFT   1
> +# define GEN7_L3CNTLREG2_URB_ALLOC_MASKINTEL_MASK(6, 1)
> +# define GEN7_L3CNTLREG2_URB_LOW_BW(1 << 7)
> +# define GEN7_L3CNTLREG2_ALL_ALLOC_SHIFT   8
> +# define GEN7_L3CNTLREG2_ALL_ALLOC_MASKINTEL_MASK(13, 8)
> +# define GEN7_L3CNTLREG2_RO_ALLOC_SHIFT14
> +# define GEN7_L3CNTLREG2_RO_ALLOC_MASK INTEL_MASK(19, 14)
> +# define GEN7_L3CNTLREG2_RO_LOW_BW (1 << 20)
> +# define GEN7_L3CNTLREG2_DC_ALLOC_SHIFT21
> +# define GEN7_L3CNTLREG2_DC_ALLOC_MASK INTEL_MASK(26, 21)
> +# define GEN7_L3CNTLREG2_DC_LOW_BW (1 << 27)
> +
> +#define GEN7_L3CNTLREG30xb024
> +# define GEN7_L3CNTLREG3_IS_ALLOC_SHIFT1
> +# define GEN7_L3CNTLREG3_IS_ALLOC_MASK INTEL_MASK(6, 1)
> +# define GEN7_L3CNTLREG3_IS_LOW_BW (1 << 7)
> +# define GEN7_L3CNTLREG3_C_ALLOC_SHIFT 8
> +# define GEN7_L3CNTLREG3_C_ALLOC_MASK  INTEL_MASK(13, 8)
> +# define GEN7_L3CNTLREG3_C_LOW_BW  (1 << 14)
> +# define GEN7_L3CNTLREG3_T_ALLOC_SHIFT 15
> +# define GEN7_L3CNTLREG3_T_ALLOC_MASK  INTEL_MASK(20, 15)
> +# define GEN7_L3CNTLREG3_T_LOW_BW  (1 << 21)
> +
> +#define HSW_SCRATCH1   0xb038
> +#define HSW_SCRATCH1_L3_ATOMIC_DISABLE (1 << 27)
> +
> +#define HSW_ROW_CHICKEN3   0xe49c
> +#define HSW_ROW_CHICKEN3_L3_ATOMIC_DISABLE (1 << 6)
> +
> +#define GEN8_L3CNTLREG 0x7034
> +# define GEN8_L3CNTLREG_SLM_ENABLE (1 << 0)
> +# define GEN8_L3CNTLREG_URB_ALLOC_SHIFT1
> +# define GEN8_L3CNTLREG_URB_ALLOC_MASK INTEL_MASK(7, 1)
> +# define GEN8_L3CNTLREG_RO_ALLOC_SHIFT 11
> +# define GEN8_L3CNTLREG_RO_ALLOC_MASK  INTEL_MASK(17, 11)
> +# define GEN8_L3CNTLREG_DC_ALLOC_SHIFT 18
> +# define GEN8_L3CNTLREG_DC_ALLOC_MASK  INTEL_MASK(24, 18)
> +# define GEN8_L3CNTLREG_ALL_ALLOC_SHIFT25
> +# define GEN8_L3CNTLREG_ALL_ALLOC_MASK INTEL_MASK(31, 25)
> -- 
> 2.6.2
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] nir/glsl: Fix copy-n-paste mistakes from commit 213f864.

2015-11-14 Thread Matt Turner

---
 src/glsl/nir/glsl_to_nir.cpp | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/src/glsl/nir/glsl_to_nir.cpp b/src/glsl/nir/glsl_to_nir.cpp
index b10d192..62eedbf 100644
--- a/src/glsl/nir/glsl_to_nir.cpp
+++ b/src/glsl/nir/glsl_to_nir.cpp
@@ -1538,9 +1538,9 @@ nir_visitor::visit(ir_expression *ir)
   result = supports_ints ? nir_ior(&b, srcs[0], srcs[1])
  : nir_for(&b, srcs[0], srcs[1]);
   break;
-   case ir_binop_logic_xor: result = nir_ixor(&b, srcs[0], srcs[1]); break;
-  result = supports_ints ? nir_ior(&b, srcs[0], srcs[1])
- : nir_for(&b, srcs[0], srcs[1]);
+   case ir_binop_logic_xor:
+  result = supports_ints ? nir_ixor(&b, srcs[0], srcs[1])
+ : nir_fxor(&b, srcs[0], srcs[1]);
   break;
case ir_binop_lshift: result = nir_ishl(&b, srcs[0], srcs[1]); break;
case ir_binop_rshift:
-- 
2.4.9

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] nir/glsl: Fix copy-n-paste mistakes from commit 213f864.

2015-11-14 Thread Connor Abbott

Whoops.

Reviewed-by: Connor Abbott 

On Sat, Nov 14, 2015 at 8:49 PM, Matt Turner  wrote:
> ---
>  src/glsl/nir/glsl_to_nir.cpp | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/src/glsl/nir/glsl_to_nir.cpp b/src/glsl/nir/glsl_to_nir.cpp
> index b10d192..62eedbf 100644
> --- a/src/glsl/nir/glsl_to_nir.cpp
> +++ b/src/glsl/nir/glsl_to_nir.cpp
> @@ -1538,9 +1538,9 @@ nir_visitor::visit(ir_expression *ir)
>result = supports_ints ? nir_ior(&b, srcs[0], srcs[1])
>   : nir_for(&b, srcs[0], srcs[1]);
>break;
> -   case ir_binop_logic_xor: result = nir_ixor(&b, srcs[0], srcs[1]); break;
> -  result = supports_ints ? nir_ior(&b, srcs[0], srcs[1])
> - : nir_for(&b, srcs[0], srcs[1]);
> +   case ir_binop_logic_xor:
> +  result = supports_ints ? nir_ixor(&b, srcs[0], srcs[1])
> + : nir_fxor(&b, srcs[0], srcs[1]);
>break;
> case ir_binop_lshift: result = nir_ishl(&b, srcs[0], srcs[1]); break;
> case ir_binop_rshift:
> --
> 2.4.9
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 0/6] i965: Do textures swizzle work-arounds in NIR.

2015-11-14 Thread Connor Abbott

Series is

Reviewed-by: Connor Abbott 

Although I'm not as familiar now with the code touched in the last patch.

On Thu, Nov 12, 2015 at 3:13 PM, Jason Ekstrand  wrote:
> The subject says it all.  This little series adds texture swizzle support
> to nir_lower_tex and makes the i965 driver use that instead of its own
> code.  This approach has a few advantages:
>
>  1) It lets us share the swizzling code between both backends and with
> anyone else who wants it.
>
>  2) It simplifies the work-arounds we have to do for RG32F textures because
> everything is pre-swizzled by the time the backend sees it.
>
>  3) If someone wants to come along and implement texture write-masking in
> the FS backend, they'll have an easier time of it.  One of my
> approaches to doing so was substantially complicated by the fact that,
> thanks to the swizzle, the channels used that we can see from NIR don't
> apply to the actual texturing instruction.  Now that the swizzling is
> done in NIR, this should be substantially simpler.
>
> Jason Ekstrand (6):
>   nir/validate: Validated dests after sources
>   nir: Use instr/if_rewrite in nir_ssa_def_rewrite_uses
>   nir: Add a ssa_def_rewrite_uses_after helper
>   nir: Add a tex_instr_is_query helper
>   nir/lower_tex: Add support for lowering texture swizzle
>   i965: Use NIR for lowering texture swizzle
>
>  src/glsl/nir/nir.c |  63 ---
>  src/glsl/nir/nir.h |  40 ++
>  src/glsl/nir/nir_lower_tex.c   |  67 
>  src/glsl/nir/nir_validate.c|  18 ++---
>  src/mesa/drivers/dri/i965/brw_fs.h |   4 -
>  src/mesa/drivers/dri/i965/brw_fs_visitor.cpp   | 105 
> -
>  src/mesa/drivers/dri/i965/brw_nir.c|  14 +++-
>  src/mesa/drivers/dri/i965/brw_vec4.h   |   4 -
>  src/mesa/drivers/dri/i965/brw_vec4_nir.cpp |  24 +++---
>  src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp |  93 ++
>  10 files changed, 215 insertions(+), 217 deletions(-)
>
> --
> 2.5.0.400.gff86faf
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 2/5] util/hash_table: don't compare deleted entries

2015-11-14 Thread Connor Abbott

The equivalent of the last patch for the hash table. I'm not aware of
any issues this fixes.

Signed-off-by: Connor Abbott 
---
 src/util/hash_table.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/util/hash_table.c b/src/util/hash_table.c
index 3247593..466519f 100644
--- a/src/util/hash_table.c
+++ b/src/util/hash_table.c
@@ -300,7 +300,8 @@ hash_table_insert(struct hash_table *ht, uint32_t hash,
* required to avoid memory leaks, perform a search
* before inserting.
*/
-  if (entry->hash == hash &&
+  if (entry_is_present(ht, entry) &&
+  entry->hash == hash &&
   ht->key_equals_function(key, entry->key)) {
  entry->key = key;
  entry->data = data;
-- 
2.4.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 1/5] util/set: don't compare against deleted entries

2015-11-14 Thread Connor Abbott

Not sure how this wasn't already caught by valgrind, but it fixes an
issue with the vectorizer.

Signed-off-by: Connor Abbott 
---
 src/util/set.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/util/set.c b/src/util/set.c
index f01f869..331ff58 100644
--- a/src/util/set.c
+++ b/src/util/set.c
@@ -282,7 +282,8 @@ set_add(struct set *ht, uint32_t hash, const void *key)
* If freeing of old keys is required to avoid memory leaks,
* perform a search before inserting.
*/
-  if (entry->hash == hash &&
+  if (entry_is_present(entry) &&
+  entry->hash == hash &&
   ht->key_equals_function(key, entry->key)) {
  entry->key = key;
  return entry;
-- 
2.4.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 0/5] NIR vectorization

2015-11-14 Thread Connor Abbott

This series adds an optimization to detect things like:

foo.x = bar.x + baz.x;
foo.y = bar.y + baz.y;
foo.z = bar.z + baz.z;
foo.w = bar.w + baz.w;

and turn them into:

foo = bar + baz;

which shows up distressingly often in shaders translated from D3D
bytecode, or by people who seemingly don't know that GLSL can do all
that vector-y stuff for you. There is a similar pass in GLSL IR, but it
can only handle operations on uniforms/inputs since it would've been too
difficult to handle more complicated cases. But now that we have NIR,
and the i965 vec4 backend uses NIR, we can write a much better pass
using the magic of SSA. Shader-db results are in the last patch.

Series is also available as the 'nir-vectorize' branch of my FDO repository.

Connor Abbott (5):
  util/set: don't compare against deleted entries
  util/hash_table: don't compare deleted entries
  nir/array: add a few more helpers
  nir: add a vectorization pass
  i965/nir: use vectorization for non-scalar stages

 src/glsl/Makefile.sources   |   1 +
 src/glsl/nir/nir.h  |   2 +
 src/glsl/nir/nir_array.h|  21 ++
 src/glsl/nir/nir_opt_vectorize.c| 447 
 src/mesa/drivers/dri/i965/brw_nir.c |   8 +
 src/util/hash_table.c   |   3 +-
 src/util/set.c  |   3 +-
 7 files changed, 483 insertions(+), 2 deletions(-)
 create mode 100644 src/glsl/nir/nir_opt_vectorize.c

-- 
2.4.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 3/5] nir/array: add a few more helpers

2015-11-14 Thread Connor Abbott

Signed-off-by: Connor Abbott 
---
 src/glsl/nir/nir_array.h | 21 +
 1 file changed, 21 insertions(+)

diff --git a/src/glsl/nir/nir_array.h b/src/glsl/nir/nir_array.h
index 1db4e8c..d704119 100644
--- a/src/glsl/nir/nir_array.h
+++ b/src/glsl/nir/nir_array.h
@@ -84,13 +84,34 @@ nir_array_grow(nir_array *arr, size_t additional)
return ptr;
 }
 
+static inline void *
+nir_array_shrink(nir_array *arr, size_t less)
+{
+   arr->size -= less;
+   assert(arr->size >= 0);
+   return (void *)((char *)arr->data + arr->size);
+}
+
 #define nir_array_add(arr, type, elem) \
*(type *)nir_array_grow(arr, sizeof(type)) = (elem)
 
+#define nir_array_pop(arr, type) \
+   (*(type *)(nir_array_shrink(arr, sizeof(type
+
+#define nir_array_first(arr, type) \
+   ((type *)(arr)->data)
+
+#define nir_array_size(arr, type) \
+   ((arr)->size / sizeof(type))
+
 #define nir_array_foreach(arr, type, elem) \
for (type *elem = (type *)(arr)->data; \
 elem < (type *)((char *)(arr)->data + (arr)->size); elem++)
 
+#define nir_array_foreach_reverse(arr, type, elem) \
+   for (type *elem = (type *)((char*)(arr)->data + (arr)->size) - 1; \
+(arr)->data && elem >= (type *)((arr)->data); elem--)
+
 #ifdef __cplusplus
 } /* extern "C" */
 #endif
-- 
2.4.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 4/5] nir: add a vectorization pass

2015-11-14 Thread Connor Abbott

This effectively does the opposite of nir_lower_alus_to_scalar, trying
to combine per-component ALU operations with the same sources but
different swizzles into one larger ALU operation. It uses a similar
model as CSE, where we do a depth-first approach and keep around a hash
set of instructions to be combined, but there are a few major
differences:

1. For now, we only support entirely per-component ALU operations.
2. Since it's not always guaranteed that we'll be able to combine
equivalent instructions, we keep a stack of equivalent instructions
around, trying to combine new instructions with instructions on the
stack.

The pass isn't comprehensive by far; it can't handle operations where
some of the sources are per-component and others aren't, and it can't
handle phi nodes. But it should handle the more common cases, and it
should be reasonably efficient.

Signed-off-by: Connor Abbott 
---
 src/glsl/Makefile.sources|   1 +
 src/glsl/nir/nir.h   |   2 +
 src/glsl/nir/nir_opt_vectorize.c | 447 +++
 3 files changed, 450 insertions(+)
 create mode 100644 src/glsl/nir/nir_opt_vectorize.c

diff --git a/src/glsl/Makefile.sources b/src/glsl/Makefile.sources
index d4b02c1..7390975 100644
--- a/src/glsl/Makefile.sources
+++ b/src/glsl/Makefile.sources
@@ -70,6 +70,7 @@ NIR_FILES = \
nir/nir_opt_peephole_select.c \
nir/nir_opt_remove_phis.c \
nir/nir_opt_undef.c \
+   nir/nir_opt_vectorize.c \
nir/nir_print.c \
nir/nir_remove_dead_variables.c \
nir/nir_search.c \
diff --git a/src/glsl/nir/nir.h b/src/glsl/nir/nir.h
index beabcaf..c1c04fd 100644
--- a/src/glsl/nir/nir.h
+++ b/src/glsl/nir/nir.h
@@ -2037,6 +2037,8 @@ bool nir_opt_remove_phis(nir_shader *shader);
 
 bool nir_opt_undef(nir_shader *shader);
 
+bool nir_opt_vectorize(nir_shader *shader);
+
 void nir_sweep(nir_shader *shader);
 
 nir_intrinsic_op nir_intrinsic_from_system_value(gl_system_value val);
diff --git a/src/glsl/nir/nir_opt_vectorize.c b/src/glsl/nir/nir_opt_vectorize.c
new file mode 100644
index 000..2a34a42
--- /dev/null
+++ b/src/glsl/nir/nir_opt_vectorize.c
@@ -0,0 +1,447 @@
+/*
+ * Copyright © 2015 Connor Abbott
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ */
+
+#include "nir.h"
+#include "nir_vla.h"
+#include "nir_builder.h"
+#include  "nir_array.h"
+
+#define HASH(hash, data) _mesa_fnv32_1a_accumulate((hash), (data))
+
+static uint32_t
+hash_src(uint32_t hash, const nir_src *src)
+{
+   assert(src->is_ssa);
+
+   return HASH(hash, src->ssa);
+}
+
+static uint32_t
+hash_alu_src(uint32_t hash, const nir_alu_src *src)
+{
+   assert(!src->abs && !src->negate);
+
+   /* intentionally don't hash swizzle */
+
+   return hash_src(hash, &src->src);
+}
+
+static uint32_t
+hash_alu(uint32_t hash, const nir_alu_instr *instr)
+{
+   hash = HASH(hash, instr->op);
+   
+   for (unsigned i = 0; i < nir_op_infos[instr->op].num_inputs; i++)
+  hash = hash_alu_src(hash, &instr->src[i]);
+
+   return hash;
+}
+
+static uint32_t
+hash_instr(const nir_instr *instr)
+{
+   uint32_t hash = _mesa_fnv32_1a_offset_bias;
+
+   switch (instr->type) {
+   case nir_instr_type_alu:
+  return hash_alu(hash, nir_instr_as_alu(instr));
+   default:
+  unreachable("bad instruction type");
+   }
+}
+
+static bool
+srcs_equal(const nir_src *src1, const nir_src *src2)
+{
+   assert(src1->is_ssa);
+   assert(src2->is_ssa);
+
+   return src1->ssa == src2->ssa;
+}
+
+static bool
+alu_srcs_equal(const nir_alu_src *src1, const nir_alu_src *src2)
+{
+   assert(!src1->abs);
+   assert(!src1->negate);
+   assert(!src2->abs);
+   assert(!src2->negate);
+
+   return srcs_equal(&src1->src, &src2->src);
+}
+
+static bool
+instrs_equal(const nir_instr *instr1, const nir_instr *instr2)
+{
+   switch (instr1->type) {
+   case nir_instr_type_alu: {
+  nir_alu_instr *alu1 = nir_instr_as_alu(instr1);

[Mesa-dev] [PATCH 5/5] i965/nir: use vectorization for non-scalar stages

2015-11-14 Thread Connor Abbott

Shader-db results on bdw with INTEL_DEBUG=vec4:

total instructions in shared programs: 1634044 -> 1612936 (-1.29%)
instructions in affected programs: 802502 -> 781394 (-2.63%)
helped: 5036
HURT: 1442

total cycles in shared programs: 9397790 -> 9355382 (-0.45%)
cycles in affected programs: 5078600 -> 5036192 (-0.84%)
helped: 3875
HURT: 2554

LOST:   0
GAINED: 0

Most of the hurt programs seem to be because we generate extra MOV's due
to vectorizing things. For example, in
shaders/non-free/steam/anomaly-2/158.shader_test, this:

add(8)  g116<1>.xyF g12<4,4,1>.xyyyF g1.4<0,4,1>.xyyyF { align16 
NoDDClr 1Q };
add(8)  g117<1>.xyF g12<4,4,1>.xyyyF g1.4<0,4,1>.zwwwF { align16 
NoDDClr 1Q };
add(8)  g116<1>.zwF g12<4,4,1>.xxxyF -g1.4<0,4,1>.xxxyF { align16 
NoDDChk 1Q };
add(8)  g117<1>.zwF g12<4,4,1>.xxxyF -g1.4<0,4,1>.zzzwF { align16 
NoDDChk 1Q };

Turns into this:

add(8)  g13<1>F g12<4,4,1>.xyxyF g1.4<0,4,1>F   { align16 1Q };
add(8)  g14<1>F g12<4,4,1>.xyxyF -g1.4<0,4,1>F  { align16 1Q };
mov(8)  g116<1>.xyD g13<4,4,1>.xyyyD{ align16 
NoDDClr 1Q };
mov(8)  g117<1>.xyD g13<4,4,1>.zwwwD{ align16 
NoDDClr 1Q };
mov(8)  g116<1>.zwD g14<4,4,1>.xxxyD{ align16 
NoDDChk 1Q };
mov(8)  g117<1>.zwD g14<4,4,1>.zzzwD{ align16 
NoDDChk 1Q };

So we eliminated two add's, but then had to introduce four mov's to
transpose the result. I don't think there's much we can do about this at
the NIR level, unfortunately.

Signed-off-by: Connor Abbott 
---
 src/mesa/drivers/dri/i965/brw_nir.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_nir.c 
b/src/mesa/drivers/dri/i965/brw_nir.c
index fe5cad4..29cafe6 100644
--- a/src/mesa/drivers/dri/i965/brw_nir.c
+++ b/src/mesa/drivers/dri/i965/brw_nir.c
@@ -198,6 +198,14 @@ nir_optimize(nir_shader *nir, bool is_scalar)
   nir_validate_shader(nir);
   progress |= nir_opt_cse(nir);
   nir_validate_shader(nir);
+
+  if (!is_scalar) {
+ progress |= nir_opt_vectorize(nir);
+ nir_validate_shader(nir);
+ progress |= nir_copy_prop(nir);
+ nir_validate_shader(nir);
+  }
+
   progress |= nir_opt_peephole_select(nir);
   nir_validate_shader(nir);
   progress |= nir_opt_algebraic(nir);
-- 
2.4.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 0/6] i965: Do textures swizzle work-arounds in NIR.

2015-11-14 Thread Jason Ekstrand

On Nov 14, 2015 6:40 PM, "Connor Abbott"  wrote:
>
> Series is
>
> Reviewed-by: Connor Abbott 

Thanks!  I'll make sure to have ken or matt take a quick look at the last
one.

Now, if only someone would review the nir_shader_clone patch...

> Although I'm not as familiar now with the code touched in the last patch.
>
> On Thu, Nov 12, 2015 at 3:13 PM, Jason Ekstrand 
wrote:
> > The subject says it all.  This little series adds texture swizzle
support
> > to nir_lower_tex and makes the i965 driver use that instead of its own
> > code.  This approach has a few advantages:
> >
> >  1) It lets us share the swizzling code between both backends and with
> > anyone else who wants it.
> >
> >  2) It simplifies the work-arounds we have to do for RG32F textures
because
> > everything is pre-swizzled by the time the backend sees it.
> >
> >  3) If someone wants to come along and implement texture write-masking
in
> > the FS backend, they'll have an easier time of it.  One of my
> > approaches to doing so was substantially complicated by the fact
that,
> > thanks to the swizzle, the channels used that we can see from NIR
don't
> > apply to the actual texturing instruction.  Now that the swizzling
is
> > done in NIR, this should be substantially simpler.
> >
> > Jason Ekstrand (6):
> >   nir/validate: Validated dests after sources
> >   nir: Use instr/if_rewrite in nir_ssa_def_rewrite_uses
> >   nir: Add a ssa_def_rewrite_uses_after helper
> >   nir: Add a tex_instr_is_query helper
> >   nir/lower_tex: Add support for lowering texture swizzle
> >   i965: Use NIR for lowering texture swizzle
> >
> >  src/glsl/nir/nir.c |  63 ---
> >  src/glsl/nir/nir.h |  40 ++
> >  src/glsl/nir/nir_lower_tex.c   |  67 
> >  src/glsl/nir/nir_validate.c|  18 ++---
> >  src/mesa/drivers/dri/i965/brw_fs.h |   4 -
> >  src/mesa/drivers/dri/i965/brw_fs_visitor.cpp   | 105
-
> >  src/mesa/drivers/dri/i965/brw_nir.c|  14 +++-
> >  src/mesa/drivers/dri/i965/brw_vec4.h   |   4 -
> >  src/mesa/drivers/dri/i965/brw_vec4_nir.cpp |  24 +++---
> >  src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp |  93
++
> >  10 files changed, 215 insertions(+), 217 deletions(-)
> >
> > --
> > 2.5.0.400.gff86faf
> >
> > ___
> > mesa-dev mailing list
> > mesa-dev@lists.freedesktop.org
> > http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

87 matches

Mail list logo