Re: [Mesa-dev] [PATCH 5/7] mesa: enable enums for OES_geometry_shader

2015-09-24 Thread Lofstedt, Marta
> -Original Message-
> From: mesa-dev [mailto:mesa-dev-boun...@lists.freedesktop.org] On
> Behalf Of Kenneth Graunke
> Sent: Wednesday, September 23, 2015 6:25 PM
> To: mesa-dev@lists.freedesktop.org
> Subject: Re: [Mesa-dev] [PATCH 5/7] mesa: enable enums for
> OES_geometry_shader
> 
> On Wednesday, September 23, 2015 08:50:34 AM Matt Turner wrote:
> > On Wed, Sep 23, 2015 at 8:45 AM, Ilia Mirkin 
> wrote:
> > > These 2 should probably be in the below GL_CORE/GLES31 section. As
> > > probably should all or most of these, but ... meh. I guess someone
> > > still has ARB_gs4 aspirations.
> >
> > I don't think so actually -- reading through these patches made me
> > wonder if we should strip out the ARB_geometry_shader4 cruft.
> 
> +1.  I don't think we ever want to support it.

I would happily toss out the ARB_shader_geometry_shader4 stuff and rebase my 
GL_OES_geometry_shader patches on top of that, but I would need some 
confirmation that this is actually the way we should go forward.
I will wait a couple of days for feedback, if no one is against it will soon be 
gone.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/7] mesa: Add extension enable for GL_OES_geometry_shader

2015-09-24 Thread Lofstedt, Marta
> -Original Message-
> From: mesa-dev [mailto:mesa-dev-boun...@lists.freedesktop.org] On
> Behalf Of Ilia Mirkin
> Sent: Wednesday, September 23, 2015 5:17 PM
> To: Marta Lofstedt
> Cc: mesa-dev@lists.freedesktop.org
> Subject: Re: [Mesa-dev] [PATCH 3/7] mesa: Add extension enable for
> GL_OES_geometry_shader
> 
> On Wed, Sep 23, 2015 at 4:42 AM, Marta Lofstedt
>  wrote:
> > From: Marta Lofstedt 
> >
> > Exposes the GL_OES_geometry_shader extension for OpenGL ES 3.1.
> >
> > Signed-off-by: Marta Lofstedt 
> > ---
> >  src/mesa/main/extensions.c  | 1 +
> >  src/mesa/main/mtypes.h  | 1 +
> >  src/mesa/main/tests/dispatch_sanity.cpp | 3 +++
> >  3 files changed, 5 insertions(+)
> >
> > diff --git a/src/mesa/main/extensions.c b/src/mesa/main/extensions.c
> > index b2c88c3..5cfcd43 100644
> > --- a/src/mesa/main/extensions.c
> > +++ b/src/mesa/main/extensions.c
> > @@ -314,6 +314,7 @@ static const struct extension extension_table[] = {
> > { "GL_OES_fbo_render_mipmap",   o(dummy_true),
> ES1 | ES2, 2005 },
> > { "GL_OES_fixed_point", o(dummy_true),
> ES1,   2002 },
> > { "GL_OES_framebuffer_object",  o(dummy_true),
> ES1,   2005 },
> > +   { "GL_OES_geometry_shader", o(OES_geometry_shader),
> ES31, 2015 },
> > { "GL_OES_get_program_binary",  o(dummy_true),
> ES2, 2008 },
> > { "GL_OES_mapbuffer",   o(dummy_true),
> ES1 | ES2, 2005 },
> > { "GL_OES_packed_depth_stencil",o(dummy_true),
> ES1 | ES2, 2007 },
> > diff --git a/src/mesa/main/mtypes.h b/src/mesa/main/mtypes.h index
> > fac45aa..2886f19 100644
> > --- a/src/mesa/main/mtypes.h
> > +++ b/src/mesa/main/mtypes.h
> > @@ -3770,6 +3770,7 @@ struct gl_extensions
> > GLboolean OES_texture_half_float;
> > GLboolean OES_texture_half_float_linear;
> > GLboolean OES_compressed_ETC1_RGB8_texture;
> > +   GLboolean OES_geometry_shader;
> > GLboolean extension_sentinel;
> > /** The extension string */
> > const GLubyte *String;
> > diff --git a/src/mesa/main/tests/dispatch_sanity.cpp
> > b/src/mesa/main/tests/dispatch_sanity.cpp
> > index 0ddda59..cce05d9 100644
> > --- a/src/mesa/main/tests/dispatch_sanity.cpp
> > +++ b/src/mesa/main/tests/dispatch_sanity.cpp
> > @@ -2481,5 +2481,8 @@ const struct function gles31_functions_possible[]
> = {
> > /* GL_OES_texture_storage_multisample_2d_array */
> > { "glTexStorage3DMultisampleOES", 31, -1 },
> >
> > +   /*GL_OES_geometry_shader*/
> > +   { "glFramebufferTextureOES", 31, -1},
> 
> Shouldn't this go into a previous patch, the one that adds the function in the
> glapi xml?

OK, thanks Ilia I will fix this for the V2s.

> 
> > +
> > { NULL, 0, -1 },
> >   };
> > --
> > 2.1.4
> >
> > ___
> > mesa-dev mailing list
> > mesa-dev@lists.freedesktop.org
> > http://lists.freedesktop.org/mailman/listinfo/mesa-dev
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 4/7] glsl: add support for GL_OES_geometry_shader

2015-09-24 Thread Lofstedt, Marta
> -Original Message-
> From: mesa-dev [mailto:mesa-dev-boun...@lists.freedesktop.org] On
> Behalf Of Ilia Mirkin
> Sent: Wednesday, September 23, 2015 5:20 PM
> To: Marta Lofstedt
> Cc: mesa-dev@lists.freedesktop.org
> Subject: Re: [Mesa-dev] [PATCH 4/7] glsl: add support for
> GL_OES_geometry_shader
> 
> On Wed, Sep 23, 2015 at 4:42 AM, Marta Lofstedt
>  wrote:
> > From: Marta Lofstedt 
> >
> > This adds glsl support of GL_OES_geometry_shader for OpenGL ES 3.1.
> >
> > Signed-off-by: Marta Lofstedt 
> > ---
> >  src/glsl/builtin_variables.cpp  | 16 +---
> >  src/glsl/glsl_parser.yy |  4 ++--
> >  src/glsl/glsl_parser_extras.cpp |  1 +
> >  src/glsl/glsl_parser_extras.h   |  2 ++
> >  4 files changed, 10 insertions(+), 13 deletions(-)
> >
> > diff --git a/src/glsl/builtin_variables.cpp
> > b/src/glsl/builtin_variables.cpp index b5e2908..cd8d61c 100644
> > --- a/src/glsl/builtin_variables.cpp
> > +++ b/src/glsl/builtin_variables.cpp
> > @@ -611,7 +611,7 @@ builtin_variable_generator::generate_constants()
> >add_const("gl_MaxVaryingComponents", state->ctx-
> >Const.MaxVarying * 4);
> > }
> >
> > -   if (state->is_version(150, 0)) {
> > +   if (state->is_version(150, 0) ||
> > + state->OES_geometry_shader_enable) {
> 
> While you're at it, throw in 320 as well? (both here and below)

Yeah, cool then there are no reason for objections about the having the 
GL_OES_geometry_shader under GLES 3.2 in docs/GL3.txt. 
Thanks Ilia!

> 
> >add_const("gl_MaxVertexOutputComponents",
> >  state->Const.MaxVertexOutputComponents);
> >add_const("gl_MaxGeometryInputComponents",
> > @@ -674,10 +674,7 @@ builtin_variable_generator::generate_constants()
> >add_const("gl_MaxAtomicCounterBindings",
> >  state->Const.MaxAtomicBufferBindings);
> >
> > -  /* When Mesa adds support for GL_OES_geometry_shader and
> > -   * GL_OES_tessellation_shader, this will need to change.
> > -   */
> > -  if (!state->es_shader) {
> > +  if (!state->es_shader || state->OES_geometry_shader_enable) {
> >   add_const("gl_MaxGeometryAtomicCounters",
> > state->Const.MaxGeometryAtomicCounters);
> >
> > @@ -701,10 +698,7 @@ builtin_variable_generator::generate_constants()
> >add_const("gl_MaxAtomicCounterBufferSize",
> >  state->Const.MaxAtomicCounterBufferSize);
> >
> > -  /* When Mesa adds support for GL_OES_geometry_shader and
> > -   * GL_OES_tessellation_shader, this will need to change.
> > -   */
> > -  if (!state->es_shader) {
> > +  if (!state->es_shader || state->OES_geometry_shader_enable) {
> >   add_const("gl_MaxGeometryAtomicCounterBuffers",
> > state->Const.MaxGeometryAtomicCounterBuffers);
> >   add_const("gl_MaxTessControlAtomicCounterBuffers",
> > @@ -762,7 +756,7 @@ builtin_variable_generator::generate_constants()
> >add_const("gl_MaxCombinedImageUniforms",
> >  state->Const.MaxCombinedImageUniforms);
> >
> > -  if (!state->es_shader) {
> > +  if (!state->es_shader || state->OES_geometry_shader_enable) {
> >   add_const("gl_MaxCombinedImageUnitsAndFragmentOutputs",
> > state->Const.MaxCombinedShaderOutputResources);
> >   add_const("gl_MaxImageSamples", @@ -993,7 +987,7 @@
> > builtin_variable_generator::generate_fs_special_vars()
> > if (state->is_version(120, 100))
> >add_input(VARYING_SLOT_PNTC, vec2_t, "gl_PointCoord");
> >
> > -   if (state->is_version(150, 0)) {
> > +   if (state->is_version(150, 0) ||
> > + state->OES_geometry_shader_enable) {
> >ir_variable *var =
> >   add_input(VARYING_SLOT_PRIMITIVE_ID, int_t, "gl_PrimitiveID");
> >var->data.interpolation = INTERP_QUALIFIER_FLAT; diff --git
> > a/src/glsl/glsl_parser.yy b/src/glsl/glsl_parser.yy index
> > 59e4527..ab50c4b 100644
> > --- a/src/glsl/glsl_parser.yy
> > +++ b/src/glsl/glsl_parser.yy
> > @@ -1250,7 +1250,7 @@ layout_qualifier_id:
> >  }
> >   }
> >
> > - if ($$.flags.i && !state->is_version(150, 0)) {
> > + if ($$.flags.i && !state->is_version(150, 310)) {
> 
> Should this be 320 + && !state->OES_geometry_shader_enable ? (Also
> below.)
> 
> >  _mesa_glsl_error(& @1, state, "#version 150 layout "
> >   "qualifier `%s' used", $1);
> >   }
> > @@ -1491,7 +1491,7 @@ layout_qualifier_id:
> >  YYERROR;
> >   } else {
> >  $$.max_vertices = $3;
> > -if (!state->is_version(150, 0)) {
> > +if (!state->is_version(150, 310)) {
> > _mesa_glsl_error(& @3, state,
> >  "#version 150 max_vertices qualifier "
> >  "specified", $3); diff --git
> > a/src/glsl/glsl_parser_extras.cpp b/src/glsl/glsl_parser_extras.cpp
> > index dae5261..1cb34df 1

[Mesa-dev] [PATCH v2] i965/vec4: check swizzle before discarding a uniform on a 3src operand

2015-09-24 Thread Alejandro Piñeiro
Without this commit, copy propagation is discarded if it involves
a uniform with an instruction that has 3 sources. But 3 sourced
instructions can access scalar values.

For example, this is what vec4_visitor::fix_3src_operand() is already
doing:

   if (src.file == UNIFORM && brw_is_single_value_swizzle(src.swizzle))
  return src;

Shader-db results (unfiltered) on NIR:
total instructions in shared programs: 6259650 -> 6241985 (-0.28%)
instructions in affected programs: 812755 -> 795090 (-2.17%)
helped:7930
HURT:  0

Shader-db results (unfiltered) on IR:
total instructions in shared programs: 6445822 -> 6441788 (-0.06%)
instructions in affected programs: 296630 -> 292596 (-1.36%)
helped:2533
HURT:  0

v2:
- Updated commit message, using Matt Turner suggestions
- Move the check after we've created the final value, as Jason
  Ekstrand suggested
- Clean up the condition
---

Removed RFC as both Matt Turner and Jason Ekstrand accepted the change.

I updated the patch as the condition was just ugly. As I made that change
I also included Jason suggestion to move the check after the swizzle
on the final value is computed. I know that the first version got a
"Reviewed" by Jason, but due all these change, I prefer to be conservative,
and ask for a new review.

Thanks in advance.

 src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp | 12 +++-
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp
index d3f0ddd..ce4d39d 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp
@@ -325,9 +325,6 @@ try_copy_propagate(const struct brw_device_info *devinfo,
inst->opcode == SHADER_OPCODE_GEN4_SCRATCH_WRITE)
   return false;
 
-   if (inst->is_3src() && value.file == UNIFORM)
-  return false;
-
if (inst->is_send_from_grf())
   return false;
 
@@ -373,6 +370,13 @@ try_copy_propagate(const struct brw_device_info *devinfo,
}
 
/* Build the final value */
+   value.swizzle = brw_compose_swizzle(inst->src[arg].swizzle,
+   value.swizzle);
+   /* One last check based on the final swizzle */
+   if (inst->is_3src() && value.file == UNIFORM &&
+   !brw_is_single_value_swizzle(value.swizzle))
+  return false;
+
if (inst->src[arg].abs) {
   value.negate = false;
   value.abs = true;
@@ -380,8 +384,6 @@ try_copy_propagate(const struct brw_device_info *devinfo,
if (inst->src[arg].negate)
   value.negate = !value.negate;
 
-   value.swizzle = brw_compose_swizzle(inst->src[arg].swizzle,
-   value.swizzle);
if (has_source_modifiers &&
value.type != inst->src[arg].type) {
   assert(can_change_source_types(inst));
-- 
2.1.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v5 2/3] mesa: Move _mesa_base_tex_format() from teximage to glformats files

2015-09-24 Thread Eduardo Lima Mitev
This function will be needed as part of validating the combination of format,
type and internal format of texture pixel operations, which happens in
glformats files. Specifically, we want to be able to obtain the base format
of a resolved effective internal format, to compare it with the original
internal format passed.

Also, since this function deals solely with GL formats, it fits better in
glformats where the rest of similar format functionality rests.

The function is moved as-is, without any modification.

Reviewed-by: Jason Ekstrand 
Cc: "11.0" 
---
 src/mesa/main/glformats.c | 505 ++
 src/mesa/main/glformats.h |   2 +
 src/mesa/main/teximage.c  | 374 --
 src/mesa/main/teximage.h  |   4 -
 4 files changed, 507 insertions(+), 378 deletions(-)

diff --git a/src/mesa/main/glformats.c b/src/mesa/main/glformats.c
index 6cfffdb..515b06e 100644
--- a/src/mesa/main/glformats.c
+++ b/src/mesa/main/glformats.c
@@ -2139,6 +2139,511 @@ _mesa_es_error_check_format_and_type(GLenum format, 
GLenum type,
return type_valid ? GL_NO_ERROR : GL_INVALID_OPERATION;
 }
 
+/**
+ * Return the simple base format for a given internal texture format.
+ * For example, given GL_LUMINANCE12_ALPHA4, return GL_LUMINANCE_ALPHA.
+ *
+ * \param ctx GL context.
+ * \param internalFormat the internal texture format token or 1, 2, 3, or 4.
+ *
+ * \return the corresponding \u base internal format (GL_ALPHA, GL_LUMINANCE,
+ * GL_LUMANCE_ALPHA, GL_INTENSITY, GL_RGB, or GL_RGBA), or -1 if invalid enum.
+ *
+ * This is the format which is used during texture application (i.e. the
+ * texture format and env mode determine the arithmetic used.
+ */
+GLint
+_mesa_base_tex_format(const struct gl_context *ctx, GLint internalFormat)
+{
+   switch (internalFormat) {
+   case GL_ALPHA:
+   case GL_ALPHA4:
+   case GL_ALPHA8:
+   case GL_ALPHA12:
+   case GL_ALPHA16:
+  return (ctx->API != API_OPENGL_CORE) ? GL_ALPHA : -1;
+   case 1:
+   case GL_LUMINANCE:
+   case GL_LUMINANCE4:
+   case GL_LUMINANCE8:
+   case GL_LUMINANCE12:
+   case GL_LUMINANCE16:
+  return (ctx->API != API_OPENGL_CORE) ? GL_LUMINANCE : -1;
+   case 2:
+   case GL_LUMINANCE_ALPHA:
+   case GL_LUMINANCE4_ALPHA4:
+   case GL_LUMINANCE6_ALPHA2:
+   case GL_LUMINANCE8_ALPHA8:
+   case GL_LUMINANCE12_ALPHA4:
+   case GL_LUMINANCE12_ALPHA12:
+   case GL_LUMINANCE16_ALPHA16:
+  return (ctx->API != API_OPENGL_CORE) ? GL_LUMINANCE_ALPHA : -1;
+   case GL_INTENSITY:
+   case GL_INTENSITY4:
+   case GL_INTENSITY8:
+   case GL_INTENSITY12:
+   case GL_INTENSITY16:
+  return (ctx->API != API_OPENGL_CORE) ? GL_INTENSITY : -1;
+   case 3:
+  return (ctx->API != API_OPENGL_CORE) ? GL_RGB : -1;
+   case GL_RGB:
+   case GL_R3_G3_B2:
+   case GL_RGB4:
+   case GL_RGB5:
+   case GL_RGB8:
+   case GL_RGB10:
+   case GL_RGB12:
+   case GL_RGB16:
+  return GL_RGB;
+   case 4:
+  return (ctx->API != API_OPENGL_CORE) ? GL_RGBA : -1;
+   case GL_RGBA:
+   case GL_RGBA2:
+   case GL_RGBA4:
+   case GL_RGB5_A1:
+   case GL_RGBA8:
+   case GL_RGB10_A2:
+   case GL_RGBA12:
+   case GL_RGBA16:
+  return GL_RGBA;
+   default:
+  ; /* fallthrough */
+   }
+
+   /* GL_BGRA can be an internal format *only* in OpenGL ES (1.x or 2.0).
+*/
+   if (_mesa_is_gles(ctx)) {
+  switch (internalFormat) {
+  case GL_BGRA:
+ return GL_RGBA;
+  default:
+ ; /* fallthrough */
+  }
+   }
+
+   if (ctx->Extensions.ARB_ES2_compatibility) {
+  switch (internalFormat) {
+  case GL_RGB565:
+ return GL_RGB;
+  default:
+ ; /* fallthrough */
+  }
+   }
+
+   if (ctx->Extensions.ARB_depth_texture) {
+  switch (internalFormat) {
+  case GL_DEPTH_COMPONENT:
+  case GL_DEPTH_COMPONENT16:
+  case GL_DEPTH_COMPONENT24:
+  case GL_DEPTH_COMPONENT32:
+ return GL_DEPTH_COMPONENT;
+  case GL_DEPTH_STENCIL:
+  case GL_DEPTH24_STENCIL8:
+ return GL_DEPTH_STENCIL;
+  default:
+ ; /* fallthrough */
+  }
+   }
+
+   if (ctx->Extensions.ARB_texture_stencil8) {
+  switch (internalFormat) {
+  case GL_STENCIL_INDEX:
+  case GL_STENCIL_INDEX1:
+  case GL_STENCIL_INDEX4:
+  case GL_STENCIL_INDEX8:
+  case GL_STENCIL_INDEX16:
+ return GL_STENCIL_INDEX;
+  default:
+ ; /* fallthrough */
+  }
+   }
+
+   switch (internalFormat) {
+   case GL_COMPRESSED_ALPHA:
+  return GL_ALPHA;
+   case GL_COMPRESSED_LUMINANCE:
+  return GL_LUMINANCE;
+   case GL_COMPRESSED_LUMINANCE_ALPHA:
+  return GL_LUMINANCE_ALPHA;
+   case GL_COMPRESSED_INTENSITY:
+  return GL_INTENSITY;
+   case GL_COMPRESSED_RGB:
+  return GL_RGB;
+   case GL_COMPRESSED_RGBA:
+  return GL_RGBA;
+   default:
+  ; /* fallthrough */
+   }
+
+   if (ctx->Extensions.TDFX_texture_compression_FXT1) {
+  switch (internalFormat) {
+  case GL_COMPRESSED_RGB_FXT1_3DFX:
+ return

[Mesa-dev] [PATCH v5 3/3] mesa: Use the effective internal format instead for validation

2015-09-24 Thread Eduardo Lima Mitev
When validating format+type+internalFormat for texture pixel operations
on GLES3, the effective internal format should be used if the one
specified is an unsized internal format. Page 127, section "3.8 Texturing"
of the GLES 3.0.4 spec says:

"if internalformat is a base internal format, the effective internal
 format is a sized internal format that is derived from the format and
 type for internal use by the GL. Table 3.12 specifies the mapping of
 format and type to effective internal formats. The effective internal
 format is used by the GL for purposes such as texture completeness or
 type checks for CopyTex* commands. In these cases, the GL is required
 to operate as if the effective internal format was used as the
 internalformat when specifying the texture data."

v2: Per the spec, Luminance8Alpha8, Luminance8 and Alpha8 should not be
considered sized internal formats. Return the corresponding unsize format
instead.

v4: * Improved comments in
  _mesa_es3_effective_internal_format_for_format_and_type().
* Splitted patch to separate chunk about reordering of
  error_check_subtexture_dimensions() error check, which is not directly
  related with this patch.
v5: Dropped the splitted patch because it was actually a work around 3
dEQP tests that are buggy:

dEQP-GLES2.functional.negative_api.texture.texsubimage2d_neg_offset
dEQP-GLES2.functional.negative_api.texture.texsubimage2d_offset_allowed
dEQP-GLES2.functional.negative_api.texture.texsubimage2d_neg_wdt_hgt

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91582
---
 src/mesa/main/glformats.c | 151 ++
 1 file changed, 151 insertions(+)

diff --git a/src/mesa/main/glformats.c b/src/mesa/main/glformats.c
index 515b06e..7dab33c 100644
--- a/src/mesa/main/glformats.c
+++ b/src/mesa/main/glformats.c
@@ -2646,6 +2646,127 @@ _mesa_base_tex_format(const struct gl_context *ctx, 
GLint internalFormat)
 }
 
 /**
+ * Returns the effective internal format from a texture format and type.
+ * This is used by texture image operations internally for validation, when
+ * the specified internal format is a base (unsized) format.
+ *
+ * This method will only return a valid effective internal format if the
+ * combination of format, type and internal format in base form, is acceptable.
+ *
+ * If a single sized internal format is defined in the spec (OpenGL-ES 3.0.4) 
or
+ * in extensions, to unambiguously correspond to the given base format, then
+ * that internal format is returned as the effective. Otherwise, if the
+ * combination is accepted but a single effective format is not defined, the
+ * passed base format will be returned instead.
+ *
+ * \param format the texture format
+ * \param type the texture type
+ */
+static GLenum
+_mesa_es3_effective_internal_format_for_format_and_type(GLenum format,
+GLenum type)
+{
+   switch (type) {
+   case GL_UNSIGNED_BYTE:
+  switch (format) {
+  case GL_RGBA:
+ return GL_RGBA8;
+  case GL_RGB:
+ return GL_RGB8;
+  /* Although LUMINANCE_ALPHA, LUMINANCE and ALPHA appear in table 3.12,
+   * (section 3.8 Texturing, page 128 of the OpenGL-ES 3.0.4) as effective
+   * internal formats, they do not correspond to GL constants, so the base
+   * format is returned instead.
+   */
+  case GL_LUMINANCE_ALPHA:
+  case GL_LUMINANCE:
+  case GL_ALPHA:
+ return format;
+  }
+  break;
+
+   case GL_UNSIGNED_SHORT_4_4_4_4:
+  if (format == GL_RGBA)
+ return GL_RGBA4;
+  break;
+
+   case GL_UNSIGNED_SHORT_5_5_5_1:
+  if (format == GL_RGBA)
+ return GL_RGB5_A1;
+  break;
+
+   case GL_UNSIGNED_SHORT_5_6_5:
+  if (format == GL_RGB)
+ return GL_RGB565;
+  break;
+
+   /* OES_packed_depth_stencil */
+   case GL_UNSIGNED_INT_24_8:
+  if (format == GL_DEPTH_STENCIL)
+ return GL_DEPTH24_STENCIL8;
+  break;
+
+   case GL_FLOAT_32_UNSIGNED_INT_24_8_REV:
+  if (format == GL_DEPTH_STENCIL)
+ return GL_DEPTH32F_STENCIL8;
+  break;
+
+   case GL_UNSIGNED_SHORT:
+  if (format == GL_DEPTH_COMPONENT)
+ return GL_DEPTH_COMPONENT16;
+  break;
+
+   case GL_UNSIGNED_INT:
+  /* It can be DEPTH_COMPONENT16 or DEPTH_COMPONENT24, so just return
+   * the format.
+   */
+  if (format == GL_DEPTH_COMPONENT)
+ return format;
+  break;
+
+   /* OES_texture_float and OES_texture_half_float */
+   case GL_FLOAT:
+  if (format == GL_DEPTH_COMPONENT)
+ return GL_DEPTH_COMPONENT32F;
+  /* fall through */
+   case GL_HALF_FLOAT_OES:
+  switch (format) {
+  case GL_RGBA:
+  case GL_RGB:
+  case GL_LUMINANCE_ALPHA:
+  case GL_LUMINANCE:
+  case GL_ALPHA:
+  case GL_RED:
+  case GL_RG:
+ return format;
+  }
+  break;
+   case GL_HALF_FLOAT:
+

[Mesa-dev] [PATCH v5 0/3] A couple of fixes for Tex(Sub)Image error checks

2015-09-24 Thread Eduardo Lima Mitev
This is a new version of the series that attempt to fix the regression reported 
at:

https://bugs.freedesktop.org/show_bug.cgi?id=91582

The review by Jason helped me uncover the fact that the following 3 dEQP tests 
are buggy:

dEQP-GLES2.functional.negative_api.texture.texsubimage2d_neg_offset
dEQP-GLES2.functional.negative_api.texture.texsubimage2d_offset_allowed
dEQP-GLES2.functional.negative_api.texture.texsubimage2d_neg_wdt_hgt

So the patch split I did in the previous version of the series (v4) was 
actually not necessary. It was just a work-around to the failure of these 
tests, which this series uncovered.

Now in this new version, I dropped the splitted patch and filed a bug against 
dEQP (together with a reference patch) to fix the above tests, which will start 
to fail once/if we merge this series.

"[dEQP] Buggy negative API tests that check dimensions args of glTexSubImage2D" 


I filed it against the AOSP project, where dEQP package is (under 
external/deqp). Lets see if that was correct.

Mark, in the mean time we can probably apply the patch I attached to the bug 
report, otherwise the regression originally reported won't go away. What do you 
think?

Notice that first two patches has R-b from Jason already. Only the patch 3/3 is 
pending review.

The question that remains is whether I should cc Mesa 10.6 stable too, apart 
from 11.0.

cheers,

Eduardo Lima Mitev (3):
  mesa: Fix order of format+type and internal format checks for
glTexImageXD ops
  mesa: Move _mesa_base_tex_format() from teximage to glformats files
  mesa: Use the effective internal format instead for validation

 src/mesa/main/glformats.c | 656 ++
 src/mesa/main/glformats.h |   2 +
 src/mesa/main/teximage.c  | 415 ++---
 src/mesa/main/teximage.h  |   4 -
 4 files changed, 683 insertions(+), 394 deletions(-)

-- 
2.4.6

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v5 1/3] mesa: Fix order of format+type and internal format checks for glTexImageXD ops

2015-09-24 Thread Eduardo Lima Mitev
The more specific GLES constrains should be checked after the general
validation performed by _mesa_error_check_format_and_type(). This is also
for consistency with the error checks order of glTexSubImage ops.

v3: The change of order uncovered a bug that regresses a couple of piglit
tests written against OpenGL-ES 1.1 spec, which expects an INVALID_VALUE
instead of the INVALID_ENUM returned by _mesa_error_check_format_and_type()
when an invalid format is passed to glTexImage2D. This version of the patch
accounts for those cases.

Fixes 1 dEQP test:
* dEQP-GLES3.functional.negative_api.texture.teximage2d

Reviewed-by: Jason Ekstrand 
Cc: "11.0" 
---
 src/mesa/main/teximage.c | 41 +
 1 file changed, 25 insertions(+), 16 deletions(-)

diff --git a/src/mesa/main/teximage.c b/src/mesa/main/teximage.c
index 9bc176a..a0dbb48 100644
--- a/src/mesa/main/teximage.c
+++ b/src/mesa/main/teximage.c
@@ -2164,15 +2164,23 @@ texture_error_check( struct gl_context *ctx,
   return GL_TRUE;
}
 
-   /* OpenGL ES 1.x and OpenGL ES 2.0 impose additional restrictions on the
-* combinations of format, internalFormat, and type that can be used.
-* Formats and types that require additional extensions (e.g., GL_FLOAT
-* requires GL_OES_texture_float) are filtered elsewhere.
-*/
-   if (_mesa_is_gles(ctx) &&
-   texture_format_error_check_gles(ctx, format, type, internalFormat,
-   dimensions, "glTexImage%dD")) {
- return GL_TRUE;
+   /* Check incoming image format and type */
+   err = _mesa_error_check_format_and_type(ctx, format, type);
+   if (err != GL_NO_ERROR) {
+  /* Prior to OpenGL-ES 2.0, an INVALID_VALUE is expected instead of
+   * INVALID_ENUM. From page 73 OpenGL ES 1.1 spec:
+   *
+   * "Specifying a value for internalformat that is not one of the
+   *  above (acceptable) values generates the error INVALID VALUE."
+   */
+  if (err == GL_INVALID_ENUM && _mesa_is_gles(ctx) && ctx->Version < 20)
+ err = GL_INVALID_VALUE;
+
+  _mesa_error(ctx, err,
+  "glTexImage%dD(incompatible format = %s, type = %s)",
+  dimensions, _mesa_enum_to_string(format),
+  _mesa_enum_to_string(type));
+  return GL_TRUE;
}
 
/* Check internalFormat */
@@ -2183,13 +2191,14 @@ texture_error_check( struct gl_context *ctx,
   return GL_TRUE;
}
 
-   /* Check incoming image format and type */
-   err = _mesa_error_check_format_and_type(ctx, format, type);
-   if (err != GL_NO_ERROR) {
-  _mesa_error(ctx, err,
-  "glTexImage%dD(incompatible format = %s, type = %s)",
-  dimensions, _mesa_enum_to_string(format),
-  _mesa_enum_to_string(type));
+   /* OpenGL ES 1.x and OpenGL ES 2.0 impose additional restrictions on the
+* combinations of format, internalFormat, and type that can be used.
+* Formats and types that require additional extensions (e.g., GL_FLOAT
+* requires GL_OES_texture_float) are filtered elsewhere.
+*/
+   if (_mesa_is_gles(ctx) &&
+   texture_format_error_check_gles(ctx, format, type, internalFormat,
+   dimensions, "glTexImage%dD")) {
   return GL_TRUE;
}
 
-- 
2.4.6

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] nir: Implement __intrinsic_load_ssbo

2015-09-24 Thread Samuel Iglesias Gonsálvez


On 23/09/15 12:07, Samuel Iglesias Gonsalvez wrote:
> From: Iago Toral Quiroga 
> 
> v2:
> - Fix ssbo loads with boolean variables.
> 
> v3:
> - Simplify the changes (Kristian)
> 
> Reviewed-by: Connor Abbott 
> ---
>  src/glsl/nir/glsl_to_nir.cpp| 66 
> +
>  src/glsl/nir/nir_intrinsics.h   |  2 +-
>  src/glsl/nir/nir_lower_phis_to_scalar.c |  2 +
>  3 files changed, 69 insertions(+), 1 deletion(-)
> 
> diff --git a/src/glsl/nir/glsl_to_nir.cpp b/src/glsl/nir/glsl_to_nir.cpp
> index 93d5d23..906797e 100644
> --- a/src/glsl/nir/glsl_to_nir.cpp
> +++ b/src/glsl/nir/glsl_to_nir.cpp
> @@ -651,6 +651,8 @@ nir_visitor::visit(ir_call *ir)
>   op = nir_intrinsic_image_samples;
>} else if (strcmp(ir->callee_name(), "__intrinsic_store_ssbo") == 0) {
>   op = nir_intrinsic_store_ssbo;
> +  } else if (strcmp(ir->callee_name(), "__intrinsic_load_ssbo") == 0) {
> + op = nir_intrinsic_load_ssbo;
>} else {
>   unreachable("not reached");
>}
> @@ -790,6 +792,70 @@ nir_visitor::visit(ir_call *ir)
>   nir_instr_insert_after_cf_list(this->cf_node_list, &instr->instr);
>   break;
>}
> +  case nir_intrinsic_load_ssbo: {
> + exec_node *param = ir->actual_parameters.get_head();
> + ir_rvalue *block = ((ir_instruction *)param)->as_rvalue();
> +
> + param = param->get_next();
> + ir_rvalue *offset = ((ir_instruction *)param)->as_rvalue();
> +
> + /* Check if we need the indirect version */
> + ir_constant *const_offset = offset->as_constant();
> + if (!const_offset) {
> +op = nir_intrinsic_load_ssbo_indirect;
> +ralloc_free(instr);
> +instr = nir_intrinsic_instr_create(shader, op);
> +instr->src[1] = evaluate_rvalue(offset);
> +instr->const_index[0] = 0;

I forgot to update 'dest' with the new instruction here. The fix is just
an one-liner:

+dest = &instr->dest;

If you don't mind, I will include it as part of this patch before
pushing it to master.

The same happens to a similar code in nir_intrinsic_store_ssbo case. My
plan is to add the same one-line change in the previous patch (nir:
modify the instruction insertion in nir_visitor::visit(ir_call *ir))

Kristian, Do you agree with these changes?

Sam

> + } else {
> +instr->const_index[0] = const_offset->value.u[0];
> + }
> +
> + instr->src[0] = evaluate_rvalue(block);
> +
> + const glsl_type *type = ir->return_deref->var->type;
> + instr->num_components = type->vector_elements;
> +
> + /* Setup destination register */
> + nir_ssa_dest_init(&instr->instr, &instr->dest,
> +   type->vector_elements, NULL);
> +
> + /* Insert the created nir instruction now since in the case of 
> boolean
> +  * result we will need to emit another instruction after it
> +  */
> + nir_instr_insert_after_cf_list(this->cf_node_list, &instr->instr);
> +
> + /*
> +  * In SSBO/UBO's, a true boolean value is any non-zero value, but we
> +  * consider a true boolean to be ~0. Fix this up with a != 0
> +  * comparison.
> +  */
> + if (type->base_type == GLSL_TYPE_BOOL) {
> +nir_load_const_instr *const_zero =
> +   nir_load_const_instr_create(shader, 1);
> +const_zero->value.u[0] = 0;
> +nir_instr_insert_after_cf_list(this->cf_node_list,
> +   &const_zero->instr);
> +
> +nir_alu_instr *load_ssbo_compare =
> +   nir_alu_instr_create(shader, nir_op_ine);
> +load_ssbo_compare->src[0].src.is_ssa = true;
> +load_ssbo_compare->src[0].src.ssa = &instr->dest.ssa;
> +load_ssbo_compare->src[1].src.is_ssa = true;
> +load_ssbo_compare->src[1].src.ssa = &const_zero->def;
> +for (unsigned i = 0; i < type->vector_elements; i++)
> +   load_ssbo_compare->src[1].swizzle[i] = 0;
> +nir_ssa_dest_init(&load_ssbo_compare->instr,
> +  &load_ssbo_compare->dest.dest,
> +  type->vector_elements, NULL);
> +load_ssbo_compare->dest.write_mask = (1 << 
> type->vector_elements) - 1;
> +nir_instr_insert_after_cf_list(this->cf_node_list,
> +   &load_ssbo_compare->instr);
> +dest = &load_ssbo_compare->dest.dest;
> + }
> +
> + break;
> +  }
>default:
>   unreachable("not reached");
>}
> diff --git a/src/glsl/nir/nir_intrinsics.h b/src/glsl/nir/nir_intrinsics.h
> index 53f541b..4a2dd79 100644
> --- a/src/glsl/nir/nir_intrinsics.h
> +++ b/src/glsl/nir/nir_intrinsics.h
> @@ -179,7 +179,7 @@ SYSTEM_VALUE(user_clip_plane, 4, 1) /* const_index[0] is 

Re: [Mesa-dev] [PATCH 1/2] mesa: Use the effective internal format instead for validation

2015-09-24 Thread Eduardo Lima Mitev
On 09/24/2015 01:04 AM, Mark Janes wrote:
> Hi Eduardo,
> 
> I can't get this patch to apply to any recent version of glformats.c
> 
> Can you double check it?
> 

Hi Mark,

Yes, I just rebased it and fixed the conflicts. After Jason's feedback,
I sent the whole series dropping the last patch, because it seems to
just work around 3 dEQP tests that are buggy.

http://lists.freedesktop.org/archives/mesa-dev/2015-September/095191.html

cheers,
Eduardo

> thanks,
> 
> Mark
> 
> Eduardo Lima Mitev  writes:
> 
>> When validating format+type+internalFormat for texture pixel operations
>> on GLES3, the effective internal format should be used if the one
>> specified is an unsized internal format. Page 127, section "3.8 Texturing"
>> of the GLES 3.0.4 spec says:
>>
>> "if internalformat is a base internal format, the effective internal
>>  format is a sized internal format that is derived from the format and
>>  type for internal use by the GL. Table 3.12 specifies the mapping of
>>  format and type to effective internal formats. The effective internal
>>  format is used by the GL for purposes such as texture completeness or
>>  type checks for CopyTex* commands. In these cases, the GL is required
>>  to operate as if the effective internal format was used as the
>>  internalformat when specifying the texture data."
>>
>> v2: Per the spec, Luminance8Alpha8, Luminance8 and Alpha8 should not be
>> considered sized internal formats. Return the corresponding unsize format
>> instead.
>>
>> v4: * Improved comments in
>>   _mesa_es3_effective_internal_format_for_format_and_type().
>> * Splitted patch to separate chunk about reordering of
>>   error_check_subtexture_dimensions() error check, which is not directly
>>   related with this patch.
>>
>> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91582
>> ---
>>  src/mesa/main/glformats.c | 151 
>> ++
>>  1 file changed, 151 insertions(+)
>>
>> diff --git a/src/mesa/main/glformats.c b/src/mesa/main/glformats.c
>> index e4e0fdc..36d59bb 100644
>> --- a/src/mesa/main/glformats.c
>> +++ b/src/mesa/main/glformats.c
>> @@ -2680,6 +2680,127 @@ _mesa_base_tex_format(const struct gl_context *ctx, 
>> GLint internalFormat)
>>  }
>>  
>>  /**
>> + * Returns the effective internal format from a texture format and type.
>> + * This is used by texture image operations internally for validation, when
>> + * the specified internal format is a base (unsized) format.
>> + *
>> + * This method will only return a valid effective internal format if the
>> + * combination of format, type and internal format in base form, is 
>> acceptable.
>> + *
>> + * If a single sized internal format is defined in the spec (OpenGL-ES 
>> 3.0.4) or
>> + * in extensions, to unambiguously correspond to the given base format, then
>> + * that internal format is returned as the effective. Otherwise, if the
>> + * combination is accepted but a single effective format is not defined, the
>> + * passed base format will be returned instead.
>> + *
>> + * \param format the texture format
>> + * \param type the texture type
>> + */
>> +static GLenum
>> +_mesa_es3_effective_internal_format_for_format_and_type(GLenum format,
>> +GLenum type)
>> +{
>> +   switch (type) {
>> +   case GL_UNSIGNED_BYTE:
>> +  switch (format) {
>> +  case GL_RGBA:
>> + return GL_RGBA8;
>> +  case GL_RGB:
>> + return GL_RGB8;
>> +  /* Although LUMINANCE_ALPHA, LUMINANCE and ALPHA appear in table 3.12,
>> +   * (section 3.8 Texturing, page 128 of the OpenGL-ES 3.0.4) as 
>> effective
>> +   * internal formats, they do not correspond to GL constants, so the 
>> base
>> +   * format is returned instead.
>> +   */
>> +  case GL_LUMINANCE_ALPHA:
>> +  case GL_LUMINANCE:
>> +  case GL_ALPHA:
>> + return format;
>> +  }
>> +  break;
>> +
>> +   case GL_UNSIGNED_SHORT_4_4_4_4:
>> +  if (format == GL_RGBA)
>> + return GL_RGBA4;
>> +  break;
>> +
>> +   case GL_UNSIGNED_SHORT_5_5_5_1:
>> +  if (format == GL_RGBA)
>> + return GL_RGB5_A1;
>> +  break;
>> +
>> +   case GL_UNSIGNED_SHORT_5_6_5:
>> +  if (format == GL_RGB)
>> + return GL_RGB565;
>> +  break;
>> +
>> +   /* OES_packed_depth_stencil */
>> +   case GL_UNSIGNED_INT_24_8:
>> +  if (format == GL_DEPTH_STENCIL)
>> + return GL_DEPTH24_STENCIL8;
>> +  break;
>> +
>> +   case GL_FLOAT_32_UNSIGNED_INT_24_8_REV:
>> +  if (format == GL_DEPTH_STENCIL)
>> + return GL_DEPTH32F_STENCIL8;
>> +  break;
>> +
>> +   case GL_UNSIGNED_SHORT:
>> +  if (format == GL_DEPTH_COMPONENT)
>> + return GL_DEPTH_COMPONENT16;
>> +  break;
>> +
>> +   case GL_UNSIGNED_INT:
>> +  /* It can be DEPTH_COMPONENT16 or DEPTH_COMPONENT24, so just return
>> +   * the format.
>> +   */
>> +  if (for

Re: [Mesa-dev] [PATCH 2/2] mesa: Moves up error check for subtexture dimensions

2015-09-24 Thread Eduardo Lima Mitev
On 09/23/2015 11:59 PM, Jason Ekstrand wrote:
> On Tue, Sep 15, 2015 at 3:01 PM, Eduardo Lima Mitev  wrote:
>> On 09/15/2015 09:23 PM, Jason Ekstrand wrote:
>>> On Tue, Sep 15, 2015 at 4:47 AM, Eduardo Lima Mitev  
>>> wrote:
 For consistency and efficiency, the (sub)texture dimension error check
 should go before the validation of format, type and internal format.
>>>
>>> You mentioned in another patch that this fixes a bug or, at the very
>>> least, prevents one.  What bug is that?  I ask because my personal
>>> inclination would be to keep them in the order they were before.
>>> --Jason
>>>
>>
>> These tests fail without this patch, because they fail the validation of
>> format+type+internalFormat (INVALID_OPERATION) which happens before the
>> check for texture dimensions (INVALID_VALUE, the expected error).
>>
>> dEQP-GLES2.functional.negative_api.texture.texsubimage2d_neg_offset
>> dEQP-GLES2.functional.negative_api.texture.texsubimage2d_offset_allowed
>> dEQP-GLES2.functional.negative_api.texture.texsubimage2d_neg_wdt_hgt
>>
>> In all 3, the failing combination is:
>>
>> format = GL_RGB, type = GL_UNSIGNED_BYTE, internalFormat = GL_RGBA
>>
>> Following the spec, the effective internal format should be GL_RGB8 (per
>> table 3.12, page 128 of OpenGL-ES 3.0.4). So, it is actually failing the
>> double-check I added in the previous patch, in which I obtain again the
>> base format from the effective internal format to compare it with the
>> original internal format in base form passed to the function. So GL_RGB8
>> -> GL_RGB which != GL_RGBA.
>>
>> Now I'm puzzled. Is the combination above valid? As I interpret the
>> spec, it isn't. Which means the tests would be wrong, but that's strange
>> too.
> 
> I could easily see the test being wrong here.  As per table 3.2 in the
> ES 3.0 spec, you can't use GL_RGB with GL_UNSIGNED_BYTE and GL_RGBA.
> Just switching the format to GL_RGBA or the internalFormat to GL_RGB8
> in the test would probably fix it.  I think the best path forward
> there would be to make a patch to the test and try to get that
> upstream to dEQP.  I'm not sure what the procedure is there.
> 

Just sent a new version of the series dropping the last patch:

http://lists.freedesktop.org/archives/mesa-dev/2015-September/095191.html

And failed a bug against dEQP (oddly, to AOSP actually) for fixing the
buggy tests, including a reference patch:

https://code.google.com/p/android/issues/detail?id=187348&thanks=187348&ts=1443083425

Thanks a lot for the feedback!

Eduardo

> --Jason
> 
 ---
  src/mesa/main/teximage.c | 12 ++--
  1 file changed, 6 insertions(+), 6 deletions(-)

 diff --git a/src/mesa/main/teximage.c b/src/mesa/main/teximage.c
 index d9453e3..09040d5 100644
 --- a/src/mesa/main/teximage.c
 +++ b/src/mesa/main/teximage.c
 @@ -2117,6 +2117,12 @@ texsubimage_error_check(struct gl_context *ctx, 
 GLuint dimensions,
return GL_TRUE;
 }

 +   if (error_check_subtexture_dimensions(ctx, dimensions,
 + texImage, xoffset, yoffset, 
 zoffset,
 + width, height, depth, 
 callerName)) {
 +  return GL_TRUE;
 +   }
 +
 err = _mesa_error_check_format_and_type(ctx, format, type);
 if (err != GL_NO_ERROR) {
_mesa_error(ctx, err,
 @@ -2145,12 +2151,6 @@ texsubimage_error_check(struct gl_context *ctx, 
 GLuint dimensions,
return GL_TRUE;
 }

 -   if (error_check_subtexture_dimensions(ctx, dimensions,
 - texImage, xoffset, yoffset, 
 zoffset,
 - width, height, depth, 
 callerName)) {
 -  return GL_TRUE;
 -   }
 -
 if (_mesa_is_format_compressed(texImage->TexFormat)) {
if (_mesa_format_no_online_compression(ctx, 
 texImage->InternalFormat)) {
   _mesa_error(ctx, GL_INVALID_OPERATION,
 --
 2.4.6

>>>
>>
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 0/5] Enable up to 24 MRF registers in gen6

2015-09-24 Thread Iago Toral
On Mon, 2015-09-21 at 09:21 -0700, Matt Turner wrote:
> On Mon, Sep 21, 2015 at 8:00 AM, Iago Toral  wrote:
> > On Mon, 2015-09-21 at 07:49 -0700, Kenneth Graunke wrote:
> >> On Monday, September 21, 2015 09:46:24 AM Mark Janes wrote:
> >> > This series hits an assertion on ILK and G45:
> >> >
> >> > src/mesa/drivers/dri/i965/brw_eu_emit.c:150: brw_set_dest: Assertion
> >> > `dest.nr < (devinfo->gen == 6 ? 24 : 16)' failed.
> >> >
> >> > It triggers about 8k piglit assertions on those platforms.  I'm turning
> >> > off testing for G45 and ILK until it is resolved.
> >> >
> >> > https://bugs.freedesktop.org/show_bug.cgi?id=92066
> >> >
> >> > -Mark
> >>
> >> I've pushed a fix for this:
> >>
> >> commit c1070550c289d48ef389aeb8c564d1abd1123ad1
> >> Author: Kenneth Graunke 
> >> Date:   Mon Sep 21 07:42:27 2015 -0700
> >>
> >> i965: Fix MRF register number assertions for compr4.
> >>
> >> compr4 is represented by setting the high bit on the MRF number.
> >> We need to mask it out before sanity checking the register number.
> >>
> >> Fixes ~8000 assert fails on Ironlake and G45.
> >>
> >> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92066
> >> Signed-off-by: Kenneth Graunke 
> >>
> >> Easy mistake...I always forget about compr4.  Hopefully we should be
> >> good now.
> >
> > that was fast, thanks Ken! I was scratching my head over this...
> >
> > BTW, I just noticed that the ILK docs also say that they have 24 MRFs...
> > (volume 4, part 2, 5.3.3 MRF Registers). Assuming that we don't find any
> > other issues, would we want to extend the fix to ILK too?
> 
> The ILK docs are notoriously bad and often contain more information
> about Sandybridge than Ironlake. I suspect that information is
> actually about SNB, though I suppose it couldn't hurt to try on ILK,
> though I'm doubtful.

For the record: I tested this and it does not seem to work in ILK, so
you're probably right.

In the process, however, I noticed that forcing spilling on the vec4
backend with current master in ILK breaks a lot of tests and also
produces at least one GPU hang, so there is something seriously broken
there. I filed this bug:
https://bugs.freedesktop.org/show_bug.cgi?id=92100

I'll try to see what is going on with that...

Iago


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 91889] Planetary Anihilation: Titans display content of other processes buffers

2015-09-24 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=91889

--- Comment #26 from Krzysztof A. Sobiecki  ---
Trace from wayland/Xwayland:
https://drive.google.com/file/d/0B3J0Mg89izcbR1BHWXEwREJhY00/view?usp=sharing

It shows only dark screen in menu.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 25/24] glsl: add std140 layout support for AoA

2015-09-24 Thread Samuel Iglesias Gonsálvez


On 24/09/15 02:16, Timothy Arceri wrote:
> On Wed, 2015-09-23 at 13:21 +0200, Samuel Iglesias Gonsálvez wrote:
>> On 22/09/15 15:30, Samuel Iglesias Gonsálvez wrote:
>>> Reviewed-by: Samuel Iglesias Gonsálvez 
>>>
>>
>> Forgot to say that we need to implement a similar patch for
>> std430_size() function, right?
> 
> Yeah I think so.
> 
>>
>> I will write a follow-up patch adding std430 support for AoA.
> 
> That would be great thanks.
> 
> I assume there will also be other places in the SSBO series that might
> need updates for AoA?
> 

As I am not very familiar with AoA, I will need to check how AoA affects
to SSBO-related code.

I plan to look at it soon and send follow-up patches (if needed). I
might ask you to review my patches as you have implemented AoA.

Thanks,

Sam

> 
>>
>> Sam
>>
>>> On 20/09/15 14:07, Timothy Arceri wrote:
 ---
  I noticed this problem after adding AoA support [1] to Ian's
 random UBO test
  script [2].

  [1] http://patchwork.freedesktop.org/patch/59956/
  [2] http://cgit.freedesktop.org/~idr/piglit/log/?h=ubo-lolz

  src/glsl/glsl_types.cpp | 13 +++--
  1 file changed, 7 insertions(+), 6 deletions(-)

 diff --git a/src/glsl/glsl_types.cpp b/src/glsl/glsl_types.cpp
 index 86f0ea5..952bd0a 100644
 --- a/src/glsl/glsl_types.cpp
 +++ b/src/glsl/glsl_types.cpp
 @@ -1310,8 +1310,8 @@ glsl_type::std140_size(bool row_major)
 const
unsigned int array_len;
  
if (this->is_array()) {
 -   element_type = this->fields.array;
 -   array_len = this->length;
 +   element_type = this->without_array();
 +   array_len = this->arrays_of_arrays_size();
} else {
 element_type = this;
 array_len = 1;
 @@ -1344,12 +1344,13 @@ glsl_type::std140_size(bool row_major)
 const
  *  the array are laid out in order, according to rule
 (9).
  */
 if (this->is_array()) {
 -  if (this->fields.array->is_record()) {
 -   return this->length * this->fields.array
 ->std140_size(row_major);
 +  if (this->without_array()->is_record()) {
 +   return this->arrays_of_arrays_size() *
 +this->without_array()->std140_size(row_major);
} else {
 unsigned element_base_align =
 -  this->fields.array
 ->std140_base_alignment(row_major);
 -   return this->length * MAX2(element_base_align, 16);
 +  this->without_array()
 ->std140_base_alignment(row_major);
 +   return this->arrays_of_arrays_size() *
 MAX2(element_base_align, 16);
}
 }
  

> 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965: Don't tell the hardware about our UAV access.

2015-09-24 Thread Emil Velikov
Hi all,

On 10 September 2015 at 00:30, Mark Janes  wrote:
> Mark Janes  writes:
>
>> Francisco Jerez  writes:
>>
>>> Mark Janes  writes:
>>>
 When I tested this, I saw an intermittent BSW gpu hang.  I haven't been
 able to confirm that it is due to the host-mem-barrier test.

>>> It would probably be useful to know if the hang is due to any of the
>>> image load/store tests or if it's something unrelated.
>>
>> Yes, you are right.  I will take some time tomorrow to catch the
>> specific test.  Given the low rate of gpu hangs on BSW lately, I expect
>> it will be due to image load/store tests.  However I need to confirm it.
>
> I spent time attempting to reproduce hangs with this patch on BSW today.
> I forced 3 hangs, but none of them could be attributed to image
> load/store tests.
>
With the above said, should we consider this as inconclusive fix ? If
so, what is the plan going forward - is this going to the bin or we'll
beat it into shape and merge it ?

Thanks
Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [Mesa-stable] [PATCH] Revert "i965: Stop aux data compare preventing program binary re-use"

2015-09-24 Thread Emil Velikov
Hi guys,

On 2 September 2015 at 20:10, Pohjolainen, Topi
 wrote:
> On Thu, Aug 27, 2015 at 10:05:14AM -0700, Ben Widawsky wrote:
>> On Thu, Aug 27, 2015 at 10:51:59AM +0300, Pohjolainen, Topi wrote:
>> > On Wed, Aug 26, 2015 at 03:46:05PM -0700, Ben Widawsky wrote:
>> > > This reverts commit 1bba29ed403e735ba0bf04ed8aa2e571884fcaaf
>> > > Author: Topi Pohjolainen 
>> > > Date:   Thu Jun 25 14:00:41 2015 +0300
>> > >
>> > > i965: Stop aux data compare preventing program binary re-use
>> > >
>> > > This fixes an intermittent failure in
>> > > piglit.spec.arb_pixel_buffer_object.texsubimage pbo.sklm64 (maybe other
>> > > platforms as well, but it is harder to reproduce). I can usually hit the 
>> > > failure
>> > > within 10 runs of the test. This is a very hairy commit to debug. I'll 
>> > > let Topi
>> > > handle it, or else we should go with the revert. I am open to either. I 
>> > > got
>> > > lucky that Jenkins caught this on a run.
>> > >
>> > > Here was the script I used for bisect:
>> > >
>> > > i=0
>> > > while [ $i -lt 40 ] ; do
>> > >./bin/texsubimage pbo -auto -fbo > /dev/null 2>&1
>> > >[[ $? != 0 ]] && echo fail && exit 1
>> > >((i++))
>> > > done
>> > >
>> > > exit 0
>> >
>> > Should I use different piglit version than the current master? I'm asking
>> > because I get this with both Mesa master and my patch reverted.
>>
>> My piglit was pretty old. I just updated that, but mesa was master as of
>> yesterday (output is below)
>>
>> The one bit of advice I can add is that you make sure your system is updated 
>> to
>> the very latest.
>>
>> >
>> > testrunner@skl-y:~/topi/piglit$ ./bin/texsubimage pbo -auto -fbo
>> > Using test set: Core formats
>> > texsubimage failed
>> >   target: GL_TEXTURE_2D
>> >   internal format: GL_COMPRESSED_RGB_S3TC_DXT1_EXT
>> >   region: 68, 12  32 x 48
>> > texsubimage failed
>> >   target: GL_TEXTURE_2D
>> >   internal format: GL_COMPRESSED_RGBA_S3TC_DXT1_EXT
>> >   region: 0, 28  116 x 20
>> > texsubimage failed
>> >   target: GL_TEXTURE_2D
>> >   internal format: GL_COMPRESSED_RGBA_S3TC_DXT3_EXT
>> >   region: 16, 4  60 x 36
>> > texsubimage failed
>> >   target: GL_TEXTURE_2D
>> >   internal format: GL_COMPRESSED_RGBA_S3TC_DXT5_EXT
>> >   region: 8, 0  104 x 60
>> > Mesa: User error: GL_INVALID_OPERATION in glTexSubImage2D(out of bounds 
>> > PBO access)
>> > PIGLIT: {"result": "fail" }
>>
>> Interesting. It so happens I have a patch that purports to fix some of these
>> things. I did not try this patch myself for this issue.
>> http://patchwork.freedesktop.org/patch/54025/
>>
>> (I've been hanging on to this since I needed to do a bit of research to 
>> address
>> Matt's feedback, specifically regarding render compression).
>>
>> >
>> >
>> > Could you include the console output of the failure you get?
>> >
>>
>> Here are two sample failures (occurred in 7 runs)
>>
>> Using test set: Core formats
>> texsubimage failed
>>   target: GL_TEXTURE_2D
>>   internal format: GL_INTENSITY
>>   region: 27, 1  13 x 61
>> Mesa: User error: GL_INVALID_OPERATION in glTexSubImage3D(out of bounds PBO 
>> access)
>
> I got more and more a feeling that this is timing related. As the current
> logic fails to ever re-use the items in the cache, the comparisons are
> redundant and we can hardcode the search to always fail. This changes the
> runtime dynamics closer to the patch being reverted:
>
> diff --git a/src/mesa/drivers/dri/i965/brw_state_cache.c 
> b/src/mesa/drivers/dri/
> index f5f279c..746fe7b 100644
> --- a/src/mesa/drivers/dri/i965/brw_state_cache.c
> +++ b/src/mesa/drivers/dri/i965/brw_state_cache.c
> @@ -224,6 +224,8 @@ brw_try_upload_using_copy(struct brw_cache *cache,
> continue;
>  }
>
> + return false;
> +
>   if (cache->aux_compare[result_item->cache_id]) {
>  if (!cache->aux_compare[result_item->cache_id](item_aux, aux))
> continue;
>
>
> I can reproduce the error on IVB when I disable fast clears (I noticed that
> the error is not SKL specific - it depends on the fast clears being disabled).
Based on the discussion so far, I take it that it's unlikely that
we'll revert the commit at this point ?
Should we drop this patch for now ?

Thanks
Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [Mesa-stable] [PATCH 1/4] mesa: Expose function to calculate whether a shader image unit is valid.

2015-09-24 Thread Emil Velikov
Hi all,

On 3 September 2015 at 14:53, Francisco Jerez  wrote:
> A future commit will remove all texture object-dependent derived state
> from the image unit struct to make validation unnecessary on texture
> state changes.  Instead of checking gl_image_unit::_Valid drivers will
> be required to call this function when needed to find out whether an
> image unit is in a valid state and whether access from the shader is
> allowed.
>
> Tested-by: Ye Tian 
> CC: "11.0" 

It seems that this series from Francisco fell through the cracks. Can
we get someone to review it please ?

Thanks
Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [Mesa-stable] [PATCH] i965/fs: Fix hang on IVB and VLV with image format mismatch.

2015-09-24 Thread Emil Velikov
Hi Francisco,

On 9 September 2015 at 18:04, Ian Romanick  wrote:
> On 09/09/2015 05:30 AM, Francisco Jerez wrote:
>> Ian Romanick  writes:
>>
>>> On 09/03/2015 06:03 AM, Francisco Jerez wrote:
 IVB and VLV hang sporadically when an untyped surface read or write
 message is used to access a surface of format other than RAW, as may
 happen when there is a mismatch between the format qualifier of the
 image uniform and the format of the actual image bound to the
 pipeline.  According to the spec this condition gives undefined
 results but may not lead to program termination (which is one of the
 possible outcomes of the hang).  Fix it by checking at runtime whether
 the surface is of the right type.

 Fixes the "arb_shader_image_load_store.invalid/format mismatch" piglit
 subtest.

 Reported-by: Mark Janes 
 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91718
 CC: mesa-sta...@lists.freedesktop.org
 ---
  .../drivers/dri/i965/brw_fs_surface_builder.cpp| 42 
 +++---
  1 file changed, 38 insertions(+), 4 deletions(-)

 diff --git a/src/mesa/drivers/dri/i965/brw_fs_surface_builder.cpp 
 b/src/mesa/drivers/dri/i965/brw_fs_surface_builder.cpp
 index f60afc9..57ce87f 100644
 --- a/src/mesa/drivers/dri/i965/brw_fs_surface_builder.cpp
 +++ b/src/mesa/drivers/dri/i965/brw_fs_surface_builder.cpp
 @@ -313,12 +313,42 @@ namespace {

 namespace image_validity {
/**
 +   * Check whether the bound image is suitable for untyped access.
 +   */
 +  brw_predicate
 +  emit_untyped_image_check(const fs_builder &bld, const fs_reg &image,
 +   brw_predicate pred)
 +  {
 + const brw_device_info *devinfo = bld.shader->devinfo;
 + const fs_reg stride = offset(image, bld, 
 BRW_IMAGE_PARAM_STRIDE_OFFSET);
 +
 + if (devinfo->gen == 7 && !devinfo->is_haswell) {
 +/* Check whether the first stride component (i.e. the Bpp 
 value)
 + * is greater than four, what on Gen7 indicates that a 
 surface of
 + * type RAW has been bound for untyped access.  Reading or 
 writing
 + * to a surface of type other than RAW using untyped surface
 + * messages causes a hang on IVB and VLV.
 + */
 +set_predicate(pred,
 +  bld.CMP(bld.null_reg_ud(), stride, fs_reg(4),
 +  BRW_CONDITIONAL_G));
 +
 +return BRW_PREDICATE_NORMAL;
 + } else {
 +/* More recent generations handle the format mismatch
 + * gracefully.
 + */
 +return pred;
 + }
 +  }
 +
 +  /**
 * Check whether there is an image bound at the given index and 
 write
 * the comparison result to f0.0.  Returns an appropriate 
 predication
 * mode to use on subsequent image operations.
 */
brw_predicate
 -  emit_surface_check(const fs_builder &bld, const fs_reg &image)
 +  emit_typed_atomic_check(const fs_builder &bld, const fs_reg &image)
>>>
>>> This change seems spurious (and also reasonable).
>>>
>> The problem is that this patch introduces a new kind of surface check
>> applicable to untyped surface reads and writes only, so it would have
>> been confusing to keep the other surface check which is applicable to
>> atomics only with its previous rather unspecific name.
>>
{
   const brw_device_info *devinfo = bld.shader->devinfo;
   const fs_reg size = offset(image, bld, 
 BRW_IMAGE_PARAM_SIZE_OFFSET);
 @@ -895,7 +925,9 @@ namespace brw {
   * surface read on the result,
   */
  const brw_predicate pred =
 -   emit_bounds_check(bld, image, saddr, dims);
 +   emit_untyped_image_check(bld, image,
 +emit_bounds_check(bld, image,
 +  saddr, dims));
>>>
>>> These appear to be the only two users of emit_bounds_check... shouldn't
>>> the bounds still be tested?
>>>
>> Yes, they are still.
>
> Ah... I completely missed that emit_bounds_check was moved into a
> parameter of the call to emit_untyped_image_check.
>
> Reviewed-by: Ian Romanick 
>
Considering Ian's r-b, are there any obstacles why this hasn't landed
in master yet ?

Thanks
Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [Mesa-stable] [PATCH] i965/skl: Use larger URB size where available.

2015-09-24 Thread Emil Velikov
Hi Ben,

On 11 September 2015 at 20:15, Ben Widawsky  wrote:
> On Fri, Sep 11, 2015 at 12:12:15PM -0700, Jordan Justen wrote:
>> On 2015-09-10 16:59:12, Ben Widawsky wrote:
>> > All SKL SKUs except the lowest one which has half the L3 size actually 
>> > have 384K
>>
>> These commit message lines seem to wrap a bit long. This first line is
>> 80 characters.
>>
>> > of URB per slice.
>> >
>> > For once, I can explain how this mistake was made and how it was missed in
>> > review...  Historically when we enable a platform and put the production 
>> > sizes,
>> > you can simply look at the "smallest" SKU and see what its URB size is 
>> > (and we
>> > assumed it was the 1 slice variant). Since on newer platforms the URB 
>> > sizes are
>> > scaled automatically by HW, this was sufficient. On SKL, this is a bit 
>> > different
>> > as the lowest SKU actually has half of the L3 fused off. GT2 is the 1 
>> > slice (not
>> > GT1) variant and it has 384K.
>> >
>> > There are no Jenkins tests fixed (or regressions) and we don't expect any 
>> > fixes
>> > here because you can always run with less URB size - this potentially 
>> > improves
>> > performance.
>>
>> It would be nice if we were able to find a benchmark that improves
>> from this change. If we can't then maybe we should just remove this
>> paragraph. It seems like the right change regardless.
>>
>> Reviewed-by: Jordan Justen 
>
> I think what I'd like to do is run the perf data to make sure there are at 
> least
> no regressions since I am proposing it for stable... Maybe if I don't get 
> around
> to that before the next stable release, we'll bail on it for 10.6
>
Did you get the chance to give this a perf test ? I don't see the
commit landing in master, regardless of the mesa-stable tag.

Thanks
Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] New stable-branch 11.0 candidate pushed

2015-09-24 Thread Emil Velikov
Hello list,

The candidate for the Mesa 11.0.1 is now available. Currently we have:
 - 23 queued
 - 29 nominated (outstanding)
 - and 0 rejected/obsolete patches

The present queue consists mostly of nouveau and i965 fixes, although we
have the odd llvmpipe (big endian) and gbm bugfix.


Note: we do have a sizeable list of nominated patches that have not been
queued. Many of these are lacking review while the odd ones have gone
stale despite the review and/or feedback given.


Take a look at section "Mesa stable queue" for more information.

Testing
---
The following results are against piglit 4b6848c131c.


Changes - classic i965(snb)
---
None.


Changes - swrast classic

None.


Changes - gallium softpipe
--
None.


Changes - gallium llvmpipe (LLVM 3.6.2)
---
None.


Testing reports/general approval

Any testing reports (or general approval of the state of the branch)
will be greatly appreciated.


Trivial merge conflicts
---
None.


The plan is to have 11.0.1 this Friday (25th of September) or shortly after.

If you have any questions or comments that you would like to share
before the release, please go ahead.


Cheers,
Emil


Mesa stable queue
-

Nominated (29)
==

Alejandro Piñeiro (2):
  i965/vec4: check writemask when bailing out at register coalesce
  i965/vec4: fill src_reg type using the constructor type parameter

Anuj Phogat (1):
  i965: Abort tiled_memcpy path for TexSubImage in case of transfer 
operations

Ben Widawsky (2):
  Revert "i965: Stop aux data compare preventing program binary re-use"
  i965/skl: Use larger URB size where available.

Benjamin Bellec (2):
  gallium/hud: temperature is displayed with a percentage symbol, remove it
  gallium/hud: display the Celsius temperature unit

Boyan Ding (1):
  i915: Add XRGB format to intel_screen_make_configs

Brian Paul (1):
  configure: don't try to build gallium DRI drivers if --disable-dri is set

Eduardo Lima Mitev (2):
  mesa: Fix order of format+type and internal format checks for 
glTexImageXD ops
  mesa: Move _mesa_base_tex_format() from teximage to glformats files

Francisco Jerez (6):
  i965: Don't tell the hardware about our UAV access.
  mesa: Expose function to calculate whether a shader image unit is valid.
  mesa: Skip redundant texture completeness checking during image 
validation.
  i965: Use _mesa_is_image_unit_valid() instead of gl_image_unit::_Valid.
  mesa: Get rid of texture-dependent image unit derived state.
  i965/fs: Fix hang on IVB and VLV with image format mismatch.

Ian Romanick (1):
  meta: Handle array textures in scaled MSAA blits

Jean-Sébastien Pédron (1):
  ralloc: Use __attribute__((destructor)) instead of atexit(3)

Kristian Høgsberg Kristensen (1):
  i965: Respect stride and subreg_offset for ATTR registers

Leo Liu (1):
  radeon/vce: fix vui time_scale zero error

Matthew Waters (1):
  egl: rework handling EGL_CONTEXT_FLAGS

Matt Turner (1)
  glsl: Expose gl_MaxTess{Control, Evaluation}AtomicCounters.

Rob Clark (1):
  xa: add xa_surface_from_handle2

Roland Scheidegger (1):
  mesa: fix mipmap generation for immutable, compressed textures

Tom Stellard (4):
  clover: Call clBuildProgram() notification function when build completes 
v2
  gallium/drivers: Add threadsafe wrappers for pipe_context v2
  clover: Use threadsafe wrappers for pipe_context v2
  clover: Properly initialize LLVM targets when linking with component libs



Queued (23)
===

Antia Puentes (2):
  i965/vec4: Fix saturation errors when coalescing registers
  i965/vec4_nir: Load constants as integers

Anuj Phogat (1):
  meta: Abort meta pbo path if TexSubImage need signed unsigned conversion

Emil Velikov (1):
  docs: add sha256 checksums for 11.0.0

Iago Toral Quiroga (1):
  mesa: Fix GL_FRAMEBUFFER_ATTACHMENT_OBJECT_TYPE for default framebuffer.

Ian Romanick (5):
  t_dd_dmatmp: Make "count" actually be the count
  t_dd_dmatmp: Clean up improper code formatting from previous patch
  t_dd_dmatmp: Use '& 3' instead of '% 4' everywhere
  t_dd_dmatmp: Pull out common 'count -= count & 3' code
  t_dd_dmatmp: Use addition instead of subtraction in loop bounds

Ilia Mirkin (6):
  st/mesa: avoid integer overflows with buffers >= 512MB
  nv50, nvc0: fix max texture buffer size to 128M elements
  freedreno/a3xx: fix blending of L8 format
  nv50,nvc0: detect underlying resource changes and update tic
  nv50,nvc0: flush texture cache in presence of coherent bufs
  radeonsi: load fmask ptr relative to the resources array

Jason Ekstrand (2):
  nir: Fix a bunch of ralloc parenting errors
  i965/vec4: Don't reswizzle hardware registers

Jeremy Huddleston (1):
 

Re: [Mesa-dev] [PATCH 1/2] st/vaapi: fix vaapi VC-1 simple/main corruption

2015-09-24 Thread Emil Velikov
On 23 September 2015 at 17:03, Christian König  wrote:
> On 23.09.2015 17:34, Matt Turner wrote:
>>
>> Should these patches go to the stable branch as well?
>
>
> Good point, most likely yes.
>
Please throw in a line or two of commit message and a bugzilla link,
if there is one.

Cc: "10.6 11.0" 

Thank you.
Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] radeon/uvd: fix VC-1 simple/main profile decode

2015-09-24 Thread Emil Velikov
Hi guys,

A couple of suggestions - feel free to take or leave them.

On 23 September 2015 at 09:11, Christian König  wrote:
> From: Boyuan Zhang 
>
A line or two lines of commit message would be appreciated.

> Signed-off-by: Boyuan Zhang 
> Reviewed-by: Christian König 
If you think that one can live with the intermittent glitches Grigori
mentioned, please throw in

Cc: "10.6 11.0" 

> ---
>  src/gallium/drivers/radeon/radeon_uvd.c   | 6 ++
>  src/gallium/drivers/radeon/radeon_video.c | 5 +
>  2 files changed, 7 insertions(+), 4 deletions(-)
>
> diff --git a/src/gallium/drivers/radeon/radeon_uvd.c 
> b/src/gallium/drivers/radeon/radeon_uvd.c
> index 81f3f45..9edb511 100644
> --- a/src/gallium/drivers/radeon/radeon_uvd.c
> +++ b/src/gallium/drivers/radeon/radeon_uvd.c
> @@ -940,6 +940,12 @@ static void ruvd_end_frame(struct pipe_video_codec 
> *decoder,
> dec->msg->body.decode.width_in_samples = dec->base.width;
> dec->msg->body.decode.height_in_samples = dec->base.height;
>
> +   if ((picture->profile == PIPE_VIDEO_PROFILE_VC1_SIMPLE)
> +|| (picture->profile == PIPE_VIDEO_PROFILE_VC1_MAIN)) {
Too many brackets, place || on previous line (as elsewhere in the file)

Cheers,
Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/8] radeonsi: Enable DCC.

2015-09-24 Thread Bas Nieuwenhuizen
Hi Marek,

Thanks for the review and my apologies, it seems I underestimated the 
potential for regressions in this series.

See below for some comments in reaction to the review.

For a v2 is it preferred that I rebase it to master, or keep basing it on the 
old version? There are some function renames that impact this series.

On Thursday, September 24, 2015 01:36:31 AM Marek Olšák wrote:
> On Fri, Sep 4, 2015 at 9:47 PM, Bas Nieuwenhuizen
> 
>  wrote:
> > The flags to be enabled in the control registers have been taken from
> > Catalyst traces.
> > 
> > Signed-off-by: Bas Nieuwenhuizen 
> > ---
> > 
> >  src/gallium/drivers/radeon/r600_pipe_common.h |  1 +
> >  src/gallium/drivers/radeon/r600_texture.c |  2 ++
> >  src/gallium/drivers/radeon/r600d_common.h |  1 +
> >  src/gallium/drivers/radeonsi/si_blit.c| 16 
> >  src/gallium/drivers/radeonsi/si_pipe.c|  2 ++
> >  src/gallium/drivers/radeonsi/si_pipe.h|  1 +
> >  src/gallium/drivers/radeonsi/si_state.c   | 27
> >  +++ src/gallium/drivers/radeonsi/sid.h  
> >   |  1 +
> >  8 files changed, 43 insertions(+), 8 deletions(-)
> > 
> > diff --git a/src/gallium/drivers/radeon/r600_pipe_common.h
> > b/src/gallium/drivers/radeon/r600_pipe_common.h index f05318f..dac270e
> > 100644
> > --- a/src/gallium/drivers/radeon/r600_pipe_common.h
> > +++ b/src/gallium/drivers/radeon/r600_pipe_common.h
> > @@ -244,6 +244,7 @@ struct r600_surface {
> > 
> > unsigned cb_color_dim;  /* EG only */
> > unsigned cb_color_pitch;/* EG and later */
> > unsigned cb_color_slice;/* EG and later */
> > 
> > +   unsigned cb_dcc_base;   /* VI and later */
> > 
> > unsigned cb_color_attrib;   /* EG and later */
> > unsigned cb_dcc_control;/* VI and later */
> > unsigned cb_color_fmask;/* CB_COLORn_FMASK (EG and later)
> > or CB_COLORn_FRAG (r600) */> 
> > diff --git a/src/gallium/drivers/radeon/r600_texture.c
> > b/src/gallium/drivers/radeon/r600_texture.c index 46e735e..017f5e7 100644
> > --- a/src/gallium/drivers/radeon/r600_texture.c
> > +++ b/src/gallium/drivers/radeon/r600_texture.c
> > @@ -495,6 +495,8 @@ static void vi_texture_alloc_dcc_separate(struct
> > r600_common_screen *rscreen,> 
> > r600_screen_clear_buffer(rscreen, &rtex->dcc_buffer->b.b, 0,
> > rtex->surface.dcc_size,> 
> >  0x, true);
> > 
> > +
> > +   rtex->cb_color_info |= VI_S_028C70_DCC_ENABLE(1);
> > 
> >  }
> >  
> >  static unsigned r600_texture_get_htile_size(struct r600_common_screen
> >  *rscreen,> 
> > diff --git a/src/gallium/drivers/radeon/r600d_common.h
> > b/src/gallium/drivers/radeon/r600d_common.h index 115042d..a3d182c 100644
> > --- a/src/gallium/drivers/radeon/r600d_common.h
> > +++ b/src/gallium/drivers/radeon/r600d_common.h
> > @@ -202,6 +202,7 @@
> > 
> >  #define   EG_S_028C70_FAST_CLEAR(x)   (((x) & 0x1) <<
> >  17) #define   SI_S_028C70_FAST_CLEAR(x)   (((x) &
> >  0x1) << 13)> 
> > +#define   VI_S_028C70_DCC_ENABLE(x)   (((x) & 0x1) <<
> > 28)> 
> >  /*CIK+*/
> >  #define R_0300FC_CP_STRMOUT_CNTL0x0300FC
> > 
> > diff --git a/src/gallium/drivers/radeonsi/si_blit.c
> > b/src/gallium/drivers/radeonsi/si_blit.c index 13bb457..98913e5 100644
> > --- a/src/gallium/drivers/radeonsi/si_blit.c
> > +++ b/src/gallium/drivers/radeonsi/si_blit.c
> > @@ -264,7 +264,7 @@ static void si_blit_decompress_color(struct
> > pipe_context *ctx,> 
> > return;
> > 
> > for (level = first_level; level <= last_level; level++) {
> > 
> > -   if (!(rtex->dirty_level_mask & (1 << level)))
> > +   if (!(rtex->dirty_level_mask & (1 << level)) &&
> > !(rtex->dcc_compressed_level_mask & (1 << level)))> 
> > continue;
> > 
> > /* The smaller the mipmap level, the less layers there are
> > 
> > @@ -274,6 +274,7 @@ static void si_blit_decompress_color(struct
> > pipe_context *ctx,> 
> > for (layer = first_layer; layer <= checked_last_layer;
> > layer++) {
> > 
> > struct pipe_surface *cbsurf, surf_tmpl;
> > 
> > +   void * custom_blend;
> > 
> > surf_tmpl.format = rtex->resource.b.b.format;
> > surf_tmpl.u.tex.level = level;
> > 
> > @@ -281,10 +282,17 @@ static void si_blit_decompress_color(struct
> > pipe_context *ctx,> 
> > surf_tmpl.u.tex.last_layer = layer;
> > cbsurf = ctx->create_surface(ctx,
> > &rtex->resource.b.b, &surf_tmpl);
> > 
> > +   if(rtex->fmask.size) {
> > +   custom_blend =
> > sctx->custom_blend_decompre

Re: [Mesa-dev] [PATCH v2] egl/dri2: enable EGL_KHR_gl_colorspace for swrast

2015-09-24 Thread Emil Velikov
Hi all,

On 15 September 2015 at 17:42, Emil Velikov  wrote:
> No driver changes needed for softpipe/llvmpipe - things just work.
>
> v2: Whitespace fixes.
>
> Signed-off-by: Emil Velikov 
> Reviewed-by: Boyan Ding 

Courtesy ping. If we don't get any objections/comments, over the next
few days I will be pushing these patches.

As mentioned, things just work and there are no regressions (according
to a full piglit run)

Thanks
Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 3/4] main/get: make KHR_debug enums available everywhere

2015-09-24 Thread Emil Velikov
Hi all,

On 16 September 2015 at 16:38, Emil Velikov  wrote:
> From: Matthew Waters 
>
> Move all the enums but CONTEXT_FLAGS. The spec seems quite explicit
> about the latter (wrt OpenGL ES)
>
> "In OpenGL ES versions prior to and including ES 3.1 there is no
> CONTEXT_FLAGS state and therefore the CONTEXT_FLAG_DEBUG_BIT cannot
> be queried."
>
> v2 [Emil Velikov] Rebase.
> v3 [Emil Veliokv] Drop the CONTEXT_FLAGS hunk - not applicable for GLES
>
> Signed-off-by: Matthew Waters 
> Signed-off-by: Emil Velikov 

With the only (?) controversial piece of the series gone, are people
happy with the patches ?
If you think something is fishy and/or you'd like to take a look but
you're short on time atm, please let me know.

Thanks
Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 5/8] radeonsi: Invalidate the L2 cache on framebuffer change.

2015-09-24 Thread Bas Nieuwenhuizen
On Thursday, September 24, 2015 02:22:23 AM Marek Olšák wrote:
> On Fri, Sep 4, 2015 at 9:47 PM, Bas Nieuwenhuizen
> 
>  wrote:
> > This is needed by DCC when using compressed textures.
> > 
> > Signed-off-by: Bas Nieuwenhuizen 
> > ---
> > 
> >  src/gallium/drivers/radeonsi/si_state.c | 2 ++
> >  1 file changed, 2 insertions(+)
> > 
> > diff --git a/src/gallium/drivers/radeonsi/si_state.c
> > b/src/gallium/drivers/radeonsi/si_state.c index 5c9c866..3e11922 100644
> > --- a/src/gallium/drivers/radeonsi/si_state.c
> > +++ b/src/gallium/drivers/radeonsi/si_state.c
> > @@ -2114,6 +2114,8 @@ static void si_set_framebuffer_state(struct
> > pipe_context *ctx,> 
> >  */
> > 
> > sctx->b.flags |= SI_CONTEXT_INV_TC_L1 |
> > 
> >  SI_CONTEXT_INV_TC_L2 |
> > 
> > +SI_CONTEXT_PS_PARTIAL_FLUSH |
> > +SI_CONTEXT_FLUSH_WITH_INV_L2 |
> 
> This may work, but I think it does too much.
> 
> The only hardware requirement I know of is that
> CACHE_FLUSH_AND_INV_EVENT must be emitted before and after
> DCC_DECOMPRESS. Is there any case where this is not sufficient?
> 
> Marek

Changing how the fast clear(i.e. moving away from the CMASK) works seems to 
remove the need for SI_CONTEXT_PS_PARTIAL_FLUSH, so I am going to retest what 
cache flushes are actually needed. 

Bas
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [Mesa-stable] [PATCH] ralloc: Use __attribute__((destructor)) instead of atexit(3)

2015-09-24 Thread Emil Velikov
On 7 September 2015 at 10:16, Jean-Sébastien Pédron
 wrote:
> On 06.09.2015 23:40, Emil Velikov wrote:
>> Hi Jean-Sébastien,
>
> Hi!
>
>> On 3 September 2015 at 19:07, Jean-Sébastien Pédron
>> We have 4(5) users of atexit() - EGL, gallium trace driver, core mesa
>> and util/ralloc. The latter of which is used almost everywhere in
>> mesa. So a bit I'm confused how you hit this only with OpenCL :-\
>> Perhaps the others should be updated as well ?
>
> That's what I'm wondering too.
>
> I didn't read the code thoroughly; my guess is that in the case of the
> OpenCL ICD loader, the loader explicitely dlclose() the backend. This
> must not be the case with OpenGL applications.
>
> I can update the patch to change all uses of atexit(3) if you are ok
> with it.
>
Upon a closer look it seems that
 - glx: both dri(1) and drisw dlclose the dri module.
 - gbm: doesn't use dlclose
 - egl/dri2: uses dlclose at eglTerminate time.
 - (some) opencl loaders use dlclose
 - the vdpau loader/wrapper uses a __attribute__((destructor)),  in
which it dlcloses the driver.
 - vaapi - uses dlclose at vaTerminate and in the error paths
 - omx ?

Long story short - although people haven't complained (yet) it might
be nice to get all cases in mesa sorted.

>>> __attribute__((destructor)) fixes the problem because such handlers are
>>> called when a library is unloaded and when the program exits.
>> Considering that atexit() (reportedly) works for Linux/Glibc, Solaris,
>> Android and Windows perhaps we should consider this as a workaround
>> for FreeBSD (and other *BSD) systems ?
>
> On one hand, my concern is this behavior is an extension to the standard
> and it could break for someone else. On the other hand, I don't like the
> standard behavior and would be glad to see FreeBSD implement this extension.
>
> I still think relying on this atexit(3) extension is fragile and I don't
> consider the use of the attribute a workaround.
>
>> Considering the multiple users (mentioned above) should one set
>> priority for the destructors ?
>
> You are right, I didn't think of the calling order of the destructors.
>
Considering the way the discussion is going, perhaps it's better to
add the destructor as a {Free,*}BSD only option for now ?

Bonus points for the "bootstrap"  suggestion from Erik, and checking
if gcc 4.2 (min. mesa requirement) supports ctor/dtor priorities and
converting everyone to use them :-)

We're quite early in the devel cycle, so the latter patch will get
some testing. And if it does not work out, we can revert it.

Thanks for bringing this topic up.
-Emil

P.S. I will be revisiting the loader/libudev sometime soon,
effectively pushing the chaos into libdrm. You should get a lovely
warning (missing implementation of XXX) when compiling libdrm, for
which patches will be gladly accepted :-)
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [Mesa-stable] [PATCH] ralloc: Use __attribute__((destructor)) instead of atexit(3)

2015-09-24 Thread Jean-Sébastien Pédron
On 24.09.2015 14:55, Emil Velikov wrote:
 __attribute__((destructor)) fixes the problem because such handlers are
 called when a library is unloaded and when the program exits.
>>> Considering that atexit() (reportedly) works for Linux/Glibc, Solaris,
>>> Android and Windows perhaps we should consider this as a workaround
>>> for FreeBSD (and other *BSD) systems ?
>>
>> On one hand, my concern is this behavior is an extension to the standard
>> and it could break for someone else. On the other hand, I don't like the
>> standard behavior and would be glad to see FreeBSD implement this extension.
>>
>> I still think relying on this atexit(3) extension is fragile and I don't
>> consider the use of the attribute a workaround.
>>
>>> Considering the multiple users (mentioned above) should one set
>>> priority for the destructors ?
>>
>> You are right, I didn't think of the calling order of the destructors.
>
> Considering the way the discussion is going, perhaps it's better to
> add the destructor as a {Free,*}BSD only option for now ?

I will study the code more, especially regarding calling order of
destructors. If I don't find a satisfying solution, then a FreeBSD
specific solution is fine for the short term.

> We're quite early in the devel cycle, so the latter patch will get
> some testing. And if it does not work out, we can revert it.

Ok. I'm busy at the moment but will eventually come back to this later
in October.

> P.S. I will be revisiting the loader/libudev sometime soon,
> effectively pushing the chaos into libdrm. You should get a lovely
> warning (missing implementation of XXX) when compiling libdrm, for
> which patches will be gladly accepted :-)

Noted, thanks for the heads-up!

-- 
Jean-Sébastien Pédron



signature.asc
Description: OpenPGP digital signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2] egl/dri2: enable EGL_KHR_gl_colorspace for swrast

2015-09-24 Thread Alex Deucher
On Thu, Sep 24, 2015 at 8:16 AM, Emil Velikov  wrote:
> Hi all,
>
> On 15 September 2015 at 17:42, Emil Velikov  wrote:
>> No driver changes needed for softpipe/llvmpipe - things just work.
>>
>> v2: Whitespace fixes.
>>
>> Signed-off-by: Emil Velikov 
>> Reviewed-by: Boyan Ding 
>
> Courtesy ping. If we don't get any objections/comments, over the next
> few days I will be pushing these patches.
>
> As mentioned, things just work and there are no regressions (according
> to a full piglit run)

They look ok to me.

Acked-by: Alex Deucher 

>
> Thanks
> Emil
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] nir: Implement __intrinsic_load_ssbo

2015-09-24 Thread Kristian Høgsberg
Yes, that's fine - we modify instr so we need to update dest. 

Kristian

> On Sep 24, 2015, at 1:59 AM, Samuel Iglesias Gonsálvez  
> wrote:
> 
> 
> 
>> On 23/09/15 12:07, Samuel Iglesias Gonsalvez wrote:
>> From: Iago Toral Quiroga 
>> 
>> v2:
>> - Fix ssbo loads with boolean variables.
>> 
>> v3:
>> - Simplify the changes (Kristian)
>> 
>> Reviewed-by: Connor Abbott 
>> ---
>> src/glsl/nir/glsl_to_nir.cpp| 66 
>> +
>> src/glsl/nir/nir_intrinsics.h   |  2 +-
>> src/glsl/nir/nir_lower_phis_to_scalar.c |  2 +
>> 3 files changed, 69 insertions(+), 1 deletion(-)
>> 
>> diff --git a/src/glsl/nir/glsl_to_nir.cpp b/src/glsl/nir/glsl_to_nir.cpp
>> index 93d5d23..906797e 100644
>> --- a/src/glsl/nir/glsl_to_nir.cpp
>> +++ b/src/glsl/nir/glsl_to_nir.cpp
>> @@ -651,6 +651,8 @@ nir_visitor::visit(ir_call *ir)
>>  op = nir_intrinsic_image_samples;
>>   } else if (strcmp(ir->callee_name(), "__intrinsic_store_ssbo") == 0) {
>>  op = nir_intrinsic_store_ssbo;
>> +  } else if (strcmp(ir->callee_name(), "__intrinsic_load_ssbo") == 0) {
>> + op = nir_intrinsic_load_ssbo;
>>   } else {
>>  unreachable("not reached");
>>   }
>> @@ -790,6 +792,70 @@ nir_visitor::visit(ir_call *ir)
>>  nir_instr_insert_after_cf_list(this->cf_node_list, &instr->instr);
>>  break;
>>   }
>> +  case nir_intrinsic_load_ssbo: {
>> + exec_node *param = ir->actual_parameters.get_head();
>> + ir_rvalue *block = ((ir_instruction *)param)->as_rvalue();
>> +
>> + param = param->get_next();
>> + ir_rvalue *offset = ((ir_instruction *)param)->as_rvalue();
>> +
>> + /* Check if we need the indirect version */
>> + ir_constant *const_offset = offset->as_constant();
>> + if (!const_offset) {
>> +op = nir_intrinsic_load_ssbo_indirect;
>> +ralloc_free(instr);
>> +instr = nir_intrinsic_instr_create(shader, op);
>> +instr->src[1] = evaluate_rvalue(offset);
>> +instr->const_index[0] = 0;
> 
> I forgot to update 'dest' with the new instruction here. The fix is just
> an one-liner:
> 
> +dest = &instr->dest;
> 
> If you don't mind, I will include it as part of this patch before
> pushing it to master.
> 
> The same happens to a similar code in nir_intrinsic_store_ssbo case. My
> plan is to add the same one-line change in the previous patch (nir:
> modify the instruction insertion in nir_visitor::visit(ir_call *ir))
> 
> Kristian, Do you agree with these changes?
> 
> Sam
> 
>> + } else {
>> +instr->const_index[0] = const_offset->value.u[0];
>> + }
>> +
>> + instr->src[0] = evaluate_rvalue(block);
>> +
>> + const glsl_type *type = ir->return_deref->var->type;
>> + instr->num_components = type->vector_elements;
>> +
>> + /* Setup destination register */
>> + nir_ssa_dest_init(&instr->instr, &instr->dest,
>> +   type->vector_elements, NULL);
>> +
>> + /* Insert the created nir instruction now since in the case of 
>> boolean
>> +  * result we will need to emit another instruction after it
>> +  */
>> + nir_instr_insert_after_cf_list(this->cf_node_list, &instr->instr);
>> +
>> + /*
>> +  * In SSBO/UBO's, a true boolean value is any non-zero value, but 
>> we
>> +  * consider a true boolean to be ~0. Fix this up with a != 0
>> +  * comparison.
>> +  */
>> + if (type->base_type == GLSL_TYPE_BOOL) {
>> +nir_load_const_instr *const_zero =
>> +   nir_load_const_instr_create(shader, 1);
>> +const_zero->value.u[0] = 0;
>> +nir_instr_insert_after_cf_list(this->cf_node_list,
>> +   &const_zero->instr);
>> +
>> +nir_alu_instr *load_ssbo_compare =
>> +   nir_alu_instr_create(shader, nir_op_ine);
>> +load_ssbo_compare->src[0].src.is_ssa = true;
>> +load_ssbo_compare->src[0].src.ssa = &instr->dest.ssa;
>> +load_ssbo_compare->src[1].src.is_ssa = true;
>> +load_ssbo_compare->src[1].src.ssa = &const_zero->def;
>> +for (unsigned i = 0; i < type->vector_elements; i++)
>> +   load_ssbo_compare->src[1].swizzle[i] = 0;
>> +nir_ssa_dest_init(&load_ssbo_compare->instr,
>> +  &load_ssbo_compare->dest.dest,
>> +  type->vector_elements, NULL);
>> +load_ssbo_compare->dest.write_mask = (1 << 
>> type->vector_elements) - 1;
>> +nir_instr_insert_after_cf_list(this->cf_node_list,
>> +   &load_ssbo_compare->instr);
>> +dest = &load_ssbo_compare->dest.dest;
>> + }
>> +
>> + break;
>> +  }
>>   default:
>>  unreachable("not reached");

Re: [Mesa-dev] mesa: rework Driver.CopyImageSubData() and related code (v3)

2015-09-24 Thread Brian Paul

On 09/23/2015 07:03 PM, Nick Sarnie wrote:

Hi,

The game still works for me with your patch.

Tested-by: Nick Sarnie mailto:commendsar...@gmail.com>>


Thanks for testing, Nick and Kai!
I'll push the patch soon.

-Brian

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v5 70/70] glsl: Mark as active all elements of shared/std140 block arrays

2015-09-24 Thread Samuel Iglesias Gonsálvez


On 24/09/15 02:04, Timothy Arceri wrote:
> On Wed, 2015-09-23 at 17:02 +0200, Antía Puentes wrote:
>> Hi! Timothy,
>> thanks for your review.
>>
>> Seeing your patch in:
>> http://lists.freedesktop.org/archives/mesa-dev/2015-September/095070.
>> html
>>
>> The condition in my patch:
>> (var->type->interface_packing != GLSL_INTERFACE_PACKING_PACKED)
>> should also be changed to:
>> (var->get_interface_type()->interface_packing !=
>> GLSL_INTERFACE_PACKING_PACKED)
>>
>> Right?
> 
> You don't really need to do that because var is already the variable
> not the array and we know its the interface not an interface member
> because of the conditions above your change.
> 

'var->type' refers to the array of interface block here but the
interface packing information is hold inside the interface block type.

Because of that, we need to use
var->get_interface_type()->interface_packing to get the valid
information for the condition, so we don't do the wrong thing.

I even ran a GDB session locally with one of my programs to be sure
before replying :-)

If you agree, we are going to do that change before pushing the patch to
master.

Sam

[0]
http://lists.freedesktop.org/archives/mesa-dev/2015-September/094009.html

> 
>>
>> On mié, 2015-09-23 at 23:44 +1000, Timothy Arceri wrote:
>>> On Wed, 2015-09-23 at 12:25 +0200, Samuel Iglesias Gonsálvez wrote:

 On 21/09/15 13:11, Samuel Iglesias Gonsálvez wrote:
> On 21/09/15 09:41, Tapani Pälli wrote:
>> Seems like a nice fix, takes ES3 CTS failures from 116 to 64
>> on
>> Haswell
>> (most failing tests are with ubos).
>>
>> Tested-by: Tapani Pälli 
>>
>
> OK thanks!
>
>> This is individual patch not related to just SSBOs, maybe
>> this
>> could be
>> pushed separately from the rest?
>>
>
> Yes, this patch can be pushed separately from the rest of
> patches
> of the
> series: we just need an R-b, as usual.
>
> We are going to discuss the proper solution with Timothy [0],
> as he
> found that we are not covering one case.
>

 Timothy has sent a patch fixing the packed case [0] and he
 developed
 a
 piglit test for it [1].

 We are going to wait for an R-b of Antía's patch before pushing
 it.
>>>
>>> I sent a reply to this same email saying almost the same thing but
>>> it
>>> seems to have gone missing.
>>>
>>> Anyway I also sent my r-b, this patch looks good to me.
>>>
>>> Reviewed-by: Timothy Arceri 
>>>
>>>

 Sam

 [0]
 http://lists.freedesktop.org/archives/mesa-dev/2015-September/095
 070.
 html
 [1] 
 http://lists.freedesktop.org/archives/piglit/2015-September/01724
 7.html

> Sam
>
> [0] https://bugs.freedesktop.org/show_bug.cgi?id=83508
>
>
>> // Tapani
>>
>> On 09/10/2015 04:36 PM, Iago Toral Quiroga wrote:
>>> From: Antia Puentes 
>>>
>>> Commit 1ca25ab (glsl: Do not eliminate 'shared' or 'std140'
>>> blocks
>>> or block members) considered as active 'shared' and
>>> 'std140'
>>> uniform
>>> blocks and uniform block arrays, but did not include the
>>> block
>>> array
>>> elements. Because of that, it was possible to have an
>>> active
>>> uniform
>>> block array without any elements marked as used, making the
>>> assertion
>>> ((b->num_array_elements > 0) == b->type->is_array())
>>> in link_uniform_blocks() fail.
>>>
>>> Fixes the following 5 dEQP tests:
>>>
>>>   * dEQP
>>> -GLES3.functional.ubo.random.nested_structs_instance_arrays
>>> .18
>>>   * dEQP
>>> -GLES3.functional.ubo.random.nested_structs_instance_arrays
>>> .24
>>>   *
>>> dEQP
>>> -GLES3.functional.ubo.random.nested_structs_arrays_instance
>>> _arr
>>> ays.19
>>>   * dEQP
>>> -GLES3.functional.ubo.random.all_per_block_buffers.49
>>>   * dEQP-GLES3.functional.ubo.random.all_shared_buffer.36
>>>
>>> Fixes bugzilla: 
>>> https://bugs.freedesktop.org/show_bug.cgi?id=83508
>>> ---
>>>   src/glsl/link_uniform_block_active_visitor.cpp | 23
>>> +++
>>>   1 file changed, 23 insertions(+)
>>>
>>> diff --git a/src/glsl/link_uniform_block_active_visitor.cpp
>>> b/src/glsl/link_uniform_block_active_visitor.cpp
>>> index 5102947..fbe79de 100644
>>> --- a/src/glsl/link_uniform_block_active_visitor.cpp
>>> +++ b/src/glsl/link_uniform_block_active_visitor.cpp
>>> @@ -106,6 +106,22 @@
>>> link_uniform_block_active_visitor::visit(ir_variable *var)
>>>  assert(b->num_array_elements == 0);
>>>  assert(b->array_elements == NULL);
>>>  assert(b->type != NULL);
>>> +   assert(!b->type->is_array() || b->has_instance_name);
>>> +
>>> +   /* For uniform block arrays declared with a shared or
>>> std140 layout
>>> +* qualifier, mark all its instance

[Mesa-dev] [PATCH v2] meta: Support 16x MSAA in the multisample scaled blit shader

2015-09-24 Thread Neil Roberts
v2: Fix the x_scale in the shader. Remove the doubts in the commit
message.
---

After some helpful explanation from Anuj and reading the code a bit
more, I think I understand this a bit better and I no longer think
there is an issue with the sample map array having out-of-bounds
indices. The texture coordinates are effectively multiplied by
‘scale’, converted to integers and then divided by scale again. That
means that fract(coord) can effectively only be a multiple of
1.0/scale. In the 8x MSAA case, 1.0/scale is vec2(0.5f, 0.25f) which
means the maximum array index is int(0.5*2.0 + 0.75*8.0) == 7.

In this v2 of the patch I've fixed scale.x in the shader. Now for the
16x MSAA case 1.0/scale is vec2(0.25f,0.25f). I've left the formula
for the array index the same as v1 so the maximum index is now
int(0.75*4.0 + 0.75*16) == 15 which makes sense.

 src/mesa/drivers/common/meta.h |  2 ++
 src/mesa/drivers/common/meta_blit.c| 29 ++
 src/mesa/drivers/dri/i965/gen6_multisample_state.c | 14 +++
 src/mesa/main/mtypes.h | 15 ++-
 4 files changed, 49 insertions(+), 11 deletions(-)

diff --git a/src/mesa/drivers/common/meta.h b/src/mesa/drivers/common/meta.h
index fe43915..e42ddc8 100644
--- a/src/mesa/drivers/common/meta.h
+++ b/src/mesa/drivers/common/meta.h
@@ -285,9 +285,11 @@ enum blit_msaa_shader {
BLIT_2X_MSAA_SHADER_2D_MULTISAMPLE_SCALED_RESOLVE,
BLIT_4X_MSAA_SHADER_2D_MULTISAMPLE_SCALED_RESOLVE,
BLIT_8X_MSAA_SHADER_2D_MULTISAMPLE_SCALED_RESOLVE,
+   BLIT_16X_MSAA_SHADER_2D_MULTISAMPLE_SCALED_RESOLVE,
BLIT_2X_MSAA_SHADER_2D_MULTISAMPLE_ARRAY_SCALED_RESOLVE,
BLIT_4X_MSAA_SHADER_2D_MULTISAMPLE_ARRAY_SCALED_RESOLVE,
BLIT_8X_MSAA_SHADER_2D_MULTISAMPLE_ARRAY_SCALED_RESOLVE,
+   BLIT_16X_MSAA_SHADER_2D_MULTISAMPLE_ARRAY_SCALED_RESOLVE,
BLIT_MSAA_SHADER_COUNT,
 };
 
diff --git a/src/mesa/drivers/common/meta_blit.c 
b/src/mesa/drivers/common/meta_blit.c
index a41fe42..1f9515a 100644
--- a/src/mesa/drivers/common/meta_blit.c
+++ b/src/mesa/drivers/common/meta_blit.c
@@ -74,20 +74,25 @@ setup_glsl_msaa_blit_scaled_shader(struct gl_context *ctx,
const char *vs_source;
const char *sampler_array_suffix = "";
const char *texcoord_type = "vec2";
-   float y_scale;
+   float x_scale, y_scale;
enum blit_msaa_shader shader_index;
 
assert(src_rb);
samples = MAX2(src_rb->NumSamples, 1);
-   y_scale = samples * 0.5;
+
+   if (samples == 16)
+  x_scale = 4.0;
+   else
+  x_scale = 2.0;
+   y_scale = samples / x_scale;
 
/* We expect only power of 2 samples in source multisample buffer. */
assert(samples > 0 && _mesa_is_pow_two(samples));
while (samples >> (shader_offset + 1)) {
   shader_offset++;
}
-   /* Update the assert if we plan to support more than 8X MSAA. */
-   assert(shader_offset > 0 && shader_offset < 4);
+   /* Update the assert if we plan to support more than 16X MSAA. */
+   assert(shader_offset > 0 && shader_offset <= 4);
 
assert(target == GL_TEXTURE_2D_MULTISAMPLE ||
   target == GL_TEXTURE_2D_MULTISAMPLE_ARRAY);
@@ -132,6 +137,10 @@ setup_glsl_msaa_blit_scaled_shader(struct gl_context *ctx,
   sample_number =  "sample_map[int(2 * fract(coord.x) + 8 * 
fract(coord.y))]";
   sample_map = ctx->Const.SampleMap8x;
   break;
+   case 16:
+  sample_number =  "sample_map[int(4 * fract(coord.x) + 16 * 
fract(coord.y))]";
+  sample_map = ctx->Const.SampleMap16x;
+  break;
default:
   sample_number = NULL;
   sample_map = NULL;
@@ -178,9 +187,9 @@ setup_glsl_msaa_blit_scaled_shader(struct gl_context *ctx,
"{\n"
"%s"
"   vec2 interp;\n"
-   "   const vec2 scale = vec2(2.0f, %ff);\n"
-   "   const vec2 scale_inv = vec2(0.5f, %ff);\n"
-   "   const vec2 s_0_offset = vec2(0.25f, %ff);\n"
+   "   const vec2 scale = vec2(%ff, %ff);\n"
+   "   const vec2 scale_inv = vec2(%ff, %ff);\n"
+   "   const vec2 s_0_offset = vec2(%ff, %ff);\n"
"   vec2 s_0_coord, s_1_coord, s_2_coord, 
s_3_coord;\n"
"   vec4 s_0_color, s_1_color, s_2_color, 
s_3_color;\n"
"   vec4 x_0_color, x_1_color;\n"
@@ -214,9 +223,9 @@ setup_glsl_msaa_blit_scaled_shader(struct gl_context *ctx,
sampler_array_suffix,
texcoord_type,
sample_map_expr,
-   y_scale,
-   1.0f / y_scale,
-   1.0f / samples,
+   x_scale, y_scale,
+   1.0f / x_scale, 1.0f / y_sc

Re: [Mesa-dev] [PATCH v2] i965/vec4: check swizzle before discarding a uniform on a 3src operand

2015-09-24 Thread Jason Ekstrand
On Thu, Sep 24, 2015 at 1:32 AM, Alejandro Piñeiro  wrote:
> Without this commit, copy propagation is discarded if it involves
> a uniform with an instruction that has 3 sources. But 3 sourced
> instructions can access scalar values.
>
> For example, this is what vec4_visitor::fix_3src_operand() is already
> doing:
>
>if (src.file == UNIFORM && brw_is_single_value_swizzle(src.swizzle))
>   return src;
>
> Shader-db results (unfiltered) on NIR:
> total instructions in shared programs: 6259650 -> 6241985 (-0.28%)
> instructions in affected programs: 812755 -> 795090 (-2.17%)
> helped:7930
> HURT:  0
>
> Shader-db results (unfiltered) on IR:
> total instructions in shared programs: 6445822 -> 6441788 (-0.06%)
> instructions in affected programs: 296630 -> 292596 (-1.36%)
> helped:2533
> HURT:  0
>
> v2:
> - Updated commit message, using Matt Turner suggestions
> - Move the check after we've created the final value, as Jason
>   Ekstrand suggested
> - Clean up the condition
> ---
>
> Removed RFC as both Matt Turner and Jason Ekstrand accepted the change.
>
> I updated the patch as the condition was just ugly. As I made that change
> I also included Jason suggestion to move the check after the swizzle
> on the final value is computed. I know that the first version got a
> "Reviewed" by Jason, but due all these change, I prefer to be conservative,
> and ask for a new review.
>
> Thanks in advance.
>
>  src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp | 12 +++-
>  1 file changed, 7 insertions(+), 5 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp 
> b/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp
> index d3f0ddd..ce4d39d 100644
> --- a/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp
> @@ -325,9 +325,6 @@ try_copy_propagate(const struct brw_device_info *devinfo,
> inst->opcode == SHADER_OPCODE_GEN4_SCRATCH_WRITE)
>return false;
>
> -   if (inst->is_3src() && value.file == UNIFORM)
> -  return false;
> -
> if (inst->is_send_from_grf())
>return false;
>
> @@ -373,6 +370,13 @@ try_copy_propagate(const struct brw_device_info *devinfo,
> }
>
> /* Build the final value */
> +   value.swizzle = brw_compose_swizzle(inst->src[arg].swizzle,
> +   value.swizzle);

Why did you move this up here?  If we're going to have checks based on
the final value, they should probably go after we've constructed the
entire final value.  Does that seem reasonable?  So I'd put this back
and just put the UNIFORM 3-src check below it.  Otherwise, we're
breaking up the "build the final value" again.

If you don't like that, then I'd rather just go with the first patch.

With the requested reshuffle,

Reviewed-by: Jason Ekstrand 

Otherwise, I gave you an R-B on the first patch so you can just use
that one if you'd like.

--Jason

> +   /* One last check based on the final swizzle */
> +   if (inst->is_3src() && value.file == UNIFORM &&
> +   !brw_is_single_value_swizzle(value.swizzle))
> +  return false;
> +
> if (inst->src[arg].abs) {
>value.negate = false;
>value.abs = true;
> @@ -380,8 +384,6 @@ try_copy_propagate(const struct brw_device_info *devinfo,
> if (inst->src[arg].negate)
>value.negate = !value.negate;
>
> -   value.swizzle = brw_compose_swizzle(inst->src[arg].swizzle,
> -   value.swizzle);
> if (has_source_modifiers &&
> value.type != inst->src[arg].type) {
>assert(can_change_source_types(inst));
> --
> 2.1.4
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v5 3/3] mesa: Use the effective internal format instead for validation

2015-09-24 Thread Jason Ekstrand
Reviewed-by: Jason Ekstrand 

On Thu, Sep 24, 2015 at 1:57 AM, Eduardo Lima Mitev  wrote:
> When validating format+type+internalFormat for texture pixel operations
> on GLES3, the effective internal format should be used if the one
> specified is an unsized internal format. Page 127, section "3.8 Texturing"
> of the GLES 3.0.4 spec says:
>
> "if internalformat is a base internal format, the effective internal
>  format is a sized internal format that is derived from the format and
>  type for internal use by the GL. Table 3.12 specifies the mapping of
>  format and type to effective internal formats. The effective internal
>  format is used by the GL for purposes such as texture completeness or
>  type checks for CopyTex* commands. In these cases, the GL is required
>  to operate as if the effective internal format was used as the
>  internalformat when specifying the texture data."
>
> v2: Per the spec, Luminance8Alpha8, Luminance8 and Alpha8 should not be
> considered sized internal formats. Return the corresponding unsize format
> instead.
>
> v4: * Improved comments in
>   _mesa_es3_effective_internal_format_for_format_and_type().
> * Splitted patch to separate chunk about reordering of
>   error_check_subtexture_dimensions() error check, which is not directly
>   related with this patch.
> v5: Dropped the splitted patch because it was actually a work around 3
> dEQP tests that are buggy:
>
> dEQP-GLES2.functional.negative_api.texture.texsubimage2d_neg_offset
> dEQP-GLES2.functional.negative_api.texture.texsubimage2d_offset_allowed
> dEQP-GLES2.functional.negative_api.texture.texsubimage2d_neg_wdt_hgt
>
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91582
> ---
>  src/mesa/main/glformats.c | 151 
> ++
>  1 file changed, 151 insertions(+)
>
> diff --git a/src/mesa/main/glformats.c b/src/mesa/main/glformats.c
> index 515b06e..7dab33c 100644
> --- a/src/mesa/main/glformats.c
> +++ b/src/mesa/main/glformats.c
> @@ -2646,6 +2646,127 @@ _mesa_base_tex_format(const struct gl_context *ctx, 
> GLint internalFormat)
>  }
>
>  /**
> + * Returns the effective internal format from a texture format and type.
> + * This is used by texture image operations internally for validation, when
> + * the specified internal format is a base (unsized) format.
> + *
> + * This method will only return a valid effective internal format if the
> + * combination of format, type and internal format in base form, is 
> acceptable.
> + *
> + * If a single sized internal format is defined in the spec (OpenGL-ES 
> 3.0.4) or
> + * in extensions, to unambiguously correspond to the given base format, then
> + * that internal format is returned as the effective. Otherwise, if the
> + * combination is accepted but a single effective format is not defined, the
> + * passed base format will be returned instead.
> + *
> + * \param format the texture format
> + * \param type the texture type
> + */
> +static GLenum
> +_mesa_es3_effective_internal_format_for_format_and_type(GLenum format,
> +GLenum type)
> +{
> +   switch (type) {
> +   case GL_UNSIGNED_BYTE:
> +  switch (format) {
> +  case GL_RGBA:
> + return GL_RGBA8;
> +  case GL_RGB:
> + return GL_RGB8;
> +  /* Although LUMINANCE_ALPHA, LUMINANCE and ALPHA appear in table 3.12,
> +   * (section 3.8 Texturing, page 128 of the OpenGL-ES 3.0.4) as 
> effective
> +   * internal formats, they do not correspond to GL constants, so the 
> base
> +   * format is returned instead.
> +   */
> +  case GL_LUMINANCE_ALPHA:
> +  case GL_LUMINANCE:
> +  case GL_ALPHA:
> + return format;
> +  }
> +  break;
> +
> +   case GL_UNSIGNED_SHORT_4_4_4_4:
> +  if (format == GL_RGBA)
> + return GL_RGBA4;
> +  break;
> +
> +   case GL_UNSIGNED_SHORT_5_5_5_1:
> +  if (format == GL_RGBA)
> + return GL_RGB5_A1;
> +  break;
> +
> +   case GL_UNSIGNED_SHORT_5_6_5:
> +  if (format == GL_RGB)
> + return GL_RGB565;
> +  break;
> +
> +   /* OES_packed_depth_stencil */
> +   case GL_UNSIGNED_INT_24_8:
> +  if (format == GL_DEPTH_STENCIL)
> + return GL_DEPTH24_STENCIL8;
> +  break;
> +
> +   case GL_FLOAT_32_UNSIGNED_INT_24_8_REV:
> +  if (format == GL_DEPTH_STENCIL)
> + return GL_DEPTH32F_STENCIL8;
> +  break;
> +
> +   case GL_UNSIGNED_SHORT:
> +  if (format == GL_DEPTH_COMPONENT)
> + return GL_DEPTH_COMPONENT16;
> +  break;
> +
> +   case GL_UNSIGNED_INT:
> +  /* It can be DEPTH_COMPONENT16 or DEPTH_COMPONENT24, so just return
> +   * the format.
> +   */
> +  if (format == GL_DEPTH_COMPONENT)
> + return format;
> +  break;
> +
> +   /* OES_texture_float and OES_texture_half_float */
> +   case GL_FLOAT:
> +  if (format == GL_DEPTH_COMPONENT)
> +   

Re: [Mesa-dev] [PATCH] i965: Don't tell the hardware about our UAV access.

2015-09-24 Thread Mark Janes
Emil Velikov  writes:

> Hi all,
>
> On 10 September 2015 at 00:30, Mark Janes  wrote:
>> Mark Janes  writes:
>>
>>> Francisco Jerez  writes:
>>>
 Mark Janes  writes:

> When I tested this, I saw an intermittent BSW gpu hang.  I haven't been
> able to confirm that it is due to the host-mem-barrier test.
>
 It would probably be useful to know if the hang is due to any of the
 image load/store tests or if it's something unrelated.
>>>
>>> Yes, you are right.  I will take some time tomorrow to catch the
>>> specific test.  Given the low rate of gpu hangs on BSW lately, I expect
>>> it will be due to image load/store tests.  However I need to confirm it.
>>
>> I spent time attempting to reproduce hangs with this patch on BSW today.
>> I forced 3 hangs, but none of them could be attributed to image
>> load/store tests.
>>
> With the above said, should we consider this as inconclusive fix ? If
> so, what is the plan going forward - is this going to the bin or we'll
> beat it into shape and merge it ?
>

I think the fix correctly addresses gpu hangs related to the
host-mem-barrier test.

My initial report wrongly attributed gpu hangs to host-mem-barrier.
Subsequent testing showed that hangs were caused by other tests.  I
provided feedback with partial information because of the time consuming
nature of root-causing gpu hangs.

Curro felt confident in this fix, so I took the time to verify which
tests triggered the hangs.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v3] i965/vec4: check swizzle before discarding a uniform on a 3src operand

2015-09-24 Thread Alejandro Piñeiro
Without this commit, copy propagation is discarded if it involves
a uniform with an instruction that has 3 sources. But 3 sourced
instructions can access scalar values.

For example, this is what vec4_visitor::fix_3src_operand() is already
doing:

   if (src.file == UNIFORM && brw_is_single_value_swizzle(src.swizzle))
  return src;

Shader-db results (unfiltered) on NIR:
total instructions in shared programs: 6259650 -> 6241985 (-0.28%)
instructions in affected programs: 812755 -> 795090 (-2.17%)
helped:7930
HURT:  0

Shader-db results (unfiltered) on IR:
total instructions in shared programs: 6445822 -> 6441788 (-0.06%)
instructions in affected programs: 296630 -> 292596 (-1.36%)
helped:2533
HURT:  0

v2:
- Updated commit message, using Matt Turner suggestions
- Move the check after we've created the final value, as Jason
  Ekstrand suggested
- Clean up the condition

v3:
- Move the check back to the original place, to keep things
  tidy, as suggested by Jason Ekstrand
---

> Why did you move this up here?  If we're going to have checks based on
> the final value, they should probably go after we've constructed the
> entire final value.  Does that seem reasonable?  So I'd put this back
> and just put the UNIFORM 3-src check below it.  Otherwise, we're
> breaking up the "build the final value" again.

The problem is that at the end of try_copy_propagate we are not only
creating the entire final value, but also modifying some of the
instruction properties, based on the fact that it is safe at that
point.

> If you don't like that, then I'd rather just go with the first patch.

This commit is basically the original one, but with some cleanings, 
including Matt's suggestion on the if, plus calling compose_swizzle
just once. I personally find more readable this way, sorry for the noise
of asking another patch review.

 src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp
index d3f0ddd..158b25d 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp
@@ -325,7 +325,11 @@ try_copy_propagate(const struct brw_device_info *devinfo,
inst->opcode == SHADER_OPCODE_GEN4_SCRATCH_WRITE)
   return false;
 
-   if (inst->is_3src() && value.file == UNIFORM)
+   unsigned composed_swizzle = brw_compose_swizzle(inst->src[arg].swizzle,
+   value.swizzle);
+   if (inst->is_3src() &&
+   value.file == UNIFORM &&
+   !composed_swizzle)
   return false;
 
if (inst->is_send_from_grf())
@@ -380,8 +384,7 @@ try_copy_propagate(const struct brw_device_info *devinfo,
if (inst->src[arg].negate)
   value.negate = !value.negate;
 
-   value.swizzle = brw_compose_swizzle(inst->src[arg].swizzle,
-   value.swizzle);
+   value.swizzle = composed_swizzle;
if (has_source_modifiers &&
value.type != inst->src[arg].type) {
   assert(can_change_source_types(inst));
-- 
2.1.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v5 70/70] glsl: Mark as active all elements of shared/std140 block arrays

2015-09-24 Thread Timothy Arceri


On 25 September 2015 12:54:38 am AEST, "Samuel Iglesias Gonsálvez" 
 wrote:
>
>
>On 24/09/15 02:04, Timothy Arceri wrote:
>> On Wed, 2015-09-23 at 17:02 +0200, Antía Puentes wrote:
>>> Hi! Timothy,
>>> thanks for your review.
>>>
>>> Seeing your patch in:
>>>
>http://lists.freedesktop.org/archives/mesa-dev/2015-September/095070.
>>> html
>>>
>>> The condition in my patch:
>>> (var->type->interface_packing != GLSL_INTERFACE_PACKING_PACKED)
>>> should also be changed to:
>>> (var->get_interface_type()->interface_packing !=
>>> GLSL_INTERFACE_PACKING_PACKED)
>>>
>>> Right?
>> 
>> You don't really need to do that because var is already the variable
>> not the array and we know its the interface not an interface member
>> because of the conditions above your change.
>> 
>
>'var->type' refers to the array of interface block here but the
>interface packing information is hold inside the interface block type.
>
>Because of that, we need to use
>var->get_interface_type()->interface_packing to get the valid
>information for the condition, so we don't do the wrong thing.
>
>I even ran a GDB session locally with one of my programs to be sure
>before replying :-)
>
>If you agree, we are going to do that change before pushing the patch
>to
>master.

After a second look I agree, this does need to change. Thanks for double 
checking.

>
>Sam
>
>[0]
>http://lists.freedesktop.org/archives/mesa-dev/2015-September/094009.html
>
>> 
>>>
>>> On mié, 2015-09-23 at 23:44 +1000, Timothy Arceri wrote:
 On Wed, 2015-09-23 at 12:25 +0200, Samuel Iglesias Gonsálvez wrote:
>
> On 21/09/15 13:11, Samuel Iglesias Gonsálvez wrote:
>> On 21/09/15 09:41, Tapani Pälli wrote:
>>> Seems like a nice fix, takes ES3 CTS failures from 116 to 64
>>> on
>>> Haswell
>>> (most failing tests are with ubos).
>>>
>>> Tested-by: Tapani Pälli 
>>>
>>
>> OK thanks!
>>
>>> This is individual patch not related to just SSBOs, maybe
>>> this
>>> could be
>>> pushed separately from the rest?
>>>
>>
>> Yes, this patch can be pushed separately from the rest of
>> patches
>> of the
>> series: we just need an R-b, as usual.
>>
>> We are going to discuss the proper solution with Timothy [0],
>> as he
>> found that we are not covering one case.
>>
>
> Timothy has sent a patch fixing the packed case [0] and he
> developed
> a
> piglit test for it [1].
>
> We are going to wait for an R-b of Antía's patch before pushing
> it.

 I sent a reply to this same email saying almost the same thing but
 it
 seems to have gone missing.

 Anyway I also sent my r-b, this patch looks good to me.

 Reviewed-by: Timothy Arceri 


>
> Sam
>
> [0]
> http://lists.freedesktop.org/archives/mesa-dev/2015-September/095
> 070.
> html
> [1] 
> http://lists.freedesktop.org/archives/piglit/2015-September/01724
> 7.html
>
>> Sam
>>
>> [0] https://bugs.freedesktop.org/show_bug.cgi?id=83508
>>
>>
>>> // Tapani
>>>
>>> On 09/10/2015 04:36 PM, Iago Toral Quiroga wrote:
 From: Antia Puentes 

 Commit 1ca25ab (glsl: Do not eliminate 'shared' or 'std140'
 blocks
 or block members) considered as active 'shared' and
 'std140'
 uniform
 blocks and uniform block arrays, but did not include the
 block
 array
 elements. Because of that, it was possible to have an
 active
 uniform
 block array without any elements marked as used, making the
 assertion
 ((b->num_array_elements > 0) == b->type->is_array())
 in link_uniform_blocks() fail.

 Fixes the following 5 dEQP tests:

   * dEQP
 -GLES3.functional.ubo.random.nested_structs_instance_arrays
 .18
   * dEQP
 -GLES3.functional.ubo.random.nested_structs_instance_arrays
 .24
   *
 dEQP
 -GLES3.functional.ubo.random.nested_structs_arrays_instance
 _arr
 ays.19
   * dEQP
 -GLES3.functional.ubo.random.all_per_block_buffers.49
   * dEQP-GLES3.functional.ubo.random.all_shared_buffer.36

 Fixes bugzilla: 
 https://bugs.freedesktop.org/show_bug.cgi?id=83508
 ---
   src/glsl/link_uniform_block_active_visitor.cpp | 23
 +++
   1 file changed, 23 insertions(+)

 diff --git a/src/glsl/link_uniform_block_active_visitor.cpp
 b/src/glsl/link_uniform_block_active_visitor.cpp
 index 5102947..fbe79de 100644
 --- a/src/glsl/link_uniform_block_active_visitor.cpp
 +++ b/src/glsl/link_uniform_block_active_visitor.cpp
 @@ -106,6 +106,22 @@
 link_uniform_block_active_visitor::visit(ir_variable *var)
  assert

Re: [Mesa-dev] [PATCH v5 3/3] mesa: Use the effective internal format instead for validation

2015-09-24 Thread Eduardo Lima Mitev
On 09/24/2015 05:39 PM, Jason Ekstrand wrote:
> Reviewed-by: Jason Ekstrand 
> 

Thanks! I will wait for Mark to confirm if it is safe or not to push the
series, since it will effectively regress the 3 dEQP tests.

I should also remove the "Bugzilla:" tag from this commit, since it
doesn't fix the bug any more.

Eduardo

> On Thu, Sep 24, 2015 at 1:57 AM, Eduardo Lima Mitev  wrote:
>> When validating format+type+internalFormat for texture pixel operations
>> on GLES3, the effective internal format should be used if the one
>> specified is an unsized internal format. Page 127, section "3.8 Texturing"
>> of the GLES 3.0.4 spec says:
>>
>> "if internalformat is a base internal format, the effective internal
>>  format is a sized internal format that is derived from the format and
>>  type for internal use by the GL. Table 3.12 specifies the mapping of
>>  format and type to effective internal formats. The effective internal
>>  format is used by the GL for purposes such as texture completeness or
>>  type checks for CopyTex* commands. In these cases, the GL is required
>>  to operate as if the effective internal format was used as the
>>  internalformat when specifying the texture data."
>>
>> v2: Per the spec, Luminance8Alpha8, Luminance8 and Alpha8 should not be
>> considered sized internal formats. Return the corresponding unsize format
>> instead.
>>
>> v4: * Improved comments in
>>   _mesa_es3_effective_internal_format_for_format_and_type().
>> * Splitted patch to separate chunk about reordering of
>>   error_check_subtexture_dimensions() error check, which is not directly
>>   related with this patch.
>> v5: Dropped the splitted patch because it was actually a work around 3
>> dEQP tests that are buggy:
>>
>> dEQP-GLES2.functional.negative_api.texture.texsubimage2d_neg_offset
>> dEQP-GLES2.functional.negative_api.texture.texsubimage2d_offset_allowed
>> dEQP-GLES2.functional.negative_api.texture.texsubimage2d_neg_wdt_hgt
>>
>> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91582
>> ---
>>  src/mesa/main/glformats.c | 151 
>> ++
>>  1 file changed, 151 insertions(+)
>>
>> diff --git a/src/mesa/main/glformats.c b/src/mesa/main/glformats.c
>> index 515b06e..7dab33c 100644
>> --- a/src/mesa/main/glformats.c
>> +++ b/src/mesa/main/glformats.c
>> @@ -2646,6 +2646,127 @@ _mesa_base_tex_format(const struct gl_context *ctx, 
>> GLint internalFormat)
>>  }
>>
>>  /**
>> + * Returns the effective internal format from a texture format and type.
>> + * This is used by texture image operations internally for validation, when
>> + * the specified internal format is a base (unsized) format.
>> + *
>> + * This method will only return a valid effective internal format if the
>> + * combination of format, type and internal format in base form, is 
>> acceptable.
>> + *
>> + * If a single sized internal format is defined in the spec (OpenGL-ES 
>> 3.0.4) or
>> + * in extensions, to unambiguously correspond to the given base format, then
>> + * that internal format is returned as the effective. Otherwise, if the
>> + * combination is accepted but a single effective format is not defined, the
>> + * passed base format will be returned instead.
>> + *
>> + * \param format the texture format
>> + * \param type the texture type
>> + */
>> +static GLenum
>> +_mesa_es3_effective_internal_format_for_format_and_type(GLenum format,
>> +GLenum type)
>> +{
>> +   switch (type) {
>> +   case GL_UNSIGNED_BYTE:
>> +  switch (format) {
>> +  case GL_RGBA:
>> + return GL_RGBA8;
>> +  case GL_RGB:
>> + return GL_RGB8;
>> +  /* Although LUMINANCE_ALPHA, LUMINANCE and ALPHA appear in table 3.12,
>> +   * (section 3.8 Texturing, page 128 of the OpenGL-ES 3.0.4) as 
>> effective
>> +   * internal formats, they do not correspond to GL constants, so the 
>> base
>> +   * format is returned instead.
>> +   */
>> +  case GL_LUMINANCE_ALPHA:
>> +  case GL_LUMINANCE:
>> +  case GL_ALPHA:
>> + return format;
>> +  }
>> +  break;
>> +
>> +   case GL_UNSIGNED_SHORT_4_4_4_4:
>> +  if (format == GL_RGBA)
>> + return GL_RGBA4;
>> +  break;
>> +
>> +   case GL_UNSIGNED_SHORT_5_5_5_1:
>> +  if (format == GL_RGBA)
>> + return GL_RGB5_A1;
>> +  break;
>> +
>> +   case GL_UNSIGNED_SHORT_5_6_5:
>> +  if (format == GL_RGB)
>> + return GL_RGB565;
>> +  break;
>> +
>> +   /* OES_packed_depth_stencil */
>> +   case GL_UNSIGNED_INT_24_8:
>> +  if (format == GL_DEPTH_STENCIL)
>> + return GL_DEPTH24_STENCIL8;
>> +  break;
>> +
>> +   case GL_FLOAT_32_UNSIGNED_INT_24_8_REV:
>> +  if (format == GL_DEPTH_STENCIL)
>> + return GL_DEPTH32F_STENCIL8;
>> +  break;
>> +
>> +   case GL_UNSIGNED_SHORT:
>> +  if (format == GL_DEPTH_COMPONENT)
>> + 

Re: [Mesa-dev] [PATCH v3] i965/vec4: check swizzle before discarding a uniform on a 3src operand

2015-09-24 Thread Jason Ekstrand
On Thu, Sep 24, 2015 at 9:29 AM, Alejandro Piñeiro  wrote:
> Without this commit, copy propagation is discarded if it involves
> a uniform with an instruction that has 3 sources. But 3 sourced
> instructions can access scalar values.
>
> For example, this is what vec4_visitor::fix_3src_operand() is already
> doing:
>
>if (src.file == UNIFORM && brw_is_single_value_swizzle(src.swizzle))
>   return src;
>
> Shader-db results (unfiltered) on NIR:
> total instructions in shared programs: 6259650 -> 6241985 (-0.28%)
> instructions in affected programs: 812755 -> 795090 (-2.17%)
> helped:7930
> HURT:  0
>
> Shader-db results (unfiltered) on IR:
> total instructions in shared programs: 6445822 -> 6441788 (-0.06%)
> instructions in affected programs: 296630 -> 292596 (-1.36%)
> helped:2533
> HURT:  0
>
> v2:
> - Updated commit message, using Matt Turner suggestions
> - Move the check after we've created the final value, as Jason
>   Ekstrand suggested
> - Clean up the condition
>
> v3:
> - Move the check back to the original place, to keep things
>   tidy, as suggested by Jason Ekstrand
> ---
>
>> Why did you move this up here?  If we're going to have checks based on
>> the final value, they should probably go after we've constructed the
>> entire final value.  Does that seem reasonable?  So I'd put this back
>> and just put the UNIFORM 3-src check below it.  Otherwise, we're
>> breaking up the "build the final value" again.
>
> The problem is that at the end of try_copy_propagate we are not only
> creating the entire final value, but also modifying some of the
> instruction properties, based on the fact that it is safe at that
> point.
>
>> If you don't like that, then I'd rather just go with the first patch.
>
> This commit is basically the original one, but with some cleanings,
> including Matt's suggestion on the if, plus calling compose_swizzle
> just once. I personally find more readable this way, sorry for the noise
> of asking another patch review.
>
>  src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp | 9 ++---
>  1 file changed, 6 insertions(+), 3 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp 
> b/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp
> index d3f0ddd..158b25d 100644
> --- a/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp
> @@ -325,7 +325,11 @@ try_copy_propagate(const struct brw_device_info *devinfo,
> inst->opcode == SHADER_OPCODE_GEN4_SCRATCH_WRITE)
>return false;
>
> -   if (inst->is_3src() && value.file == UNIFORM)
> +   unsigned composed_swizzle = brw_compose_swizzle(inst->src[arg].swizzle,
> +   value.swizzle);
> +   if (inst->is_3src() &&
> +   value.file == UNIFORM &&
> +   !composed_swizzle)

Where did the is_single_value_swizzle() go?

>return false;
>
> if (inst->is_send_from_grf())
> @@ -380,8 +384,7 @@ try_copy_propagate(const struct brw_device_info *devinfo,
> if (inst->src[arg].negate)
>value.negate = !value.negate;
>
> -   value.swizzle = brw_compose_swizzle(inst->src[arg].swizzle,
> -   value.swizzle);
> +   value.swizzle = composed_swizzle;
> if (has_source_modifiers &&
> value.type != inst->src[arg].type) {
>assert(can_change_source_types(inst));
> --
> 2.1.4
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2] i965/vec4: Don't coalesce regs in Gen6 MATH ops if reswizzle/writemask needed

2015-09-24 Thread Jason Ekstrand
Cc'ing stable.

On Tue, Sep 22, 2015 at 9:53 AM, Antia Puentes  wrote:
> Gen6 MATH instructions can not execute in align16 mode, so swizzles or
> writemasking are not allowed.
>
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92033
> ---
>
> I have tried to find an example where the writemask check is strictly needed
> in order to avoid an incorrect register coalescing, but I have failed.
>
> If I have something like this in my shader:
>
>   in vec4 attr_pos;
>
>   void main() {
> vec4 a;
> a.xy = sin(attr_pos);
> ..
>   }
>
> It is translated into the following vec4 instructions:
>   sin vgrf4.0:F, vgrf3.xyzw:F
>   mov vgrf0.0.xy:F, vgrf4.xyzw:F <-- (added by emit_math to fix the 
> writemasking issue)
>
> And converted by the opt_reduce_swizzle optimization into:
>   sin vgrf4.0:F, vgrf3.xyzw:F
>   mov vgrf0.0.xy:F, vgrf4.xyyy:F
>
> The swizzle in the MOV instruction is not the identity anymore, so the method 
> "can_reswizzle"
> in opt_register_coalesce will return FALSE because of the swizzle-check that 
> my first
> version of the patch added.
>
> Out of curiosity, I disabled the opt_reduce_swizzle optimization to see what 
> happens.
> Still "can_reswizzle" returns FALSE preventing the incorrect register 
> coalescing,
> because the math function sets more things than the MOV instruction; i.e.
> the check "(dst.writemask & ~swizzle_mask) is positive.
>
> Although I have not found an example where the writemask-check is crucial, 
> that does not neccessarily
> mean that it does not exists. I am sending the second version of the patch in 
> case you prefer to have
> that check in place.
>
>  src/mesa/drivers/dri/i965/brw_ir_vec4.h |  3 ++-
>  src/mesa/drivers/dri/i965/brw_vec4.cpp  | 11 +--
>  2 files changed, 11 insertions(+), 3 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_ir_vec4.h 
> b/src/mesa/drivers/dri/i965/brw_ir_vec4.h
> index 966a410..a48bb68 100644
> --- a/src/mesa/drivers/dri/i965/brw_ir_vec4.h
> +++ b/src/mesa/drivers/dri/i965/brw_ir_vec4.h
> @@ -175,7 +175,8 @@ public:
>
> bool is_send_from_grf();
> unsigned regs_read(unsigned arg) const;
> -   bool can_reswizzle(int dst_writemask, int swizzle, int swizzle_mask);
> +   bool can_reswizzle(const struct brw_device_info *devinfo, int 
> dst_writemask,
> +  int swizzle, int swizzle_mask);
> void reswizzle(int dst_writemask, int swizzle);
> bool can_do_source_mods(const struct brw_device_info *devinfo);
>
> diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp 
> b/src/mesa/drivers/dri/i965/brw_vec4.cpp
> index ed49cd3..ae695f1 100644
> --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
> @@ -941,10 +941,17 @@ vec4_visitor::opt_set_dependency_control()
>  }
>
>  bool
> -vec4_instruction::can_reswizzle(int dst_writemask,
> +vec4_instruction::can_reswizzle(const struct brw_device_info *devinfo,
> +int dst_writemask,
>  int swizzle,
>  int swizzle_mask)
>  {
> +   /* Gen6 MATH instructions can not execute in align16 mode, so swizzles
> +* or writemasking are not allowed. */
> +   if (devinfo->gen == 6 && is_math() &&
> +   (swizzle != BRW_SWIZZLE_XYZW || dst_writemask != WRITEMASK_XYZW))
> +  return false;
> +
> /* If this instruction sets anything not referenced by swizzle, then we'd
>  * totally break it when we reswizzle.
>  */
> @@ -1077,7 +1084,7 @@ vec4_visitor::opt_register_coalesce()
> break;
>
>  /* If we can't handle the swizzle, bail. */
> -if (!scan_inst->can_reswizzle(inst->dst.writemask,
> +if (!scan_inst->can_reswizzle(devinfo, inst->dst.writemask,
>inst->src[0].swizzle,
>chans_needed)) {
> break;
> --
> 2.1.0
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2] meta: Support 16x MSAA in the multisample scaled blit shader

2015-09-24 Thread Anuj Phogat
On Thu, Sep 24, 2015 at 8:07 AM, Neil Roberts  wrote:
>
> v2: Fix the x_scale in the shader. Remove the doubts in the commit
> message.
> ---
>
> After some helpful explanation from Anuj and reading the code a bit
> more, I think I understand this a bit better and I no longer think
> there is an issue with the sample map array having out-of-bounds
> indices. The texture coordinates are effectively multiplied by
> ‘scale’, converted to integers and then divided by scale again. That
> means that fract(coord) can effectively only be a multiple of
> 1.0/scale. In the 8x MSAA case, 1.0/scale is vec2(0.5f, 0.25f) which
> means the maximum array index is int(0.5*2.0 + 0.75*8.0) == 7.
>
> In this v2 of the patch I've fixed scale.x in the shader. Now for the
> 16x MSAA case 1.0/scale is vec2(0.25f,0.25f). I've left the formula
> for the array index the same as v1 so the maximum index is now
> int(0.75*4.0 + 0.75*16) == 15 which makes sense.
>
>  src/mesa/drivers/common/meta.h |  2 ++
>  src/mesa/drivers/common/meta_blit.c| 29 
> ++
>  src/mesa/drivers/dri/i965/gen6_multisample_state.c | 14 +++
>  src/mesa/main/mtypes.h | 15 ++-
>  4 files changed, 49 insertions(+), 11 deletions(-)
>
> diff --git a/src/mesa/drivers/common/meta.h b/src/mesa/drivers/common/meta.h
> index fe43915..e42ddc8 100644
> --- a/src/mesa/drivers/common/meta.h
> +++ b/src/mesa/drivers/common/meta.h
> @@ -285,9 +285,11 @@ enum blit_msaa_shader {
> BLIT_2X_MSAA_SHADER_2D_MULTISAMPLE_SCALED_RESOLVE,
> BLIT_4X_MSAA_SHADER_2D_MULTISAMPLE_SCALED_RESOLVE,
> BLIT_8X_MSAA_SHADER_2D_MULTISAMPLE_SCALED_RESOLVE,
> +   BLIT_16X_MSAA_SHADER_2D_MULTISAMPLE_SCALED_RESOLVE,
> BLIT_2X_MSAA_SHADER_2D_MULTISAMPLE_ARRAY_SCALED_RESOLVE,
> BLIT_4X_MSAA_SHADER_2D_MULTISAMPLE_ARRAY_SCALED_RESOLVE,
> BLIT_8X_MSAA_SHADER_2D_MULTISAMPLE_ARRAY_SCALED_RESOLVE,
> +   BLIT_16X_MSAA_SHADER_2D_MULTISAMPLE_ARRAY_SCALED_RESOLVE,
> BLIT_MSAA_SHADER_COUNT,
>  };
>
> diff --git a/src/mesa/drivers/common/meta_blit.c 
> b/src/mesa/drivers/common/meta_blit.c
> index a41fe42..1f9515a 100644
> --- a/src/mesa/drivers/common/meta_blit.c
> +++ b/src/mesa/drivers/common/meta_blit.c
> @@ -74,20 +74,25 @@ setup_glsl_msaa_blit_scaled_shader(struct gl_context *ctx,
> const char *vs_source;
> const char *sampler_array_suffix = "";
> const char *texcoord_type = "vec2";
> -   float y_scale;
> +   float x_scale, y_scale;
> enum blit_msaa_shader shader_index;
>
> assert(src_rb);
> samples = MAX2(src_rb->NumSamples, 1);
> -   y_scale = samples * 0.5;
> +
> +   if (samples == 16)
> +  x_scale = 4.0;
> +   else
> +  x_scale = 2.0;
> +   y_scale = samples / x_scale;
>
> /* We expect only power of 2 samples in source multisample buffer. */
> assert(samples > 0 && _mesa_is_pow_two(samples));
> while (samples >> (shader_offset + 1)) {
>shader_offset++;
> }
> -   /* Update the assert if we plan to support more than 8X MSAA. */
> -   assert(shader_offset > 0 && shader_offset < 4);
> +   /* Update the assert if we plan to support more than 16X MSAA. */
> +   assert(shader_offset > 0 && shader_offset <= 4);
>
> assert(target == GL_TEXTURE_2D_MULTISAMPLE ||
>target == GL_TEXTURE_2D_MULTISAMPLE_ARRAY);
> @@ -132,6 +137,10 @@ setup_glsl_msaa_blit_scaled_shader(struct gl_context 
> *ctx,
>sample_number =  "sample_map[int(2 * fract(coord.x) + 8 * 
> fract(coord.y))]";
>sample_map = ctx->Const.SampleMap8x;
>break;
> +   case 16:
> +  sample_number =  "sample_map[int(4 * fract(coord.x) + 16 * 
> fract(coord.y))]";
> +  sample_map = ctx->Const.SampleMap16x;
> +  break;
> default:
>sample_number = NULL;
>sample_map = NULL;
> @@ -178,9 +187,9 @@ setup_glsl_msaa_blit_scaled_shader(struct gl_context *ctx,
> "{\n"
> "%s"
> "   vec2 interp;\n"
> -   "   const vec2 scale = vec2(2.0f, %ff);\n"
> -   "   const vec2 scale_inv = vec2(0.5f, %ff);\n"
> -   "   const vec2 s_0_offset = vec2(0.25f, 
> %ff);\n"
> +   "   const vec2 scale = vec2(%ff, %ff);\n"
> +   "   const vec2 scale_inv = vec2(%ff, %ff);\n"
> +   "   const vec2 s_0_offset = vec2(%ff, %ff);\n"
> "   vec2 s_0_coord, s_1_coord, s_2_coord, 
> s_3_coord;\n"
> "   vec4 s_0_color, s_1_color, s_2_color, 
> s_3_color;\n"
> "   vec4 x_0_color, x_1_color;\n"
> @@ -214,9 +223,9 @@ setup_glsl_msaa_blit_scaled_shader(struct gl_context *ctx,
> sampler_array_suffix,
> texcoord_type,
> 

Re: [Mesa-dev] [Mesa-stable] [PATCH] Revert "i965: Stop aux data compare preventing program binary re-use"

2015-09-24 Thread Ben Widawsky
On Thu, Sep 24, 2015 at 11:41:15AM +0100, Emil Velikov wrote:
> Hi guys,
> 
> On 2 September 2015 at 20:10, Pohjolainen, Topi
>  wrote:
> > On Thu, Aug 27, 2015 at 10:05:14AM -0700, Ben Widawsky wrote:
> >> On Thu, Aug 27, 2015 at 10:51:59AM +0300, Pohjolainen, Topi wrote:
> >> > On Wed, Aug 26, 2015 at 03:46:05PM -0700, Ben Widawsky wrote:
> >> > > This reverts commit 1bba29ed403e735ba0bf04ed8aa2e571884fcaaf
> >> > > Author: Topi Pohjolainen 
> >> > > Date:   Thu Jun 25 14:00:41 2015 +0300
> >> > >
> >> > > i965: Stop aux data compare preventing program binary re-use
> >> > >
> >> > > This fixes an intermittent failure in
> >> > > piglit.spec.arb_pixel_buffer_object.texsubimage pbo.sklm64 (maybe other
> >> > > platforms as well, but it is harder to reproduce). I can usually hit 
> >> > > the failure
> >> > > within 10 runs of the test. This is a very hairy commit to debug. I'll 
> >> > > let Topi
> >> > > handle it, or else we should go with the revert. I am open to either. 
> >> > > I got
> >> > > lucky that Jenkins caught this on a run.
> >> > >
> >> > > Here was the script I used for bisect:
> >> > >
> >> > > i=0
> >> > > while [ $i -lt 40 ] ; do
> >> > >./bin/texsubimage pbo -auto -fbo > /dev/null 2>&1
> >> > >[[ $? != 0 ]] && echo fail && exit 1
> >> > >((i++))
> >> > > done
> >> > >
> >> > > exit 0
> >> >
> >> > Should I use different piglit version than the current master? I'm asking
> >> > because I get this with both Mesa master and my patch reverted.
> >>
> >> My piglit was pretty old. I just updated that, but mesa was master as of
> >> yesterday (output is below)
> >>
> >> The one bit of advice I can add is that you make sure your system is 
> >> updated to
> >> the very latest.
> >>
> >> >
> >> > testrunner@skl-y:~/topi/piglit$ ./bin/texsubimage pbo -auto -fbo
> >> > Using test set: Core formats
> >> > texsubimage failed
> >> >   target: GL_TEXTURE_2D
> >> >   internal format: GL_COMPRESSED_RGB_S3TC_DXT1_EXT
> >> >   region: 68, 12  32 x 48
> >> > texsubimage failed
> >> >   target: GL_TEXTURE_2D
> >> >   internal format: GL_COMPRESSED_RGBA_S3TC_DXT1_EXT
> >> >   region: 0, 28  116 x 20
> >> > texsubimage failed
> >> >   target: GL_TEXTURE_2D
> >> >   internal format: GL_COMPRESSED_RGBA_S3TC_DXT3_EXT
> >> >   region: 16, 4  60 x 36
> >> > texsubimage failed
> >> >   target: GL_TEXTURE_2D
> >> >   internal format: GL_COMPRESSED_RGBA_S3TC_DXT5_EXT
> >> >   region: 8, 0  104 x 60
> >> > Mesa: User error: GL_INVALID_OPERATION in glTexSubImage2D(out of bounds 
> >> > PBO access)
> >> > PIGLIT: {"result": "fail" }
> >>
> >> Interesting. It so happens I have a patch that purports to fix some of 
> >> these
> >> things. I did not try this patch myself for this issue.
> >> http://patchwork.freedesktop.org/patch/54025/
> >>
> >> (I've been hanging on to this since I needed to do a bit of research to 
> >> address
> >> Matt's feedback, specifically regarding render compression).
> >>
> >> >
> >> >
> >> > Could you include the console output of the failure you get?
> >> >
> >>
> >> Here are two sample failures (occurred in 7 runs)
> >>
> >> Using test set: Core formats
> >> texsubimage failed
> >>   target: GL_TEXTURE_2D
> >>   internal format: GL_INTENSITY
> >>   region: 27, 1  13 x 61
> >> Mesa: User error: GL_INVALID_OPERATION in glTexSubImage3D(out of bounds 
> >> PBO access)
> >
> > I got more and more a feeling that this is timing related. As the current
> > logic fails to ever re-use the items in the cache, the comparisons are
> > redundant and we can hardcode the search to always fail. This changes the
> > runtime dynamics closer to the patch being reverted:
> >
> > diff --git a/src/mesa/drivers/dri/i965/brw_state_cache.c 
> > b/src/mesa/drivers/dri/
> > index f5f279c..746fe7b 100644
> > --- a/src/mesa/drivers/dri/i965/brw_state_cache.c
> > +++ b/src/mesa/drivers/dri/i965/brw_state_cache.c
> > @@ -224,6 +224,8 @@ brw_try_upload_using_copy(struct brw_cache *cache,
> > continue;
> >  }
> >
> > + return false;
> > +
> >   if (cache->aux_compare[result_item->cache_id]) {
> >  if (!cache->aux_compare[result_item->cache_id](item_aux, aux))
> > continue;
> >
> >
> > I can reproduce the error on IVB when I disable fast clears (I noticed that
> > the error is not SKL specific - it depends on the fast clears being 
> > disabled).
> Based on the discussion so far, I take it that it's unlikely that
> we'll revert the commit at this point ?
> Should we drop this patch for now ?
> 
> Thanks
> Emil

Yes. So far as we can tell, the regression is real and fixed by the revert - but
it's only the one test, and only on SKL. The bug is being tracked here:
https://bugs.freedesktop.org/show_bug.cgi?id=91926

Topi and I discussed a few things...
1. Merging the patch.
2. Merging the patch to stable only
3. Doing nothing. There is a bugzilla people can find.

I am in favor of #3
___
mesa-dev mailing l

Re: [Mesa-dev] [PATCH v5 0/3] A couple of fixes for Tex(Sub)Image error checks

2015-09-24 Thread Jason Ekstrand
On Thu, Sep 24, 2015 at 1:57 AM, Eduardo Lima Mitev  wrote:
> This is a new version of the series that attempt to fix the regression 
> reported at:
>
> https://bugs.freedesktop.org/show_bug.cgi?id=91582
>
> The review by Jason helped me uncover the fact that the following 3 dEQP 
> tests are buggy:
>
> dEQP-GLES2.functional.negative_api.texture.texsubimage2d_neg_offset
> dEQP-GLES2.functional.negative_api.texture.texsubimage2d_offset_allowed
> dEQP-GLES2.functional.negative_api.texture.texsubimage2d_neg_wdt_hgt
>
> So the patch split I did in the previous version of the series (v4) was 
> actually not necessary. It was just a work-around to the failure of these 
> tests, which this series uncovered.
>
> Now in this new version, I dropped the splitted patch and filed a bug against 
> dEQP (together with a reference patch) to fix the above tests, which will 
> start to fail once/if we merge this series.
>
> "[dEQP] Buggy negative API tests that check dimensions args of 
> glTexSubImage2D" 
> 
>
> I filed it against the AOSP project, where dEQP package is (under 
> external/deqp). Lets see if that was correct.
>
> Mark, in the mean time we can probably apply the patch I attached to the bug 
> report, otherwise the regression originally reported won't go away. What do 
> you think?
>
> Notice that first two patches has R-b from Jason already. Only the patch 3/3 
> is pending review.
>
> The question that remains is whether I should cc Mesa 10.6 stable too, apart 
> from 11.0.

I don't know how many 10.6 releases we're still going to have but
please double-check that we get 11.0 before we push.
--Jason

> cheers,
>
> Eduardo Lima Mitev (3):
>   mesa: Fix order of format+type and internal format checks for
> glTexImageXD ops
>   mesa: Move _mesa_base_tex_format() from teximage to glformats files
>   mesa: Use the effective internal format instead for validation
>
>  src/mesa/main/glformats.c | 656 
> ++
>  src/mesa/main/glformats.h |   2 +
>  src/mesa/main/teximage.c  | 415 ++---
>  src/mesa/main/teximage.h  |   4 -
>  4 files changed, 683 insertions(+), 394 deletions(-)
>
> --
> 2.4.6
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [Mesa-stable] [PATCH] i965/skl: Use larger URB size where available.

2015-09-24 Thread Ben Widawsky
On Thu, Sep 24, 2015 at 11:54:57AM +0100, Emil Velikov wrote:
> Hi Ben,
> 
> On 11 September 2015 at 20:15, Ben Widawsky  wrote:
> > On Fri, Sep 11, 2015 at 12:12:15PM -0700, Jordan Justen wrote:
> >> On 2015-09-10 16:59:12, Ben Widawsky wrote:
> >> > All SKL SKUs except the lowest one which has half the L3 size actually 
> >> > have 384K
> >>
> >> These commit message lines seem to wrap a bit long. This first line is
> >> 80 characters.
> >>
> >> > of URB per slice.
> >> >
> >> > For once, I can explain how this mistake was made and how it was missed 
> >> > in
> >> > review...  Historically when we enable a platform and put the production 
> >> > sizes,
> >> > you can simply look at the "smallest" SKU and see what its URB size is 
> >> > (and we
> >> > assumed it was the 1 slice variant). Since on newer platforms the URB 
> >> > sizes are
> >> > scaled automatically by HW, this was sufficient. On SKL, this is a bit 
> >> > different
> >> > as the lowest SKU actually has half of the L3 fused off. GT2 is the 1 
> >> > slice (not
> >> > GT1) variant and it has 384K.
> >> >
> >> > There are no Jenkins tests fixed (or regressions) and we don't expect 
> >> > any fixes
> >> > here because you can always run with less URB size - this potentially 
> >> > improves
> >> > performance.
> >>
> >> It would be nice if we were able to find a benchmark that improves
> >> from this change. If we can't then maybe we should just remove this
> >> paragraph. It seems like the right change regardless.
> >>
> >> Reviewed-by: Jordan Justen 
> >
> > I think what I'd like to do is run the perf data to make sure there are at 
> > least
> > no regressions since I am proposing it for stable... Maybe if I don't get 
> > around
> > to that before the next stable release, we'll bail on it for 10.6
> >
> Did you get the chance to give this a perf test ? I don't see the
> commit landing in master, regardless of the mesa-stable tag.
> 
> Thanks
> Emil

For other reasons I was unable to collect sufficient performance data. The
commit is in master (c1e38ad37042b0ec261eb0ba5631b7ff0ee7a9da). I think without
performance data we should probably leave it out of stable.

As a side note, if this patch does anything but help our current workloads it'd
be really interesting...
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/8] radeonsi: Enable DCC.

2015-09-24 Thread Marek Olšák
On Thu, Sep 24, 2015 at 2:15 PM, Bas Nieuwenhuizen
 wrote:
> Hi Marek,
>
> Thanks for the review and my apologies, it seems I underestimated the
> potential for regressions in this series.
>
> See below for some comments in reaction to the review.
>
> For a v2 is it preferred that I rebase it to master, or keep basing it on the
> old version? There are some function renames that impact this series.

We'd like to merge this eventually, so rebasing would be nice.

>
> On Thursday, September 24, 2015 01:36:31 AM Marek Olšák wrote:
>> On Fri, Sep 4, 2015 at 9:47 PM, Bas Nieuwenhuizen
>>
>>  wrote:
>> > The flags to be enabled in the control registers have been taken from
>> > Catalyst traces.
>> >
>> > Signed-off-by: Bas Nieuwenhuizen 
>> > ---
>> >
>> >  src/gallium/drivers/radeon/r600_pipe_common.h |  1 +
>> >  src/gallium/drivers/radeon/r600_texture.c |  2 ++
>> >  src/gallium/drivers/radeon/r600d_common.h |  1 +
>> >  src/gallium/drivers/radeonsi/si_blit.c| 16 
>> >  src/gallium/drivers/radeonsi/si_pipe.c|  2 ++
>> >  src/gallium/drivers/radeonsi/si_pipe.h|  1 +
>> >  src/gallium/drivers/radeonsi/si_state.c   | 27
>> >  +++ src/gallium/drivers/radeonsi/sid.h
>> >   |  1 +
>> >  8 files changed, 43 insertions(+), 8 deletions(-)
>> >
>> > diff --git a/src/gallium/drivers/radeon/r600_pipe_common.h
>> > b/src/gallium/drivers/radeon/r600_pipe_common.h index f05318f..dac270e
>> > 100644
>> > --- a/src/gallium/drivers/radeon/r600_pipe_common.h
>> > +++ b/src/gallium/drivers/radeon/r600_pipe_common.h
>> > @@ -244,6 +244,7 @@ struct r600_surface {
>> >
>> > unsigned cb_color_dim;  /* EG only */
>> > unsigned cb_color_pitch;/* EG and later */
>> > unsigned cb_color_slice;/* EG and later */
>> >
>> > +   unsigned cb_dcc_base;   /* VI and later */
>> >
>> > unsigned cb_color_attrib;   /* EG and later */
>> > unsigned cb_dcc_control;/* VI and later */
>> > unsigned cb_color_fmask;/* CB_COLORn_FMASK (EG and later)
>> > or CB_COLORn_FRAG (r600) */>
>> > diff --git a/src/gallium/drivers/radeon/r600_texture.c
>> > b/src/gallium/drivers/radeon/r600_texture.c index 46e735e..017f5e7 100644
>> > --- a/src/gallium/drivers/radeon/r600_texture.c
>> > +++ b/src/gallium/drivers/radeon/r600_texture.c
>> > @@ -495,6 +495,8 @@ static void vi_texture_alloc_dcc_separate(struct
>> > r600_common_screen *rscreen,>
>> > r600_screen_clear_buffer(rscreen, &rtex->dcc_buffer->b.b, 0,
>> > rtex->surface.dcc_size,>
>> >  0x, true);
>> >
>> > +
>> > +   rtex->cb_color_info |= VI_S_028C70_DCC_ENABLE(1);
>> >
>> >  }
>> >
>> >  static unsigned r600_texture_get_htile_size(struct r600_common_screen
>> >  *rscreen,>
>> > diff --git a/src/gallium/drivers/radeon/r600d_common.h
>> > b/src/gallium/drivers/radeon/r600d_common.h index 115042d..a3d182c 100644
>> > --- a/src/gallium/drivers/radeon/r600d_common.h
>> > +++ b/src/gallium/drivers/radeon/r600d_common.h
>> > @@ -202,6 +202,7 @@
>> >
>> >  #define   EG_S_028C70_FAST_CLEAR(x)   (((x) & 0x1) <<
>> >  17) #define   SI_S_028C70_FAST_CLEAR(x)   (((x) &
>> >  0x1) << 13)>
>> > +#define   VI_S_028C70_DCC_ENABLE(x)   (((x) & 0x1) <<
>> > 28)>
>> >  /*CIK+*/
>> >  #define R_0300FC_CP_STRMOUT_CNTL0x0300FC
>> >
>> > diff --git a/src/gallium/drivers/radeonsi/si_blit.c
>> > b/src/gallium/drivers/radeonsi/si_blit.c index 13bb457..98913e5 100644
>> > --- a/src/gallium/drivers/radeonsi/si_blit.c
>> > +++ b/src/gallium/drivers/radeonsi/si_blit.c
>> > @@ -264,7 +264,7 @@ static void si_blit_decompress_color(struct
>> > pipe_context *ctx,>
>> > return;
>> >
>> > for (level = first_level; level <= last_level; level++) {
>> >
>> > -   if (!(rtex->dirty_level_mask & (1 << level)))
>> > +   if (!(rtex->dirty_level_mask & (1 << level)) &&
>> > !(rtex->dcc_compressed_level_mask & (1 << level)))>
>> > continue;
>> >
>> > /* The smaller the mipmap level, the less layers there are
>> >
>> > @@ -274,6 +274,7 @@ static void si_blit_decompress_color(struct
>> > pipe_context *ctx,>
>> > for (layer = first_layer; layer <= checked_last_layer;
>> > layer++) {
>> >
>> > struct pipe_surface *cbsurf, surf_tmpl;
>> >
>> > +   void * custom_blend;
>> >
>> > surf_tmpl.format = rtex->resource.b.b.format;
>> > surf_tmpl.u.tex.level = level;
>> >
>> > @@ -281,10 +282,17 @@ static void si_blit_decompress_color(struct
>> > pipe_context *ctx,>
>> > surf_tmpl.u.tex.last_layer = layer;
>> > cbsurf = ctx->create_surface(ctx,
>> > &rtex->resource.b.b

[Mesa-dev] [PATCH 2/2] radeon/llvm: Initialize gallivm targets when initializing the AMDGPU target

2015-09-24 Thread Tom Stellard
This fixes a race condition in the glx-multithreaded-shader-compile
test.

CC: "10.6 11.0" 
---
 src/gallium/drivers/radeon/radeon_llvm_emit.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/src/gallium/drivers/radeon/radeon_llvm_emit.c 
b/src/gallium/drivers/radeon/radeon_llvm_emit.c
index 0002559..045665a 100644
--- a/src/gallium/drivers/radeon/radeon_llvm_emit.c
+++ b/src/gallium/drivers/radeon/radeon_llvm_emit.c
@@ -25,6 +25,7 @@
  */
 #include "radeon_llvm_emit.h"
 #include "radeon_elf_util.h"
+#include "gallivm/lp_bld_misc.h"
 #include "util/u_memory.h"
 #include "pipe/p_shader_tokens.h"
 
@@ -88,6 +89,7 @@ static void init_r600_target()
 {
static unsigned initialized = 0;
if (!initialized) {
+   gallivm_init_llvm_begin();
 #if HAVE_LLVM < 0x0307
LLVMInitializeR600TargetInfo();
LLVMInitializeR600Target();
@@ -101,6 +103,7 @@ static void init_r600_target()
 
 #endif
initialized = 1;
+   gallivm_init_llvm_end();
}
 }
 
-- 
2.0.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 1/2] gallium/hud: temperature is displayed with a percentage symbol, remove it

2015-09-24 Thread Marek Olšák
Pushed this patch. Thanks.

BTW, I don't think this is a candidate for stable.

Marek

On Fri, Sep 4, 2015 at 8:27 PM, Benjamin Bellec  wrote:
> Signed-off-by: Benjamin Bellec 
> ---
>  src/gallium/drivers/radeon/r600_pipe_common.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/src/gallium/drivers/radeon/r600_pipe_common.c 
> b/src/gallium/drivers/radeon/r600_pipe_common.c
> index ed5d1da..2a8d906 100644
> --- a/src/gallium/drivers/radeon/r600_pipe_common.c
> +++ b/src/gallium/drivers/radeon/r600_pipe_common.c
> @@ -722,7 +722,7 @@ static int r600_get_driver_query_info(struct pipe_screen 
> *screen,
> {"VRAM-usage", R600_QUERY_VRAM_USAGE, 
> {rscreen->info.vram_size}, PIPE_DRIVER_QUERY_TYPE_BYTES},
> {"GTT-usage", R600_QUERY_GTT_USAGE, 
> {rscreen->info.gart_size}, PIPE_DRIVER_QUERY_TYPE_BYTES},
> {"GPU-load", R600_QUERY_GPU_LOAD, {100}},
> -   {"temperature", R600_QUERY_GPU_TEMPERATURE, {100}},
> +   {"temperature", R600_QUERY_GPU_TEMPERATURE, {125}},
> {"shader-clock", R600_QUERY_CURRENT_GPU_SCLK, {0}, 
> PIPE_DRIVER_QUERY_TYPE_HZ},
> {"memory-clock", R600_QUERY_CURRENT_GPU_MCLK, {0}, 
> PIPE_DRIVER_QUERY_TYPE_HZ},
> };
> --
> 2.4.3
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] gallium: remove support for predicates from TGSI

2015-09-24 Thread Marek Olšák
On Wed, Sep 2, 2015 at 12:55 AM, Jose Fonseca  wrote:
> I'm all for TGIS simplification, but I just checked, and we rely on TGSI
> predicates for some of our internal state trackers.  So I'll need more time
> to evaluate how much effort it would be to not rely on it, and when can it
> be done.

Any news? :)

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 91596] EGL_KHR_gl_colorspace (v2) causes problem with Android-x86 GUI

2015-09-24 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=91596

--- Comment #4 from Marek Olšák  ---
What happens if you hide the extension on Android?

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] android: radeonsi: fix sid_tables.h missing LOCAL_MODULE_CLASS

2015-09-24 Thread Marek Olšák
Pushed, thanks.

Marek

On Wed, Sep 23, 2015 at 10:30 PM, Mauro Rossi  wrote:
> ---
>  src/gallium/drivers/radeonsi/Android.mk | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/src/gallium/drivers/radeonsi/Android.mk 
> b/src/gallium/drivers/radeonsi/Android.mk
> index b469aca..7e5e54b 100644
> --- a/src/gallium/drivers/radeonsi/Android.mk
> +++ b/src/gallium/drivers/radeonsi/Android.mk
> @@ -34,6 +34,7 @@ LOCAL_SHARED_LIBRARIES := libdrm libdrm_radeon
>  LOCAL_MODULE := libmesa_pipe_radeonsi
>
>  # generate sources
> +LOCAL_MODULE_CLASS := STATIC_LIBRARIES
>  intermediates := $(call local-generated-sources-dir)
>  LOCAL_GENERATED_SOURCES := $(addprefix $(intermediates)/, 
> $(GENERATED_SOURCES))
>
> --
> 2.1.4
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/2] gallivm: Allow drivers and state trackers to initialize gallivm LLVM targets

2015-09-24 Thread Tom Stellard
Drivers and state trackers that use LLVM for generating code, must
register the targets they use with LLVM's global TargetRegistry.
The TargetRegistry is not thread-safe, so all targets must be added
to the registry before it can be queried for target information.

When drivers and state trackers initialize their own targets, they need
a way to force gallivm to initialize its targets at the same time.
Otherwise, there can be a race condition in some multi-threaded
applications (e.g. glx-multihreaded-shader-compile in piglit),
when one thread creates a context for a driver that uses LLVM (e.g.
radeonsi) and another thread creates a gallivm context (glxContextCreate
does this).

The race happens when the driver thread initializes its LLVM targets and
then starts using the registry before the gallivm thread has a chance to
register its targets.

This patch allows users to force gallivm to register its targets by
adding two new functions:  gallivm_llvm_init_begin() which locks a mutex
guarding the TargetRegistry and initializes the gallivm targets, and
gallivm_llvm_init_end() which unlocks the mutex.

Drivers and state trackers should use this sequence to correctly
initialize the LLVM targets they use:

gallivm_init_llvm_begin();
LLVMInitializeFooTarget();
gallivm_init_llvm_end();

CC: "10.6 11.0" 
---
 src/gallium/auxiliary/gallivm/lp_bld_misc.cpp | 68 ---
 src/gallium/auxiliary/gallivm/lp_bld_misc.h   |  5 ++
 2 files changed, 66 insertions(+), 7 deletions(-)

diff --git a/src/gallium/auxiliary/gallivm/lp_bld_misc.cpp 
b/src/gallium/auxiliary/gallivm/lp_bld_misc.cpp
index 5e25819..8bf833e 100644
--- a/src/gallium/auxiliary/gallivm/lp_bld_misc.cpp
+++ b/src/gallium/auxiliary/gallivm/lp_bld_misc.cpp
@@ -81,6 +81,7 @@
 #  pragma pop_macro("DEBUG")
 #endif
 
+#include "os/os_thread.h"
 #include "pipe/p_config.h"
 #include "util/u_debug.h"
 #include "util/u_cpu_detect.h"
@@ -101,6 +102,64 @@ public:
 
 static LLVMEnsureMultithreaded lLVMEnsureMultithreaded;
 
+struct LLVMRegistryLock {
+private:
+   pipe_mutex mutex;
+
+public:
+   LLVMRegistryLock()
+   {
+  pipe_mutex_init(mutex);
+   }
+
+   void lock()
+   {
+  pipe_mutex_lock(mutex);
+   }
+
+   void unlock()
+   {
+  pipe_mutex_unlock(mutex);
+   }
+};
+
+static LLVMRegistryLock gallivm_llvm_registry_lock;
+
+}
+
+/**
+ * The llvm target registry is not thread-safe, so drivers and state-trackers
+ * that want to initialize targets should use the gallivm_init_llvm_begin() and
+ * gallivm_init_llvm_end() functions to safely initialize targets.  For 
example:
+ *
+ * gallivm_init_llvm_begin();
+ * LLVMInitializeFooTarget();
+ * gallivm_init_llvm_end();
+ *
+ * LLVM targets should be initialized before the driver tries to access the
+ * registry.
+ */
+extern "C" void
+gallivm_init_llvm_begin(void)
+{
+   static int init = 0;
+   gallivm_llvm_registry_lock.lock();
+   if (!init) {
+  // If we have a native target, initialize it to ensure it is linked in 
and
+  // usable by the JIT.
+  llvm::InitializeNativeTarget();
+
+  llvm::InitializeNativeTargetAsmPrinter();
+
+  llvm::InitializeNativeTargetDisassembler();
+  init = 1;
+}
+}
+
+extern "C" void
+gallivm_init_llvm_end(void)
+{
+   gallivm_llvm_registry_lock.unlock();
 }
 
 extern "C" void
@@ -115,13 +174,8 @@ lp_set_target_options(void)
llvm::DisablePrettyStackTrace = true;
 #endif
 
-   // If we have a native target, initialize it to ensure it is linked in and
-   // usable by the JIT.
-   llvm::InitializeNativeTarget();
-
-   llvm::InitializeNativeTargetAsmPrinter();
-
-   llvm::InitializeNativeTargetDisassembler();
+   gallivm_init_llvm_begin();
+   gallivm_init_llvm_end();
 }
 
 
diff --git a/src/gallium/auxiliary/gallivm/lp_bld_misc.h 
b/src/gallium/auxiliary/gallivm/lp_bld_misc.h
index 36923aa..f87bb24 100644
--- a/src/gallium/auxiliary/gallivm/lp_bld_misc.h
+++ b/src/gallium/auxiliary/gallivm/lp_bld_misc.h
@@ -41,6 +41,11 @@ extern "C" {
 
 struct lp_generated_code;
 
+extern void
+gallivm_init_llvm_begin(void);
+
+extern void
+gallivm_init_llvm_end(void);
 
 extern void
 lp_set_target_options(void);
-- 
2.0.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/3] gallium/radeon: add separate stencil level dirty flags

2015-09-24 Thread Marek Olšák
On Wed, Sep 9, 2015 at 10:07 AM, Michel Dänzer  wrote:
> On 07.09.2015 07:17, Marek Olšák wrote:
>> From: Marek Olšák 
>>
>> We will only do depth-only or stencil-only decompress blits, whichever is
>> needed by textures, instead of always doing both.
>> ---
>>  src/gallium/drivers/r600/evergreen_state.c| 4 ++--
>>  src/gallium/drivers/r600/r600_state_common.c  | 3 +++
>>  src/gallium/drivers/radeon/r600_pipe_common.h | 1 +
>>  src/gallium/drivers/radeonsi/cik_sdma.c   | 4 ++--
>>  src/gallium/drivers/radeonsi/si_dma.c | 4 ++--
>>  src/gallium/drivers/radeonsi/si_state_draw.c  | 3 +++
>>  6 files changed, 13 insertions(+), 6 deletions(-)
>>
>> diff --git a/src/gallium/drivers/r600/evergreen_state.c 
>> b/src/gallium/drivers/r600/evergreen_state.c
>> index 6f4cb55..8ecc498 100644
>> --- a/src/gallium/drivers/r600/evergreen_state.c
>> +++ b/src/gallium/drivers/r600/evergreen_state.c
>> @@ -3372,11 +3372,11 @@ static void evergreen_dma_copy(struct pipe_context 
>> *ctx,
>>   }
>>
>>   if (src->format != dst->format || src_box->depth > 1 ||
>> - rdst->dirty_level_mask != 0) {
>> + (rdst->dirty_level_mask | rdst->stencil_dirty_level_mask) & (1 << 
>> dst_level)) {
>>   goto fallback;
>>   }
>>
>> - if (rsrc->dirty_level_mask) {
>> + if ((rsrc->dirty_level_mask | rsrc->stencil_dirty_level_mask) & (1 << 
>> src_level)) {
>>   ctx->flush_resource(ctx, src);
>>   }
>
> AFAICT ctx->flush_resource only decompresses colour resources, so is the
> second change really necessary? Same in si_dma_copy().

Yes, you're right. I'll fix that.

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v4] i965/vec4: check swizzle before discarding a uniform on a 3src operand

2015-09-24 Thread Alejandro Piñeiro
Without this commit, copy propagation is discarded if it involves
a uniform with an instruction that has 3 sources. But 3 sourced
instructions can access scalar values.

For example, this is what vec4_visitor::fix_3src_operand() is already
doing:

   if (src.file == UNIFORM && brw_is_single_value_swizzle(src.swizzle))
  return src;

Shader-db results (unfiltered) on NIR:
total instructions in shared programs: 6259650 -> 6241985 (-0.28%)
instructions in affected programs: 812755 -> 795090 (-2.17%)
helped:7930
HURT:  0

Shader-db results (unfiltered) on IR:
total instructions in shared programs: 6445822 -> 6441788 (-0.06%)
instructions in affected programs: 296630 -> 292596 (-1.36%)
helped:2533
HURT:  0

v2:
- Updated commit message, using Matt Turner suggestions
- Move the check after we've created the final value, as Jason
  Ekstrand suggested
- Clean up the condition

v3:
- Move the check back to the original place, to keep things
  tidy, as suggested by Jason Ekstrand

v4:
- Fixed missing is_single_value_swizzle() as pointed by Jason Ekstrand
---

> > +   value.file == UNIFORM &&
> > +   !composed_swizzle)

> Where did the is_single_value_swizzle() go?

I cleaned too much. Sorry for the noise.


 src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp
index d3f0ddd..5b6444e 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp
@@ -325,7 +325,11 @@ try_copy_propagate(const struct brw_device_info *devinfo,
inst->opcode == SHADER_OPCODE_GEN4_SCRATCH_WRITE)
   return false;
 
-   if (inst->is_3src() && value.file == UNIFORM)
+   unsigned composed_swizzle = brw_compose_swizzle(inst->src[arg].swizzle,
+   value.swizzle);
+   if (inst->is_3src() &&
+   value.file == UNIFORM &&
+   !brw_is_single_value_swizzle(composed_swizzle))
   return false;
 
if (inst->is_send_from_grf())
@@ -380,8 +384,7 @@ try_copy_propagate(const struct brw_device_info *devinfo,
if (inst->src[arg].negate)
   value.negate = !value.negate;
 
-   value.swizzle = brw_compose_swizzle(inst->src[arg].swizzle,
-   value.swizzle);
+   value.swizzle = composed_swizzle;
if (has_source_modifiers &&
value.type != inst->src[arg].type) {
   assert(can_change_source_types(inst));
-- 
2.1.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v4] i965/vec4: check swizzle before discarding a uniform on a 3src operand

2015-09-24 Thread Matt Turner
Reviewed-by: Matt Turner 

Thanks!
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] radeon/llvm: Initialize gallivm targets when initializing the AMDGPU target

2015-09-24 Thread Marek Olšák
For the series:

Reviewed-by: Marek Olšák 

Marek

On Thu, Sep 24, 2015 at 6:32 PM, Tom Stellard  wrote:
> This fixes a race condition in the glx-multithreaded-shader-compile
> test.
>
> CC: "10.6 11.0" 
> ---
>  src/gallium/drivers/radeon/radeon_llvm_emit.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/src/gallium/drivers/radeon/radeon_llvm_emit.c 
> b/src/gallium/drivers/radeon/radeon_llvm_emit.c
> index 0002559..045665a 100644
> --- a/src/gallium/drivers/radeon/radeon_llvm_emit.c
> +++ b/src/gallium/drivers/radeon/radeon_llvm_emit.c
> @@ -25,6 +25,7 @@
>   */
>  #include "radeon_llvm_emit.h"
>  #include "radeon_elf_util.h"
> +#include "gallivm/lp_bld_misc.h"
>  #include "util/u_memory.h"
>  #include "pipe/p_shader_tokens.h"
>
> @@ -88,6 +89,7 @@ static void init_r600_target()
>  {
> static unsigned initialized = 0;
> if (!initialized) {
> +   gallivm_init_llvm_begin();
>  #if HAVE_LLVM < 0x0307
> LLVMInitializeR600TargetInfo();
> LLVMInitializeR600Target();
> @@ -101,6 +103,7 @@ static void init_r600_target()
>
>  #endif
> initialized = 1;
> +   gallivm_init_llvm_end();
> }
>  }
>
> --
> 2.0.4
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/8] radeonsi: Enable DCC.

2015-09-24 Thread Bas Nieuwenhuizen
On Thursday, September 24, 2015 07:24:50 PM Marek Olšák wrote:
> On Thu, Sep 24, 2015 at 2:15 PM, Bas Nieuwenhuizen
> 
>  wrote:
> > Hi Marek,
> > 
> > Thanks for the review and my apologies, it seems I underestimated the
> > potential for regressions in this series.
> > 
> > See below for some comments in reaction to the review.
> > 
> > For a v2 is it preferred that I rebase it to master, or keep basing it on
> > the old version? There are some function renames that impact this series.
> We'd like to merge this eventually, so rebasing would be nice.
> 
> > On Thursday, September 24, 2015 01:36:31 AM Marek Olšák wrote:
> >> On Fri, Sep 4, 2015 at 9:47 PM, Bas Nieuwenhuizen
> >> 
> >>  wrote:
> >> > The flags to be enabled in the control registers have been taken from
> >> > Catalyst traces.
> >> > 
> >> > Signed-off-by: Bas Nieuwenhuizen 
> >> > ---
> >> > 
> >> >  src/gallium/drivers/radeon/r600_pipe_common.h |  1 +
> >> >  src/gallium/drivers/radeon/r600_texture.c |  2 ++
> >> >  src/gallium/drivers/radeon/r600d_common.h |  1 +
> >> >  src/gallium/drivers/radeonsi/si_blit.c| 16 
> >> >  src/gallium/drivers/radeonsi/si_pipe.c|  2 ++
> >> >  src/gallium/drivers/radeonsi/si_pipe.h|  1 +
> >> >  src/gallium/drivers/radeonsi/si_state.c   | 27
> >> >  +++ src/gallium/drivers/radeonsi/sid.h
> >> >  
> >> >   |  1 +
> >> >  
> >> >  8 files changed, 43 insertions(+), 8 deletions(-)
> >> > 
> >> > diff --git a/src/gallium/drivers/radeon/r600_pipe_common.h
> >> > b/src/gallium/drivers/radeon/r600_pipe_common.h index f05318f..dac270e
> >> > 100644
> >> > --- a/src/gallium/drivers/radeon/r600_pipe_common.h
> >> > +++ b/src/gallium/drivers/radeon/r600_pipe_common.h
> >> > @@ -244,6 +244,7 @@ struct r600_surface {
> >> > 
> >> > unsigned cb_color_dim;  /* EG only */
> >> > unsigned cb_color_pitch;/* EG and later */
> >> > unsigned cb_color_slice;/* EG and later */
> >> > 
> >> > +   unsigned cb_dcc_base;   /* VI and later */
> >> > 
> >> > unsigned cb_color_attrib;   /* EG and later */
> >> > unsigned cb_dcc_control;/* VI and later */
> >> > unsigned cb_color_fmask;/* CB_COLORn_FMASK (EG and
> >> > later)
> >> > or CB_COLORn_FRAG (r600) */>
> >> > 
> >> > diff --git a/src/gallium/drivers/radeon/r600_texture.c
> >> > b/src/gallium/drivers/radeon/r600_texture.c index 46e735e..017f5e7
> >> > 100644
> >> > --- a/src/gallium/drivers/radeon/r600_texture.c
> >> > +++ b/src/gallium/drivers/radeon/r600_texture.c
> >> > @@ -495,6 +495,8 @@ static void vi_texture_alloc_dcc_separate(struct
> >> > r600_common_screen *rscreen,>
> >> > 
> >> > r600_screen_clear_buffer(rscreen, &rtex->dcc_buffer->b.b, 0,
> >> > rtex->surface.dcc_size,>
> >> > 
> >> >  0x, true);
> >> > 
> >> > +
> >> > +   rtex->cb_color_info |= VI_S_028C70_DCC_ENABLE(1);
> >> > 
> >> >  }
> >> >  
> >> >  static unsigned r600_texture_get_htile_size(struct r600_common_screen
> >> >  *rscreen,>
> >> > 
> >> > diff --git a/src/gallium/drivers/radeon/r600d_common.h
> >> > b/src/gallium/drivers/radeon/r600d_common.h index 115042d..a3d182c
> >> > 100644
> >> > --- a/src/gallium/drivers/radeon/r600d_common.h
> >> > +++ b/src/gallium/drivers/radeon/r600d_common.h
> >> > @@ -202,6 +202,7 @@
> >> > 
> >> >  #define   EG_S_028C70_FAST_CLEAR(x)   (((x) & 0x1)
> >> >  <<
> >> >  17) #define   SI_S_028C70_FAST_CLEAR(x)   (((x) &
> >> >  0x1) << 13)>
> >> > 
> >> > +#define   VI_S_028C70_DCC_ENABLE(x)   (((x) & 0x1)
> >> > <<
> >> > 28)>
> >> > 
> >> >  /*CIK+*/
> >> >  #define R_0300FC_CP_STRMOUT_CNTL0x0300FC
> >> > 
> >> > diff --git a/src/gallium/drivers/radeonsi/si_blit.c
> >> > b/src/gallium/drivers/radeonsi/si_blit.c index 13bb457..98913e5 100644
> >> > --- a/src/gallium/drivers/radeonsi/si_blit.c
> >> > +++ b/src/gallium/drivers/radeonsi/si_blit.c
> >> > @@ -264,7 +264,7 @@ static void si_blit_decompress_color(struct
> >> > pipe_context *ctx,>
> >> > 
> >> > return;
> >> > 
> >> > for (level = first_level; level <= last_level; level++) {
> >> > 
> >> > -   if (!(rtex->dirty_level_mask & (1 << level)))
> >> > +   if (!(rtex->dirty_level_mask & (1 << level)) &&
> >> > !(rtex->dcc_compressed_level_mask & (1 << level)))>
> >> > 
> >> > continue;
> >> > 
> >> > /* The smaller the mipmap level, the less layers there
> >> > are
> >> > 
> >> > @@ -274,6 +274,7 @@ static void si_blit_decompress_color(struct
> >> > pipe_context *ctx,>
> >> > 
> >> > for (layer = first_layer; layer <= checked_last_layer;
> >> > layer++) {
> >> > 
> >> > struct pipe_s

Re: [Mesa-dev] [Mesa-stable] New stable-branch 11.0 candidate pushed

2015-09-24 Thread Mark Janes
Can you please add this commit to the branch for 11.0.1?

commit 2ea16966ae66d4dd5c5dcb996d7996d9c734bbee
Author: Kristian Høgsberg Kristensen 
Date:   Wed Sep 23 16:57:47 2015 -0700

i965: Respect stride and subreg_offset for ATTR registers

It fixes a Braswell dEQP regression.

thanks,

Mark

Emil Velikov  writes:

> Hello list,
>
> The candidate for the Mesa 11.0.1 is now available. Currently we have:
>  - 23 queued
>  - 29 nominated (outstanding)
>  - and 0 rejected/obsolete patches
>
> The present queue consists mostly of nouveau and i965 fixes, although we
> have the odd llvmpipe (big endian) and gbm bugfix.
>
>
> Note: we do have a sizeable list of nominated patches that have not been
> queued. Many of these are lacking review while the odd ones have gone
> stale despite the review and/or feedback given.
>
>
> Take a look at section "Mesa stable queue" for more information.
>
> Testing
> ---
> The following results are against piglit 4b6848c131c.
>
>
> Changes - classic i965(snb)
> ---
> None.
>
>
> Changes - swrast classic
> 
> None.
>
>
> Changes - gallium softpipe
> --
> None.
>
>
> Changes - gallium llvmpipe (LLVM 3.6.2)
> ---
> None.
>
>
> Testing reports/general approval
> 
> Any testing reports (or general approval of the state of the branch)
> will be greatly appreciated.
>
>
> Trivial merge conflicts
> ---
> None.
>
>
> The plan is to have 11.0.1 this Friday (25th of September) or shortly after.
>
> If you have any questions or comments that you would like to share
> before the release, please go ahead.
>
>
> Cheers,
> Emil
>
>
> Mesa stable queue
> -
>
> Nominated (29)
> ==
>
> Alejandro Piñeiro (2):
>   i965/vec4: check writemask when bailing out at register coalesce
>   i965/vec4: fill src_reg type using the constructor type parameter
>
> Anuj Phogat (1):
>   i965: Abort tiled_memcpy path for TexSubImage in case of transfer 
> operations
>
> Ben Widawsky (2):
>   Revert "i965: Stop aux data compare preventing program binary re-use"
>   i965/skl: Use larger URB size where available.
>
> Benjamin Bellec (2):
>   gallium/hud: temperature is displayed with a percentage symbol, remove 
> it
>   gallium/hud: display the Celsius temperature unit
>
> Boyan Ding (1):
>   i915: Add XRGB format to intel_screen_make_configs
>
> Brian Paul (1):
>   configure: don't try to build gallium DRI drivers if --disable-dri is 
> set
>
> Eduardo Lima Mitev (2):
>   mesa: Fix order of format+type and internal format checks for 
> glTexImageXD ops
>   mesa: Move _mesa_base_tex_format() from teximage to glformats files
>
> Francisco Jerez (6):
>   i965: Don't tell the hardware about our UAV access.
>   mesa: Expose function to calculate whether a shader image unit is valid.
>   mesa: Skip redundant texture completeness checking during image 
> validation.
>   i965: Use _mesa_is_image_unit_valid() instead of gl_image_unit::_Valid.
>   mesa: Get rid of texture-dependent image unit derived state.
>   i965/fs: Fix hang on IVB and VLV with image format mismatch.
>
> Ian Romanick (1):
>   meta: Handle array textures in scaled MSAA blits
>
> Jean-Sébastien Pédron (1):
>   ralloc: Use __attribute__((destructor)) instead of atexit(3)
>
> Kristian Høgsberg Kristensen (1):
>   i965: Respect stride and subreg_offset for ATTR registers
>
> Leo Liu (1):
>   radeon/vce: fix vui time_scale zero error
>
> Matthew Waters (1):
>   egl: rework handling EGL_CONTEXT_FLAGS
>
> Matt Turner (1)
>   glsl: Expose gl_MaxTess{Control, Evaluation}AtomicCounters.
>
> Rob Clark (1):
>   xa: add xa_surface_from_handle2
>
> Roland Scheidegger (1):
>   mesa: fix mipmap generation for immutable, compressed textures
>
> Tom Stellard (4):
>   clover: Call clBuildProgram() notification function when build 
> completes v2
>   gallium/drivers: Add threadsafe wrappers for pipe_context v2
>   clover: Use threadsafe wrappers for pipe_context v2
>   clover: Properly initialize LLVM targets when linking with component 
> libs
>
>
>
> Queued (23)
> ===
>
> Antia Puentes (2):
>   i965/vec4: Fix saturation errors when coalescing registers
>   i965/vec4_nir: Load constants as integers
>
> Anuj Phogat (1):
>   meta: Abort meta pbo path if TexSubImage need signed unsigned conversion
>
> Emil Velikov (1):
>   docs: add sha256 checksums for 11.0.0
>
> Iago Toral Quiroga (1):
>   mesa: Fix GL_FRAMEBUFFER_ATTACHMENT_OBJECT_TYPE for default framebuffer.
>
> Ian Romanick (5):
>   t_dd_dmatmp: Make "count" actually be the count
>   t_dd_dmatmp: Clean up improper code formatting from previous patch
>   t_dd_dmatmp: Use '& 3' instead of '% 4' everywhere
>   t_dd_dmatmp: Pull out common 'count -= count & 3' code
>   

Re: [Mesa-dev] [Mesa-stable] New stable-branch 11.0 candidate pushed

2015-09-24 Thread Matt Turner
On Thu, Sep 24, 2015 at 1:18 PM, Mark Janes  wrote:
> Can you please add this commit to the branch for 11.0.1?
>
> commit 2ea16966ae66d4dd5c5dcb996d7996d9c734bbee
> Author: Kristian Høgsberg Kristensen 
> Date:   Wed Sep 23 16:57:47 2015 -0700
>
> i965: Respect stride and subreg_offset for ATTR registers
>
> It fixes a Braswell dEQP regression.

And

commit d6bb46bbe8e4ef90dedc5a04c7434a8113c10a8b
Author: Matt Turner 
Date:   Wed Sep 23 18:06:19 2015 -0700

glsl: Expose gl_MaxTess{Control,Evaluation}AtomicCounters.

as well.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/8] radeonsi: Enable DCC.

2015-09-24 Thread Marek Olšák
On Thu, Sep 24, 2015 at 9:47 PM, Bas Nieuwenhuizen
 wrote:
> On Thursday, September 24, 2015 07:24:50 PM Marek Olšák wrote:
>> On Thu, Sep 24, 2015 at 2:15 PM, Bas Nieuwenhuizen
>>
>>  wrote:
>> > Hi Marek,
>> >
>> > Thanks for the review and my apologies, it seems I underestimated the
>> > potential for regressions in this series.
>> >
>> > See below for some comments in reaction to the review.
>> >
>> > For a v2 is it preferred that I rebase it to master, or keep basing it on
>> > the old version? There are some function renames that impact this series.
>> We'd like to merge this eventually, so rebasing would be nice.
>>
>> > On Thursday, September 24, 2015 01:36:31 AM Marek Olšák wrote:
>> >> On Fri, Sep 4, 2015 at 9:47 PM, Bas Nieuwenhuizen
>> >>
>> >>  wrote:
>> >> > The flags to be enabled in the control registers have been taken from
>> >> > Catalyst traces.
>> >> >
>> >> > Signed-off-by: Bas Nieuwenhuizen 
>> >> > ---
>> >> >
>> >> >  src/gallium/drivers/radeon/r600_pipe_common.h |  1 +
>> >> >  src/gallium/drivers/radeon/r600_texture.c |  2 ++
>> >> >  src/gallium/drivers/radeon/r600d_common.h |  1 +
>> >> >  src/gallium/drivers/radeonsi/si_blit.c| 16 
>> >> >  src/gallium/drivers/radeonsi/si_pipe.c|  2 ++
>> >> >  src/gallium/drivers/radeonsi/si_pipe.h|  1 +
>> >> >  src/gallium/drivers/radeonsi/si_state.c   | 27
>> >> >  +++ src/gallium/drivers/radeonsi/sid.h
>> >> >
>> >> >   |  1 +
>> >> >
>> >> >  8 files changed, 43 insertions(+), 8 deletions(-)
>> >> >
>> >> > diff --git a/src/gallium/drivers/radeon/r600_pipe_common.h
>> >> > b/src/gallium/drivers/radeon/r600_pipe_common.h index f05318f..dac270e
>> >> > 100644
>> >> > --- a/src/gallium/drivers/radeon/r600_pipe_common.h
>> >> > +++ b/src/gallium/drivers/radeon/r600_pipe_common.h
>> >> > @@ -244,6 +244,7 @@ struct r600_surface {
>> >> >
>> >> > unsigned cb_color_dim;  /* EG only */
>> >> > unsigned cb_color_pitch;/* EG and later */
>> >> > unsigned cb_color_slice;/* EG and later */
>> >> >
>> >> > +   unsigned cb_dcc_base;   /* VI and later */
>> >> >
>> >> > unsigned cb_color_attrib;   /* EG and later */
>> >> > unsigned cb_dcc_control;/* VI and later */
>> >> > unsigned cb_color_fmask;/* CB_COLORn_FMASK (EG and
>> >> > later)
>> >> > or CB_COLORn_FRAG (r600) */>
>> >> >
>> >> > diff --git a/src/gallium/drivers/radeon/r600_texture.c
>> >> > b/src/gallium/drivers/radeon/r600_texture.c index 46e735e..017f5e7
>> >> > 100644
>> >> > --- a/src/gallium/drivers/radeon/r600_texture.c
>> >> > +++ b/src/gallium/drivers/radeon/r600_texture.c
>> >> > @@ -495,6 +495,8 @@ static void vi_texture_alloc_dcc_separate(struct
>> >> > r600_common_screen *rscreen,>
>> >> >
>> >> > r600_screen_clear_buffer(rscreen, &rtex->dcc_buffer->b.b, 0,
>> >> > rtex->surface.dcc_size,>
>> >> >
>> >> >  0x, true);
>> >> >
>> >> > +
>> >> > +   rtex->cb_color_info |= VI_S_028C70_DCC_ENABLE(1);
>> >> >
>> >> >  }
>> >> >
>> >> >  static unsigned r600_texture_get_htile_size(struct r600_common_screen
>> >> >  *rscreen,>
>> >> >
>> >> > diff --git a/src/gallium/drivers/radeon/r600d_common.h
>> >> > b/src/gallium/drivers/radeon/r600d_common.h index 115042d..a3d182c
>> >> > 100644
>> >> > --- a/src/gallium/drivers/radeon/r600d_common.h
>> >> > +++ b/src/gallium/drivers/radeon/r600d_common.h
>> >> > @@ -202,6 +202,7 @@
>> >> >
>> >> >  #define   EG_S_028C70_FAST_CLEAR(x)   (((x) & 0x1)
>> >> >  <<
>> >> >  17) #define   SI_S_028C70_FAST_CLEAR(x)   (((x) &
>> >> >  0x1) << 13)>
>> >> >
>> >> > +#define   VI_S_028C70_DCC_ENABLE(x)   (((x) & 0x1)
>> >> > <<
>> >> > 28)>
>> >> >
>> >> >  /*CIK+*/
>> >> >  #define R_0300FC_CP_STRMOUT_CNTL0x0300FC
>> >> >
>> >> > diff --git a/src/gallium/drivers/radeonsi/si_blit.c
>> >> > b/src/gallium/drivers/radeonsi/si_blit.c index 13bb457..98913e5 100644
>> >> > --- a/src/gallium/drivers/radeonsi/si_blit.c
>> >> > +++ b/src/gallium/drivers/radeonsi/si_blit.c
>> >> > @@ -264,7 +264,7 @@ static void si_blit_decompress_color(struct
>> >> > pipe_context *ctx,>
>> >> >
>> >> > return;
>> >> >
>> >> > for (level = first_level; level <= last_level; level++) {
>> >> >
>> >> > -   if (!(rtex->dirty_level_mask & (1 << level)))
>> >> > +   if (!(rtex->dirty_level_mask & (1 << level)) &&
>> >> > !(rtex->dcc_compressed_level_mask & (1 << level)))>
>> >> >
>> >> > continue;
>> >> >
>> >> > /* The smaller the mipmap level, the less layers there
>> >> > are
>> >> >
>> >> > @@ -274,6 +274,7 @@ static void si_blit_decompress_color(struct
>> >> > pipe_context *ctx,>
>> >> >
>> >> > for (layer = first_layer; layer <= checked_

Re: [Mesa-dev] [PATCH v4] i965/vec4: check swizzle before discarding a uniform on a 3src operand

2015-09-24 Thread Jason Ekstrand
On Thu, Sep 24, 2015 at 11:54 AM, Alejandro Piñeiro
 wrote:
> Without this commit, copy propagation is discarded if it involves
> a uniform with an instruction that has 3 sources. But 3 sourced
> instructions can access scalar values.
>
> For example, this is what vec4_visitor::fix_3src_operand() is already
> doing:
>
>if (src.file == UNIFORM && brw_is_single_value_swizzle(src.swizzle))
>   return src;
>
> Shader-db results (unfiltered) on NIR:
> total instructions in shared programs: 6259650 -> 6241985 (-0.28%)
> instructions in affected programs: 812755 -> 795090 (-2.17%)
> helped:7930
> HURT:  0
>
> Shader-db results (unfiltered) on IR:
> total instructions in shared programs: 6445822 -> 6441788 (-0.06%)
> instructions in affected programs: 296630 -> 292596 (-1.36%)
> helped:2533
> HURT:  0
>
> v2:
> - Updated commit message, using Matt Turner suggestions
> - Move the check after we've created the final value, as Jason
>   Ekstrand suggested
> - Clean up the condition
>
> v3:
> - Move the check back to the original place, to keep things
>   tidy, as suggested by Jason Ekstrand
>
> v4:
> - Fixed missing is_single_value_swizzle() as pointed by Jason Ekstrand

Good work!

Reviewed-by: Jason Ekstrand 

> ---
>
>> > +   value.file == UNIFORM &&
>> > +   !composed_swizzle)
>
>> Where did the is_single_value_swizzle() go?
>
> I cleaned too much. Sorry for the noise.
>
>
>  src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp | 9 ++---
>  1 file changed, 6 insertions(+), 3 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp 
> b/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp
> index d3f0ddd..5b6444e 100644
> --- a/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp
> @@ -325,7 +325,11 @@ try_copy_propagate(const struct brw_device_info *devinfo,
> inst->opcode == SHADER_OPCODE_GEN4_SCRATCH_WRITE)
>return false;
>
> -   if (inst->is_3src() && value.file == UNIFORM)
> +   unsigned composed_swizzle = brw_compose_swizzle(inst->src[arg].swizzle,
> +   value.swizzle);
> +   if (inst->is_3src() &&
> +   value.file == UNIFORM &&
> +   !brw_is_single_value_swizzle(composed_swizzle))
>return false;
>
> if (inst->is_send_from_grf())
> @@ -380,8 +384,7 @@ try_copy_propagate(const struct brw_device_info *devinfo,
> if (inst->src[arg].negate)
>value.negate = !value.negate;
>
> -   value.swizzle = brw_compose_swizzle(inst->src[arg].swizzle,
> -   value.swizzle);
> +   value.swizzle = composed_swizzle;
> if (has_source_modifiers &&
> value.type != inst->src[arg].type) {
>assert(can_change_source_types(inst));
> --
> 2.1.4
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] gallivm: Allow drivers and state trackers to initialize gallivm LLVM targets

2015-09-24 Thread Emil Velikov
Hi Tom,

On 24 September 2015 at 17:31, Tom Stellard  wrote:
> Drivers and state trackers that use LLVM for generating code, must
> register the targets they use with LLVM's global TargetRegistry.
> The TargetRegistry is not thread-safe, so all targets must be added
> to the registry before it can be queried for target information.
>
> When drivers and state trackers initialize their own targets, they need
> a way to force gallivm to initialize its targets at the same time.
> Otherwise, there can be a race condition in some multi-threaded
> applications (e.g. glx-multihreaded-shader-compile in piglit),
> when one thread creates a context for a driver that uses LLVM (e.g.
> radeonsi) and another thread creates a gallivm context (glxContextCreate
> does this).
>
> The race happens when the driver thread initializes its LLVM targets and
> then starts using the registry before the gallivm thread has a chance to
> register its targets.
>
> This patch allows users to force gallivm to register its targets by
> adding two new functions:  gallivm_llvm_init_begin() which locks a mutex
> guarding the TargetRegistry and initializes the gallivm targets, and
> gallivm_llvm_init_end() which unlocks the mutex.
>
> Drivers and state trackers should use this sequence to correctly
> initialize the LLVM targets they use:
>
> gallivm_init_llvm_begin();
> LLVMInitializeFooTarget();
> gallivm_init_llvm_end();
>
> CC: "10.6 11.0" 
> ---
>  src/gallium/auxiliary/gallivm/lp_bld_misc.cpp | 68 
> ---
>  src/gallium/auxiliary/gallivm/lp_bld_misc.h   |  5 ++
>  2 files changed, 66 insertions(+), 7 deletions(-)
>
> diff --git a/src/gallium/auxiliary/gallivm/lp_bld_misc.cpp 
> b/src/gallium/auxiliary/gallivm/lp_bld_misc.cpp
> index 5e25819..8bf833e 100644
> --- a/src/gallium/auxiliary/gallivm/lp_bld_misc.cpp
> +++ b/src/gallium/auxiliary/gallivm/lp_bld_misc.cpp
> @@ -81,6 +81,7 @@
>  #  pragma pop_macro("DEBUG")
>  #endif
>
> +#include "os/os_thread.h"
>  #include "pipe/p_config.h"
>  #include "util/u_debug.h"
>  #include "util/u_cpu_detect.h"
> @@ -101,6 +102,64 @@ public:
>
>  static LLVMEnsureMultithreaded lLVMEnsureMultithreaded;
>
> +struct LLVMRegistryLock {
> +private:
> +   pipe_mutex mutex;
> +
> +public:
> +   LLVMRegistryLock()
> +   {
> +  pipe_mutex_init(mutex);
> +   }
> +
> +   void lock()
> +   {
> +  pipe_mutex_lock(mutex);
> +   }
> +
> +   void unlock()
> +   {
> +  pipe_mutex_unlock(mutex);
> +   }
> +};
> +
> +static LLVMRegistryLock gallivm_llvm_registry_lock;
> +
> +}
> +
> +/**
> + * The llvm target registry is not thread-safe, so drivers and state-trackers
> + * that want to initialize targets should use the gallivm_init_llvm_begin() 
> and
> + * gallivm_init_llvm_end() functions to safely initialize targets.  For 
> example:
> + *
> + * gallivm_init_llvm_begin();
> + * LLVMInitializeFooTarget();
> + * gallivm_init_llvm_end();
> + *
> + * LLVM targets should be initialized before the driver tries to access the
> + * registry.
> + */
> +extern "C" void
> +gallivm_init_llvm_begin(void)
> +{
> +   static int init = 0;
> +   gallivm_llvm_registry_lock.lock();
> +   if (!init) {
> +  // If we have a native target, initialize it to ensure it is linked in 
> and
> +  // usable by the JIT.
> +  llvm::InitializeNativeTarget();
> +
> +  llvm::InitializeNativeTargetAsmPrinter();
> +
> +  llvm::InitializeNativeTargetDisassembler();
> +  init = 1;
> +}
> +}
> +
> +extern "C" void
> +gallivm_init_llvm_end(void)
> +{
> +   gallivm_llvm_registry_lock.unlock();
>  }
>
>  extern "C" void
> @@ -115,13 +174,8 @@ lp_set_target_options(void)
> llvm::DisablePrettyStackTrace = true;
>  #endif
>
> -   // If we have a native target, initialize it to ensure it is linked in and
> -   // usable by the JIT.
> -   llvm::InitializeNativeTarget();
> -
> -   llvm::InitializeNativeTargetAsmPrinter();
> -
> -   llvm::InitializeNativeTargetDisassembler();
> +   gallivm_init_llvm_begin();
> +   gallivm_init_llvm_end();
>  }
>
What is the advantage of these wrappers instead of the (more obvious imho)

pipe_mutex_init(mutex); // wrap in call_once() ?
pipe_mutex_lock(mutex);

llvm::InitializeNativeTarget...

pipe_mutex_unlock(mutex);

By adding these you effectively have extra calls to
llvm::InitializeNativeTarget* for radeon. The second patch does not
mention anything on the topic, is that intentional ?

Cheers,
Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 09/12] i965/fs/skl+: Fix calculating gl_SampleID for 16x MSAA

2015-09-24 Thread Anuj Phogat
On Tue, Sep 22, 2015 at 4:18 PM, Ben Widawsky  wrote:

> On Thu, Sep 17, 2015 at 05:00:11PM +0100, Neil Roberts wrote:
> > In order to accomodate 16x MSAA, the starting sample pair index is now
> > 3 bits rather than 2 on SKL+.
>
> Hooray for bad docs. "PS Thread Payload for Normal Dispatch": SSPI is 2
> bits
> (R0.0.7:6) with 1 reserved for expansion.
>
> > ---
> >  src/mesa/drivers/dri/i965/brw_fs.cpp | 9 -
> >  1 file changed, 8 insertions(+), 1 deletion(-)
> >
> > diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp
> b/src/mesa/drivers/dri/i965/brw_fs.cpp
> > index 82f49b4..bd9bcdd 100644
> > --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
> > +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
> > @@ -1300,9 +1300,16 @@ fs_visitor::emit_sampleid_setup()
> > * are sample 1 of subspan 0; the third group is sample 0 of
> > * subspan 1, and finally sample 1 of subspan 1.
> > */
> > +
> > +  /* SKL+ has an extra bit for the Starting Sample Pair Index to
> > +   * accomodate 16x MSAA.
> > +   */
> > +  unsigned sspi_bits = devinfo->gen >= 9 ? 3 : 2;
> > +  unsigned sspi_mask = ((1 << sspi_bits) - 1) << 6;
> > +
>
> I think using the bits makes this unnecessarily hard to read.
> unsigned sspi_mask = devinfo->gen >= 9 ? 0x1c0 : 0xc0;
>
> >abld.exec_all()
> >.AND(t1, fs_reg(retype(brw_vec1_grf(0, 0),
> BRW_REGISTER_TYPE_UD)),
> > -   fs_reg(0xc0));
> > +   fs_reg(sspi_mask));
> >abld.exec_all().SHR(t1, t1, fs_reg(5));
> >
> >/* This works for both SIMD8 and SIMD16 */
>
> I'm really sketchy on the details of how this actually works. Mostly, I
> get what
> this code is doing, and I agree that the only difference for 16x should be
> the
> extra bit (because SSPI needs to go up to 7 for SIMD8).
>
As per docs we're supposed to get the per slot SampleID written to 15:0
bits in
R1.0. I used SSPI to compute the SampleID because I never got anything
useful
in these bits on IVB. Things might have changed on later platforms. So, I
think
it's worth trying to do what docs say. I'll send out a patch to add above
reasoning
as a comment in this code. For now I'm happy with this patch:

Reviewed-by: Anuj Phogat 


>
> So take my:
> Reviewed-by: Ben Widawsky 
>
> with a grain of salt.
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/2] mapi: Make _glapi_get_stub work with "gl" or "mgl" prefix.

2015-09-24 Thread Kyle Brenneman
When USE_MGL_NAMESPACE is defined, _glapi_get_stub will check for the "m"
prefix before trying to skip it, so that "glFoo" and "mglFoo" are
equivalent.

This should let it work with all the places where something calls
_glapi_get_proc_offset with a hard-coded name that starts with the normal
"gl" prefix.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=2
---
 src/mapi/mapi_glapi.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/mapi/mapi_glapi.c b/src/mapi/mapi_glapi.c
index 7b9f1ae..c19bab4 100644
--- a/src/mapi/mapi_glapi.c
+++ b/src/mapi/mapi_glapi.c
@@ -174,7 +174,7 @@ _glapi_get_stub(const char *name, int generate)
const struct mapi_stub *stub;
 
 #ifdef USE_MGL_NAMESPACE
-   if (name)
+   if (name && name[0] == 'm')
   name++;
 #endif
 
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/2] glx: Fix build errors with --enable-mangling.

2015-09-24 Thread Kyle Brenneman
Rearranged the GLX_ALIAS macro in glextensions.h so that it will pick up
the renames from glx_mangle.h.

Fixed the alias attribute for glXGetProcAddress when USE_MGL_NAMESPACE is
defined.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=2
---
 src/glx/glxcmds.c   | 4 
 src/glx/glxextensions.h | 6 --
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/src/glx/glxcmds.c b/src/glx/glxcmds.c
index 8164cd9..976ee36 100644
--- a/src/glx/glxcmds.c
+++ b/src/glx/glxcmds.c
@@ -2666,7 +2666,11 @@ _X_EXPORT void (*glXGetProcAddressARB(const GLubyte * 
procName)) (void)
  */
 _X_EXPORT void (*glXGetProcAddress(const GLubyte * procName)) (void)
 #if defined(__GNUC__) && !defined(GLX_ALIAS_UNSUPPORTED)
+# if defined(USE_MGL_NAMESPACE)
+   __attribute__ ((alias("mglXGetProcAddressARB")));
+# else
__attribute__ ((alias("glXGetProcAddressARB")));
+# endif
 #else
 {
return glXGetProcAddressARB(procName);
diff --git a/src/glx/glxextensions.h b/src/glx/glxextensions.h
index 8436a94..c4d4a17 100644
--- a/src/glx/glxextensions.h
+++ b/src/glx/glxextensions.h
@@ -279,11 +279,13 @@ typedef void (*PFNGLXDISABLEEXTENSIONPROC) (const char 
*name);
 # define GLX_ALIAS_VOID(real_func, proto_args, args, aliased_func)
 #else
 # if defined(__GNUC__) && !defined(GLX_ALIAS_UNSUPPORTED)
-#  define GLX_ALIAS(return_type, real_func, proto_args, args, aliased_func) \
+#  define GLX_ALIAS2(return_type, real_func, proto_args, args, aliased_func) \
return_type  real_func  proto_args   \
__attribute__ ((alias( # aliased_func ) ));
+#  define GLX_ALIAS(return_type, real_func, proto_args, args, aliased_func) \
+   GLX_ALIAS2(return_type, real_func, proto_args, args, aliased_func)
 #  define GLX_ALIAS_VOID(real_func, proto_args, args, aliased_func) \
-   GLX_ALIAS(void, real_func, proto_args, args, aliased_func)
+   GLX_ALIAS2(void, real_func, proto_args, args, aliased_func)
 # else
 #  define GLX_ALIAS(return_type, real_func, proto_args, args, aliased_func) \
return_type  real_func  proto_args   \
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/3] nir: Add a function to count the number of vertices a GS emits.

2015-09-24 Thread Kenneth Graunke
Some hardware (such as Broadwell) can run geometry shaders more
efficiently when the number of vertices emitted is statically known.

This pass provides a way to obtain the constant vertex count, or
-1 indicating that the vertex count is unknown/non-constant.

Signed-off-by: Kenneth Graunke 
---
 src/glsl/Makefile.sources|  1 +
 src/glsl/nir/nir.h   |  2 +
 src/glsl/nir/nir_gs_count_vertices.c | 93 
 3 files changed, 96 insertions(+)
 create mode 100644 src/glsl/nir/nir_gs_count_vertices.c

diff --git a/src/glsl/Makefile.sources b/src/glsl/Makefile.sources
index a8f4994..32b6dba 100644
--- a/src/glsl/Makefile.sources
+++ b/src/glsl/Makefile.sources
@@ -30,6 +30,7 @@ NIR_FILES = \
nir/nir_control_flow_private.h \
nir/nir_dominance.c \
nir/nir_from_ssa.c \
+   nir/nir_gs_count_vertices.c \
nir/nir_intrinsics.c \
nir/nir_intrinsics.h \
nir/nir_live_variables.c \
diff --git a/src/glsl/nir/nir.h b/src/glsl/nir/nir.h
index 4f45770..d0c7b04 100644
--- a/src/glsl/nir/nir.h
+++ b/src/glsl/nir/nir.h
@@ -1804,6 +1804,8 @@ void nir_dump_dom_frontier(nir_shader *shader, FILE *fp);
 void nir_dump_cfg_impl(nir_function_impl *impl, FILE *fp);
 void nir_dump_cfg(nir_shader *shader, FILE *fp);
 
+int nir_gs_count_vertices(nir_shader *shader);
+
 bool nir_split_var_copies(nir_shader *shader);
 
 void nir_lower_var_copy_instr(nir_intrinsic_instr *copy, void *mem_ctx);
diff --git a/src/glsl/nir/nir_gs_count_vertices.c 
b/src/glsl/nir/nir_gs_count_vertices.c
new file mode 100644
index 000..e0bdf17
--- /dev/null
+++ b/src/glsl/nir/nir_gs_count_vertices.c
@@ -0,0 +1,93 @@
+/*
+ * Copyright © 2015 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ */
+
+#include "nir.h"
+#include "nir_builder.h"
+
+static nir_intrinsic_instr *
+as_intrinsic(nir_instr *instr, nir_intrinsic_op op)
+{
+   if (instr->type != nir_instr_type_intrinsic)
+  return NULL;
+
+   nir_intrinsic_instr *intrin = nir_instr_as_intrinsic(instr);
+   if (intrin->intrinsic != op)
+  return NULL;
+
+   return intrin;
+}
+
+static nir_intrinsic_instr *
+as_set_vertex_count(nir_instr *instr)
+{
+   return as_intrinsic(instr, nir_intrinsic_set_vertex_count);
+}
+
+/**
+ * If a geometry shader emits a constant number of vertices, return the
+ * number of vertices.  Otherwise, return -1 (unknown).
+ *
+ * This only works if you've used nir_lower_gs_intrinsics() to do vertex
+ * counting at the NIR level.
+ */
+int
+nir_gs_count_vertices(nir_shader *shader)
+{
+   int count = -1;
+
+   nir_foreach_overload(shader, overload) {
+  if (!overload->impl)
+ continue;
+
+  /* set_vertex_count intrinsics only appear in predecessors of the
+   * end block.  So we don't need to walk all of them.
+   */
+  struct set_entry *entry;
+  set_foreach(overload->impl->end_block->predecessors, entry) {
+ nir_block *block = (nir_block *) entry->key;
+
+ nir_foreach_instr_reverse(block, instr) {
+nir_intrinsic_instr *intrin = as_set_vertex_count(instr);
+if (!intrin)
+   continue;
+
+nir_const_value *val = nir_src_as_const_value(intrin->src[0]);
+/* We've found a non-constant value.  Bail. */
+if (!val)
+   return -1;
+
+if (count == -1)
+   count = val->i[0];
+
+/* We've found contradictory set_vertex_count intrinsics.
+ * This can happen if there are early-returns in main() and
+ * different paths emit different numbers of vertices.
+ */
+if (count != val->i[0])
+   return -1;
+ }
+  }
+   }
+
+   return count;
+}
-- 
2.5.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/ma

[Mesa-dev] [PATCH 3/3] i965: Implement "Static Vertex Count" geometry shader optimization.

2015-09-24 Thread Kenneth Graunke
Broadwell's 3DSTATE_GS contains new "Static Output" and "Static Vertex
Count" fields, which control a new optimization.  Normally, geometry
shaders can output arbitrary numbers of vertices, which means that
resource allocation has to be done on the fly.  However, if the number
of vertices is statically known, the hardware can pre-allocate resources
up front, which is more efficient.

Thanks to the new NIR GS intrinsics, this is easy.  We just call the
function introduced in the previous commit to get the vertex count.
If it obtains a count, we stop emitting the extra 32-bit "Vertex Count"
field in the VUE, and instead fill out the 3DSTATE_GS fields.

Improves performance of Gl32GSCloth by 5.16347% +/- 0.12611% (n=91)
on my Lenovo X250 laptop (Broadwell GT2) at 1024x768.

shader-db statistics for geometry shaders only:

total instructions in shared programs: 3227 -> 3207 (-0.62%)
instructions in affected programs: 242 -> 222 (-8.26%)
helped:10

Signed-off-by: Kenneth Graunke 
---
 src/mesa/drivers/dri/i965/brw_context.h   |  5 +
 src/mesa/drivers/dri/i965/brw_defines.h   |  5 +
 src/mesa/drivers/dri/i965/brw_gs.c|  5 +
 src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp | 11 +++
 src/mesa/drivers/dri/i965/gen8_gs_state.c |  6 ++
 5 files changed, 28 insertions(+), 4 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_context.h 
b/src/mesa/drivers/dri/i965/brw_context.h
index b05b8bd..5c31ba4 100644
--- a/src/mesa/drivers/dri/i965/brw_context.h
+++ b/src/mesa/drivers/dri/i965/brw_context.h
@@ -782,6 +782,11 @@ struct brw_gs_prog_data
 
bool include_primitive_id;
 
+   /**
+* The number of vertices emitted, if constant - otherwise -1.
+*/
+   int static_vertex_count;
+
int invocations;
 
/**
diff --git a/src/mesa/drivers/dri/i965/brw_defines.h 
b/src/mesa/drivers/dri/i965/brw_defines.h
index f9d8d1b..6d94a6f 100644
--- a/src/mesa/drivers/dri/i965/brw_defines.h
+++ b/src/mesa/drivers/dri/i965/brw_defines.h
@@ -1960,6 +1960,11 @@ enum brw_message_target {
 # define GEN6_GS_SVBI_POSTINCREMENT_VALUE_MASK INTEL_MASK(25, 16)
 # define GEN6_GS_ENABLE(1 << 15)
 
+/* Gen8+ DW8 */
+# define GEN8_GS_STATIC_OUTPUT  (1 << 30)
+# define GEN8_GS_STATIC_VERTEX_COUNT_SHIFT  16
+# define GEN8_GS_STATIC_VERTEX_COUNT_MASK   INTEL_MASK(26, 16)
+
 /* Gen8+ DW9 */
 # define GEN8_GS_URB_ENTRY_OUTPUT_OFFSET_SHIFT  21
 # define GEN8_GS_URB_OUTPUT_LENGTH_SHIFT16
diff --git a/src/mesa/drivers/dri/i965/brw_gs.c 
b/src/mesa/drivers/dri/i965/brw_gs.c
index 16ea684..111cf93 100644
--- a/src/mesa/drivers/dri/i965/brw_gs.c
+++ b/src/mesa/drivers/dri/i965/brw_gs.c
@@ -73,6 +73,11 @@ brw_codegen_gs_prog(struct brw_context *brw,
c.prog_data.base.base.nr_params = param_count;
c.prog_data.base.base.nr_image_params = gs->NumImages;
 
+   if (brw->gen >= 8) {
+  c.prog_data.static_vertex_count =
+ nir_gs_count_vertices(gp->program.Base.nir);
+   }
+
if (brw->gen >= 7) {
   if (gp->program.OutputType == GL_POINTS) {
  /* When the output type is points, the geometry shader may output data
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp
index ff5bd98..acf0501 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp
@@ -234,17 +234,20 @@ vec4_gs_visitor::emit_thread_end()
 */
int base_mrf = 1;
 
+   bool static_vertex_count = c->prog_data.static_vertex_count != -1;
+
current_annotation = "thread end";
dst_reg mrf_reg(MRF, base_mrf);
src_reg r0(retype(brw_vec8_grf(0, 0), BRW_REGISTER_TYPE_UD));
vec4_instruction *inst = emit(MOV(mrf_reg, r0));
inst->force_writemask_all = true;
-   emit(GS_OPCODE_SET_VERTEX_COUNT, mrf_reg, this->vertex_count);
+   if (devinfo->gen < 8 || !static_vertex_count)
+  emit(GS_OPCODE_SET_VERTEX_COUNT, mrf_reg, this->vertex_count);
if (INTEL_DEBUG & DEBUG_SHADER_TIME)
   emit_shader_time_end();
inst = emit(GS_OPCODE_THREAD_END);
inst->base_mrf = base_mrf;
-   inst->mlen = devinfo->gen >= 8 ? 2 : 1;
+   inst->mlen = devinfo->gen >= 8 && !static_vertex_count ? 2 : 1;
 }
 
 
@@ -284,7 +287,7 @@ vec4_gs_visitor::emit_urb_write_opcode(bool complete)
/* We need to increment Global Offset by 1 to make room for Broadwell's
 * extra "Vertex Count" payload at the beginning of the URB entry.
 */
-   if (devinfo->gen >= 8)
+   if (devinfo->gen >= 8 && c->prog_data.static_vertex_count == -1)
   inst->offset++;
 
inst->urb_write_flags = BRW_URB_WRITE_PER_SLOT_OFFSET;
@@ -421,7 +424,7 @@ vec4_gs_visitor::emit_control_data_bits()
 * URB entry.  Since this is an OWord message, Global Offset is counted
 * in 128-bit units, so we must set it to 2.
 */
-   if (devinfo->

[Mesa-dev] [PATCH 2/3] i965: Move GS_THREAD_END mlen calculations out of the generator.

2015-09-24 Thread Kenneth Graunke
The visitor was setting a mlen that was wrong for Broadwell, but the
generator was ignoring it and doing the right thing regardless.  We may
as well move the logic fully into the visitor.  This will be useful in
the next commit as well.

Signed-off-by: Kenneth Graunke 
---
 src/mesa/drivers/dri/i965/brw_vec4_generator.cpp  | 2 +-
 src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
index 05f2044..e69c067 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
@@ -505,7 +505,7 @@ vec4_generator::generate_gs_thread_end(vec4_instruction 
*inst)
  inst->base_mrf, /* starting mrf reg nr */
  src,
  BRW_URB_WRITE_EOT | inst->urb_write_flags,
- devinfo->gen >= 8 ? 2 : 1,/* message len */
+ inst->mlen,
  0,  /* response len */
  0,  /* urb destination offset */
  BRW_URB_SWIZZLE_INTERLEAVE);
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp
index 3cb1b4c..ff5bd98 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp
@@ -244,7 +244,7 @@ vec4_gs_visitor::emit_thread_end()
   emit_shader_time_end();
inst = emit(GS_OPCODE_THREAD_END);
inst->base_mrf = base_mrf;
-   inst->mlen = 1;
+   inst->mlen = devinfo->gen >= 8 ? 2 : 1;
 }
 
 
-- 
2.5.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v3] glsl: add packed varyings to program resource list

2015-09-24 Thread Ilia Mirkin
On Fri, Sep 11, 2015 at 3:19 AM, Tapani Pälli  wrote:
> This makes sure that user is still able to query properties about
> variables that have gotten packed by lower_packed_varyings pass.
>
> Fixes following OpenGL ES 3.1 test:
>ES31-CTS.program_interface_query.separate-programs-vertex
>
> v2: fix 'name included in packed list' check (Ilia Mirkin)
> v3: iterate over instances of name using strtok_r (Ilia Mirkin)
>
> Signed-off-by: Tapani Pälli 
> ---
>  src/glsl/linker.cpp | 80 
> ++---
>  1 file changed, 76 insertions(+), 4 deletions(-)
>
> diff --git a/src/glsl/linker.cpp b/src/glsl/linker.cpp
> index 94f847e..de499e2 100644
> --- a/src/glsl/linker.cpp
> +++ b/src/glsl/linker.cpp
> @@ -3211,6 +3251,33 @@ add_interface_variables(struct gl_shader_program 
> *shProg,
> return true;
>  }
>
> +static bool
> +add_packed_varyings(struct gl_shader_program *shProg, int stage)
> +{
> +   struct gl_shader *sh = shProg->_LinkedShaders[stage];
> +   GLenum iface;
> +
> +   if (!sh || !sh->packed_varyings)
> +  return true;
> +
> +   foreach_in_list(ir_instruction, node, sh->packed_varyings) {
> +  ir_variable *var = node->as_variable();
> +  if (var) {
> + switch (var->data.mode) {
> + case ir_var_shader_in:
> +iface = GL_PROGRAM_INPUT;
> + case ir_var_shader_out:
> +iface = GL_PROGRAM_OUTPUT;

I happened to notice this when looking over pushed commits. The issue
above should be obvious... should probably fix that, and ideally add a
piglit test that would catch it?

  -ilia
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v3] glsl: add packed varyings to program resource list

2015-09-24 Thread Tapani Pälli



On 09/25/2015 08:45 AM, Ilia Mirkin wrote:

On Fri, Sep 11, 2015 at 3:19 AM, Tapani Pälli  wrote:

This makes sure that user is still able to query properties about
variables that have gotten packed by lower_packed_varyings pass.

Fixes following OpenGL ES 3.1 test:
ES31-CTS.program_interface_query.separate-programs-vertex

v2: fix 'name included in packed list' check (Ilia Mirkin)
v3: iterate over instances of name using strtok_r (Ilia Mirkin)

Signed-off-by: Tapani Pälli 
---
  src/glsl/linker.cpp | 80 ++---
  1 file changed, 76 insertions(+), 4 deletions(-)

diff --git a/src/glsl/linker.cpp b/src/glsl/linker.cpp
index 94f847e..de499e2 100644
--- a/src/glsl/linker.cpp
+++ b/src/glsl/linker.cpp
@@ -3211,6 +3251,33 @@ add_interface_variables(struct gl_shader_program *shProg,
 return true;
  }

+static bool
+add_packed_varyings(struct gl_shader_program *shProg, int stage)
+{
+   struct gl_shader *sh = shProg->_LinkedShaders[stage];
+   GLenum iface;
+
+   if (!sh || !sh->packed_varyings)
+  return true;
+
+   foreach_in_list(ir_instruction, node, sh->packed_varyings) {
+  ir_variable *var = node->as_variable();
+  if (var) {
+ switch (var->data.mode) {
+ case ir_var_shader_in:
+iface = GL_PROGRAM_INPUT;
+ case ir_var_shader_out:
+iface = GL_PROGRAM_OUTPUT;


I happened to notice this when looking over pushed commits. The issue
above should be obvious... should probably fix that, and ideally add a
piglit test that would catch it?


Oh dear, yep I'll fix it soon and also add a default case (which should 
not be hit).


// Tapani
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/3] i965: Implement "Static Vertex Count" geometry shader optimization.

2015-09-24 Thread Jordan Justen
On 2015-09-24 22:20:40, Kenneth Graunke wrote:
> Broadwell's 3DSTATE_GS contains new "Static Output" and "Static Vertex
> Count" fields, which control a new optimization.  Normally, geometry
> shaders can output arbitrary numbers of vertices, which means that
> resource allocation has to be done on the fly.  However, if the number
> of vertices is statically known, the hardware can pre-allocate resources
> up front, which is more efficient.
> 
> Thanks to the new NIR GS intrinsics, this is easy.  We just call the
> function introduced in the previous commit to get the vertex count.
> If it obtains a count, we stop emitting the extra 32-bit "Vertex Count"
> field in the VUE, and instead fill out the 3DSTATE_GS fields.
> 
> Improves performance of Gl32GSCloth by 5.16347% +/- 0.12611% (n=91)
> on my Lenovo X250 laptop (Broadwell GT2) at 1024x768.
> 
> shader-db statistics for geometry shaders only:
> 
> total instructions in shared programs: 3227 -> 3207 (-0.62%)
> instructions in affected programs: 242 -> 222 (-8.26%)
> helped:10
> 
> Signed-off-by: Kenneth Graunke 
> ---
>  src/mesa/drivers/dri/i965/brw_context.h   |  5 +
>  src/mesa/drivers/dri/i965/brw_defines.h   |  5 +
>  src/mesa/drivers/dri/i965/brw_gs.c|  5 +
>  src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp | 11 +++
>  src/mesa/drivers/dri/i965/gen8_gs_state.c |  6 ++
>  5 files changed, 28 insertions(+), 4 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_context.h 
> b/src/mesa/drivers/dri/i965/brw_context.h
> index b05b8bd..5c31ba4 100644
> --- a/src/mesa/drivers/dri/i965/brw_context.h
> +++ b/src/mesa/drivers/dri/i965/brw_context.h
> @@ -782,6 +782,11 @@ struct brw_gs_prog_data
>  
> bool include_primitive_id;
>  
> +   /**
> +* The number of vertices emitted, if constant - otherwise -1.
> +*/
> +   int static_vertex_count;
> +
> int invocations;
>  
> /**
> diff --git a/src/mesa/drivers/dri/i965/brw_defines.h 
> b/src/mesa/drivers/dri/i965/brw_defines.h
> index f9d8d1b..6d94a6f 100644
> --- a/src/mesa/drivers/dri/i965/brw_defines.h
> +++ b/src/mesa/drivers/dri/i965/brw_defines.h
> @@ -1960,6 +1960,11 @@ enum brw_message_target {
>  # define GEN6_GS_SVBI_POSTINCREMENT_VALUE_MASK INTEL_MASK(25, 16)
>  # define GEN6_GS_ENABLE(1 << 15)
>  
> +/* Gen8+ DW8 */
> +# define GEN8_GS_STATIC_OUTPUT  (1 << 30)
> +# define GEN8_GS_STATIC_VERTEX_COUNT_SHIFT  16
> +# define GEN8_GS_STATIC_VERTEX_COUNT_MASK   INTEL_MASK(26, 16)
> +
>  /* Gen8+ DW9 */
>  # define GEN8_GS_URB_ENTRY_OUTPUT_OFFSET_SHIFT  21
>  # define GEN8_GS_URB_OUTPUT_LENGTH_SHIFT16
> diff --git a/src/mesa/drivers/dri/i965/brw_gs.c 
> b/src/mesa/drivers/dri/i965/brw_gs.c
> index 16ea684..111cf93 100644
> --- a/src/mesa/drivers/dri/i965/brw_gs.c
> +++ b/src/mesa/drivers/dri/i965/brw_gs.c
> @@ -73,6 +73,11 @@ brw_codegen_gs_prog(struct brw_context *brw,
> c.prog_data.base.base.nr_params = param_count;
> c.prog_data.base.base.nr_image_params = gs->NumImages;
>  
> +   if (brw->gen >= 8) {
> +  c.prog_data.static_vertex_count =
> + nir_gs_count_vertices(gp->program.Base.nir);
> +   }

It looks like static_vertex_count will always be 0 for gen < 8 which I
guess is an invalid value. It looks the code that uses it always
checks gen >= 8, so it should be fine.

Series Reviewed-by: Jordan Justen 

> if (brw->gen >= 7) {
>if (gp->program.OutputType == GL_POINTS) {
>   /* When the output type is points, the geometry shader may output 
> data
> diff --git a/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp 
> b/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp
> index ff5bd98..acf0501 100644
> --- a/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp
> @@ -234,17 +234,20 @@ vec4_gs_visitor::emit_thread_end()
>  */
> int base_mrf = 1;
>  
> +   bool static_vertex_count = c->prog_data.static_vertex_count != -1;
> +
> current_annotation = "thread end";
> dst_reg mrf_reg(MRF, base_mrf);
> src_reg r0(retype(brw_vec8_grf(0, 0), BRW_REGISTER_TYPE_UD));
> vec4_instruction *inst = emit(MOV(mrf_reg, r0));
> inst->force_writemask_all = true;
> -   emit(GS_OPCODE_SET_VERTEX_COUNT, mrf_reg, this->vertex_count);
> +   if (devinfo->gen < 8 || !static_vertex_count)
> +  emit(GS_OPCODE_SET_VERTEX_COUNT, mrf_reg, this->vertex_count);
> if (INTEL_DEBUG & DEBUG_SHADER_TIME)
>emit_shader_time_end();
> inst = emit(GS_OPCODE_THREAD_END);
> inst->base_mrf = base_mrf;
> -   inst->mlen = devinfo->gen >= 8 ? 2 : 1;
> +   inst->mlen = devinfo->gen >= 8 && !static_vertex_count ? 2 : 1;
>  }
>  
>  
> @@ -284,7 +287,7 @@ vec4_gs_visitor::emit_urb_write_opcode(bool complete)
> /* We need to increment Global Offset by 1 to make room for Br

[Mesa-dev] [PATCH v2 3/3] i965: Implement "Static Vertex Count" geometry shader optimization.

2015-09-24 Thread Kenneth Graunke
Broadwell's 3DSTATE_GS contains new "Static Output" and "Static Vertex
Count" fields, which control a new optimization.  Normally, geometry
shaders can output arbitrary numbers of vertices, which means that
resource allocation has to be done on the fly.  However, if the number
of vertices is statically known, the hardware can pre-allocate resources
up front, which is more efficient.

Thanks to the new NIR GS intrinsics, this is easy.  We just call the
function introduced in the previous commit to get the vertex count.
If it obtains a count, we stop emitting the extra 32-bit "Vertex Count"
field in the VUE, and instead fill out the 3DSTATE_GS fields.

Improves performance of Gl32GSCloth by 5.16347% +/- 0.12611% (n=91)
on my Lenovo X250 laptop (Broadwell GT2) at 1024x768.

shader-db statistics for geometry shaders only:

total instructions in shared programs: 3227 -> 3207 (-0.62%)
instructions in affected programs: 242 -> 222 (-8.26%)
helped:10

v2: Don't break non-NIR paths (just skip this optimization).

Signed-off-by: Kenneth Graunke 
---
 src/mesa/drivers/dri/i965/brw_context.h   |  5 +
 src/mesa/drivers/dri/i965/brw_defines.h   |  5 +
 src/mesa/drivers/dri/i965/brw_gs.c|  5 +
 src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp | 11 +++
 src/mesa/drivers/dri/i965/gen8_gs_state.c |  6 ++
 5 files changed, 28 insertions(+), 4 deletions(-)

Brown bag fix...I broke the non-NIR path.

diff --git a/src/mesa/drivers/dri/i965/brw_context.h 
b/src/mesa/drivers/dri/i965/brw_context.h
index b05b8bd..5c31ba4 100644
--- a/src/mesa/drivers/dri/i965/brw_context.h
+++ b/src/mesa/drivers/dri/i965/brw_context.h
@@ -782,6 +782,11 @@ struct brw_gs_prog_data
 
bool include_primitive_id;
 
+   /**
+* The number of vertices emitted, if constant - otherwise -1.
+*/
+   int static_vertex_count;
+
int invocations;
 
/**
diff --git a/src/mesa/drivers/dri/i965/brw_defines.h 
b/src/mesa/drivers/dri/i965/brw_defines.h
index f9d8d1b..6d94a6f 100644
--- a/src/mesa/drivers/dri/i965/brw_defines.h
+++ b/src/mesa/drivers/dri/i965/brw_defines.h
@@ -1960,6 +1960,11 @@ enum brw_message_target {
 # define GEN6_GS_SVBI_POSTINCREMENT_VALUE_MASK INTEL_MASK(25, 16)
 # define GEN6_GS_ENABLE(1 << 15)
 
+/* Gen8+ DW8 */
+# define GEN8_GS_STATIC_OUTPUT  (1 << 30)
+# define GEN8_GS_STATIC_VERTEX_COUNT_SHIFT  16
+# define GEN8_GS_STATIC_VERTEX_COUNT_MASK   INTEL_MASK(26, 16)
+
 /* Gen8+ DW9 */
 # define GEN8_GS_URB_ENTRY_OUTPUT_OFFSET_SHIFT  21
 # define GEN8_GS_URB_OUTPUT_LENGTH_SHIFT16
diff --git a/src/mesa/drivers/dri/i965/brw_gs.c 
b/src/mesa/drivers/dri/i965/brw_gs.c
index 16ea684..25480a0 100644
--- a/src/mesa/drivers/dri/i965/brw_gs.c
+++ b/src/mesa/drivers/dri/i965/brw_gs.c
@@ -73,6 +73,11 @@ brw_codegen_gs_prog(struct brw_context *brw,
c.prog_data.base.base.nr_params = param_count;
c.prog_data.base.base.nr_image_params = gs->NumImages;
 
+   if (brw->gen >= 8) {
+  c.prog_data.static_vertex_count = !gp->program.Base.nir ? -1 :
+ nir_gs_count_vertices(gp->program.Base.nir);
+   }
+
if (brw->gen >= 7) {
   if (gp->program.OutputType == GL_POINTS) {
  /* When the output type is points, the geometry shader may output data
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp
index ff5bd98..acf0501 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp
@@ -234,17 +234,20 @@ vec4_gs_visitor::emit_thread_end()
 */
int base_mrf = 1;
 
+   bool static_vertex_count = c->prog_data.static_vertex_count != -1;
+
current_annotation = "thread end";
dst_reg mrf_reg(MRF, base_mrf);
src_reg r0(retype(brw_vec8_grf(0, 0), BRW_REGISTER_TYPE_UD));
vec4_instruction *inst = emit(MOV(mrf_reg, r0));
inst->force_writemask_all = true;
-   emit(GS_OPCODE_SET_VERTEX_COUNT, mrf_reg, this->vertex_count);
+   if (devinfo->gen < 8 || !static_vertex_count)
+  emit(GS_OPCODE_SET_VERTEX_COUNT, mrf_reg, this->vertex_count);
if (INTEL_DEBUG & DEBUG_SHADER_TIME)
   emit_shader_time_end();
inst = emit(GS_OPCODE_THREAD_END);
inst->base_mrf = base_mrf;
-   inst->mlen = devinfo->gen >= 8 ? 2 : 1;
+   inst->mlen = devinfo->gen >= 8 && !static_vertex_count ? 2 : 1;
 }
 
 
@@ -284,7 +287,7 @@ vec4_gs_visitor::emit_urb_write_opcode(bool complete)
/* We need to increment Global Offset by 1 to make room for Broadwell's
 * extra "Vertex Count" payload at the beginning of the URB entry.
 */
-   if (devinfo->gen >= 8)
+   if (devinfo->gen >= 8 && c->prog_data.static_vertex_count == -1)
   inst->offset++;
 
inst->urb_write_flags = BRW_URB_WRITE_PER_SLOT_OFFSET;
@@ -421,7 +424,7 @@ vec4_gs_visitor::emit_control_data_bits()
 * URB entry