Re: [Mesa-dev] [PATCH v4 (part2) 52/56] mesa: Add getters for the GL_ARB_shader_storage_buffer_object max constants

2015-07-31 Thread Samuel Iglesias Gonsálvez
On Fri, 2015-07-31 at 09:26 +0300, Tapani Pälli wrote:
> 
> On 07/24/2015 08:30 AM, Samuel Iglesias Gonsálvez wrote:
> > 
> > On 23/07/15 08:42, Samuel Iglesias Gonsalvez wrote:
> > > v2:
> > > - Add tessellation shader constants support
> > > 
> > > Signed-off-by: Samuel Iglesias Gonsalvez 
> > > ---
> > >   src/mesa/main/get.c  |  1 +
> > >   src/mesa/main/get_hash_params.py | 14 ++
> > >   2 files changed, 15 insertions(+)
> > > 
> > > diff --git a/src/mesa/main/get.c b/src/mesa/main/get.c
> > > index 56cc3f2..a75bea9 100644
> > > --- a/src/mesa/main/get.c
> > > +++ b/src/mesa/main/get.c
> > > @@ -414,6 +414,7 @@ EXTRA_EXT(ARB_clip_control);
> > >   EXTRA_EXT(EXT_polygon_offset_clamp);
> > >   EXTRA_EXT(ARB_framebuffer_no_attachments);
> > >   EXTRA_EXT(ARB_tessellation_shader);
> > > +EXTRA_EXT(ARB_shader_storage_buffer_object);
> > > 
> > >   static const int
> > >   extra_ARB_color_buffer_float_or_glcore[] = {
> > > diff --git a/src/mesa/main/get_hash_params.py 
> > > b/src/mesa/main/get_hash_params.py
> > > index 2cf06d6..f810155 100644
> > > --- a/src/mesa/main/get_hash_params.py
> > > +++ b/src/mesa/main/get_hash_params.py
> > > @@ -373,6 +373,20 @@ descriptor=[
> > > [ "UNIFORM_BUFFER_OFFSET_ALIGNMENT", 
> > > "CONTEXT_INT(Const.UniformBufferOffsetAlignment), 
> > > extra_ARB_uniform_buffer_object" ],
> > > [ "UNIFORM_BUFFER_BINDING", "LOC_CUSTOM, TYPE_INT, 0, 
> > > extra_ARB_uniform_buffer_object" ],
> > > 
> > > +  # GL_ARB_shader_storage_buffer_object
> > > +  [ "MAX_VERTEX_SHADER_STORAGE_BLOCKS", 
> > > "CONTEXT_INT(Const.Program[MESA_SHADER_VERTEX].MaxShaderStorageBl
> > > ocks), extra_ARB_shader_storage_buffer_object" ],
> > > +  [ "MAX_GEOMETRY_SHADER_STORAGE_BLOCKS", 
> > > "CONTEXT_INT(Const.Program[MESA_SHADER_FRAGMENT].MaxShaderStorage
> > > Blocks), extra_ARB_shader_storage_buffer_object" ],
> > > +  [ "MAX_TESS_CONTROL_SHADER_STORAGE_BLOCKS", 
> > > "CONTEXT_INT(Const.Program[MESA_SHADER_TESS_CTRL].MaxShaderStorag
> > > eBlocks), extra_ARB_shader_storage_buffer_object" ],
> > > +  [ "MAX_TESS_EVALUATION_SHADER_STORAGE_BLOCKS", 
> > > "CONTEXT_INT(Const.Program[MESA_SHADER_TESS_EVAL].MaxShaderStorag
> > > eBlocks), extra_ARB_shader_storage_buffer_object" ],
> > > +  [ "MAX_FRAGMENT_SHADER_STORAGE_BLOCKS", 
> > > "CONTEXT_INT(Const.Program[MESA_SHADER_FRAGMENT].MaxShaderStorage
> > > Blocks), extra_ARB_shader_storage_buffer_object" ],
> > > +  [ "MAX_COMPUTE_SHADER_STORAGE_BLOCKS", 
> > > "CONTEXT_INT(Const.Program[MESA_SHADER_COMPUTE].MaxShaderStorageB
> > > locks), extra_ARB_shader_storage_buffer_object" ],
> > > +  [ "MAX_COMBINED_SHADER_STORAGE_BLOCKS", 
> > > "CONTEXT_INT(Const.MaxCombinedShaderStorageBlocks), 
> > > extra_ARB_shader_storage_buffer_object" ],
> > > +  [ "MAX_SHADER_STORAGE_BLOCK_SIZE", 
> > > "CONTEXT_INT(Const.MaxShaderStorageBlockSize), 
> > > extra_ARB_shader_storage_buffer_object" ],
> > > +  [ "MAX_SHADER_STORAGE_BUFFER_BINDINGS", 
> > > "CONTEXT_INT(Const.MaxShaderStorageBufferBindings), 
> > > extra_ARB_shader_storage_buffer_object" ],
> > > +  [ "MAX_COMBINED_SHADER_OUTPUT_RESOURCES", 
> > > "CONTEXT_INT(Const.MaxCombinedImageUnitsAndFragmentOutputs), 
> > > extra_ARB_shader_storage_buffer_object" ],
> > > +  [ "SHADER_STORAGE_BUFFER_OFFSET_ALIGNMENT", 
> > > "CONTEXT_INT(Const.ShaderStorageBufferOffsetAlignment), 
> > > extra_ARB_shader_storage_buffer_object" ],
> > > +  [ "SHADER_STORAGE_BUFFER_BINDING", "LOC_CUSTOM, TYPE_INT, 0, 
> > > extra_ARB_shader_storage_buffer_object" ],
> > > +
> > 
> > While I was writing ARB_shader_storage_buffer_object support for 
> > GLES
> > 3.1, I realized that this patch misplaced these lines. They should 
> > be at
> > the end of the file (inside "Enums restricted to OpenGL Core 
> > profile"
> > section).
> > 
> > Later, one of the GLES 3.1 patches will move the corresponding 
> > constants
> > to "Enums in OpenGL Core profile and ES 3.1" section.
> > 
> > I can send another version of this patch with that fixed, if you 
> > want.
> 
> Why not directly move these to "Enums in OpenGL Core profile and ES 
> 3.1" 
> section?
> 

Not all constants are defined in GLES 3.1. However, what I can do is to
merge the GLES 3.1 patch to this one, so we have one patch with all the
support.

Does this sounds good to you?

Sam

> > Sam
> > 
> > >   # GL_ARB_vertex_shader
> > > [ "MAX_VERTEX_UNIFORM_COMPONENTS_ARB", 
> > > "CONTEXT_INT(Const.Program[MESA_SHADER_VERTEX].MaxUniformComponen
> > > ts), extra_ARB_vertex_shader" ],
> > > [ "MAX_VARYING_FLOATS_ARB", "LOC_CUSTOM, TYPE_INT, 0, 
> > > extra_ARB_vertex_shader" ],
> > > 
> > ___
> > mesa-dev mailing list
> > mesa-dev@lists.freedesktop.org
> > http://lists.freedesktop.org/mailman/listinfo/mesa-dev
> > 
> 


signature.asc
Description: This is a digitally signed message part
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.o

Re: [Mesa-dev] [PATCH v3 (part2) 49/56] main: Add SHADER_STORAGE_BLOCK and BUFFER_VARIABLE support for ARB_program_interface_query

2015-07-31 Thread Samuel Iglesias Gonsálvez
On Fri, 2015-07-31 at 09:09 +0300, Tapani Pälli wrote:
> On 07/14/2015 10:46 AM, Iago Toral Quiroga wrote:
> > From: Samuel Iglesias Gonsalvez 
> > 
> > Including TOP_LEVEL_ARRAY_SIZE and TOP_LEVEL_ARRAY_STRIDE queries.
> > 
> > Signed-off-by: Samuel Iglesias Gonsalvez 
> > ---
> >   src/glsl/ir_uniform.h|   5 +
> >   src/glsl/link_uniforms.cpp   |  17 ++-
> >   src/glsl/linker.cpp  |  10 +-
> >   src/mesa/main/program_resource.c |   7 +-
> >   src/mesa/main/shader_query.cpp   | 265 
> > +--
> >   5 files changed, 289 insertions(+), 15 deletions(-)
> > 
> > diff --git a/src/glsl/ir_uniform.h b/src/glsl/ir_uniform.h
> > index e1b8014..71d894c 100644
> > --- a/src/glsl/ir_uniform.h
> > +++ b/src/glsl/ir_uniform.h
> > @@ -186,6 +186,11 @@ struct gl_uniform_storage {
> >   * This is a built-in uniform that should not be modified 
> > through any gl API.
> >   */
> >  bool builtin;
> > +
> > +   /**
> > +* This is a shader storage buffer variable, not an uniform.
> > +*/
> > +   bool is_shader_storage;
> >   };
> > 
> >   #ifdef __cplusplus
> > diff --git a/src/glsl/link_uniforms.cpp 
> > b/src/glsl/link_uniforms.cpp
> > index eefe7dc..29a8799 100644
> > --- a/src/glsl/link_uniforms.cpp
> > +++ b/src/glsl/link_uniforms.cpp
> > @@ -638,6 +638,9 @@ private:
> > if (!this->uniforms[id].builtin)
> >this->uniforms[id].storage = this->values;
> > 
> > +  this->uniforms[id].is_shader_storage =
> > + current_var->is_in_shader_storage_block();
> > +
> > if (this->ubo_block_index != -1) {
> >  this->uniforms[id].block_index = this->ubo_block_index;
> > 
> > @@ -647,8 +650,12 @@ private:
> >  this->ubo_byte_offset += type->std140_size(row_major);
> > 
> >  if (type->is_array()) {
> > -   this->uniforms[id].array_stride =
> > -  glsl_align(type->fields.array
> > ->std140_size(row_major), 16);
> > +   if (type->interface_packing == 
> > GLSL_INTERFACE_PACKING_STD430)
> > +  this->uniforms[id].array_stride =
> > + type->fields.array->std430_size(row_major);
> > +   else
> > +  this->uniforms[id].array_stride =
> > + glsl_align(type->fields.array
> > ->std140_size(row_major), 16);
> >  } else {
> > this->uniforms[id].array_stride = 0;
> >  }
> > @@ -659,7 +666,11 @@ private:
> >   const unsigned items = row_major ? matrix
> > ->matrix_columns : matrix->vector_elements;
> > 
> >   assert(items <= 4);
> > -this->uniforms[id].matrix_stride = glsl_align(items * 
> > N, 16);
> > +if (type->interface_packing == 
> > GLSL_INTERFACE_PACKING_STD430)
> > +   this->uniforms[id].matrix_stride = items < 3 ? 
> > items * N :
> > +  glsl_align(items 
> > * N, 16);
> > +else
> > +   this->uniforms[id].matrix_stride = glsl_align(items 
> > * N, 16);
> > this->uniforms[id].row_major = row_major;
> >  } else {
> > this->uniforms[id].matrix_stride = 0;
> > diff --git a/src/glsl/linker.cpp b/src/glsl/linker.cpp
> > index 330ef56..e82aa61 100644
> > --- a/src/glsl/linker.cpp
> > +++ b/src/glsl/linker.cpp
> > @@ -2852,14 +2852,18 @@ build_program_resource_list(struct 
> > gl_context *ctx,
> >}
> > }
> > 
> > -  if (!add_program_resource(shProg, GL_UNIFORM,
> > +  bool is_shader_storage =  shProg
> > ->UniformStorage[i].is_shader_storage;
> > +  GLenum type = is_shader_storage ? GL_BUFFER_VARIABLE : 
> > GL_UNIFORM;
> > +  if (!add_program_resource(shProg, type,
> >   &shProg->UniformStorage[i], 
> > stageref))
> >return;
> >  }
> > 
> > -   /* Add program uniform blocks. */
> > +   /* Add program uniform blocks and shader storage blocks. */
> >  for (unsigned i = 0; i < shProg->NumUniformBlocks; i++) {
> > -  if (!add_program_resource(shProg, GL_UNIFORM_BLOCK,
> > +  bool is_shader_storage = shProg
> > ->UniformBlocks[i].IsShaderStorage;
> > +  GLenum type = is_shader_storage ? GL_SHADER_STORAGE_BLOCK : 
> > GL_UNIFORM_BLOCK;
> > +  if (!add_program_resource(shProg, type,
> > &shProg->UniformBlocks[i], 0))
> >return;
> >  }
> > diff --git a/src/mesa/main/program_resource.c 
> > b/src/mesa/main/program_resource.c
> > index d857b84..0444e3b 100644
> > --- a/src/mesa/main/program_resource.c
> > +++ b/src/mesa/main/program_resource.c
> > @@ -40,6 +40,8 @@ supported_interface_enum(GLenum iface)
> >  case GL_PROGRAM_OUTPUT:
> >  case GL_TRANSFORM_FEEDBACK_VARYING:
> >  case GL_ATOMIC_COUNTER_BUFFER:
> > +   case GL_BUFFER_VARIABLE:
> > +   case GL_SHADER_STORAGE_BLOCK:
> > return true;
> >  case GL_VERTEX_SUBROUTINE:
> >  case GL_TESS_CONTROL_SUBROUTINE:
> > @@ -53,8 +55,6 @@ supported_interface_enum(GLenum iface)
> >  case GL_GEOMETRY_

Re: [Mesa-dev] [PATCH 0/9] Mostly trivial clean ups

2015-07-31 Thread Juha-Pekka Heikkila
this set is

Reviewed-by: Juha-Pekka Heikkila 

On 30.07.2015 17:14, Ian Romanick wrote:
> All but the last of these patches have been sitting in one tree or
> another for quite some time.  All of these files were touched as part of
> other work, but that work is stalled a bit.  At Ken's suggestion, I'm
> sending out the trivial bits just to prune my trees a bit.
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 3/6] i965/vec4: Don't emit scratch reads for a spilled register we have just written

2015-07-31 Thread Iago Toral
On Thu, 2015-07-30 at 17:08 +0300, Francisco Jerez wrote:
> Iago Toral Quiroga  writes:
> 
> > When we have code such as this:
> >
> > mov vgrf1.0.x:F, vgrf2.:F
> > mov vgrf3.0.x:F, vgrf1.:F
> > ...
> > mov vgrf3.0.x:F, vgrf1.:F
> >
> > And vgrf1 is chosen for spilling, we can emit this:
> >
> > mov vgrf1.0.x:F, vgrf2.:F
> > gen4_scratch_write hw_reg0:F, vgrf1.:D, 22D
> > mov vgrf3.0.x:F, vgrf1.:F
> > ...
> > gen4_scratch_read vgrf4.0.x:F, 22D
> > mov vgrf3.0.x:F, vgrf4.:F
> >
> > Instead of this:
> >
> > mov vgrf1.0.x:F, vgrf2.:F
> > gen4_scratch_write hw_reg0:F, vgrf1.:D, 22D
> > gen4_scratch_read vgrf4.0.x:F, 22D
> > mov vgrf3.0.x:F, vgrf4.:F
> > ...
> > gen4_scratch_read vgrf5.0.x:F, 22D
> > mov vgrf3.0.x:F, vgrf5.:F
> >
> > And save one scratch read while still preserving the benefits of
> > spilling the register.
> >
> > In general, we avoid emitting scratch reads for as long as the next 
> > instruction
> > keeps reading the spilled register. This should not harm the benefit of
> > spilling the register because gains for register allocation only come when 
> > we
> > have chunks of program code where the register is alive but not really used
> > (because these are the points where we could effectively use that register 
> > for
> > another purpose if we spilled it), so as long as consecutive instructions 
> > use
> > that register we can avoid the scratch reads without losing anything.
> > ---
> >  .../drivers/dri/i965/brw_vec4_reg_allocate.cpp | 37 
> > +-
> >  1 file changed, 36 insertions(+), 1 deletion(-)
> >
> > diff --git a/src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp 
> > b/src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp
> > index cff5406..fd56dae 100644
> > --- a/src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp
> > +++ b/src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp
> > @@ -340,11 +340,43 @@ vec4_visitor::spill_reg(int spill_reg_nr)
> > unsigned int spill_offset = last_scratch++;
> >  
> > /* Generate spill/unspill instructions for the objects being spilled. */
> > +   vec4_instruction *spill_write_inst = NULL;
> > foreach_block_and_inst(block, vec4_instruction, inst, cfg) {
> > +  /* We don't spill registers used for scratch */
> > +  if (inst->opcode == SHADER_OPCODE_GEN4_SCRATCH_READ ||
> > +  inst->opcode == SHADER_OPCODE_GEN4_SCRATCH_WRITE)
> > + continue;
> > +
> >int scratch_reg = -1;
> > +  bool spill_reg_was_read = false;
> >for (unsigned int i = 0; i < 3; i++) {
> >   if (inst->src[i].file == GRF && inst->src[i].reg == spill_reg_nr) 
> > {
> > -if (scratch_reg == -1) {
> > +if (!spill_reg_was_read) {
> > +   spill_reg_was_read = (!inst->predicate ||
> > + inst->opcode == BRW_OPCODE_SEL);
> > +}
> > +
> > +/* If we are reading the spilled register right after writing
> > + * to it we can skip the scratch read and use directly the
> > + * register we used as source for the scratch write. For this
> > + * to work we must check that:
> > + *
> > + * 1) The write is inconditional, that is, it is not 
> > predicated or
> > + *it is a SEL.
> > + * 2) All the channels that we read have been written in that
> > + *last write instruction.
> > + *
> > + * We keep doing this for as long as the next instruction
> > + * keeps reading the spilled register and break as soon as we
> > + * find an instruction that doesn't.
> > + */
> > +if (spill_write_inst &&
> > +(!spill_write_inst->predicate ||
> > + spill_write_inst->opcode == BRW_OPCODE_SEL) &&
> > +((brw_mask_for_swizzle(inst->src[i].swizzle) &
> > + ~spill_write_inst->dst.writemask) == 0)) {
> > +   scratch_reg = spill_write_inst->dst.reg;
> > +} else if (scratch_reg == -1) {
> 
> One suggestion: You could factor out the rather complex caching logic
> into a separate function (e.g. 'bool can_reuse_scratch_for_source(const
> vec4_instruction *, unsigned i, unsigned scratch_reg)').  The function
> would simply compare scratch_reg with the sources of the current
> instruction (up to src) and the sources and destination of the previous
> non-scratch_read/write instruction.  If there's a match it would check
> that the regioning is compatible with the i-th source and return true in
> that case.  This would have several benefits:

I think this might need to be a bit more complex. The previous inst's
src[i] might read only a subset of the channels that where loaded into
scratch_reg so comparing only against that can lead us to think that we
can't reuse scratch_reg when in fact we can.

I think the process should be more like we loop back looking a

Re: [Mesa-dev] [PATCH v4 (part2) 52/56] mesa: Add getters for the GL_ARB_shader_storage_buffer_object max constants

2015-07-31 Thread Tapani Pälli



On 07/31/2015 10:31 AM, Samuel Iglesias Gonsálvez wrote:

On Fri, 2015-07-31 at 09:26 +0300, Tapani Pälli wrote:


On 07/24/2015 08:30 AM, Samuel Iglesias Gonsálvez wrote:


On 23/07/15 08:42, Samuel Iglesias Gonsalvez wrote:

v2:
- Add tessellation shader constants support

Signed-off-by: Samuel Iglesias Gonsalvez 
---
   src/mesa/main/get.c  |  1 +
   src/mesa/main/get_hash_params.py | 14 ++
   2 files changed, 15 insertions(+)

diff --git a/src/mesa/main/get.c b/src/mesa/main/get.c
index 56cc3f2..a75bea9 100644
--- a/src/mesa/main/get.c
+++ b/src/mesa/main/get.c
@@ -414,6 +414,7 @@ EXTRA_EXT(ARB_clip_control);
   EXTRA_EXT(EXT_polygon_offset_clamp);
   EXTRA_EXT(ARB_framebuffer_no_attachments);
   EXTRA_EXT(ARB_tessellation_shader);
+EXTRA_EXT(ARB_shader_storage_buffer_object);

   static const int
   extra_ARB_color_buffer_float_or_glcore[] = {
diff --git a/src/mesa/main/get_hash_params.py
b/src/mesa/main/get_hash_params.py
index 2cf06d6..f810155 100644
--- a/src/mesa/main/get_hash_params.py
+++ b/src/mesa/main/get_hash_params.py
@@ -373,6 +373,20 @@ descriptor=[
 [ "UNIFORM_BUFFER_OFFSET_ALIGNMENT",
"CONTEXT_INT(Const.UniformBufferOffsetAlignment),
extra_ARB_uniform_buffer_object" ],
 [ "UNIFORM_BUFFER_BINDING", "LOC_CUSTOM, TYPE_INT, 0,
extra_ARB_uniform_buffer_object" ],

+  # GL_ARB_shader_storage_buffer_object
+  [ "MAX_VERTEX_SHADER_STORAGE_BLOCKS",
"CONTEXT_INT(Const.Program[MESA_SHADER_VERTEX].MaxShaderStorageBl
ocks), extra_ARB_shader_storage_buffer_object" ],
+  [ "MAX_GEOMETRY_SHADER_STORAGE_BLOCKS",
"CONTEXT_INT(Const.Program[MESA_SHADER_FRAGMENT].MaxShaderStorage
Blocks), extra_ARB_shader_storage_buffer_object" ],
+  [ "MAX_TESS_CONTROL_SHADER_STORAGE_BLOCKS",
"CONTEXT_INT(Const.Program[MESA_SHADER_TESS_CTRL].MaxShaderStorag
eBlocks), extra_ARB_shader_storage_buffer_object" ],
+  [ "MAX_TESS_EVALUATION_SHADER_STORAGE_BLOCKS",
"CONTEXT_INT(Const.Program[MESA_SHADER_TESS_EVAL].MaxShaderStorag
eBlocks), extra_ARB_shader_storage_buffer_object" ],
+  [ "MAX_FRAGMENT_SHADER_STORAGE_BLOCKS",
"CONTEXT_INT(Const.Program[MESA_SHADER_FRAGMENT].MaxShaderStorage
Blocks), extra_ARB_shader_storage_buffer_object" ],
+  [ "MAX_COMPUTE_SHADER_STORAGE_BLOCKS",
"CONTEXT_INT(Const.Program[MESA_SHADER_COMPUTE].MaxShaderStorageB
locks), extra_ARB_shader_storage_buffer_object" ],
+  [ "MAX_COMBINED_SHADER_STORAGE_BLOCKS",
"CONTEXT_INT(Const.MaxCombinedShaderStorageBlocks),
extra_ARB_shader_storage_buffer_object" ],
+  [ "MAX_SHADER_STORAGE_BLOCK_SIZE",
"CONTEXT_INT(Const.MaxShaderStorageBlockSize),
extra_ARB_shader_storage_buffer_object" ],
+  [ "MAX_SHADER_STORAGE_BUFFER_BINDINGS",
"CONTEXT_INT(Const.MaxShaderStorageBufferBindings),
extra_ARB_shader_storage_buffer_object" ],
+  [ "MAX_COMBINED_SHADER_OUTPUT_RESOURCES",
"CONTEXT_INT(Const.MaxCombinedImageUnitsAndFragmentOutputs),
extra_ARB_shader_storage_buffer_object" ],
+  [ "SHADER_STORAGE_BUFFER_OFFSET_ALIGNMENT",
"CONTEXT_INT(Const.ShaderStorageBufferOffsetAlignment),
extra_ARB_shader_storage_buffer_object" ],
+  [ "SHADER_STORAGE_BUFFER_BINDING", "LOC_CUSTOM, TYPE_INT, 0,
extra_ARB_shader_storage_buffer_object" ],
+


While I was writing ARB_shader_storage_buffer_object support for
GLES
3.1, I realized that this patch misplaced these lines. They should
be at
the end of the file (inside "Enums restricted to OpenGL Core
profile"
section).

Later, one of the GLES 3.1 patches will move the corresponding
constants
to "Enums in OpenGL Core profile and ES 3.1" section.

I can send another version of this patch with that fixed, if you
want.


Why not directly move these to "Enums in OpenGL Core profile and ES
3.1"
section?



Not all constants are defined in GLES 3.1. However, what I can do is to
merge the GLES 3.1 patch to this one, so we have one patch with all the
support.

Does this sounds good to you?


Sounds perfect!


Sam


Sam


   # GL_ARB_vertex_shader
 [ "MAX_VERTEX_UNIFORM_COMPONENTS_ARB",
"CONTEXT_INT(Const.Program[MESA_SHADER_VERTEX].MaxUniformComponen
ts), extra_ARB_vertex_shader" ],
 [ "MAX_VARYING_FLOATS_ARB", "LOC_CUSTOM, TYPE_INT, 0,
extra_ARB_vertex_shader" ],


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev




___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/3] dri: set the __DRI_API_OPENGL bit based on max gl compat version

2015-07-31 Thread Frank Binns
On 30/07/15 17:16, Emil Velikov wrote:
> On 30 July 2015 at 10:27, Frank Binns  wrote:
>> This matches similar behaviour for the __DRI_API_OPENGL_CORE bit.
>>
>> Signed-off-by: Frank Binns 
>> ---
>>  src/mesa/drivers/dri/common/dri_util.c | 4 +++-
>>  1 file changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/src/mesa/drivers/dri/common/dri_util.c 
>> b/src/mesa/drivers/dri/common/dri_util.c
>> index 884a7e0..d35ac26 100644
>> --- a/src/mesa/drivers/dri/common/dri_util.c
>> +++ b/src/mesa/drivers/dri/common/dri_util.c
>> @@ -163,7 +163,9 @@ driCreateNewScreen2(int scrn, int fd,
>> }
>>  }
>>
>> -psp->api_mask = (1 << __DRI_API_OPENGL);
>> +psp->api_mask = 0;
>> +if (psp->max_gl_compat_version > 0)
>> +   psp->api_mask |= (1 << __DRI_API_OPENGL);
> It's almost as if there is a DRI module that doesn't do OpenGL but
> GLES alone (cough, cough).
>
> But seriously, the series looks great imho, and is Reviewed-by: Emil
> Velikov 
>
> I do wonder if we should pull patch#1 (and #2) for the stable branch ?
>
> -Emil
Sure, I'll resend CCing stable for the first two patches.

Thanks
Frank
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/3] egl: Add eglQuerySurface surface type check for EGL_LARGEST_PBUFFER attrib

2015-07-31 Thread Frank Binns
Calling eglQuerySurface on a window or pixmap with the EGL_LARGEST_PBUFFER
attribute resulted in the contents of the 'value' parameter being modified.
This is the wrong behaviour according to the EGL spec, which states:

"Querying EGL_LARGEST_PBUFFER for a pbuffer surface returns the
 same attribute value specified when the surface was created with
 eglCreatePbufferSurface. For a window or pixmap surface, the
 contents of value are not modified."

Avoid this from happening by checking that the surface type is EGL_PBUFFER_BIT
before modifying the contents of the parameter.

Cc: 
Signed-off-by: Frank Binns 
Reviewed-by: Emil Velikov 
---
 src/egl/main/eglsurface.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/egl/main/eglsurface.c b/src/egl/main/eglsurface.c
index 541353f..4fa43f3 100644
--- a/src/egl/main/eglsurface.c
+++ b/src/egl/main/eglsurface.c
@@ -326,7 +326,8 @@ _eglQuerySurface(_EGLDriver *drv, _EGLDisplay *dpy, 
_EGLSurface *surface,
   *value = surface->Config->ConfigID;
   break;
case EGL_LARGEST_PBUFFER:
-  *value = surface->LargestPbuffer;
+  if (surface->Type == EGL_PBUFFER_BIT)
+ *value = surface->LargestPbuffer;
   break;
case EGL_TEXTURE_FORMAT:
   /* texture attributes: only for pbuffers, no error otherwise */
-- 
1.8.5.4.gfd2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/3] egl/dri: Add error info needed for EGL_EXT_image_dma_buf_import extension

2015-07-31 Thread Frank Binns
Update the DRI image interface error codes to reflect the needs of the
EGL_EXT_image_dma_buf_import extension. This means updating the existing error
code documentation and adding a new __DRI_IMAGE_ERROR_BAD_ACCESS error code
so that drivers can correctly reject unsupported pitches and offsets. Hook
the new error code up in EGL to return EGL_BAD_ACCESS.

Cc: 
Signed-off-by: Frank Binns 
Reviewed-by: Emil Velikov 
---
 include/GL/internal/dri_interface.h | 8 ++--
 src/egl/drivers/dri2/egl_dri2.c | 4 
 2 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/include/GL/internal/dri_interface.h 
b/include/GL/internal/dri_interface.h
index e7cf50d..a0f155a 100644
--- a/include/GL/internal/dri_interface.h
+++ b/include/GL/internal/dri_interface.h
@@ -1178,7 +1178,8 @@ enum __DRIChromaSiting {
 };
 
 /**
- * \name Reasons that __DRIimageExtensionRec::createImageFromTexture might fail
+ * \name Reasons that __DRIimageExtensionRec::createImageFromTexture or
+ * __DRIimageExtensionRec::createImageFromDmaBufs might fail
  */
 /*@{*/
 /** Success! */
@@ -1187,11 +1188,14 @@ enum __DRIChromaSiting {
 /** Memory allocation failure */
 #define __DRI_IMAGE_ERROR_BAD_ALLOC 1
 
-/** Client requested an invalid attribute for a texture object  */
+/** Client requested an invalid attribute */
 #define __DRI_IMAGE_ERROR_BAD_MATCH 2
 
 /** Client requested an invalid texture object */
 #define __DRI_IMAGE_ERROR_BAD_PARAMETER 3
+
+/** Client requested an invalid pitch and/or offset */
+#define __DRI_IMAGE_ERROR_BAD_ACCESS4
 /*@}*/
 
 /**
diff --git a/src/egl/drivers/dri2/egl_dri2.c b/src/egl/drivers/dri2/egl_dri2.c
index e3afab4..432260b 100644
--- a/src/egl/drivers/dri2/egl_dri2.c
+++ b/src/egl/drivers/dri2/egl_dri2.c
@@ -1592,6 +1592,10 @@ dri2_create_image_khr_texture_error(int dri_error)
   egl_error = EGL_BAD_PARAMETER;
   break;
 
+   case __DRI_IMAGE_ERROR_BAD_ACCESS:
+  egl_error = EGL_BAD_ACCESS;
+  break;
+
default:
   assert(0);
   egl_error = EGL_BAD_MATCH;
-- 
1.8.5.4.gfd2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/3] dri: set the __DRI_API_OPENGL bit based on max gl compat version

2015-07-31 Thread Frank Binns
This matches similar behaviour for the __DRI_API_OPENGL_CORE bit.

Signed-off-by: Frank Binns 
Reviewed-by: Emil Velikov 
---
 src/mesa/drivers/dri/common/dri_util.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/common/dri_util.c 
b/src/mesa/drivers/dri/common/dri_util.c
index 884a7e0..d35ac26 100644
--- a/src/mesa/drivers/dri/common/dri_util.c
+++ b/src/mesa/drivers/dri/common/dri_util.c
@@ -163,7 +163,9 @@ driCreateNewScreen2(int scrn, int fd,
}
 }
 
-psp->api_mask = (1 << __DRI_API_OPENGL);
+psp->api_mask = 0;
+if (psp->max_gl_compat_version > 0)
+   psp->api_mask |= (1 << __DRI_API_OPENGL);
 if (psp->max_gl_core_version > 0)
psp->api_mask |= (1 << __DRI_API_OPENGL_CORE);
 if (psp->max_gl_es1_version > 0)
-- 
1.8.5.4.gfd2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [Mesa-dev, 1/9] mesa/es3.1: Allow binding GL_DRAW_INDIRECT_BUFFER with gles 3.1

2015-07-31 Thread Tapani Pälli
I've gone through this set (and using them regularly to be able to run 
3.1es conformance tests);


Patches 1,2,3,4,5,7,8

Reviewed-by: Tapani Pälli 

I believe with 6 and 9 there are changes required.

On 05/11/2015 04:03 PM, Marta Löfstedt wrote:

From: Marta Lofstedt 

Signed-off-by: Marta Lofstedt 

---
src/mesa/main/bufferobj.c | 5 +++--
  1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/src/mesa/main/bufferobj.c b/src/mesa/main/bufferobj.c
index 66dee68..07f82cd 100644
--- a/src/mesa/main/bufferobj.c
+++ b/src/mesa/main/bufferobj.c
@@ -91,8 +91,9 @@ get_buffer_target(struct gl_context *ctx, GLenum target)
 case GL_COPY_WRITE_BUFFER:
return &ctx->CopyWriteBuffer;
 case GL_DRAW_INDIRECT_BUFFER:
-  if (ctx->API == API_OPENGL_CORE &&
-  ctx->Extensions.ARB_draw_indirect) {
+  if ((ctx->API == API_OPENGL_CORE &&
+   ctx->Extensions.ARB_draw_indirect) ||
+   _mesa_is_gles31(ctx)) {
   return &ctx->DrawIndirectBuffer;
}
break;


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] r600, compute: force tiling on 2D and 3D texture compute resources

2015-07-31 Thread Zoltan Gilian
To circumvent a problem occuring when LINEAR_ALIGNED array mode is
selected on a TEXTURE_2D RAT.
This configuration causes MEM_RAT STORE_TYPED to write to incorrect
locations.
---
 src/gallium/drivers/radeon/r600_texture.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/src/gallium/drivers/radeon/r600_texture.c 
b/src/gallium/drivers/radeon/r600_texture.c
index 8169e05..f1fce04 100644
--- a/src/gallium/drivers/radeon/r600_texture.c
+++ b/src/gallium/drivers/radeon/r600_texture.c
@@ -706,6 +706,7 @@ static unsigned r600_choose_tiling(struct 
r600_common_screen *rscreen,
   const struct pipe_resource *templ)
 {
const struct util_format_description *desc = 
util_format_description(templ->format);
+   bool force_tiling = templ->flags & R600_RESOURCE_FLAG_FORCE_TILING;
 
/* MSAA resources must be 2D tiled. */
if (templ->nr_samples > 1)
@@ -715,10 +716,15 @@ static unsigned r600_choose_tiling(struct 
r600_common_screen *rscreen,
if (templ->flags & R600_RESOURCE_FLAG_TRANSFER)
return RADEON_SURF_MODE_LINEAR_ALIGNED;
 
+   /* Force tiling on TEXTURE_2D and TEXTURE_3D compute resources. */
+   if ((templ->bind & PIPE_BIND_COMPUTE_RESOURCE) &&
+   (templ->target == PIPE_TEXTURE_2D ||
+templ->target == PIPE_TEXTURE_3D))
+   force_tiling = true;
+
/* Handle common candidates for the linear mode.
 * Compressed textures must always be tiled. */
-   if (!(templ->flags & R600_RESOURCE_FLAG_FORCE_TILING) &&
-   !util_format_is_compressed(templ->format)) {
+   if (!force_tiling && !util_format_is_compressed(templ->format)) {
/* Not everything can be linear, so we cannot enforce it
 * for all textures. */
if ((rscreen->debug_flags & DBG_NO_TILING) &&
-- 
2.4.6

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] r600, compute: force tiling on 2D and 3D texture compute resources

2015-07-31 Thread Zoltán Gilián
> Also, does the same restriction apply to SI and newer GPUs?

I don't know. I can only test this on an evergreen (juniper) GPU.

On Fri, Jul 31, 2015 at 5:15 AM, Michel Dänzer  wrote:
> On 31.07.2015 07:06, Zoltan Gilian wrote:
>> To circumvent a problem occuring when LINEAR_ALIGNED array mode is
>> selected on a TEXTURE_2D RAT.
>> This configuration causes MEM_RAT STORE_TYPED to write to incorrect
>> locations.
>
> [...]
>
>> @@ -715,10 +716,15 @@ static unsigned r600_choose_tiling(struct 
>> r600_common_screen *rscreen,
>>   if (templ->flags & R600_RESOURCE_FLAG_TRANSFER)
>>   return RADEON_SURF_MODE_LINEAR_ALIGNED;
>>
>> + /* Force tiling on TEXTURE_2D and TEXTURE_3D compute resources. */
>> + if ((templ->target == PIPE_TEXTURE_2D ||
>> +  templ->target == PIPE_TEXTURE_3D) &&
>> + (templ->bind & PIPE_BIND_COMPUTE_RESOURCE))
>> + force_tiling = true;
>
> This should test (templ->bind & PIPE_BIND_COMPUTE_RESOURCE) first.
>
> Also, does the same restriction apply to SI and newer GPUs?
>
>
> --
> Earthling Michel Dänzer   |   http://www.amd.com
> Libre software enthusiast | Mesa and X developer
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 00/78] i965: A new vec4 backend based on NIR

2015-07-31 Thread Eduardo Lima Mitev
On 07/30/2015 09:48 PM, Jason Ekstrand wrote:
> 
> On Jul 27, 2015 3:39 PM, "Jason Ekstrand"  > wrote:
>>
>> On Mon, Jul 27, 2015 at 2:07 PM, Eduardo Lima Mitev  > wrote:
>> > On 07/25/2015 03:08 AM, Jason Ekstrand wrote:
>> >> Alright, I got through it again...
>> >>
>> >> I asked for a few trivial changes on a few of the patches.  With those
>> >> fixed, everything except patch 65 and 66 are
>> >>
>> >
>> > Thanks! we have fixed most of the patches already.
>> >
>> >> Reviewed-by: Jason Ekstrand  >
>> >>
>> >> While the requested changes on the texturing patches are not
>> >> complicated, I would like to see the updated version of those two
>> >> patches before I give a formal R-B.
>> >>
>> >
>> > Sure, we will send new versions of patch 65 and 66 as soon as we have
>> > them ready and tested.
>> >
>> >> There is one caveat to the above review: Something is badly broken on
>> >> HSW.  I'm not sure what, but one of the patches badly breaks HSW even
>> >> in the non-NIR case.  We need to figure out what that is before
>> >> pushing anything.  That said, please don't re-send the full series;
>> >> just figure out where the bug is, fix it, and re-send the one patch.
>> >>
>> >
>> > We have done extensive testing today, trying to find something strange
>> > that explains this breakage you are experiencing. Most of us have HSW
>> > laptops so it is our normal development environments.
>> >
>> > However, we have not been able to reproduce any fail that would suggest
>> > our branch breaks something. Using current master as baseline, I have
>> > tested both nir-vec4 and vec4_visitor against it (piglit and dEQP), and
>> > we see no regressions other that the one we described in the cover
> letter.
>> >
>> > Please, could you be more specific about the symptoms of this breakage
>> > and give us any possible info that would help us reproduce it locally?
>> > Maybe details of your system, kernel and gcc version; anything that
>> > might be different to a standard linux system.
>>
>> I'm not sure what happened there.  I may have had a bad rebase or
>> something.  I pulled your branch, rebased on master, and tried it
>> again.  Now it seems to be ok.  On Broadwell and Cherryview, however,
>> it's not so ok.  It looks like you may have accidentally broken scalar
>> VS or something like that.  I can't even run glxgears on BDW without
>> hitting an assert.  If your dev machine is a HSW, you can use
>> INTEL_DEVID_OVERRIDE to tell it to compile shaders as if it's a BDW
>> and figure out why it's crashing.  The patch titled "nir/nir_lower_io:
>> Add vec4 support" makes glxgears render black on BDW.  It seems like a
>> later patch is causing the crash.  In any case, you should be able to
>> use INTEL_DEVID_OVERRIDE and compare the shader results.
>>
>> --Jason
> 
> Any progress on this?
> 

Yes, I think I found the issue and fixed it, (at least the only one I
was able to find so far); and it was indeed the one causing the assert
in glxgears.

Basically, it was a silly mistake in one of the patches I authored:
https://github.com/Igalia/mesa/commit/2d72ae5f9e26c3c11dce9e779407984ef9774a52

You can see that I passed 'false' in 'is_scalar' argument of
brw_create_nir() for ARB_vertex_programs, ignoring that in BDW those are
scalar too. The fix was to pass 'brw->intelScreen->compiler->scalar_vs'
instead of 'false' there.

Indirectly, this error was messing up the lowering passes in brw_nir,
assuming non-scalar when it was scalar. Specifically, the pass
nir_lower_alu_to_scalar() was not called because we disabled it on
non-scalar shaders.

I will send a new version of the patch I changed, in case you want to
have a look. I also updated the git-tree of the branch with the fix and
rebased against master:

https://github.com/Igalia/mesa/tree/nir-vec4-v2

Please, let us know if you find more issues.


Right now we have ~8 new piglit regressions (in HSW) after rebasing
master. I'm currently analyzing them and hope to fix them soon. They are
under "arb_shader_atomic_counters" set, so that you know in case you see
them.

Thank you Jason!

Eduardo

>> > Thanks again for the review and patience.
>> >
>> > Eduardo
>> >
>> >> --Jason
>> >>
>> >> On Thu, Jul 23, 2015 at 3:16 AM, Eduardo Lima Mitev
> mailto:el...@igalia.com>> wrote:
>> >>> Hi,
>> >>>
>> >>> This is the second version of the series that adds a new vec4
> backend for i965 based on NIR. The first series was sent some weeks ago
> [1] and went through review by Jason and Kenneth (thanks a lot!). This
> series is a result of addressing issues detected during that feedback.
>> >>>
>> >>> This series also adds support for the NIR->vec4 pass on geometry
> shaders and on ARB_vertex_programs. Both supports were work-in-progress
> by the time we sent the first version, and are now completed. The
> patch-sets for GS and ARB_vertex_program were added at the end of the
> series.
>> >>>
>> >>> Like the firs

Re: [Mesa-dev] Properly exposing limits with gallium

2015-07-31 Thread Marek Olšák
On Thu, Jul 30, 2015 at 10:21 PM, Ilia Mirkin  wrote:
> So I'm trying to get these things to line up, especially for nvc0.
>
> Here are the limits exposed by the blob drivers:
>
> http://people.freedesktop.org/~imirkin/glxinfo/glxinfo.html#v=Vendor
>
> and they reflect what the hardware is capable. More specifically, I need
>
> GL_MAX_VERTEX_OUTPUT_COMPONENTS = 128
> GL_MAX_VARYING_FLOATS_ARB = 124
> GL_MAX_FRAGMENT_INPUT_COMPONENTS = 128
> GL_MAX_GEOMETRY_INPUT_COMPONENTS = 128
> GL_MAX_GEOMETRY_OUTPUT_COMPONENTS = 128
>
> What would the proper way to expose this be? Should we always just say
> MAX_VARYING_FLOATS = MAX_VERTEX_OUTPUT_COMPONENTS - 4? Or a separate
> PIPE_SHADER_CAP for the maxvaryingfloats?

According to the spec, the former. It looks like all outputs are
counted against the limit, but one of them doesn't include
gl_Position.

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] r600, compute: force tiling on 2D and 3D texture compute resources

2015-07-31 Thread Michel Dänzer
On 31.07.2015 17:39, Zoltan Gilian wrote:
> To circumvent a problem occuring when LINEAR_ALIGNED array mode is
> selected on a TEXTURE_2D RAT.
> This configuration causes MEM_RAT STORE_TYPED to write to incorrect
> locations.

[...]

> @@ -715,10 +716,15 @@ static unsigned r600_choose_tiling(struct 
> r600_common_screen *rscreen,
>   if (templ->flags & R600_RESOURCE_FLAG_TRANSFER)
>   return RADEON_SURF_MODE_LINEAR_ALIGNED;
>  
> + /* Force tiling on TEXTURE_2D and TEXTURE_3D compute resources. */
> + if ((templ->bind & PIPE_BIND_COMPUTE_RESOURCE) &&
> + (templ->target == PIPE_TEXTURE_2D ||
> +  templ->target == PIPE_TEXTURE_3D))

The indentation of the second and third line of the if statement should
be better aligned to the first line. With that fixed,

Reviewed-by: Michel Dänzer 


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965/fs: Fix regression with SIMD8 VS since b5f1a48e234d47b24df38cb562cffb8941d43795.

2015-07-31 Thread Lofstedt, Marta
In hope of speeding up the  review and merge of this bug fix,
I confirm that below patch fix the regression I have had from:
"b5f1a48e234d47b24df38cb562cffb8941d43795" on Gen8.

/Marta

> -Original Message-
> From: Francisco Jerez [mailto:curroje...@riseup.net]
> Sent: Thursday, July 30, 2015 2:23 PM
> To: mesa-dev@lists.freedesktop.org
> Cc: Lofstedt, Marta
> Subject: [PATCH] i965/fs: Fix regression with SIMD8 VS since
> b5f1a48e234d47b24df38cb562cffb8941d43795.
> 
> With num_direct_uniforms == 0 there's no space allocated in the param_size
> array for the one block of direct uniforms -- On the FS stage this would be a
> harmless no-op because it would simply re-set one of the param_size entries
> allocated for the sampler units to zero, but on the VS stage it has been
> reported to cause memory corruption followed by a crash -- Surprising how a
> full piglit run on Gen8 didn't catch it.
> 
> Reported-by: Marta Lofstedt 
> ---
>  src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> index 722e4e7..9cb7b0d 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> @@ -182,7 +182,8 @@ fs_visitor::nir_setup_uniforms(nir_shader *shader)
> /* We split the uniform register file in half.  The first half is
>  * entirely direct uniforms.  The second half is indirect.
>  */
> -   param_size[0] = num_direct_uniforms;
> +   if (num_direct_uniforms > 0)
> +  param_size[0] = num_direct_uniforms;
> if (shader->num_uniforms > num_direct_uniforms)
>param_size[num_direct_uniforms] = shader->num_uniforms -
> num_direct_uniforms;
> 
> --
> 2.4.6

--
Intel Sweden AB
Registered Office: Knarrarnasgatan 15, 164 40 Kista, Stockholm, Sweden
Registration Number: 556189-6027

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] r600, compute: force tiling on 2D and 3D texture compute resources

2015-07-31 Thread Marek Olšák
Please apply this behavior to r600g only. See rscreen->chip_class.

Marek

On Fri, Jul 31, 2015 at 12:06 AM, Zoltan Gilian  wrote:
> To circumvent a problem occuring when LINEAR_ALIGNED array mode is
> selected on a TEXTURE_2D RAT.
> This configuration causes MEM_RAT STORE_TYPED to write to incorrect
> locations.
> ---
>  src/gallium/drivers/radeon/r600_texture.c | 10 --
>  1 file changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/src/gallium/drivers/radeon/r600_texture.c 
> b/src/gallium/drivers/radeon/r600_texture.c
> index 8169e05..8979a98 100644
> --- a/src/gallium/drivers/radeon/r600_texture.c
> +++ b/src/gallium/drivers/radeon/r600_texture.c
> @@ -706,6 +706,7 @@ static unsigned r600_choose_tiling(struct 
> r600_common_screen *rscreen,
>const struct pipe_resource *templ)
>  {
> const struct util_format_description *desc = 
> util_format_description(templ->format);
> +   bool force_tiling = templ->flags & R600_RESOURCE_FLAG_FORCE_TILING;
>
> /* MSAA resources must be 2D tiled. */
> if (templ->nr_samples > 1)
> @@ -715,10 +716,15 @@ static unsigned r600_choose_tiling(struct 
> r600_common_screen *rscreen,
> if (templ->flags & R600_RESOURCE_FLAG_TRANSFER)
> return RADEON_SURF_MODE_LINEAR_ALIGNED;
>
> +   /* Force tiling on TEXTURE_2D and TEXTURE_3D compute resources. */
> +   if ((templ->target == PIPE_TEXTURE_2D ||
> +templ->target == PIPE_TEXTURE_3D) &&
> +   (templ->bind & PIPE_BIND_COMPUTE_RESOURCE))
> +   force_tiling = true;
> +
> /* Handle common candidates for the linear mode.
>  * Compressed textures must always be tiled. */
> -   if (!(templ->flags & R600_RESOURCE_FLAG_FORCE_TILING) &&
> -   !util_format_is_compressed(templ->format)) {
> +   if (!force_tiling && !util_format_is_compressed(templ->format)) {
> /* Not everything can be linear, so we cannot enforce it
>  * for all textures. */
> if ((rscreen->debug_flags & DBG_NO_TILING) &&
> --
> 2.4.6
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v3 (part2) 49/56] main: Add SHADER_STORAGE_BLOCK and BUFFER_VARIABLE support for ARB_program_interface_query

2015-07-31 Thread Samuel Iglesias Gonsálvez
On Fri, 2015-07-31 at 09:32 +0200, Samuel Iglesias Gonsálvez wrote:
> On Fri, 2015-07-31 at 09:09 +0300, Tapani Pälli wrote:
> > On 07/14/2015 10:46 AM, Iago Toral Quiroga wrote:
> > > From: Samuel Iglesias Gonsalvez 
> > > 
> > > Including TOP_LEVEL_ARRAY_SIZE and TOP_LEVEL_ARRAY_STRIDE 
> > > queries.
> > > 
> > > Signed-off-by: Samuel Iglesias Gonsalvez 
> > > ---
> > >   src/glsl/ir_uniform.h|   5 +
> > >   src/glsl/link_uniforms.cpp   |  17 ++-
> > >   src/glsl/linker.cpp  |  10 +-
> > >   src/mesa/main/program_resource.c |   7 +-
> > >   src/mesa/main/shader_query.cpp   | 265 
> > > +--
> > >   5 files changed, 289 insertions(+), 15 deletions(-)
> > > 
> > > diff --git a/src/glsl/ir_uniform.h b/src/glsl/ir_uniform.h
> > > index e1b8014..71d894c 100644
> > > --- a/src/glsl/ir_uniform.h
> > > +++ b/src/glsl/ir_uniform.h
> > > @@ -186,6 +186,11 @@ struct gl_uniform_storage {
> > >   * This is a built-in uniform that should not be modified 
> > > through any gl API.
> > >   */
> > >  bool builtin;
> > > +
> > > +   /**
> > > +* This is a shader storage buffer variable, not an uniform.
> > > +*/
> > > +   bool is_shader_storage;
> > >   };
> > > 
> > >   #ifdef __cplusplus
> > > diff --git a/src/glsl/link_uniforms.cpp 
> > > b/src/glsl/link_uniforms.cpp
> > > index eefe7dc..29a8799 100644
> > > --- a/src/glsl/link_uniforms.cpp
> > > +++ b/src/glsl/link_uniforms.cpp
> > > @@ -638,6 +638,9 @@ private:
> > > if (!this->uniforms[id].builtin)
> > >this->uniforms[id].storage = this->values;
> > > 
> > > +  this->uniforms[id].is_shader_storage =
> > > + current_var->is_in_shader_storage_block();
> > > +
> > > if (this->ubo_block_index != -1) {
> > >this->uniforms[id].block_index = this
> > > ->ubo_block_index;
> > > 
> > > @@ -647,8 +650,12 @@ private:
> > >this->ubo_byte_offset += type->std140_size(row_major);
> > > 
> > >if (type->is_array()) {
> > > - this->uniforms[id].array_stride =
> > > -glsl_align(type->fields.array
> > > ->std140_size(row_major), 16);
> > > + if (type->interface_packing == 
> > > GLSL_INTERFACE_PACKING_STD430)
> > > +this->uniforms[id].array_stride =
> > > +   type->fields.array->std430_size(row_major);
> > > + else
> > > +this->uniforms[id].array_stride =
> > > +   glsl_align(type->fields.array
> > > ->std140_size(row_major), 16);
> > >} else {
> > >   this->uniforms[id].array_stride = 0;
> > >}
> > > @@ -659,7 +666,11 @@ private:
> > >   const unsigned items = row_major ? matrix
> > > ->matrix_columns : matrix->vector_elements;
> > > 
> > >   assert(items <= 4);
> > > -this->uniforms[id].matrix_stride = glsl_align(items 
> > > * 
> > > N, 16);
> > > +if (type->interface_packing == 
> > > GLSL_INTERFACE_PACKING_STD430)
> > > +   this->uniforms[id].matrix_stride = items < 3 ? 
> > > items * N :
> > > + 
> > >  glsl_align(items 
> > > * N, 16);
> > > +else
> > > +   this->uniforms[id].matrix_stride = 
> > > glsl_align(items 
> > > * N, 16);
> > >   this->uniforms[id].row_major = row_major;
> > >} else {
> > >   this->uniforms[id].matrix_stride = 0;
> > > diff --git a/src/glsl/linker.cpp b/src/glsl/linker.cpp
> > > index 330ef56..e82aa61 100644
> > > --- a/src/glsl/linker.cpp
> > > +++ b/src/glsl/linker.cpp
> > > @@ -2852,14 +2852,18 @@ build_program_resource_list(struct 
> > > gl_context *ctx,
> > >}
> > > }
> > > 
> > > -  if (!add_program_resource(shProg, GL_UNIFORM,
> > > +  bool is_shader_storage =  shProg
> > > ->UniformStorage[i].is_shader_storage;
> > > +  GLenum type = is_shader_storage ? GL_BUFFER_VARIABLE : 
> > > GL_UNIFORM;
> > > +  if (!add_program_resource(shProg, type,
> > >   &shProg->UniformStorage[i], 
> > > stageref))
> > >return;
> > >  }
> > > 
> > > -   /* Add program uniform blocks. */
> > > +   /* Add program uniform blocks and shader storage blocks. */
> > >  for (unsigned i = 0; i < shProg->NumUniformBlocks; i++) {
> > > -  if (!add_program_resource(shProg, GL_UNIFORM_BLOCK,
> > > +  bool is_shader_storage = shProg
> > > ->UniformBlocks[i].IsShaderStorage;
> > > +  GLenum type = is_shader_storage ? GL_SHADER_STORAGE_BLOCK 
> > > : 
> > > GL_UNIFORM_BLOCK;
> > > +  if (!add_program_resource(shProg, type,
> > > &shProg->UniformBlocks[i], 0))
> > >return;
> > >  }
> > > diff --git a/src/mesa/main/program_resource.c 
> > > b/src/mesa/main/program_resource.c
> > > index d857b84..0444e3b 100644
> > > --- a/src/mesa/main/program_resource.c
> > > +++ b/src/mesa/main/program_resource.c
> > > @@ -40,6 +40,8 @@ supported_interface_enum

Re: [Mesa-dev] [PATCH 13/18] radeonsi: completely rework updating descriptors without CP DMA

2015-07-31 Thread Michel Dänzer
On 28.07.2015 19:05, Marek Olšák wrote:
> From: Marek Olšák 
> 
> The patch has a better explanation. Just a summary here:
> - The CPU always uploads a whole descriptor array to previously-unused memory.
> - CP DMA isn't used.
> - No caches need to be flushed.
> - All descriptors are always up-to-date in memory even after a hang, because
>   CP DMA doesn't serve as a middle man to update them.
> 
> This should bring:
> - better hang recovery (descriptors are always up-to-date)
> - better GPU performance (no KCACHE and TC flushes)
> - worse CPU performance for partial updates (only whole arrays are uploaded)
> - less used IB space (no CP_DMA and WRITE_DATA packets)
> - simpler code
> - hopefully, some of the corruption issues with SI cards will go away.
>   If not, we'll know the issue is not here.

[...]

> @@ -24,14 +24,23 @@
>   *  Marek Olšák 
>   */
>  
> -/* Resource binding slots and sampler states (each described with 8 or 4 
> dwords)
> - * live in memory on SI.
> +/* Resource binding slots and sampler states (each described with 8 or
> + * 4 dwords) are stored in lists in memory which is accessed by shaders
> + * using scalar load instructions.

I'd call them arrays instead of lists, but either way:

Reviewed-by: Michel Dänzer 


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] r600, compute: force tiling on 2D and 3D texture compute resources

2015-07-31 Thread Zoltan Gilian
To circumvent a problem occuring when LINEAR_ALIGNED array mode is
selected on a TEXTURE_2D RAT.
This configuration causes MEM_RAT STORE_TYPED to write to incorrect
locations.
---
 src/gallium/drivers/radeon/r600_texture.c | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/src/gallium/drivers/radeon/r600_texture.c 
b/src/gallium/drivers/radeon/r600_texture.c
index 8169e05..d79f50a 100644
--- a/src/gallium/drivers/radeon/r600_texture.c
+++ b/src/gallium/drivers/radeon/r600_texture.c
@@ -706,6 +706,7 @@ static unsigned r600_choose_tiling(struct 
r600_common_screen *rscreen,
   const struct pipe_resource *templ)
 {
const struct util_format_description *desc = 
util_format_description(templ->format);
+   bool force_tiling = templ->flags & R600_RESOURCE_FLAG_FORCE_TILING;
 
/* MSAA resources must be 2D tiled. */
if (templ->nr_samples > 1)
@@ -715,10 +716,16 @@ static unsigned r600_choose_tiling(struct 
r600_common_screen *rscreen,
if (templ->flags & R600_RESOURCE_FLAG_TRANSFER)
return RADEON_SURF_MODE_LINEAR_ALIGNED;
 
+   /* r600g: force tiling on TEXTURE_2D and TEXTURE_3D compute resources. 
*/
+   if (rscreen->chip_class >= R600 && rscreen->chip_class <= CAYMAN &&
+   (templ->bind & PIPE_BIND_COMPUTE_RESOURCE) &&
+   (templ->target == PIPE_TEXTURE_2D ||
+templ->target == PIPE_TEXTURE_3D))
+   force_tiling = true;
+
/* Handle common candidates for the linear mode.
 * Compressed textures must always be tiled. */
-   if (!(templ->flags & R600_RESOURCE_FLAG_FORCE_TILING) &&
-   !util_format_is_compressed(templ->format)) {
+   if (!force_tiling && !util_format_is_compressed(templ->format)) {
/* Not everything can be linear, so we cannot enforce it
 * for all textures. */
if ((rscreen->debug_flags & DBG_NO_TILING) &&
-- 
2.4.6

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/3] egl/x11: set EGL_BAD_NATIVE_(PIXMAP|WINDOW) for invalid pixmaps/windows

2015-07-31 Thread Frank Binns
Both eglCreatePixmapSurface and eglCreateWindowSurface were incorrectly
setting the EGL error to be EGL_BAD_ALLOC when an invalid native drawable
handle was being passed in. The EGL spec states the following for
eglCreatePixmapSurface:

"If pixmap is not a valid native pixmap handle, then an EGL_BAD_-
 NATIVE_PIXMAP error should be generated."

(eglCreateWindowSurface has similar text)

Correctly set the EGL error value based on xcb_get_geometry_reply returning
an error structure.

Signed-off-by: Frank Binns 
---
 src/egl/drivers/dri2/platform_x11.c | 14 ++
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/src/egl/drivers/dri2/platform_x11.c 
b/src/egl/drivers/dri2/platform_x11.c
index eb8d185..d35e9e2 100644
--- a/src/egl/drivers/dri2/platform_x11.c
+++ b/src/egl/drivers/dri2/platform_x11.c
@@ -265,10 +265,16 @@ dri2_x11_create_surface(_EGLDriver *drv, _EGLDisplay 
*disp, EGLint type,
if (type != EGL_PBUFFER_BIT) {
   cookie = xcb_get_geometry (dri2_dpy->conn, dri2_surf->drawable);
   reply = xcb_get_geometry_reply (dri2_dpy->conn, cookie, &error);
-  if (reply == NULL || error != NULL) {
-_eglError(EGL_BAD_ALLOC, "xcb_get_geometry");
-free(error);
-goto cleanup_dri_drawable;
+  if (error != NULL) {
+ if (type == EGL_WINDOW_BIT)
+_eglError(EGL_BAD_NATIVE_WINDOW, "xcb_get_geometry");
+ else
+_eglError(EGL_BAD_NATIVE_PIXMAP, "xcb_get_geometry");
+ free(error);
+ goto cleanup_dri_drawable;
+  } else if (reply == NULL) {
+ _eglError(EGL_BAD_ALLOC, "xcb_get_geometry");
+ goto cleanup_dri_drawable;
   }
 
   dri2_surf->base.Width = reply->width;
-- 
1.8.5.4.gfd2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/3] egl/x11: don't abort when creating a DRI2 drawable fails

2015-07-31 Thread Frank Binns
When calling either eglCreateWindowSurface or eglCreatePixmapSurface it
was possible for an application to be aborted as a result of it failing
to create a DRI2 drawable on the server. This could happen due to an
application passing in an invalid native drawable handle, for example.

Signed-off-by: Frank Binns 
---
 src/egl/drivers/dri2/platform_x11.c | 18 +-
 1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/src/egl/drivers/dri2/platform_x11.c 
b/src/egl/drivers/dri2/platform_x11.c
index d35e9e2..830e643 100644
--- a/src/egl/drivers/dri2/platform_x11.c
+++ b/src/egl/drivers/dri2/platform_x11.c
@@ -284,7 +284,23 @@ dri2_x11_create_surface(_EGLDriver *drv, _EGLDisplay 
*disp, EGLint type,
}
 
if (dri2_dpy->dri2) {
-  xcb_dri2_create_drawable (dri2_dpy->conn, dri2_surf->drawable);
+  xcb_void_cookie_t cookie;
+
+  cookie = xcb_dri2_create_drawable_checked(dri2_dpy->conn,
+dri2_surf->drawable);
+  error = xcb_request_check(dri2_dpy->conn, cookie);
+  if (error != NULL) {
+ if (error->error_code == BadAlloc || type == EGL_PBUFFER_BIT)
+_eglError(EGL_BAD_ALLOC, "xcb_dri2_create_drawable_checked");
+ else if (type == EGL_WINDOW_BIT)
+_eglError(EGL_BAD_NATIVE_WINDOW,
+  "xcb_dri2_create_drawable_checked");
+ else
+_eglError(EGL_BAD_NATIVE_PIXMAP,
+  "xcb_dri2_create_drawable_checked");
+ free(error);
+ goto cleanup_dri_drawable;
+  }
} else {
   if (type == EGL_PBUFFER_BIT) {
  dri2_surf->depth = _eglGetConfigKey(conf, EGL_BUFFER_SIZE);
-- 
1.8.5.4.gfd2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/3] egl/x11: fix use of EGL_BAD_NATIVE_WINDOW

2015-07-31 Thread Frank Binns
Commit 4ed23fd590 introduced some calls to _eglError inappropriately
passing it EGL_BAD_NATIVE_WINDOW. This was actually harmless in two of the
cases as _eglError gets called later on with a more appropriate error code
but (just to be safe) switch these to _eglLog calls instead. In the
remaining case, change the error to EGL_BAD_ALLOC instead as this seems
more appropriate.

Signed-off-by: Frank Binns 
---
 src/egl/drivers/dri2/platform_x11.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/src/egl/drivers/dri2/platform_x11.c 
b/src/egl/drivers/dri2/platform_x11.c
index fecd36b..eb8d185 100644
--- a/src/egl/drivers/dri2/platform_x11.c
+++ b/src/egl/drivers/dri2/platform_x11.c
@@ -226,7 +226,7 @@ dri2_x11_create_surface(_EGLDriver *drv, _EGLDisplay *disp, 
EGLint type,
   s = xcb_setup_roots_iterator(xcb_get_setup(dri2_dpy->conn));
   screen = get_xcb_screen(s, dri2_dpy->screen);
   if (!screen) {
- _eglError(EGL_BAD_NATIVE_WINDOW, "dri2_create_surface");
+ _eglError(EGL_BAD_ALLOC, "failed to get xcb screen");
  goto cleanup_surf;
   }
 
@@ -544,7 +544,7 @@ dri2_x11_connect(struct dri2_egl_display *dri2_dpy)
s = xcb_setup_roots_iterator(xcb_get_setup(dri2_dpy->conn));
screen = get_xcb_screen(s, dri2_dpy->screen);
if (!screen) {
-  _eglError(EGL_BAD_NATIVE_WINDOW, "dri2_x11_connect");
+  _eglLog(_EGL_WARNING, "DRI2: failed to get xcb screen");
   return EGL_FALSE;
}
connect_cookie = xcb_dri2_connect_unchecked(dri2_dpy->conn, screen->root,
@@ -615,7 +615,7 @@ dri2_x11_authenticate(_EGLDisplay *disp, uint32_t id)
 
screen = get_xcb_screen(s, dri2_dpy->screen);
if (!screen) {
-  _eglError(EGL_BAD_NATIVE_WINDOW, "dri2_x11_authenticate");
+  _eglLog(_EGL_WARNING, "DRI2: failed to get xcb screen");
   return -1;
}
 
-- 
1.8.5.4.gfd2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 16/18] radeonsi: fix a regression since the resource_copy_region cleanup

2015-07-31 Thread Michel Dänzer
On 28.07.2015 19:05, Marek Olšák wrote:
> From: Marek Olšák 
> 
> Broken since:
> 46b2b3b - radeonsi: don't change pipe_resource in resource_copy_region
> 
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91444
> ---
>  src/gallium/drivers/radeonsi/si_state.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/src/gallium/drivers/radeonsi/si_state.c 
> b/src/gallium/drivers/radeonsi/si_state.c
> index ab5c3ca..6c7170d 100644
> --- a/src/gallium/drivers/radeonsi/si_state.c
> +++ b/src/gallium/drivers/radeonsi/si_state.c
> @@ -2511,7 +2511,7 @@ si_create_sampler_view_custom(struct pipe_context *ctx,
> S_008F1C_LAST_LEVEL(texture->nr_samples > 1 ?
> 
> util_logbase2(texture->nr_samples) :
> last_level) |
> -   S_008F1C_TILING_INDEX(si_tile_mode_index(tmp, 0, 
> false)) |
> +   S_008F1C_TILING_INDEX(si_tile_mode_index(tmp, 
> base_level, false)) |
> S_008F1C_POW2_PAD(texture->last_level > 0) |
> S_008F1C_TYPE(si_tex_dim(texture->target, 
> texture->nr_samples)));
>   view->state[4] = (S_008F20_DEPTH(depth - 1) | S_008F20_PITCH(pitch - 
> 1));
> 

Reviewed-and-Tested-by: Michel Dänzer 


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 14/18] radeonsi: move CP DMA functions to their own file

2015-07-31 Thread Michel Dänzer
On 28.07.2015 19:05, Marek Olšák wrote:
> From: Marek Olšák 

Reviewed-by: Michel Dänzer 


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 17/18] radeonsi: early exist in si_clear if there's nothing to do

2015-07-31 Thread Michel Dänzer
On 28.07.2015 19:05, Marek Olšák wrote:
> From: Marek Olšák 
> 
> ---
>  src/gallium/drivers/radeonsi/si_blit.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/src/gallium/drivers/radeonsi/si_blit.c 
> b/src/gallium/drivers/radeonsi/si_blit.c
> index c3591a7..c892623 100644
> --- a/src/gallium/drivers/radeonsi/si_blit.c
> +++ b/src/gallium/drivers/radeonsi/si_blit.c
> @@ -342,6 +342,8 @@ static void si_clear(struct pipe_context *ctx, unsigned 
> buffers,
>   if (buffers & PIPE_CLEAR_COLOR) {
>   evergreen_do_fast_color_clear(&sctx->b, fb, 
> &sctx->framebuffer.atom,
> &buffers, color);
> + if (!buffers)
> + return; /* all buffers have been fast cleared */
>   }
>  
>   if (buffers & PIPE_CLEAR_COLOR) {
> 

Reviewed-by: Michel Dänzer 


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 18/18] r600g: early exist in r600_clear if there's nothing to do

2015-07-31 Thread Michel Dänzer
On 28.07.2015 19:05, Marek Olšák wrote:
> From: Marek Olšák 
> 
> ---
>  src/gallium/drivers/r600/r600_blit.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/src/gallium/drivers/r600/r600_blit.c 
> b/src/gallium/drivers/r600/r600_blit.c
> index 8e553a8..1c59230 100644
> --- a/src/gallium/drivers/r600/r600_blit.c
> +++ b/src/gallium/drivers/r600/r600_blit.c
> @@ -401,6 +401,8 @@ static void r600_clear(struct pipe_context *ctx, unsigned 
> buffers,
>   rctx->framebuffer.nr_samples > 1) {
>   evergreen_do_fast_color_clear(&rctx->b, fb, 
> &rctx->framebuffer.atom,
> &buffers, color);
> + if (!buffers)
> + return; /* all buffers have been fast cleared */
>   }
>  
>   if (buffers & PIPE_CLEAR_COLOR) {
> 

Reviewed-by: Michel Dänzer 


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] r600, compute: force tiling on 2D and 3D texture compute resources

2015-07-31 Thread Marek Olšák
Reviewed-by: Marek Olšák 

Marek

On Fri, Jul 31, 2015 at 11:59 AM, Zoltan Gilian  wrote:
> To circumvent a problem occuring when LINEAR_ALIGNED array mode is
> selected on a TEXTURE_2D RAT.
> This configuration causes MEM_RAT STORE_TYPED to write to incorrect
> locations.
> ---
>  src/gallium/drivers/radeon/r600_texture.c | 11 +--
>  1 file changed, 9 insertions(+), 2 deletions(-)
>
> diff --git a/src/gallium/drivers/radeon/r600_texture.c 
> b/src/gallium/drivers/radeon/r600_texture.c
> index 8169e05..d79f50a 100644
> --- a/src/gallium/drivers/radeon/r600_texture.c
> +++ b/src/gallium/drivers/radeon/r600_texture.c
> @@ -706,6 +706,7 @@ static unsigned r600_choose_tiling(struct 
> r600_common_screen *rscreen,
>const struct pipe_resource *templ)
>  {
> const struct util_format_description *desc = 
> util_format_description(templ->format);
> +   bool force_tiling = templ->flags & R600_RESOURCE_FLAG_FORCE_TILING;
>
> /* MSAA resources must be 2D tiled. */
> if (templ->nr_samples > 1)
> @@ -715,10 +716,16 @@ static unsigned r600_choose_tiling(struct 
> r600_common_screen *rscreen,
> if (templ->flags & R600_RESOURCE_FLAG_TRANSFER)
> return RADEON_SURF_MODE_LINEAR_ALIGNED;
>
> +   /* r600g: force tiling on TEXTURE_2D and TEXTURE_3D compute 
> resources. */
> +   if (rscreen->chip_class >= R600 && rscreen->chip_class <= CAYMAN &&
> +   (templ->bind & PIPE_BIND_COMPUTE_RESOURCE) &&
> +   (templ->target == PIPE_TEXTURE_2D ||
> +templ->target == PIPE_TEXTURE_3D))
> +   force_tiling = true;
> +
> /* Handle common candidates for the linear mode.
>  * Compressed textures must always be tiled. */
> -   if (!(templ->flags & R600_RESOURCE_FLAG_FORCE_TILING) &&
> -   !util_format_is_compressed(templ->format)) {
> +   if (!force_tiling && !util_format_is_compressed(templ->format)) {
> /* Not everything can be linear, so we cannot enforce it
>  * for all textures. */
> if ((rscreen->debug_flags & DBG_NO_TILING) &&
> --
> 2.4.6
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 3/6] i965/vec4: Don't emit scratch reads for a spilled register we have just written

2015-07-31 Thread Francisco Jerez
Iago Toral  writes:

> On Thu, 2015-07-30 at 17:08 +0300, Francisco Jerez wrote:
>> Iago Toral Quiroga  writes:
>> 
>> > When we have code such as this:
>> >
>> > mov vgrf1.0.x:F, vgrf2.:F
>> > mov vgrf3.0.x:F, vgrf1.:F
>> > ...
>> > mov vgrf3.0.x:F, vgrf1.:F
>> >
>> > And vgrf1 is chosen for spilling, we can emit this:
>> >
>> > mov vgrf1.0.x:F, vgrf2.:F
>> > gen4_scratch_write hw_reg0:F, vgrf1.:D, 22D
>> > mov vgrf3.0.x:F, vgrf1.:F
>> > ...
>> > gen4_scratch_read vgrf4.0.x:F, 22D
>> > mov vgrf3.0.x:F, vgrf4.:F
>> >
>> > Instead of this:
>> >
>> > mov vgrf1.0.x:F, vgrf2.:F
>> > gen4_scratch_write hw_reg0:F, vgrf1.:D, 22D
>> > gen4_scratch_read vgrf4.0.x:F, 22D
>> > mov vgrf3.0.x:F, vgrf4.:F
>> > ...
>> > gen4_scratch_read vgrf5.0.x:F, 22D
>> > mov vgrf3.0.x:F, vgrf5.:F
>> >
>> > And save one scratch read while still preserving the benefits of
>> > spilling the register.
>> >
>> > In general, we avoid emitting scratch reads for as long as the next 
>> > instruction
>> > keeps reading the spilled register. This should not harm the benefit of
>> > spilling the register because gains for register allocation only come when 
>> > we
>> > have chunks of program code where the register is alive but not really used
>> > (because these are the points where we could effectively use that register 
>> > for
>> > another purpose if we spilled it), so as long as consecutive instructions 
>> > use
>> > that register we can avoid the scratch reads without losing anything.
>> > ---
>> >  .../drivers/dri/i965/brw_vec4_reg_allocate.cpp | 37 
>> > +-
>> >  1 file changed, 36 insertions(+), 1 deletion(-)
>> >
>> > diff --git a/src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp 
>> > b/src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp
>> > index cff5406..fd56dae 100644
>> > --- a/src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp
>> > +++ b/src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp
>> > @@ -340,11 +340,43 @@ vec4_visitor::spill_reg(int spill_reg_nr)
>> > unsigned int spill_offset = last_scratch++;
>> >  
>> > /* Generate spill/unspill instructions for the objects being spilled. 
>> > */
>> > +   vec4_instruction *spill_write_inst = NULL;
>> > foreach_block_and_inst(block, vec4_instruction, inst, cfg) {
>> > +  /* We don't spill registers used for scratch */
>> > +  if (inst->opcode == SHADER_OPCODE_GEN4_SCRATCH_READ ||
>> > +  inst->opcode == SHADER_OPCODE_GEN4_SCRATCH_WRITE)
>> > + continue;
>> > +
>> >int scratch_reg = -1;
>> > +  bool spill_reg_was_read = false;
>> >for (unsigned int i = 0; i < 3; i++) {
>> >   if (inst->src[i].file == GRF && inst->src[i].reg == 
>> > spill_reg_nr) {
>> > -if (scratch_reg == -1) {
>> > +if (!spill_reg_was_read) {
>> > +   spill_reg_was_read = (!inst->predicate ||
>> > + inst->opcode == BRW_OPCODE_SEL);
>> > +}
>> > +
>> > +/* If we are reading the spilled register right after writing
>> > + * to it we can skip the scratch read and use directly the
>> > + * register we used as source for the scratch write. For this
>> > + * to work we must check that:
>> > + *
>> > + * 1) The write is inconditional, that is, it is not 
>> > predicated or
>> > + *it is a SEL.
>> > + * 2) All the channels that we read have been written in that
>> > + *last write instruction.
>> > + *
>> > + * We keep doing this for as long as the next instruction
>> > + * keeps reading the spilled register and break as soon as we
>> > + * find an instruction that doesn't.
>> > + */
>> > +if (spill_write_inst &&
>> > +(!spill_write_inst->predicate ||
>> > + spill_write_inst->opcode == BRW_OPCODE_SEL) &&
>> > +((brw_mask_for_swizzle(inst->src[i].swizzle) &
>> > + ~spill_write_inst->dst.writemask) == 0)) {
>> > +   scratch_reg = spill_write_inst->dst.reg;
>> > +} else if (scratch_reg == -1) {
>> 
>> One suggestion: You could factor out the rather complex caching logic
>> into a separate function (e.g. 'bool can_reuse_scratch_for_source(const
>> vec4_instruction *, unsigned i, unsigned scratch_reg)').  The function
>> would simply compare scratch_reg with the sources of the current
>> instruction (up to src) and the sources and destination of the previous
>> non-scratch_read/write instruction.  If there's a match it would check
>> that the regioning is compatible with the i-th source and return true in
>> that case.  This would have several benefits:
>
> I think this might need to be a bit more complex. The previous inst's
> src[i] might read only a subset of the channels that where loaded into
> scratch_reg so comparing only against t

Re: [Mesa-dev] [PATCH v2 3/6] i965/vec4: Don't emit scratch reads for a spilled register we have just written

2015-07-31 Thread Francisco Jerez
Iago Toral  writes:

> On Thu, 2015-07-30 at 17:14 +0300, Francisco Jerez wrote:
>> Francisco Jerez  writes:
>> 
>> > Iago Toral Quiroga  writes:
>> >
>> >> When we have code such as this:
>> >>
>> >> mov vgrf1.0.x:F, vgrf2.:F
>> >> mov vgrf3.0.x:F, vgrf1.:F
>> >> ...
>> >> mov vgrf3.0.x:F, vgrf1.:F
>> >>
>> >> And vgrf1 is chosen for spilling, we can emit this:
>> >>
>> >> mov vgrf1.0.x:F, vgrf2.:F
>> >> gen4_scratch_write hw_reg0:F, vgrf1.:D, 22D
>> >> mov vgrf3.0.x:F, vgrf1.:F
>> >> ...
>> >> gen4_scratch_read vgrf4.0.x:F, 22D
>> >> mov vgrf3.0.x:F, vgrf4.:F
>> >>
>> >> Instead of this:
>> >>
>> >> mov vgrf1.0.x:F, vgrf2.:F
>> >> gen4_scratch_write hw_reg0:F, vgrf1.:D, 22D
>> >> gen4_scratch_read vgrf4.0.x:F, 22D
>> >> mov vgrf3.0.x:F, vgrf4.:F
>> >> ...
>> >> gen4_scratch_read vgrf5.0.x:F, 22D
>> >> mov vgrf3.0.x:F, vgrf5.:F
>> >>
>> >> And save one scratch read while still preserving the benefits of
>> >> spilling the register.
>> >>
>> >> In general, we avoid emitting scratch reads for as long as the next 
>> >> instruction
>> >> keeps reading the spilled register. This should not harm the benefit of
>> >> spilling the register because gains for register allocation only come 
>> >> when we
>> >> have chunks of program code where the register is alive but not really 
>> >> used
>> >> (because these are the points where we could effectively use that 
>> >> register for
>> >> another purpose if we spilled it), so as long as consecutive instructions 
>> >> use
>> >> that register we can avoid the scratch reads without losing anything.
>> >> ---
>> >>  .../drivers/dri/i965/brw_vec4_reg_allocate.cpp | 37 
>> >> +-
>> >>  1 file changed, 36 insertions(+), 1 deletion(-)
>> >>
>> >> diff --git a/src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp 
>> >> b/src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp
>> >> index cff5406..fd56dae 100644
>> >> --- a/src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp
>> >> +++ b/src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp
>> >> @@ -340,11 +340,43 @@ vec4_visitor::spill_reg(int spill_reg_nr)
>> >> unsigned int spill_offset = last_scratch++;
>> >>  
>> >> /* Generate spill/unspill instructions for the objects being spilled. 
>> >> */
>> >> +   vec4_instruction *spill_write_inst = NULL;
>> >> foreach_block_and_inst(block, vec4_instruction, inst, cfg) {
>> >> +  /* We don't spill registers used for scratch */
>> >> +  if (inst->opcode == SHADER_OPCODE_GEN4_SCRATCH_READ ||
>> >> +  inst->opcode == SHADER_OPCODE_GEN4_SCRATCH_WRITE)
>> >> + continue;
>> >> +
>> >>int scratch_reg = -1;
>> >> +  bool spill_reg_was_read = false;
>> >>for (unsigned int i = 0; i < 3; i++) {
>> >>   if (inst->src[i].file == GRF && inst->src[i].reg == 
>> >> spill_reg_nr) {
>> >> -if (scratch_reg == -1) {
>> >> +if (!spill_reg_was_read) {
>> >> +   spill_reg_was_read = (!inst->predicate ||
>> >> + inst->opcode == BRW_OPCODE_SEL);
>> >> +}
>> >> +
>> >> +/* If we are reading the spilled register right after writing
>> >> + * to it we can skip the scratch read and use directly the
>> >> + * register we used as source for the scratch write. For this
>> >> + * to work we must check that:
>> >> + *
>> >> + * 1) The write is inconditional, that is, it is not 
>> >> predicated or
>> >> + *it is a SEL.
>> >> + * 2) All the channels that we read have been written in that
>> >> + *last write instruction.
>> >> + *
>> >> + * We keep doing this for as long as the next instruction
>> >> + * keeps reading the spilled register and break as soon as we
>> >> + * find an instruction that doesn't.
>> >> + */
>> >> +if (spill_write_inst &&
>> >> +(!spill_write_inst->predicate ||
>> >> + spill_write_inst->opcode == BRW_OPCODE_SEL) &&
>> >> +((brw_mask_for_swizzle(inst->src[i].swizzle) &
>> >> + ~spill_write_inst->dst.writemask) == 0)) {
>> >> +   scratch_reg = spill_write_inst->dst.reg;
>> >> +} else if (scratch_reg == -1) {
>> >
>> > One suggestion: You could factor out the rather complex caching logic
>> > into a separate function (e.g. 'bool can_reuse_scratch_for_source(const
>> > vec4_instruction *, unsigned i, unsigned scratch_reg)').  The function
>> > would simply compare scratch_reg with the sources of the current
>> > instruction (up to src) and the sources and destination of the previous
>> 
>> Huh, of course "up to i" is what I intended to write.
>
> Up to i? Shouldn't we just check source i only? The way I understand
> this is that we would have a loop for each source in the current
> instruction and call this function f

Re: [Mesa-dev] [PATCH v2 3/6] i965/vec4: Don't emit scratch reads for a spilled register we have just written

2015-07-31 Thread Francisco Jerez
Francisco Jerez  writes:

> Iago Toral  writes:
>
>> On Thu, 2015-07-30 at 17:08 +0300, Francisco Jerez wrote:
>>> Iago Toral Quiroga  writes:
>>> 
>>> > When we have code such as this:
>>> >
>>> > mov vgrf1.0.x:F, vgrf2.:F
>>> > mov vgrf3.0.x:F, vgrf1.:F
>>> > ...
>>> > mov vgrf3.0.x:F, vgrf1.:F
>>> >
>>> > And vgrf1 is chosen for spilling, we can emit this:
>>> >
>>> > mov vgrf1.0.x:F, vgrf2.:F
>>> > gen4_scratch_write hw_reg0:F, vgrf1.:D, 22D
>>> > mov vgrf3.0.x:F, vgrf1.:F
>>> > ...
>>> > gen4_scratch_read vgrf4.0.x:F, 22D
>>> > mov vgrf3.0.x:F, vgrf4.:F
>>> >
>>> > Instead of this:
>>> >
>>> > mov vgrf1.0.x:F, vgrf2.:F
>>> > gen4_scratch_write hw_reg0:F, vgrf1.:D, 22D
>>> > gen4_scratch_read vgrf4.0.x:F, 22D
>>> > mov vgrf3.0.x:F, vgrf4.:F
>>> > ...
>>> > gen4_scratch_read vgrf5.0.x:F, 22D
>>> > mov vgrf3.0.x:F, vgrf5.:F
>>> >
>>> > And save one scratch read while still preserving the benefits of
>>> > spilling the register.
>>> >
>>> > In general, we avoid emitting scratch reads for as long as the next 
>>> > instruction
>>> > keeps reading the spilled register. This should not harm the benefit of
>>> > spilling the register because gains for register allocation only come 
>>> > when we
>>> > have chunks of program code where the register is alive but not really 
>>> > used
>>> > (because these are the points where we could effectively use that 
>>> > register for
>>> > another purpose if we spilled it), so as long as consecutive instructions 
>>> > use
>>> > that register we can avoid the scratch reads without losing anything.
>>> > ---
>>> >  .../drivers/dri/i965/brw_vec4_reg_allocate.cpp | 37 
>>> > +-
>>> >  1 file changed, 36 insertions(+), 1 deletion(-)
>>> >
>>> > diff --git a/src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp 
>>> > b/src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp
>>> > index cff5406..fd56dae 100644
>>> > --- a/src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp
>>> > +++ b/src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp
>>> > @@ -340,11 +340,43 @@ vec4_visitor::spill_reg(int spill_reg_nr)
>>> > unsigned int spill_offset = last_scratch++;
>>> >  
>>> > /* Generate spill/unspill instructions for the objects being spilled. 
>>> > */
>>> > +   vec4_instruction *spill_write_inst = NULL;
>>> > foreach_block_and_inst(block, vec4_instruction, inst, cfg) {
>>> > +  /* We don't spill registers used for scratch */
>>> > +  if (inst->opcode == SHADER_OPCODE_GEN4_SCRATCH_READ ||
>>> > +  inst->opcode == SHADER_OPCODE_GEN4_SCRATCH_WRITE)
>>> > + continue;
>>> > +
>>> >int scratch_reg = -1;
>>> > +  bool spill_reg_was_read = false;
>>> >for (unsigned int i = 0; i < 3; i++) {
>>> >   if (inst->src[i].file == GRF && inst->src[i].reg == 
>>> > spill_reg_nr) {
>>> > -if (scratch_reg == -1) {
>>> > +if (!spill_reg_was_read) {
>>> > +   spill_reg_was_read = (!inst->predicate ||
>>> > + inst->opcode == BRW_OPCODE_SEL);
>>> > +}
>>> > +
>>> > +/* If we are reading the spilled register right after writing
>>> > + * to it we can skip the scratch read and use directly the
>>> > + * register we used as source for the scratch write. For this
>>> > + * to work we must check that:
>>> > + *
>>> > + * 1) The write is inconditional, that is, it is not 
>>> > predicated or
>>> > + *it is a SEL.
>>> > + * 2) All the channels that we read have been written in that
>>> > + *last write instruction.
>>> > + *
>>> > + * We keep doing this for as long as the next instruction
>>> > + * keeps reading the spilled register and break as soon as we
>>> > + * find an instruction that doesn't.
>>> > + */
>>> > +if (spill_write_inst &&
>>> > +(!spill_write_inst->predicate ||
>>> > + spill_write_inst->opcode == BRW_OPCODE_SEL) &&
>>> > +((brw_mask_for_swizzle(inst->src[i].swizzle) &
>>> > + ~spill_write_inst->dst.writemask) == 0)) {
>>> > +   scratch_reg = spill_write_inst->dst.reg;
>>> > +} else if (scratch_reg == -1) {
>>> 
>>> One suggestion: You could factor out the rather complex caching logic
>>> into a separate function (e.g. 'bool can_reuse_scratch_for_source(const
>>> vec4_instruction *, unsigned i, unsigned scratch_reg)').  The function
>>> would simply compare scratch_reg with the sources of the current
>>> instruction (up to src) and the sources and destination of the previous
>>> non-scratch_read/write instruction.  If there's a match it would check
>>> that the regioning is compatible with the i-th source and return true in
>>> that case.  This would have several benefits:
>>
>> I think this might need to be a bit mor

Re: [Mesa-dev] [PATCH v2 3/6] i965/vec4: Don't emit scratch reads for a spilled register we have just written

2015-07-31 Thread Iago Toral
On Fri, 2015-07-31 at 13:12 +0300, Francisco Jerez wrote:
> Iago Toral  writes:
> 
> > On Thu, 2015-07-30 at 17:08 +0300, Francisco Jerez wrote:
> >> Iago Toral Quiroga  writes:
> >> 
> >> > When we have code such as this:
> >> >
> >> > mov vgrf1.0.x:F, vgrf2.:F
> >> > mov vgrf3.0.x:F, vgrf1.:F
> >> > ...
> >> > mov vgrf3.0.x:F, vgrf1.:F
> >> >
> >> > And vgrf1 is chosen for spilling, we can emit this:
> >> >
> >> > mov vgrf1.0.x:F, vgrf2.:F
> >> > gen4_scratch_write hw_reg0:F, vgrf1.:D, 22D
> >> > mov vgrf3.0.x:F, vgrf1.:F
> >> > ...
> >> > gen4_scratch_read vgrf4.0.x:F, 22D
> >> > mov vgrf3.0.x:F, vgrf4.:F
> >> >
> >> > Instead of this:
> >> >
> >> > mov vgrf1.0.x:F, vgrf2.:F
> >> > gen4_scratch_write hw_reg0:F, vgrf1.:D, 22D
> >> > gen4_scratch_read vgrf4.0.x:F, 22D
> >> > mov vgrf3.0.x:F, vgrf4.:F
> >> > ...
> >> > gen4_scratch_read vgrf5.0.x:F, 22D
> >> > mov vgrf3.0.x:F, vgrf5.:F
> >> >
> >> > And save one scratch read while still preserving the benefits of
> >> > spilling the register.
> >> >
> >> > In general, we avoid emitting scratch reads for as long as the next 
> >> > instruction
> >> > keeps reading the spilled register. This should not harm the benefit of
> >> > spilling the register because gains for register allocation only come 
> >> > when we
> >> > have chunks of program code where the register is alive but not really 
> >> > used
> >> > (because these are the points where we could effectively use that 
> >> > register for
> >> > another purpose if we spilled it), so as long as consecutive 
> >> > instructions use
> >> > that register we can avoid the scratch reads without losing anything.
> >> > ---
> >> >  .../drivers/dri/i965/brw_vec4_reg_allocate.cpp | 37 
> >> > +-
> >> >  1 file changed, 36 insertions(+), 1 deletion(-)
> >> >
> >> > diff --git a/src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp 
> >> > b/src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp
> >> > index cff5406..fd56dae 100644
> >> > --- a/src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp
> >> > +++ b/src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp
> >> > @@ -340,11 +340,43 @@ vec4_visitor::spill_reg(int spill_reg_nr)
> >> > unsigned int spill_offset = last_scratch++;
> >> >  
> >> > /* Generate spill/unspill instructions for the objects being 
> >> > spilled. */
> >> > +   vec4_instruction *spill_write_inst = NULL;
> >> > foreach_block_and_inst(block, vec4_instruction, inst, cfg) {
> >> > +  /* We don't spill registers used for scratch */
> >> > +  if (inst->opcode == SHADER_OPCODE_GEN4_SCRATCH_READ ||
> >> > +  inst->opcode == SHADER_OPCODE_GEN4_SCRATCH_WRITE)
> >> > + continue;
> >> > +
> >> >int scratch_reg = -1;
> >> > +  bool spill_reg_was_read = false;
> >> >for (unsigned int i = 0; i < 3; i++) {
> >> >   if (inst->src[i].file == GRF && inst->src[i].reg == 
> >> > spill_reg_nr) {
> >> > -if (scratch_reg == -1) {
> >> > +if (!spill_reg_was_read) {
> >> > +   spill_reg_was_read = (!inst->predicate ||
> >> > + inst->opcode == BRW_OPCODE_SEL);
> >> > +}
> >> > +
> >> > +/* If we are reading the spilled register right after 
> >> > writing
> >> > + * to it we can skip the scratch read and use directly the
> >> > + * register we used as source for the scratch write. For 
> >> > this
> >> > + * to work we must check that:
> >> > + *
> >> > + * 1) The write is inconditional, that is, it is not 
> >> > predicated or
> >> > + *it is a SEL.
> >> > + * 2) All the channels that we read have been written in 
> >> > that
> >> > + *last write instruction.
> >> > + *
> >> > + * We keep doing this for as long as the next instruction
> >> > + * keeps reading the spilled register and break as soon as 
> >> > we
> >> > + * find an instruction that doesn't.
> >> > + */
> >> > +if (spill_write_inst &&
> >> > +(!spill_write_inst->predicate ||
> >> > + spill_write_inst->opcode == BRW_OPCODE_SEL) &&
> >> > +((brw_mask_for_swizzle(inst->src[i].swizzle) &
> >> > + ~spill_write_inst->dst.writemask) == 0)) {
> >> > +   scratch_reg = spill_write_inst->dst.reg;
> >> > +} else if (scratch_reg == -1) {
> >> 
> >> One suggestion: You could factor out the rather complex caching logic
> >> into a separate function (e.g. 'bool can_reuse_scratch_for_source(const
> >> vec4_instruction *, unsigned i, unsigned scratch_reg)').  The function
> >> would simply compare scratch_reg with the sources of the current
> >> instruction (up to src) and the sources and destination of the previous
> >> non-scratch_read/write instruction.  If there's a match it would check
>

Re: [Mesa-dev] [PATCH 1/3] egl/x11: fix use of EGL_BAD_NATIVE_WINDOW

2015-07-31 Thread Emil Velikov
On 31 July 2015 at 11:02, Frank Binns  wrote:
> Commit 4ed23fd590 introduced some calls to _eglError inappropriately
> passing it EGL_BAD_NATIVE_WINDOW. This was actually harmless in two of the
> cases as _eglError gets called later on with a more appropriate error code
> but (just to be safe) switch these to _eglLog calls instead. In the
> remaining case, change the error to EGL_BAD_ALLOC instead as this seems
> more appropriate.
>
Hmm... EGL_BAD_NATIVE_WINDOW might not be the best, but EGL_BAD_ALLOC
doesn't seem much better.

There is no memory allocation happening there, so I'm not sure why you
believe it more appropriate. I'm wondering about EGL_BAD_PARAMETER,
yet it's very unlikely we'll ever get here. Did you see any uses that
trigger this ?

-Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965/fs: Fix regression with SIMD8 VS since b5f1a48e234d47b24df38cb562cffb8941d43795.

2015-07-31 Thread Francisco Jerez
"Lofstedt, Marta"  writes:

> In hope of speeding up the  review and merge of this bug fix,
> I confirm that below patch fix the regression I have had from:
> "b5f1a48e234d47b24df38cb562cffb8941d43795" on Gen8.
>
Wouldn't you feel like reviewing it yourself? :)

> /Marta
>
>> -Original Message-
>> From: Francisco Jerez [mailto:curroje...@riseup.net]
>> Sent: Thursday, July 30, 2015 2:23 PM
>> To: mesa-dev@lists.freedesktop.org
>> Cc: Lofstedt, Marta
>> Subject: [PATCH] i965/fs: Fix regression with SIMD8 VS since
>> b5f1a48e234d47b24df38cb562cffb8941d43795.
>> 
>> With num_direct_uniforms == 0 there's no space allocated in the param_size
>> array for the one block of direct uniforms -- On the FS stage this would be a
>> harmless no-op because it would simply re-set one of the param_size entries
>> allocated for the sampler units to zero, but on the VS stage it has been
>> reported to cause memory corruption followed by a crash -- Surprising how a
>> full piglit run on Gen8 didn't catch it.
>> 
>> Reported-by: Marta Lofstedt 
>> ---
>>  src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 3 ++-
>>  1 file changed, 2 insertions(+), 1 deletion(-)
>> 
>> diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
>> b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
>> index 722e4e7..9cb7b0d 100644
>> --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
>> +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
>> @@ -182,7 +182,8 @@ fs_visitor::nir_setup_uniforms(nir_shader *shader)
>> /* We split the uniform register file in half.  The first half is
>>  * entirely direct uniforms.  The second half is indirect.
>>  */
>> -   param_size[0] = num_direct_uniforms;
>> +   if (num_direct_uniforms > 0)
>> +  param_size[0] = num_direct_uniforms;
>> if (shader->num_uniforms > num_direct_uniforms)
>>param_size[num_direct_uniforms] = shader->num_uniforms -
>> num_direct_uniforms;
>> 
>> --
>> 2.4.6
>
> --
> Intel Sweden AB
> Registered Office: Knarrarnasgatan 15, 164 40 Kista, Stockholm, Sweden
> Registration Number: 556189-6027
>
> This e-mail and any attachments may contain confidential material for
> the sole use of the intended recipient(s). Any review or distribution
> by others is strictly prohibited. If you are not the intended
> recipient, please contact the sender and delete all copies.


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/4] mesa/es3.1: Add driver interface for glMemoryBarrierByRegion

2015-07-31 Thread Marta Lofstedt
From: Marta Lofstedt 

Signed-off-by: Marta Lofstedt 
---
 src/mesa/main/dd.h | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/src/mesa/main/dd.h b/src/mesa/main/dd.h
index 87eb63e..4b41141 100644
--- a/src/mesa/main/dd.h
+++ b/src/mesa/main/dd.h
@@ -1017,6 +1017,13 @@ struct dd_function_table {
/*@{*/
void (*DispatchCompute)(struct gl_context *ctx, const GLuint *num_groups);
/*@}*/
+
+   /**
+* \name MemoryBarrierByRegion
+*/
+   /*@{*/
+   void (*MemoryBarrierByRegion)(struct gl_context *ctx, GLbitfield barriers);
+   /*@}*/
 };
 
 
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/4] gles/es3.1: Enable dispatch of glMemoryBarrierByRegion

2015-07-31 Thread Marta Lofstedt
From: Marta Lofstedt 

Signed-off-by: Marta Lofstedt 
---
 src/mapi/glapi/gen/gl_API.xml   | 4 
 src/mesa/main/tests/dispatch_sanity.cpp | 3 +--
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/src/mapi/glapi/gen/gl_API.xml b/src/mapi/glapi/gen/gl_API.xml
index 658efa4..3db4349 100644
--- a/src/mapi/glapi/gen/gl_API.xml
+++ b/src/mapi/glapi/gen/gl_API.xml
@@ -2966,6 +2966,10 @@
 
 
 
+
+
+
+
 
 
 
diff --git a/src/mesa/main/tests/dispatch_sanity.cpp 
b/src/mesa/main/tests/dispatch_sanity.cpp
index af89d2c..14c9eda 100644
--- a/src/mesa/main/tests/dispatch_sanity.cpp
+++ b/src/mesa/main/tests/dispatch_sanity.cpp
@@ -2461,8 +2461,7 @@ const struct function gles31_functions_possible[] = {
{ "glGetBooleani_v", 31, -1 },
{ "glMemoryBarrier", 31, -1 },
 
-   // FINISHME: This function has not been implemented yet.
-   // { "glMemoryBarrierByRegion", 31, -1 },
+   { "glMemoryBarrierByRegion", 31, -1 },
 
{ "glTexStorage2DMultisample", 31, -1 },
{ "glGetMultisamplefv", 31, -1 },
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 4/4] i965/es3.1: Implement glMemoryBarrierByRegion

2015-07-31 Thread Marta Lofstedt
From: Marta Lofstedt 

Signed-off-by: Marta Lofstedt 
---
 src/mesa/drivers/dri/i965/brw_program.c | 34 +
 1 file changed, 34 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_program.c 
b/src/mesa/drivers/dri/i965/brw_program.c
index 85e271d..332d84e 100644
--- a/src/mesa/drivers/dri/i965/brw_program.c
+++ b/src/mesa/drivers/dri/i965/brw_program.c
@@ -226,6 +226,39 @@ brw_memory_barrier(struct gl_context *ctx, GLbitfield 
barriers)
brw_emit_pipe_control_flush(brw, bits);
 }
 
+static void
+brw_memory_barrier_by_region(struct gl_context *ctx, GLbitfield barriers)
+{
+   GLbitfield all_allowed_bits = GL_ATOMIC_COUNTER_BARRIER_BIT |
+  GL_FRAMEBUFFER_BARRIER_BIT |
+  GL_SHADER_IMAGE_ACCESS_BARRIER_BIT |
+  GL_SHADER_STORAGE_BARRIER_BIT |
+  GL_TEXTURE_FETCH_BARRIER_BIT |
+  GL_UNIFORM_BARRIER_BIT;
+   /*
+* According to OpenGL ES 3.1 spec. April 29, 2015, 7.11.2:
+* "When barriers are ALL_BARRIERS_BIT, shader memory access
+* will be synchronized realtive to all theese barrier bits,
+* but not to other barrier bits specific to MemoryBarrier."
+* I.e if bariiers is the special value GL_ALL_BARRIER_BITS,
+* then all barriers allowed by glMemoryBarrierByRegion
+* should be activated.
+   */
+   if (barriers == GL_ALL_BARRIER_BITS)
+  return brw_memory_barrier(ctx, all_allowed_bits);
+
+   /*
+* If barriers contain a value that is not allowed
+* for glMemoryBarrierByRegion an GL_INVALID_VALUE
+* should be generated.
+   */
+   if ((all_allowed_bits | barriers) ^ all_allowed_bits)
+   _mesa_error(ctx, GL_INVALID_VALUE,
+"glMemoryBarrierByRegion(unsupported barrier bit");
+
+   return brw_memory_barrier(ctx, barriers);
+}
+
 void
 brw_add_texrect_params(struct gl_program *prog)
 {
@@ -285,6 +318,7 @@ void brwInitFragProgFuncs( struct dd_function_table 
*functions )
functions->LinkShader = brw_link_shader;
 
functions->MemoryBarrier = brw_memory_barrier;
+   functions->MemoryBarrierByRegion = brw_memory_barrier_by_region;
 }
 
 struct shader_times {
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 0/4] Implementation of glMemoryBarrierByRegion

2015-07-31 Thread Marta Lofstedt
This provides an i965 implementation of the
OpenGL ES 3.1 needed function, glMemoryBarrierByRegion.

Marta Lofstedt (4):
  gles/es3.1: Enable dispatch of glMemoryBarrierByRegion
  mesa/es3.1: Add driver interface for glMemoryBarrierByRegion
  mesa/es3.1: Implement the entry point of MemoryBarrierByRegion
  i965/es3.1: Implement glMemoryBarrierByRegion

 src/mapi/glapi/gen/gl_API.xml   |  4 
 src/mesa/drivers/dri/i965/brw_program.c | 34 +
 src/mesa/main/dd.h  |  7 +++
 src/mesa/main/shaderimage.c |  9 +
 src/mesa/main/shaderimage.h |  3 +++
 src/mesa/main/tests/dispatch_sanity.cpp |  3 +--
 6 files changed, 58 insertions(+), 2 deletions(-)

-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/4] mesa/es3.1: Implement the entry point of MemoryBarrierByRegion

2015-07-31 Thread Marta Lofstedt
From: Marta Lofstedt 

Signed-off-by: Marta Lofstedt 
---
 src/mesa/main/shaderimage.c | 9 +
 src/mesa/main/shaderimage.h | 3 +++
 2 files changed, 12 insertions(+)

diff --git a/src/mesa/main/shaderimage.c b/src/mesa/main/shaderimage.c
index a348cdb..be66a6c 100644
--- a/src/mesa/main/shaderimage.c
+++ b/src/mesa/main/shaderimage.c
@@ -653,3 +653,12 @@ _mesa_MemoryBarrier(GLbitfield barriers)
if (ctx->Driver.MemoryBarrier)
   ctx->Driver.MemoryBarrier(ctx, barriers);
 }
+
+void GLAPIENTRY
+_mesa_MemoryBarrierByRegion(GLbitfield barriers)
+{
+   GET_CURRENT_CONTEXT(ctx);
+
+   if (ctx->Driver.MemoryBarrierByRegion)
+  ctx->Driver.MemoryBarrierByRegion(ctx, barriers);
+}
diff --git a/src/mesa/main/shaderimage.h b/src/mesa/main/shaderimage.h
index 33d8a1e..d08ece8 100644
--- a/src/mesa/main/shaderimage.h
+++ b/src/mesa/main/shaderimage.h
@@ -68,6 +68,9 @@ _mesa_BindImageTextures(GLuint first, GLsizei count, const 
GLuint *textures);
 void GLAPIENTRY
 _mesa_MemoryBarrier(GLbitfield barriers);
 
+void GLAPIENTRY
+_mesa_MemoryBarrierByRegion(GLbitfield barriers);
+
 #ifdef __cplusplus
 }
 #endif
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] i965 implementation of the ARB_shader_image_load_store built-ins. (v4)

2015-07-31 Thread Francisco Jerez
Jason Ekstrand  writes:

> Curro,
> What are we still wainting on for the image_load_store extension?  I
> think I've given you R-B's on all but one or two of the compiler
> patches.  Is the state setup stuff reviewed?  Is there anything else
> that needs review?

I've made a list of the patches still pending review that are required
for us to expose ARB_shader_image_load_store -- They're not that many so
I hope we can make it before the feature freeze ;).  From this series:

 - i965/fs: Import code to transform image coordinates into surface coordinates.
 - i965/fs: Implement image load, store and atomic.
 - i965: Teach type_size() about the size of an image uniform.
 - i965: Implement logic to set up and upload an image uniform.

From the state upload series [1]:
 - i965: Implement surface state set-up for shader images.
 - i965: Define and initialize image parameter structure.
 - i965: Reserve enough parameter entries for all image uniforms used in the 
program.
 - i965: Hook up image state upload.

From a two-patch series of SKL-specific fixes [2]:
 - i965: Fix brw_memory_barrier() for SKL.
 - i965: Add SKL support to brw_miptree_get_horizontal_slice_pitch().

[1] http://lists.freedesktop.org/archives/mesa-dev/2015-February/076392.html
[2] http://lists.freedesktop.org/archives/mesa-dev/2015-May/084141.html

> --Jason
>
> On Thu, Jul 23, 2015 at 6:58 AM, Francisco Jerez  
> wrote:
>> Jason Ekstrand  writes:
>>
>>> *whew*, I've made it through the entire series...
>>>
>> Thanks!
>>
>>> On Tue, Jul 21, 2015 at 9:38 AM, Francisco Jerez  
>>> wrote:
 Another resend of the i965 compiler-related changes for
 ARB_shader_image_load_store, reworked to make use of the SIMD lowering
 infrastructure introduced in a previous series [1].  For a
 self-contained branch in testable form see [2].

 [1] http://lists.freedesktop.org/archives/mesa-dev/2015-July/089009.html
 [2] 
 http://cgit.freedesktop.org/~currojerez/mesa/log/?h=image-load-store-lower

 [PATCH 01/20] i965/fs: Define logical typed and untyped surface opcodes.
 [PATCH 02/20] i965/fs: Hook up SIMD lowering to unroll surface 
 instructions of unsupported width.
 [PATCH 03/20] i965/fs: Implement lowering of logical surface instructions.
 [PATCH 04/20] i965/fs: Handle zero-size allocations in fs_builder::vgrf().
 [PATCH 05/20] i965/fs: Import surface message builder helper functions.
 [PATCH 06/20] i965/fs: Import image access validity checks.
>>>
>>> The above are all
>>>
>>> Reviewed-by: Jason Ekstrand 
>>>
 [PATCH 07/20] i965/fs: Import image memory offset calculation code.
>>>
>>> As I said in the e-mail, this needs a *lot* more comments.  I'll
>>> re-try the review once you've actually explained what it's doing.
>>>
 [PATCH 08/20] i965/fs: Import image format metadata queries.
>>>
>>> Reviewed-by: Jason Ekstrand 
>>>
 [PATCH 09/20] i965/fs: Import image format conversion primitives.
>>>
>>> I had some minor concerns here.
>>>
 [PATCH 10/20] i965/fs: Implement image load, store and atomic.
>>>
>>> This looks pretty good.  However, I had some concerns about things
>>> getting out-of-sync with state setup that I'd like to see addressed.
>>>
 [PATCH 11/20] i965/fs: Revisit NIR atomic counter intrinsic translation.
 [PATCH 12/20] i965/fs: Drop unused untyped surface read and atomic emit 
 methods.
>>>
>>> Reviewed-by: Jason Ekstrand 
>>>
 [PATCH 13/20] i965: Teach type_size() about the size of an image uniform.
 [PATCH 14/20] i965: Implement logic to set up and upload an image uniform.
>>>
>>> I'm not particularly familiar with how mesa's uniform setup works.  Ken?
>>>
 [PATCH 15/20] i965/fs: Don't overwrite fs_visitor::uniforms and 
 ::param_size during the SIMD16 run.
 [PATCH 16/20] i965/fs: Execute nir_setup_uniforms, _inputs and _outputs 
 unconditionally.
>>>
>>> Reviewed-by: Jason Ekstrand 
>>>
 [PATCH 17/20] i965/fs: Handle image uniforms in NIR programs.
>>>
>>> Reviewed-by: Jason Ekstrand 
>>>
 [PATCH 18/20] i965/fs: Translate image load, store and atomic NIR 
 intrinsics.
>>>
>>> I had some comments on this one regarding encapsulation that I'd like
>>> to see addressed.
>>>
 [PATCH 19/20] i965/fs: Translate memory barrier NIR intrinsics.
>>>
>>> Reviewed-by: Jason Ekstrand 
>>>
 [PATCH 20/20] i965: Expose ARB_shader_image_load_store.
>>>
>>> I'm going to wait until the rest of it is reviewed to add mine to that
>>> little patch. :-)
>>>
 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965/fs: Fix regression with SIMD8 VS since b5f1a48e234d47b24df38cb562cffb8941d43795.

2015-07-31 Thread Lofstedt, Marta
Well, I could try, but I don't believe I have enough cred yet, to make any 
difference.

Reviewed-by: "Lofstedt, Marta" 

> -Original Message-
> From: Francisco Jerez [mailto:curroje...@riseup.net]
> Sent: Friday, July 31, 2015 2:07 PM
> To: Lofstedt, Marta; mesa-dev@lists.freedesktop.org
> Subject: RE: [PATCH] i965/fs: Fix regression with SIMD8 VS since
> b5f1a48e234d47b24df38cb562cffb8941d43795.
> 
> "Lofstedt, Marta"  writes:
> 
> > In hope of speeding up the  review and merge of this bug fix, I
> > confirm that below patch fix the regression I have had from:
> > "b5f1a48e234d47b24df38cb562cffb8941d43795" on Gen8.
> >
> Wouldn't you feel like reviewing it yourself? :)
> 
> > /Marta
> >
> >> -Original Message-
> >> From: Francisco Jerez [mailto:curroje...@riseup.net]
> >> Sent: Thursday, July 30, 2015 2:23 PM
> >> To: mesa-dev@lists.freedesktop.org
> >> Cc: Lofstedt, Marta
> >> Subject: [PATCH] i965/fs: Fix regression with SIMD8 VS since
> >> b5f1a48e234d47b24df38cb562cffb8941d43795.
> >>
> >> With num_direct_uniforms == 0 there's no space allocated in the
> >> param_size array for the one block of direct uniforms -- On the FS
> >> stage this would be a harmless no-op because it would simply re-set
> >> one of the param_size entries allocated for the sampler units to
> >> zero, but on the VS stage it has been reported to cause memory
> >> corruption followed by a crash -- Surprising how a full piglit run on Gen8
> didn't catch it.
> >>
> >> Reported-by: Marta Lofstedt 
> >> ---
> >>  src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 3 ++-
> >>  1 file changed, 2 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> >> b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> >> index 722e4e7..9cb7b0d 100644
> >> --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> >> +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> >> @@ -182,7 +182,8 @@ fs_visitor::nir_setup_uniforms(nir_shader
> *shader)
> >> /* We split the uniform register file in half.  The first half is
> >>  * entirely direct uniforms.  The second half is indirect.
> >>  */
> >> -   param_size[0] = num_direct_uniforms;
> >> +   if (num_direct_uniforms > 0)
> >> +  param_size[0] = num_direct_uniforms;
> >> if (shader->num_uniforms > num_direct_uniforms)
> >>param_size[num_direct_uniforms] = shader->num_uniforms -
> >> num_direct_uniforms;
> >>
> >> --
> >> 2.4.6
> >
> > --
> > Intel Sweden AB
> > Registered Office: Knarrarnasgatan 15, 164 40 Kista, Stockholm, Sweden
> > Registration Number: 556189-6027
> >
> > This e-mail and any attachments may contain confidential material for
> > the sole use of the intended recipient(s). Any review or distribution
> > by others is strictly prohibited. If you are not the intended
> > recipient, please contact the sender and delete all copies.
--
Intel Sweden AB
Registered Office: Knarrarnasgatan 15, 164 40 Kista, Stockholm, Sweden
Registration Number: 556189-6027

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/3] egl/x11: fix use of EGL_BAD_NATIVE_WINDOW

2015-07-31 Thread Frank Binns


On 31/07/15 12:53, Emil Velikov wrote:
> On 31 July 2015 at 11:02, Frank Binns  wrote:
>> Commit 4ed23fd590 introduced some calls to _eglError inappropriately
>> passing it EGL_BAD_NATIVE_WINDOW. This was actually harmless in two of the
>> cases as _eglError gets called later on with a more appropriate error code
>> but (just to be safe) switch these to _eglLog calls instead. In the
>> remaining case, change the error to EGL_BAD_ALLOC instead as this seems
>> more appropriate.
>>
> Hmm... EGL_BAD_NATIVE_WINDOW might not be the best, but EGL_BAD_ALLOC
> doesn't seem much better.
>
> There is no memory allocation happening there, so I'm not sure why you
> believe it more appropriate. I'm wondering about EGL_BAD_PARAMETER,
> yet it's very unlikely we'll ever get here. Did you see any uses that
> trigger this ?
>
> -Emil
The errors available to us, i.e. the ones mentioned in the EGL spec, are:
- EGL_BAD_MATCH
- EGL_BAD_CONFIG
- EGL_BAD_NATIVE_WINDOW
- EGL_BAD_ALLOC

I believe there are two possible ways we can end up going down this path:
1) xcb_get_setup fails due to a previous error on the connection. The
error that appears most in the xcb code is XCB_CONN_CLOSED_MEM_INSUFFICIENT.

2) The 'rem' field of xcb_screen_iterator_t is 0.

In both cases EGL_BAD_ALLOC seems like the most appropriate error.

I've not actually hit this error path but spotted it when making the
fixes in patches 2 and 3.

Thanks
Frank
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 10/20] i965/fs: Implement image load, store and atomic.

2015-07-31 Thread Francisco Jerez
Jason Ekstrand  writes:

> On Thu, Jul 23, 2015 at 4:38 AM, Francisco Jerez  
> wrote:
>> Jason Ekstrand  writes:
>>
>>> This all looks correct as far as I can tell.  However, I'm very
>>> concerned about the number of checks such as
>>> has_matching_typed_format() that are built-in to the compiler (via
>>> surface_builder) where we then go on to do something that is highly
>>> dependant on state setup doing the exact same check (not via
>>> surface_builder) and doing the right thing.  Another example would be
>>> the interplay with has_split_bit_layout.  How do we know these won't
>>> get out of sync?
>>>
>> They cannot get out of sync because the policy which surface format to
>> use for a given GLSL format is shared between the compiler and the
>> state-setup code, see brw_lower_mesa_image_format() in "i965: Implement
>> surface state set-up for shader images.".
>
> I came back to this today and looked at it again.  I think my main
> objection is that all of these metadata functions work in terms of a
> devinfo and a format and then immediately call
> brw_lower_mesa_image_format.  Why don't they work instead in terms
> comparing the lowered format to the nominal format?  Then it would be
> a lot more obvious that you're doing a transformation from one format
> to another.  As is, it's a bunch of magic "what do I have to do next"
> queries 

I don't think any of these functions implies what to do next, they all
encode clearly defined device-specific properties of the formats:

 - Does the device support any format with the same bit layout?
   (has_supported_bit_layout())

 - Does the device support any format with the same bit layout *and*
   encoding? (is_conversion_trivial())

 - Does the device support a typed format of the same size?
   (has_matching_typed_format())

 - Does the device split the pixel data into a number of discontiguous
   segments? (has_split_bit_layout())

 - Does the device return garbage in the unused bits when reading from
   the given format? (has_undefined_high_bits())

 - Do all fields of the format have the same size? (This is not even
   device-specific)

 - Is the format represented as a signed integer in memory? (Ditto)

You may have noticed that except for the last two the natural subject
and object of all questions are "device" and "format" respectively --
I'm not sure I see how rephrasing them to involve the lowered format
(that in fact a number of them don't care about) would make any of these
questions easier to understand.

> which, while in the same namespace, are in a different file.

I happen to have cleaned that up to be in the same file a short while
ago -- They're no longer used by anything else outside of
brw_fs_surface_builder.cpp:
http://cgit.freedesktop.org/~currojerez/mesa/commit/?h=image-load-store-lower&id=6b83876a343cabb89083212d20d5b3721b10c200

> It would also be more efficient because you would only call
> brw_lower_mesa_image_format once.

A more elegant way to "fix" that would be to mark
brw_lower_mesa_image_format() as pure, or enable LTO -- It's not like it
would make any measurable difference in practice though.

>
> Does that make more sense?
> --Jason
>
>> Thanks.
>>
>>> On Tue, Jul 21, 2015 at 9:38 AM, Francisco Jerez  
>>> wrote:
 v2: Drop VEC4 suport.
 v3: Rebase.
 ---
  .../drivers/dri/i965/brw_fs_surface_builder.cpp| 216 
 +
  src/mesa/drivers/dri/i965/brw_fs_surface_builder.h |  17 ++
  2 files changed, 233 insertions(+)

 diff --git a/src/mesa/drivers/dri/i965/brw_fs_surface_builder.cpp 
 b/src/mesa/drivers/dri/i965/brw_fs_surface_builder.cpp
 index ea1c4aa..46b449f 100644
 --- a/src/mesa/drivers/dri/i965/brw_fs_surface_builder.cpp
 +++ b/src/mesa/drivers/dri/i965/brw_fs_surface_builder.cpp
 @@ -587,3 +587,219 @@ namespace {
}
 }
  }
 +
 +namespace brw {
 +   namespace image_access {
 +  /**
 +   * Load a vector from a surface of the given format and 
 dimensionality
 +   * at the given coordinates.
 +   */
 +  fs_reg
 +  emit_image_load(const fs_builder &bld,
 +  const fs_reg &image, const fs_reg &addr,
 +  mesa_format format, unsigned dims)
 +  {
 + using namespace image_format_info;
 + using namespace image_format_conversion;
 + using namespace image_validity;
 + using namespace image_coordinates;
 + using namespace surface_access;
 + const brw_device_info *devinfo = bld.shader->devinfo;
 + const mesa_format lower_format =
 +brw_lower_mesa_image_format(devinfo, format);
 + fs_reg tmp;
 +
 + if (has_matching_typed_format(devinfo, format)) {
 +/* Hopefully we get here most of the time... */
 +tmp = emit_typed_read(bld, image, addr, dims,
 +  

Re: [Mesa-dev] [Mesa-dev, 1/9] mesa/es3.1: Allow binding GL_DRAW_INDIRECT_BUFFER with gles 3.1

2015-07-31 Thread Lofstedt, Marta
Thanks Tapani,

However, 

For patch 9 there is a V3 for which I can't see any new objections:
http://patchwork.freedesktop.org/patch/51879/

If you have any new ones could you please clarify.

My interpretation of the comments on patch 6, is that it was OK as it is. 
Please clarify if you don't agree.

> -Original Message-
> From: mesa-dev [mailto:mesa-dev-boun...@lists.freedesktop.org] On
> Behalf Of Tapani Pälli
> Sent: Friday, July 31, 2015 10:28 AM
> To: Marta Löfstedt; mesa-dev@lists.freedesktop.org
> Subject: Re: [Mesa-dev] [Mesa-dev, 1/9] mesa/es3.1: Allow binding
> GL_DRAW_INDIRECT_BUFFER with gles 3.1
> 
> I've gone through this set (and using them regularly to be able to run 3.1es
> conformance tests);
> 
> Patches 1,2,3,4,5,7,8
> 
> Reviewed-by: Tapani Pälli 
> 
> I believe with 6 and 9 there are changes required.
> 
> On 05/11/2015 04:03 PM, Marta Löfstedt wrote:
> > From: Marta Lofstedt 
> >
> > Signed-off-by: Marta Lofstedt 
> >
> > ---
> > src/mesa/main/bufferobj.c | 5 +++--
> >   1 file changed, 3 insertions(+), 2 deletions(-)
> >
> > diff --git a/src/mesa/main/bufferobj.c b/src/mesa/main/bufferobj.c
> > index 66dee68..07f82cd 100644
> > --- a/src/mesa/main/bufferobj.c
> > +++ b/src/mesa/main/bufferobj.c
> > @@ -91,8 +91,9 @@ get_buffer_target(struct gl_context *ctx, GLenum
> target)
> >  case GL_COPY_WRITE_BUFFER:
> > return &ctx->CopyWriteBuffer;
> >  case GL_DRAW_INDIRECT_BUFFER:
> > -  if (ctx->API == API_OPENGL_CORE &&
> > -  ctx->Extensions.ARB_draw_indirect) {
> > +  if ((ctx->API == API_OPENGL_CORE &&
> > +   ctx->Extensions.ARB_draw_indirect) ||
> > +   _mesa_is_gles31(ctx)) {
> >return &ctx->DrawIndirectBuffer;
> > }
> > break;
> >
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
--
Intel Sweden AB
Registered Office: Knarrarnasgatan 15, 164 40 Kista, Stockholm, Sweden
Registration Number: 556189-6027

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965/fs: Fix regression with SIMD8 VS since b5f1a48e234d47b24df38cb562cffb8941d43795.

2015-07-31 Thread Francisco Jerez
"Lofstedt, Marta"  writes:

> Well, I could try, but I don't believe I have enough cred yet, to make any 
> difference.
>
> Reviewed-by: "Lofstedt, Marta" 
>
It's a fine R-b, and this seems important enough that we wouldn't want
to delay the fix any further, I'll push it shortly. :)

>> -Original Message-
>> From: Francisco Jerez [mailto:curroje...@riseup.net]
>> Sent: Friday, July 31, 2015 2:07 PM
>> To: Lofstedt, Marta; mesa-dev@lists.freedesktop.org
>> Subject: RE: [PATCH] i965/fs: Fix regression with SIMD8 VS since
>> b5f1a48e234d47b24df38cb562cffb8941d43795.
>> 
>> "Lofstedt, Marta"  writes:
>> 
>> > In hope of speeding up the  review and merge of this bug fix, I
>> > confirm that below patch fix the regression I have had from:
>> > "b5f1a48e234d47b24df38cb562cffb8941d43795" on Gen8.
>> >
>> Wouldn't you feel like reviewing it yourself? :)
>> 
>> > /Marta
>> >
>> >> -Original Message-
>> >> From: Francisco Jerez [mailto:curroje...@riseup.net]
>> >> Sent: Thursday, July 30, 2015 2:23 PM
>> >> To: mesa-dev@lists.freedesktop.org
>> >> Cc: Lofstedt, Marta
>> >> Subject: [PATCH] i965/fs: Fix regression with SIMD8 VS since
>> >> b5f1a48e234d47b24df38cb562cffb8941d43795.
>> >>
>> >> With num_direct_uniforms == 0 there's no space allocated in the
>> >> param_size array for the one block of direct uniforms -- On the FS
>> >> stage this would be a harmless no-op because it would simply re-set
>> >> one of the param_size entries allocated for the sampler units to
>> >> zero, but on the VS stage it has been reported to cause memory
>> >> corruption followed by a crash -- Surprising how a full piglit run on Gen8
>> didn't catch it.
>> >>
>> >> Reported-by: Marta Lofstedt 
>> >> ---
>> >>  src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 3 ++-
>> >>  1 file changed, 2 insertions(+), 1 deletion(-)
>> >>
>> >> diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
>> >> b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
>> >> index 722e4e7..9cb7b0d 100644
>> >> --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
>> >> +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
>> >> @@ -182,7 +182,8 @@ fs_visitor::nir_setup_uniforms(nir_shader
>> *shader)
>> >> /* We split the uniform register file in half.  The first half is
>> >>  * entirely direct uniforms.  The second half is indirect.
>> >>  */
>> >> -   param_size[0] = num_direct_uniforms;
>> >> +   if (num_direct_uniforms > 0)
>> >> +  param_size[0] = num_direct_uniforms;
>> >> if (shader->num_uniforms > num_direct_uniforms)
>> >>param_size[num_direct_uniforms] = shader->num_uniforms -
>> >> num_direct_uniforms;
>> >>
>> >> --
>> >> 2.4.6
>> >
>> > --
>> > Intel Sweden AB
>> > Registered Office: Knarrarnasgatan 15, 164 40 Kista, Stockholm, Sweden
>> > Registration Number: 556189-6027
>> >
>> > This e-mail and any attachments may contain confidential material for
>> > the sole use of the intended recipient(s). Any review or distribution
>> > by others is strictly prohibited. If you are not the intended
>> > recipient, please contact the sender and delete all copies.
> --
> Intel Sweden AB
> Registered Office: Knarrarnasgatan 15, 164 40 Kista, Stockholm, Sweden
> Registration Number: 556189-6027
>
> This e-mail and any attachments may contain confidential material for
> the sole use of the intended recipient(s). Any review or distribution
> by others is strictly prohibited. If you are not the intended
> recipient, please contact the sender and delete all copies.


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] clover: handle setKernelArg errors

2015-07-31 Thread Francisco Jerez
Zoltan Gilian  writes:

> ---
>  src/gallium/state_trackers/clover/core/kernel.cpp | 15 +++
>  1 file changed, 15 insertions(+)
>
> diff --git a/src/gallium/state_trackers/clover/core/kernel.cpp 
> b/src/gallium/state_trackers/clover/core/kernel.cpp
> index a23cd2b..820a80a 100644
> --- a/src/gallium/state_trackers/clover/core/kernel.cpp
> +++ b/src/gallium/state_trackers/clover/core/kernel.cpp
> @@ -366,6 +366,9 @@ kernel::scalar_argument::scalar_argument(size_t size) : 
> size(size) {
>  
>  void
>  kernel::scalar_argument::set(size_t size, const void *value) {
> +   if (!value)
> +  throw error(CL_INVALID_ARG_VALUE);
> +
> if (size != this->size)
>throw error(CL_INVALID_ARG_SIZE);
>  
> @@ -434,6 +437,9 @@ kernel::local_argument::set(size_t size, const void 
> *value) {
> if (value)
>throw error(CL_INVALID_ARG_VALUE);
>  
> +   if (!size)
> +  throw error(CL_INVALID_ARG_SIZE);
> +
> _storage = size;
> _set = true;
>  }
> @@ -493,6 +499,9 @@ kernel::constant_argument::unbind(exec_context &ctx) {
>  
>  void
>  kernel::image_rd_argument::set(size_t size, const void *value) {
> +   if (!value)
> +  throw error(CL_INVALID_ARG_VALUE);
> +
> if (size != sizeof(cl_mem))
>throw error(CL_INVALID_ARG_SIZE);
>  
> @@ -521,6 +530,9 @@ kernel::image_rd_argument::unbind(exec_context &ctx) {
>  
>  void
>  kernel::image_wr_argument::set(size_t size, const void *value) {
> +   if (!value)
> +  throw error(CL_INVALID_ARG_VALUE);
> +
> if (size != sizeof(cl_mem))
>throw error(CL_INVALID_ARG_SIZE);
>  
> @@ -549,6 +561,9 @@ kernel::image_wr_argument::unbind(exec_context &ctx) {
>  
>  void
>  kernel::sampler_argument::set(size_t size, const void *value) {
> +   if (!value)
> +  throw error(CL_INVALID_SAMPLER);
> +
> if (size != sizeof(cl_sampler))
>throw error(CL_INVALID_ARG_SIZE);
>  
> -- 
> 2.4.6

Reviewed-by: Francisco Jerez 


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] clover: handle setKernelArg errors

2015-07-31 Thread Zoltán Gilián
Could you please commit this? I don't have permissions.

On Fri, Jul 31, 2015 at 3:55 PM, Francisco Jerez  wrote:
> Zoltan Gilian  writes:
>
>> ---
>>  src/gallium/state_trackers/clover/core/kernel.cpp | 15 +++
>>  1 file changed, 15 insertions(+)
>>
>> diff --git a/src/gallium/state_trackers/clover/core/kernel.cpp 
>> b/src/gallium/state_trackers/clover/core/kernel.cpp
>> index a23cd2b..820a80a 100644
>> --- a/src/gallium/state_trackers/clover/core/kernel.cpp
>> +++ b/src/gallium/state_trackers/clover/core/kernel.cpp
>> @@ -366,6 +366,9 @@ kernel::scalar_argument::scalar_argument(size_t size) : 
>> size(size) {
>>
>>  void
>>  kernel::scalar_argument::set(size_t size, const void *value) {
>> +   if (!value)
>> +  throw error(CL_INVALID_ARG_VALUE);
>> +
>> if (size != this->size)
>>throw error(CL_INVALID_ARG_SIZE);
>>
>> @@ -434,6 +437,9 @@ kernel::local_argument::set(size_t size, const void 
>> *value) {
>> if (value)
>>throw error(CL_INVALID_ARG_VALUE);
>>
>> +   if (!size)
>> +  throw error(CL_INVALID_ARG_SIZE);
>> +
>> _storage = size;
>> _set = true;
>>  }
>> @@ -493,6 +499,9 @@ kernel::constant_argument::unbind(exec_context &ctx) {
>>
>>  void
>>  kernel::image_rd_argument::set(size_t size, const void *value) {
>> +   if (!value)
>> +  throw error(CL_INVALID_ARG_VALUE);
>> +
>> if (size != sizeof(cl_mem))
>>throw error(CL_INVALID_ARG_SIZE);
>>
>> @@ -521,6 +530,9 @@ kernel::image_rd_argument::unbind(exec_context &ctx) {
>>
>>  void
>>  kernel::image_wr_argument::set(size_t size, const void *value) {
>> +   if (!value)
>> +  throw error(CL_INVALID_ARG_VALUE);
>> +
>> if (size != sizeof(cl_mem))
>>throw error(CL_INVALID_ARG_SIZE);
>>
>> @@ -549,6 +561,9 @@ kernel::image_wr_argument::unbind(exec_context &ctx) {
>>
>>  void
>>  kernel::sampler_argument::set(size_t size, const void *value) {
>> +   if (!value)
>> +  throw error(CL_INVALID_SAMPLER);
>> +
>> if (size != sizeof(cl_sampler))
>>throw error(CL_INVALID_ARG_SIZE);
>>
>> --
>> 2.4.6
>
> Reviewed-by: Francisco Jerez 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] clover: move find_kernels to functions

2015-07-31 Thread Zoltán Gilián
Could you please commit this?

On Mon, Jul 27, 2015 at 1:20 PM, Francisco Jerez  wrote:
> Zoltan Gilian  writes:
>
>> ---
>>  .../state_trackers/clover/llvm/invocation.cpp  | 28 
>> --
>>  1 file changed, 15 insertions(+), 13 deletions(-)
>>
>> diff --git a/src/gallium/state_trackers/clover/llvm/invocation.cpp 
>> b/src/gallium/state_trackers/clover/llvm/invocation.cpp
>> index 967284d..924cb36 100644
>> --- a/src/gallium/state_trackers/clover/llvm/invocation.cpp
>> +++ b/src/gallium/state_trackers/clover/llvm/invocation.cpp
>> @@ -269,17 +269,19 @@ namespace {
>>  #endif
>> }
>>
>> -   void
>> -   find_kernels(llvm::Module *mod, std::vector &kernels) {
>> +   std::vector
>> +   find_kernels(const llvm::Module *mod) {
>>const llvm::NamedMDNode *kernel_node =
>>   mod->getNamedMetadata("opencl.kernels");
>>// This means there are no kernels in the program.  The spec does not
>>// require that we return an error here, but there will be an error if
>>// the user tries to pass this program to a clCreateKernel() call.
>>if (!kernel_node) {
>> - return;
>> + return std::vector();
>>}
>>
>> +  std::vector kernels;
>> +  kernels.reserve(kernel_node->getNumOperands());
>>for (unsigned i = 0; i < kernel_node->getNumOperands(); ++i) {
>>  #if HAVE_LLVM >= 0x0306
>>   kernels.push_back(llvm::mdconst::dyn_extract(
>> @@ -288,11 +290,11 @@ namespace {
>>  #endif
>>  
>> kernel_node->getOperand(i)->getOperand(0)));
>>}
>> +  return kernels;
>> }
>>
>> void
>> -   optimize(llvm::Module *mod, unsigned optimization_level,
>> -const std::vector &kernels) {
>> +   optimize(llvm::Module *mod, unsigned optimization_level) {
>>
>>  #if HAVE_LLVM >= 0x0307
>>llvm::legacy::PassManager PM;
>> @@ -300,6 +302,8 @@ namespace {
>>llvm::PassManager PM;
>>  #endif
>>
>> +  const std::vector kernels = find_kernels(mod);
>> +
>>// Add a function internalizer pass.
>>//
>>// By default, the function internalizer pass will look for a function
>> @@ -435,7 +439,6 @@ namespace {
>>
>> module
>> build_module_llvm(llvm::Module *mod,
>> - const std::vector &kernels,
>>   clang::LangAS::Map& address_spaces) {
>>
>>module m;
>> @@ -447,6 +450,7 @@ namespace {
>>llvm::WriteBitcodeToFile(mod, bitcode_ostream);
>>bitcode_ostream.flush();
>>
>> +  const std::vector kernels = find_kernels(mod);
>>for (unsigned i = 0; i < kernels.size(); ++i) {
>>   std::string kernel_name = kernels[i]->getName();
>>   std::vector args =
>> @@ -610,10 +614,11 @@ namespace {
>> module
>> build_module_native(std::vector &code,
>> const llvm::Module *mod,
>> -   const std::vector &kernels,
>> const clang::LangAS::Map &address_spaces,
>> std::string &r_log) {
>>
>> +  const std::vector kernels = find_kernels(mod);
>> +
>>std::map kernel_offsets =
>>  get_kernel_offsets(code, kernels, r_log);
>>
>> @@ -697,7 +702,6 @@ clover::compile_program_llvm(const std::string &source,
>>
>> init_targets();
>>
>> -   std::vector kernels;
>> size_t processor_str_len = std::string(target).find_first_of("-");
>> std::string processor(target, 0, processor_str_len);
>> std::string triple(target, processor_str_len + 1,
>> @@ -717,9 +721,7 @@ clover::compile_program_llvm(const std::string &source,
>>  triple, processor, opts, address_spaces,
>>  optimization_level, r_log);
>>
>> -   find_kernels(mod, kernels);
>> -
>> -   optimize(mod, optimization_level, kernels);
>> +   optimize(mod, optimization_level);
>>
>> if (get_debug_flags() & DBG_LLVM) {
>>std::string log;
>> @@ -738,13 +740,13 @@ clover::compile_program_llvm(const std::string &source,
>>   m = module();
>>   break;
>>case PIPE_SHADER_IR_LLVM:
>> - m = build_module_llvm(mod, kernels, address_spaces);
>> + m = build_module_llvm(mod, address_spaces);
>>   break;
>>case PIPE_SHADER_IR_NATIVE: {
>>   std::vector code = compile_native(mod, triple, processor,
>>   get_debug_flags() & 
>> DBG_ASM,
>>   r_log);
>> - m = build_module_native(code, mod, kernels, address_spaces, r_log);
>> + m = build_module_native(code, mod, address_spaces, r_log);
>>   break;
>>}
>> }
>> --
>> 2.4.6
>
> Looks good,
> Reviewed-by: Francisco Jerez 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-d

Re: [Mesa-dev] [PATCH 2/2] clover: pass image attributes to the kernel

2015-07-31 Thread Zoltán Gilián
Could you please commit this?

On Mon, Jul 27, 2015 at 1:28 PM, Francisco Jerez  wrote:
> Zoltan Gilian  writes:
>
>> Read-only and write-only image arguments are recognized and
>> distinguished.
>> Attributes of the image arguments are passed to the kernel as implicit
>> arguments.
>> ---
>>  src/gallium/state_trackers/clover/core/kernel.cpp  |  28 +
>>  src/gallium/state_trackers/clover/core/kernel.hpp  |  15 ++-
>>  src/gallium/state_trackers/clover/core/module.hpp  |   4 +-
>>  .../state_trackers/clover/llvm/invocation.cpp  | 135 
>> -
>>  4 files changed, 171 insertions(+), 11 deletions(-)
>>
>> diff --git a/src/gallium/state_trackers/clover/core/kernel.cpp 
>> b/src/gallium/state_trackers/clover/core/kernel.cpp
>> index 0756f06..955ff7b 100644
>> --- a/src/gallium/state_trackers/clover/core/kernel.cpp
>> +++ b/src/gallium/state_trackers/clover/core/kernel.cpp
>> @@ -182,6 +182,34 @@ kernel::exec_context::bind(intrusive_ptr 
>> _q,
>>   }
>>   break;
>>}
>> +  case module::argument::image_size: {
>> + auto img = dynamic_cast(**(explicit_arg - 
>> 1)).get();
>> + std::vector image_size{
>> +   static_cast(img->width()),
>> +   static_cast(img->height()),
>> +   static_cast(img->depth())};
>> + for (auto x : image_size) {
>> +auto arg = argument::create(marg);
>> +
>> +arg->set(sizeof(x), &x);
>> +arg->bind(*this, marg);
>> + }
>> + break;
>> +  }
>> +  case module::argument::image_format: {
>> + auto img = dynamic_cast(**(explicit_arg - 
>> 1)).get();
>> + cl_image_format fmt = img->format();
>> + std::vector image_format{
>> +   static_cast(fmt.image_channel_data_type),
>> +   static_cast(fmt.image_channel_order)};
>> + for (auto x : image_format) {
>> +auto arg = argument::create(marg);
>> +
>> +arg->set(sizeof(x), &x);
>> +arg->bind(*this, marg);
>> + }
>> + break;
>> +  }
>>}
>> }
>>
>> diff --git a/src/gallium/state_trackers/clover/core/kernel.hpp 
>> b/src/gallium/state_trackers/clover/core/kernel.hpp
>> index d6432a4..4ba6ff4 100644
>> --- a/src/gallium/state_trackers/clover/core/kernel.hpp
>> +++ b/src/gallium/state_trackers/clover/core/kernel.hpp
>> @@ -190,7 +190,16 @@ namespace clover {
>>   pipe_surface *st;
>>};
>>
>> -  class image_rd_argument : public argument {
>> +  class image_argument : public argument {
>> +  public:
>> + const image *get() const {
>> +return img;
>> + }
>> +  protected:
>> + image *img;
>> +  };
>> +
>> +  class image_rd_argument : public image_argument {
>>public:
>>   virtual void set(size_t size, const void *value);
>>   virtual void bind(exec_context &ctx,
>> @@ -198,11 +207,10 @@ namespace clover {
>>   virtual void unbind(exec_context &ctx);
>>
>>private:
>> - image *img;
>>   pipe_sampler_view *st;
>>};
>>
>> -  class image_wr_argument : public argument {
>> +  class image_wr_argument : public image_argument {
>>public:
>>   virtual void set(size_t size, const void *value);
>>   virtual void bind(exec_context &ctx,
>> @@ -210,7 +218,6 @@ namespace clover {
>>   virtual void unbind(exec_context &ctx);
>>
>>private:
>> - image *img;
>>   pipe_surface *st;
>>};
>>
>> diff --git a/src/gallium/state_trackers/clover/core/module.hpp 
>> b/src/gallium/state_trackers/clover/core/module.hpp
>> index 9d65688..5db0548 100644
>> --- a/src/gallium/state_trackers/clover/core/module.hpp
>> +++ b/src/gallium/state_trackers/clover/core/module.hpp
>> @@ -72,7 +72,9 @@ namespace clover {
>>   enum semantic {
>>  general,
>>  grid_dimension,
>> -grid_offset
>> +grid_offset,
>> +image_size,
>> +image_format
>>   };
>>
>>   argument(enum type type, size_t size,
>> diff --git a/src/gallium/state_trackers/clover/llvm/invocation.cpp 
>> b/src/gallium/state_trackers/clover/llvm/invocation.cpp
>> index 924cb36..86859af 100644
>> --- a/src/gallium/state_trackers/clover/llvm/invocation.cpp
>> +++ b/src/gallium/state_trackers/clover/llvm/invocation.cpp
>> @@ -344,18 +344,91 @@ namespace {
>>PM.run(*mod);
>> }
>>
>> +   // Kernel metadata
>> +
>> +   const llvm::MDNode *
>> +   get_kernel_metadata(const llvm::Function *kernel_func) {
>> +  auto mod = kernel_func->getParent();
>> +  auto kernels_node = mod->getNamedMetadata("opencl.kernels");
>> +  if (!kernels_node) {
>> + return nullptr;
>> +  }
>> +
>> +  const llvm::MDNode *kernel_node = nullptr;
>> +  for (unsigned i = 0; i < kernels_node->getNumOperands(); ++i) {
>> +#if HAVE_LLVM >= 0x0306
>> +

Re: [Mesa-dev] [PATCH 1/2] clover: make dispatch matches functions def

2015-07-31 Thread Francisco Jerez
EdB  writes:

> ---
>  src/gallium/state_trackers/clover/api/dispatch.hpp | 23 
> +-
>  1 file changed, 18 insertions(+), 5 deletions(-)
>
> diff --git a/src/gallium/state_trackers/clover/api/dispatch.hpp 
> b/src/gallium/state_trackers/clover/api/dispatch.hpp
> index ffae1ae..781b54e 100644
> --- a/src/gallium/state_trackers/clover/api/dispatch.hpp
> +++ b/src/gallium/state_trackers/clover/api/dispatch.hpp
> @@ -693,7 +693,13 @@ struct _cl_icd_dispatch {
> CL_API_ENTRY cl_int (CL_API_CALL *clUnloadPlatformCompiler)(
>cl_platform_id platform);
>  
> -   void *clGetKernelArgInfo;
> +   CL_API_ENTRY cl_int (CL_API_CALL *clGetKernelArgInfo)(
> +  cl_kernel kernel,
> +  cl_uint arg_indx,
> +  cl_kernel_arg_info  param_name,
> +  size_t param_value_size,
> +  void * param_value,

No space after '*'.

> +  size_t * param_value_size_ret);
>  
Same here.  With these fixed this patch is:

Reviewed-by: Francisco Jerez 

> CL_API_ENTRY cl_int (CL_API_CALL *clEnqueueFillBuffer)(
>cl_command_queue command_queue,
> @@ -701,7 +707,7 @@ struct _cl_icd_dispatch {
>const void *pattern,
>size_t pattern_size,
>size_t offset,
> -  size_t cb,
> +  size_t size,
>cl_uint num_events_in_wait_list,
>const cl_event *event_wait_list,
>cl_event *event);
> @@ -710,13 +716,20 @@ struct _cl_icd_dispatch {
>cl_command_queue command_queue,
>cl_mem image,
>const void *fill_color,
> -  const size_t origin[3],
> -  const size_t region[3],
> +  const size_t *origin,
> +  const size_t *region,
>cl_uint num_events_in_wait_list,
>const cl_event *event_wait_list,
>cl_event *event);
>  
> -   void *clEnqueueMigrateMemObjects;
> +   CL_API_ENTRY cl_int (CL_API_CALL *clEnqueueMigrateMemObjects)(
> +  cl_command_queue command_queue,
> +  cl_uint num_mem_objects,
> +  const cl_mem *mem_objects,
> +  cl_mem_migration_flags flags,
> +  cl_uint num_events_in_wait_list,
> +  const cl_event *event_wait_list,
> +  cl_event *event);
>  
> CL_API_ENTRY cl_int (CL_API_CALL *clEnqueueMarkerWithWaitList)(
>cl_command_queue command_queue,
> -- 
> 2.5.0.rc2


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] clover: handle setKernelArg errors

2015-07-31 Thread Francisco Jerez
Zoltán Gilián  writes:

> Could you please commit this? I don't have permissions.
>
Sure, I'll put them into my queue.

> On Fri, Jul 31, 2015 at 3:55 PM, Francisco Jerez  
> wrote:
>> Zoltan Gilian  writes:
>>
>>> ---
>>>  src/gallium/state_trackers/clover/core/kernel.cpp | 15 +++
>>>  1 file changed, 15 insertions(+)
>>>
>>> diff --git a/src/gallium/state_trackers/clover/core/kernel.cpp 
>>> b/src/gallium/state_trackers/clover/core/kernel.cpp
>>> index a23cd2b..820a80a 100644
>>> --- a/src/gallium/state_trackers/clover/core/kernel.cpp
>>> +++ b/src/gallium/state_trackers/clover/core/kernel.cpp
>>> @@ -366,6 +366,9 @@ kernel::scalar_argument::scalar_argument(size_t size) : 
>>> size(size) {
>>>
>>>  void
>>>  kernel::scalar_argument::set(size_t size, const void *value) {
>>> +   if (!value)
>>> +  throw error(CL_INVALID_ARG_VALUE);
>>> +
>>> if (size != this->size)
>>>throw error(CL_INVALID_ARG_SIZE);
>>>
>>> @@ -434,6 +437,9 @@ kernel::local_argument::set(size_t size, const void 
>>> *value) {
>>> if (value)
>>>throw error(CL_INVALID_ARG_VALUE);
>>>
>>> +   if (!size)
>>> +  throw error(CL_INVALID_ARG_SIZE);
>>> +
>>> _storage = size;
>>> _set = true;
>>>  }
>>> @@ -493,6 +499,9 @@ kernel::constant_argument::unbind(exec_context &ctx) {
>>>
>>>  void
>>>  kernel::image_rd_argument::set(size_t size, const void *value) {
>>> +   if (!value)
>>> +  throw error(CL_INVALID_ARG_VALUE);
>>> +
>>> if (size != sizeof(cl_mem))
>>>throw error(CL_INVALID_ARG_SIZE);
>>>
>>> @@ -521,6 +530,9 @@ kernel::image_rd_argument::unbind(exec_context &ctx) {
>>>
>>>  void
>>>  kernel::image_wr_argument::set(size_t size, const void *value) {
>>> +   if (!value)
>>> +  throw error(CL_INVALID_ARG_VALUE);
>>> +
>>> if (size != sizeof(cl_mem))
>>>throw error(CL_INVALID_ARG_SIZE);
>>>
>>> @@ -549,6 +561,9 @@ kernel::image_wr_argument::unbind(exec_context &ctx) {
>>>
>>>  void
>>>  kernel::sampler_argument::set(size_t size, const void *value) {
>>> +   if (!value)
>>> +  throw error(CL_INVALID_SAMPLER);
>>> +
>>> if (size != sizeof(cl_sampler))
>>>throw error(CL_INVALID_ARG_SIZE);
>>>
>>> --
>>> 2.4.6
>>
>> Reviewed-by: Francisco Jerez 


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] IROUND, math errors, etc. (was: Re: proposed patch optimized F_TO_I for powerpc platforms)

2015-07-31 Thread Roland Scheidegger
CC mesa-dev.

This looks good to me. I am starting to wonder though why we don't just
use lrintf() and let the compiler sort it out (for x86 too).
Though actually some quick experiments show that:
- llvm's clang will always use libm lrintf call. Which then will do
(x86_64) cvtss2si %xmm0,%rax as expected. Meaning the cost is probably
twice as high as it could be due to the unnecessary library call.
- gcc will also use the same library call. Unless you specify
-fno-math-errno (or some more aggressive math optimizing stuff), in
which case it will do the cvtss2si on its own. Which is fairly stupid,
because this function doesn't set errno in any case, so it could be used
independent of -fno-math-errno.

Speaking of -fno-math-errno, why don't we use that in mesa? I know the
fast math stuff can be problematic, but noone is _ever_ interested in
math error numbers.

Speaking of which, I'm not really sure why IROUND isn't doing the same.
Yes it rounds away from zero, but I doubt that matters - would probably
be better to match whatever rounding is used in hw (GL doesn't seem to
specify tie-breaker rules for round to nearest afaict).

FWIW IROUND along with even the 64bit sibling IROUND64 (and IROUND_POS)
is not even really correct in any case. There exist floats where f +
0.5f will round up to the next integer incorrectly. e.g. something like
"largest float smaller than 63.5f", 63.499f or so, if you add +0.5f
the resulting number for the hw is right between that largest float
smaller than 63.5f and 64.0f, and thus it will use the tie-breaker rule
(round to nearest even for your typical hw with typical rounding mode
set) making this 64.0, thus the rounded integer will be 64, which is
just plain wrong no matter the round-to-nearest tie breaker rule.
There are ways to fix it (the obvious one is to add 0.5 as double), but
I don't think we should even try that, and assume lrintf can do a decent
job on hw we care about (compiler not doing its job right is a pity but
might not be too bad even if it uses lib call).


Roland


Am 31.07.2015 um 11:39 schrieb Jochen Rollwagen:
> Hi,
> 
> i've produced and tested the following mesa patch for powerpc platforms
> (based on/inspired by commit 989d2e370993c87d1bbda4950657bfcc5b0a58dd
> 
> "Add an accelerated version of F_TO_I for x86_64"):
> 
> diff --git a/src/mesa/main/imports.h b/src/mesa/main/imports.h
> index 09e55eb..e4feb83 100644
> --- a/src/mesa/main/imports.h
> +++ b/src/mesa/main/imports.h
> @@ -296,6 +296,14 @@ static inline int F_TO_I(float f)
> return r;
>  #elif defined(__x86_64__)
> return _mm_cvt_ss2si(_mm_load_ss(&f));
> +#elif defined(__GNUC__) && defined(__PPC__)  
> +   long res [2] ;
> +  
> +   __asm__( "fctiw %0,%0\n\t"
> +"stfd %0,%1\n\t"
> +: "=f" (f), "=o" (res): );
> +   
> +   return res [1] ;
>  #else
> return IROUND(f);
>  #endif
> 
> 
> any chance to get this into mesa for the few other powerpc hangouts
> still around ? performance is markedly improved (although i didn't
> really measure it :-) )
> 
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 08/18] radeonsi: don't use llvm.AMDIL.fraction for FRC and DFRAC

2015-07-31 Thread Tom Stellard
On Tue, Jul 28, 2015 at 12:05:43PM +0200, Marek Olšák wrote:
> From: Marek Olšák 
> 
> There are 2 reasons for this:
> - LLVM optimization passes can work with floor
> - there are patterns to select v_fract from floor anyway
> 
> There is no change in the generated code.
> ---
>  src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c | 20 
>  1 file changed, 16 insertions(+), 4 deletions(-)
> 
> diff --git a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c 
> b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
> index 319380f..5c08cf5 100644
> --- a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
> +++ b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
> @@ -1170,6 +1170,20 @@ static void emit_dneg(
>   emit_data->args[0], "");
>  }
>  
> +static void emit_frac(
> + const struct lp_build_tgsi_action * action,
> + struct lp_build_tgsi_context * bld_base,
> + struct lp_build_emit_data * emit_data)
> +{
> + LLVMBuilderRef builder = bld_base->base.gallivm->builder;
> +
> + LLVMValueRef floor = lp_build_intrinsic(builder, "floor", 
> emit_data->dst_type,

The intrinsics name should be "llvm.floor.f32" for float and "llvm.floor.f64"
for double.

With that fixed, this is:
Reviewed-by: Tom Stellard 

> + &emit_data->args[0], 1,
> + LLVMReadNoneAttribute);
> + emit_data->output[emit_data->chan] = LLVMBuildFSub(builder,
> + emit_data->args[0], floor, "");
> +}
> +
>  static void emit_f2i(
>   const struct lp_build_tgsi_action * action,
>   struct lp_build_tgsi_context * bld_base,
> @@ -1432,8 +1446,7 @@ void radeon_llvm_context_init(struct 
> radeon_llvm_context * ctx)
>   bld_base->op_actions[TGSI_OPCODE_DABS].intr_name = "fabs";
>   bld_base->op_actions[TGSI_OPCODE_DFMA].emit = 
> build_tgsi_intrinsic_nomem;
>   bld_base->op_actions[TGSI_OPCODE_DFMA].intr_name = "llvm.fma.f64";
> - bld_base->op_actions[TGSI_OPCODE_DFRAC].emit = 
> build_tgsi_intrinsic_nomem;
> - bld_base->op_actions[TGSI_OPCODE_DFRAC].intr_name = 
> "llvm.AMDIL.fraction.";
> + bld_base->op_actions[TGSI_OPCODE_DFRAC].emit = emit_frac;
>   bld_base->op_actions[TGSI_OPCODE_DNEG].emit = emit_dneg;
>   bld_base->op_actions[TGSI_OPCODE_DSEQ].emit = emit_dcmp;
>   bld_base->op_actions[TGSI_OPCODE_DSGE].emit = emit_dcmp;
> @@ -1452,8 +1465,7 @@ void radeon_llvm_context_init(struct 
> radeon_llvm_context * ctx)
>   bld_base->op_actions[TGSI_OPCODE_FLR].intr_name = "floor";
>   bld_base->op_actions[TGSI_OPCODE_FMA].emit = build_tgsi_intrinsic_nomem;
>   bld_base->op_actions[TGSI_OPCODE_FMA].intr_name = "llvm.fma.f32";
> - bld_base->op_actions[TGSI_OPCODE_FRC].emit = build_tgsi_intrinsic_nomem;
> - bld_base->op_actions[TGSI_OPCODE_FRC].intr_name = 
> "llvm.AMDIL.fraction.";
> + bld_base->op_actions[TGSI_OPCODE_FRC].emit = emit_frac;
>   bld_base->op_actions[TGSI_OPCODE_F2I].emit = emit_f2i;
>   bld_base->op_actions[TGSI_OPCODE_F2U].emit = emit_f2u;
>   bld_base->op_actions[TGSI_OPCODE_FSEQ].emit = emit_fcmp;
> -- 
> 2.1.4
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 10/20] i965/fs: Implement image load, store and atomic.

2015-07-31 Thread Jason Ekstrand
On Fri, Jul 31, 2015 at 6:15 AM, Francisco Jerez  wrote:
> Jason Ekstrand  writes:
>
>> On Thu, Jul 23, 2015 at 4:38 AM, Francisco Jerez  
>> wrote:
>>> Jason Ekstrand  writes:
>>>
 This all looks correct as far as I can tell.  However, I'm very
 concerned about the number of checks such as
 has_matching_typed_format() that are built-in to the compiler (via
 surface_builder) where we then go on to do something that is highly
 dependant on state setup doing the exact same check (not via
 surface_builder) and doing the right thing.  Another example would be
 the interplay with has_split_bit_layout.  How do we know these won't
 get out of sync?

>>> They cannot get out of sync because the policy which surface format to
>>> use for a given GLSL format is shared between the compiler and the
>>> state-setup code, see brw_lower_mesa_image_format() in "i965: Implement
>>> surface state set-up for shader images.".
>>
>> I came back to this today and looked at it again.  I think my main
>> objection is that all of these metadata functions work in terms of a
>> devinfo and a format and then immediately call
>> brw_lower_mesa_image_format.  Why don't they work instead in terms
>> comparing the lowered format to the nominal format?  Then it would be
>> a lot more obvious that you're doing a transformation from one format
>> to another.  As is, it's a bunch of magic "what do I have to do next"
>> queries
>
> I don't think any of these functions implies what to do next, they all
> encode clearly defined device-specific properties of the formats:
>
>  - Does the device support any format with the same bit layout?
>(has_supported_bit_layout())
>
>  - Does the device support any format with the same bit layout *and*
>encoding? (is_conversion_trivial())
>
>  - Does the device support a typed format of the same size?
>(has_matching_typed_format())
>
>  - Does the device split the pixel data into a number of discontiguous
>segments? (has_split_bit_layout())
>
>  - Does the device return garbage in the unused bits when reading from
>the given format? (has_undefined_high_bits())
>
>  - Do all fields of the format have the same size? (This is not even
>device-specific)
>
>  - Is the format represented as a signed integer in memory? (Ditto)
>
> You may have noticed that except for the last two the natural subject
> and object of all questions are "device" and "format" respectively --
> I'm not sure I see how rephrasing them to involve the lowered format
> (that in fact a number of them don't care about) would make any of these
> questions easier to understand.

I *understand* the questions being asked.  My claim is this is the
*wrong* set of questions.  By the time you get to this stage, someone
higher up has made the decision that they're not going to give you
format A but will rather give you format B which they know will also
work.  (This decision is made by code that understands the hardware
limitations and you're guaranteed that B is a format you can read and
transform into A.)  The fact that that decision is passed down through
brw_lower_mesa_image_format is secondary.  At this stage, you have two
formats, A and B and need to figure out how to get from one to the
other.

For example, instead of "Does the device support a typed format of the
same size?" the better question would be "Are these two formats the
same size?".  Instead of asking "Does the device support any format
with the same bit layout *and* encoding?" you should ask "Do these two
formats have the same bit layout *and* encoding?"

Does the difference I'm making make sense?  In the current code, it
looks like you're doing device queries and making decisions about how
to work around device limitations.  Hence my original comment about
things getting out of sync.  In reality, all of those decisions have
already been made by brw_lower_mesa_image_format, this code is just
doing what it needs to do to go from format A to format B.

>> which, while in the same namespace, are in a different file.
>
> I happen to have cleaned that up to be in the same file a short while
> ago -- They're no longer used by anything else outside of
> brw_fs_surface_builder.cpp:
> http://cgit.freedesktop.org/~currojerez/mesa/commit/?h=image-load-store-lower&id=6b83876a343cabb89083212d20d5b3721b10c200
>
>> It would also be more efficient because you would only call
>> brw_lower_mesa_image_format once.
>
> A more elegant way to "fix" that would be to mark
> brw_lower_mesa_image_format() as pure, or enable LTO -- It's not like it
> would make any measurable difference in practice though.
>
>>
>> Does that make more sense?
>> --Jason
>>
>>> Thanks.
>>>
 On Tue, Jul 21, 2015 at 9:38 AM, Francisco Jerez  
 wrote:
> v2: Drop VEC4 suport.
> v3: Rebase.
> ---
>  .../drivers/dri/i965/brw_fs_surface_builder.cpp| 216 
> +
>  src/mesa/drivers/dri/i965/brw_fs_surface_builder.h |  17 ++
>

Re: [Mesa-dev] [PATCH 08/18] radeonsi: don't use llvm.AMDIL.fraction for FRC and DFRAC

2015-07-31 Thread Marek Olšák
On Fri, Jul 31, 2015 at 4:18 PM, Tom Stellard  wrote:
> On Tue, Jul 28, 2015 at 12:05:43PM +0200, Marek Olšák wrote:
>> From: Marek Olšák 
>>
>> There are 2 reasons for this:
>> - LLVM optimization passes can work with floor
>> - there are patterns to select v_fract from floor anyway
>>
>> There is no change in the generated code.
>> ---
>>  src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c | 20 
>> 
>>  1 file changed, 16 insertions(+), 4 deletions(-)
>>
>> diff --git a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c 
>> b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
>> index 319380f..5c08cf5 100644
>> --- a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
>> +++ b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
>> @@ -1170,6 +1170,20 @@ static void emit_dneg(
>>   emit_data->args[0], "");
>>  }
>>
>> +static void emit_frac(
>> + const struct lp_build_tgsi_action * action,
>> + struct lp_build_tgsi_context * bld_base,
>> + struct lp_build_emit_data * emit_data)
>> +{
>> + LLVMBuilderRef builder = bld_base->base.gallivm->builder;
>> +
>> + LLVMValueRef floor = lp_build_intrinsic(builder, "floor", 
>> emit_data->dst_type,
>
> The intrinsics name should be "llvm.floor.f32" for float and "llvm.floor.f64"
> for double.
>
> With that fixed, this is:
> Reviewed-by: Tom Stellard 

Sorry, I have pushed this already. Is it really required to use
"llvm.floor.f*"? We've been using "floor" for FLR forever. We've also
been using "fabs" and "ceil". Are those wrong too?

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 08/18] radeonsi: don't use llvm.AMDIL.fraction for FRC and DFRAC

2015-07-31 Thread Tom Stellard
On Fri, Jul 31, 2015 at 04:59:19PM +0200, Marek Olšák wrote:
> On Fri, Jul 31, 2015 at 4:18 PM, Tom Stellard  wrote:
> > On Tue, Jul 28, 2015 at 12:05:43PM +0200, Marek Olšák wrote:
> >> From: Marek Olšák 
> >>
> >> There are 2 reasons for this:
> >> - LLVM optimization passes can work with floor
> >> - there are patterns to select v_fract from floor anyway
> >>
> >> There is no change in the generated code.
> >> ---
> >>  src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c | 20 
> >> 
> >>  1 file changed, 16 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c 
> >> b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
> >> index 319380f..5c08cf5 100644
> >> --- a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
> >> +++ b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
> >> @@ -1170,6 +1170,20 @@ static void emit_dneg(
> >>   emit_data->args[0], "");
> >>  }
> >>
> >> +static void emit_frac(
> >> + const struct lp_build_tgsi_action * action,
> >> + struct lp_build_tgsi_context * bld_base,
> >> + struct lp_build_emit_data * emit_data)
> >> +{
> >> + LLVMBuilderRef builder = bld_base->base.gallivm->builder;
> >> +
> >> + LLVMValueRef floor = lp_build_intrinsic(builder, "floor", 
> >> emit_data->dst_type,
> >
> > The intrinsics name should be "llvm.floor.f32" for float and 
> > "llvm.floor.f64"
> > for double.
> >
> > With that fixed, this is:
> > Reviewed-by: Tom Stellard 
> 
> Sorry, I have pushed this already. Is it really required to use
> "llvm.floor.f*"? We've been using "floor" for FLR forever. We've also
> been using "fabs" and "ceil". Are those wrong too?
> 

It is better to use the intrinsics: (i.e. llvm.*) functions, because
they don't have side-effects like the libm calls, so they can be optimized
better.

-Tom

> Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] st/mesa: don't try to clear depth and stencil together in glClear fallback

2015-07-31 Thread Marek Olšák
From: Marek Olšák 

A lot of GPUs allocate separate depth and stencil buffers, so clearing them
together doesn't make much sense. If some GPUs don't allocate separate
depth & stencil, it's still beneficial to clear the HiZ / HiS information
for only one of the two.
---
 src/mesa/state_tracker/st_cb_clear.c | 9 -
 1 file changed, 9 deletions(-)

diff --git a/src/mesa/state_tracker/st_cb_clear.c 
b/src/mesa/state_tracker/st_cb_clear.c
index 137fac8..1e404a2 100644
--- a/src/mesa/state_tracker/st_cb_clear.c
+++ b/src/mesa/state_tracker/st_cb_clear.c
@@ -515,15 +515,6 @@ st_Clear(struct gl_context *ctx, GLbitfield mask)
   }
}
 
-   /* Always clear depth and stencil together.
-* This can only happen when the stencil writemask is not a full mask.
-*/
-   if (quad_buffers & PIPE_CLEAR_DEPTHSTENCIL &&
-   clear_buffers & PIPE_CLEAR_DEPTHSTENCIL) {
-  quad_buffers |= clear_buffers & PIPE_CLEAR_DEPTHSTENCIL;
-  clear_buffers &= ~PIPE_CLEAR_DEPTHSTENCIL;
-   }
-
/* Only use quad-based clearing for the renderbuffers which cannot
 * use pipe->clear. We want to always use pipe->clear for the other
 * renderbuffers, because it's likely to be faster.
-- 
2.1.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] gallium/radeon: suspend timer queries between IBs

2015-07-31 Thread Marek Olšák
From: Marek Olšák 

When we are measuring the time spent in a draw call, an unexpected flush
can distort the result.
---
 src/gallium/drivers/r600/r600_hw_context.c|  3 +-
 src/gallium/drivers/radeon/r600_pipe_common.c |  8 ++--
 src/gallium/drivers/radeon/r600_pipe_common.h |  9 +++-
 src/gallium/drivers/radeon/r600_query.c   | 68 ---
 src/gallium/drivers/radeonsi/si_hw_context.c  |  3 +-
 5 files changed, 66 insertions(+), 25 deletions(-)

diff --git a/src/gallium/drivers/r600/r600_hw_context.c 
b/src/gallium/drivers/r600/r600_hw_context.c
index 8eb0c68..9155707 100644
--- a/src/gallium/drivers/r600/r600_hw_context.c
+++ b/src/gallium/drivers/r600/r600_hw_context.c
@@ -68,7 +68,8 @@ void r600_need_cs_space(struct r600_context *ctx, unsigned 
num_dw,
}
 
/* Count in queries_suspend. */
-   num_dw += ctx->b.num_cs_dw_nontimer_queries_suspend;
+   num_dw += ctx->b.num_cs_dw_nontimer_queries_suspend +
+ ctx->b.num_cs_dw_timer_queries_suspend;
 
/* Count in streamout_end at the end of CS. */
if (ctx->b.streamout.begin_emitted) {
diff --git a/src/gallium/drivers/radeon/r600_pipe_common.c 
b/src/gallium/drivers/radeon/r600_pipe_common.c
index 75e8201..c940f6d 100644
--- a/src/gallium/drivers/radeon/r600_pipe_common.c
+++ b/src/gallium/drivers/radeon/r600_pipe_common.c
@@ -132,10 +132,11 @@ void r600_preflush_suspend_features(struct 
r600_common_context *ctx)
}
 
/* suspend queries */
-   ctx->nontimer_queries_suspended = false;
+   ctx->queries_suspended_for_flush = false;
if (ctx->num_cs_dw_nontimer_queries_suspend) {
r600_suspend_nontimer_queries(ctx);
-   ctx->nontimer_queries_suspended = true;
+   r600_suspend_timer_queries(ctx);
+   ctx->queries_suspended_for_flush = true;
}
 
ctx->streamout.suspended = false;
@@ -153,8 +154,9 @@ void r600_postflush_resume_features(struct 
r600_common_context *ctx)
}
 
/* resume queries */
-   if (ctx->nontimer_queries_suspended) {
+   if (ctx->queries_suspended_for_flush) {
r600_resume_nontimer_queries(ctx);
+   r600_resume_timer_queries(ctx);
}
 
/* Re-enable render condition. */
diff --git a/src/gallium/drivers/radeon/r600_pipe_common.h 
b/src/gallium/drivers/radeon/r600_pipe_common.h
index e2a60c5..fbd2a21 100644
--- a/src/gallium/drivers/radeon/r600_pipe_common.h
+++ b/src/gallium/drivers/radeon/r600_pipe_common.h
@@ -384,11 +384,14 @@ struct r600_common_context {
int num_occlusion_queries;
/* Keep track of non-timer queries, because they should be suspended
 * during context flushing.
-* The timer queries (TIME_ELAPSED) shouldn't be suspended. */
+* The timer queries (TIME_ELAPSED) shouldn't be suspended for blits,
+* but they should be suspended between IBs. */
struct list_headactive_nontimer_queries;
+   struct list_headactive_timer_queries;
unsignednum_cs_dw_nontimer_queries_suspend;
+   unsignednum_cs_dw_timer_queries_suspend;
/* If queries have been suspended. */
-   boolnontimer_queries_suspended;
+   boolqueries_suspended_for_flush;
/* Additional hardware info. */
unsignedbackend_mask;
unsignedmax_db; /* for OQ */
@@ -503,6 +506,8 @@ unsigned r600_gpu_load_end(struct r600_common_screen 
*rscreen, uint64_t begin);
 void r600_query_init(struct r600_common_context *rctx);
 void r600_suspend_nontimer_queries(struct r600_common_context *ctx);
 void r600_resume_nontimer_queries(struct r600_common_context *ctx);
+void r600_suspend_timer_queries(struct r600_common_context *ctx);
+void r600_resume_timer_queries(struct r600_common_context *ctx);
 void r600_query_init_backend_mask(struct r600_common_context *ctx);
 
 /* r600_streamout.c */
diff --git a/src/gallium/drivers/radeon/r600_query.c 
b/src/gallium/drivers/radeon/r600_query.c
index 909d502..7fc2253 100644
--- a/src/gallium/drivers/radeon/r600_query.c
+++ b/src/gallium/drivers/radeon/r600_query.c
@@ -226,9 +226,10 @@ static void r600_emit_query_begin(struct 
r600_common_context *ctx, struct r600_q
r600_emit_reloc(ctx, &ctx->rings.gfx, query->buffer.buf, 
RADEON_USAGE_WRITE,
RADEON_PRIO_MIN);
 
-   if (!r600_is_timer_query(query->type)) {
+   if (r600_is_timer_query(query->type))
+   ctx->num_cs_dw_timer_queries_suspend += query->num_cs_dw;
+   else
ctx->num_cs_dw_nontimer_queries_suspend += query->num_cs_dw;
-   }
 }
 
 static void r600_emit_query_end(struct r600_common_context *ctx, struct 
r600_query *query)
@@ -290,9 +291,10 @@ static void r600_emit_query_end(struct r

[Mesa-dev] [PATCH 2/2] radeon/winsys: increase the IB size for VM

2015-07-31 Thread Marek Olšák
From: Marek Olšák 

Luckily, there is a kernel query, so use the size from that.
It currently returns 256KB. It can be increased in the kernel.
---
 src/gallium/winsys/radeon/drm/radeon_drm_cs.c |  8 +++-
 src/gallium/winsys/radeon/drm/radeon_drm_cs.h |  2 +-
 src/gallium/winsys/radeon/drm/radeon_drm_winsys.c | 12 
 src/gallium/winsys/radeon/drm/radeon_drm_winsys.h |  1 +
 4 files changed, 17 insertions(+), 6 deletions(-)

diff --git a/src/gallium/winsys/radeon/drm/radeon_drm_cs.c 
b/src/gallium/winsys/radeon/drm/radeon_drm_cs.c
index e363cc0..45eef29 100644
--- a/src/gallium/winsys/radeon/drm/radeon_drm_cs.c
+++ b/src/gallium/winsys/radeon/drm/radeon_drm_cs.c
@@ -85,17 +85,22 @@ static boolean radeon_init_cs_context(struct 
radeon_cs_context *csc,
 {
 int i;
 
+csc->buf = MALLOC(ws->ib_max_size);
+if (!csc->buf)
+return FALSE;
 csc->fd = ws->fd;
 csc->nrelocs = 512;
 csc->relocs_bo = (struct radeon_bo**)
  CALLOC(1, csc->nrelocs * sizeof(struct radeon_bo*));
 if (!csc->relocs_bo) {
+FREE(csc->buf);
 return FALSE;
 }
 
 csc->relocs = (struct drm_radeon_cs_reloc*)
   CALLOC(1, csc->nrelocs * sizeof(struct drm_radeon_cs_reloc));
 if (!csc->relocs) {
+FREE(csc->buf);
 FREE(csc->relocs_bo);
 return FALSE;
 }
@@ -148,6 +153,7 @@ static void radeon_destroy_cs_context(struct 
radeon_cs_context *csc)
 radeon_cs_context_cleanup(csc);
 FREE(csc->relocs_bo);
 FREE(csc->relocs);
+FREE(csc->buf);
 }
 
 
@@ -188,7 +194,7 @@ radeon_drm_cs_create(struct radeon_winsys *rws,
 cs->cst = &cs->csc2;
 cs->base.buf = cs->csc->buf;
 cs->base.ring_type = ring_type;
-cs->base.max_dw = ARRAY_SIZE(cs->csc->buf);
+cs->base.max_dw = ws->ib_max_size / 4;
 
 p_atomic_inc(&ws->num_cs);
 return &cs->base;
diff --git a/src/gallium/winsys/radeon/drm/radeon_drm_cs.h 
b/src/gallium/winsys/radeon/drm/radeon_drm_cs.h
index 6ceb8e9..ab15494 100644
--- a/src/gallium/winsys/radeon/drm/radeon_drm_cs.h
+++ b/src/gallium/winsys/radeon/drm/radeon_drm_cs.h
@@ -30,7 +30,7 @@
 #include "radeon_drm_bo.h"
 
 struct radeon_cs_context {
-uint32_tbuf[16 * 1024];
+uint32_t*buf;
 
 int fd;
 struct drm_radeon_cscs;
diff --git a/src/gallium/winsys/radeon/drm/radeon_drm_winsys.c 
b/src/gallium/winsys/radeon/drm/radeon_drm_winsys.c
index 41f8826..a652223 100644
--- a/src/gallium/winsys/radeon/drm/radeon_drm_winsys.c
+++ b/src/gallium/winsys/radeon/drm/radeon_drm_winsys.c
@@ -395,16 +395,20 @@ static boolean do_winsys_init(struct radeon_drm_winsys 
*ws)
 }
 
 ws->info.r600_virtual_address = FALSE;
-if (ws->info.drm_minor >= 13) {
-uint32_t ib_vm_max_size;
+ws->ib_max_size = 64 * 1024;
 
+if (ws->info.drm_minor >= 13) {
 ws->info.r600_virtual_address = TRUE;
 if (!radeon_get_drm_value(ws->fd, RADEON_INFO_VA_START, NULL,
   &ws->va_start))
 ws->info.r600_virtual_address = FALSE;
-if (!radeon_get_drm_value(ws->fd, RADEON_INFO_IB_VM_MAX_SIZE, NULL,
-  &ib_vm_max_size))
+
+if (radeon_get_drm_value(ws->fd, RADEON_INFO_IB_VM_MAX_SIZE, NULL,
+ &ws->ib_max_size))
+ws->ib_max_size *= 4; /* the kernel returns the size in dwords 
*/
+else
 ws->info.r600_virtual_address = FALSE;
+
 radeon_get_drm_value(ws->fd, RADEON_INFO_VA_UNMAP_WORKING, NULL,
  &ws->va_unmap_working);
 }
diff --git a/src/gallium/winsys/radeon/drm/radeon_drm_winsys.h 
b/src/gallium/winsys/radeon/drm/radeon_drm_winsys.h
index 308b5bd..c1a8d6a 100644
--- a/src/gallium/winsys/radeon/drm/radeon_drm_winsys.h
+++ b/src/gallium/winsys/radeon/drm/radeon_drm_winsys.h
@@ -73,6 +73,7 @@ struct radeon_drm_winsys {
 
 enum radeon_generation gen;
 struct radeon_info info;
+uint32_t ib_max_size;
 uint32_t va_start;
 uint32_t va_unmap_working;
 uint32_t accel_working2;
-- 
2.1.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/2] gallium/radeon: allow the winsys to choose the IB size

2015-07-31 Thread Marek Olšák
From: Marek Olšák 

Picked from the amdgpu branch.

Reviewed-by: Alex Deucher 
---
 src/gallium/drivers/r300/r300_blit.c  | 2 +-
 src/gallium/drivers/r300/r300_cs.h| 2 +-
 src/gallium/drivers/r300/r300_render.c| 2 +-
 src/gallium/drivers/r600/r600_hw_context.c| 2 +-
 src/gallium/drivers/r600/r600_pipe.h  | 4 ++--
 src/gallium/drivers/radeon/r600_cs.h  | 8 
 src/gallium/drivers/radeon/r600_pipe_common.c | 4 ++--
 src/gallium/drivers/radeon/radeon_winsys.h| 3 +--
 src/gallium/drivers/radeonsi/si_hw_context.c  | 2 +-
 src/gallium/winsys/radeon/drm/radeon_drm_cs.c | 5 +++--
 src/gallium/winsys/radeon/drm/radeon_drm_cs.h | 2 +-
 11 files changed, 18 insertions(+), 18 deletions(-)

diff --git a/src/gallium/drivers/r300/r300_blit.c 
b/src/gallium/drivers/r300/r300_blit.c
index baf05ce..6ea8f24 100644
--- a/src/gallium/drivers/r300/r300_blit.c
+++ b/src/gallium/drivers/r300/r300_blit.c
@@ -382,7 +382,7 @@ static void r300_clear(struct pipe_context* pipe,
 r300_get_num_cs_end_dwords(r300);
 
 /* Reserve CS space. */
-if (dwords > (RADEON_MAX_CMDBUF_DWORDS - r300->cs->cdw)) {
+if (dwords > (r300->cs->max_dw - r300->cs->cdw)) {
 r300_flush(&r300->context, RADEON_FLUSH_ASYNC, NULL);
 }
 
diff --git a/src/gallium/drivers/r300/r300_cs.h 
b/src/gallium/drivers/r300/r300_cs.h
index 37f9641..fc15054 100644
--- a/src/gallium/drivers/r300/r300_cs.h
+++ b/src/gallium/drivers/r300/r300_cs.h
@@ -46,7 +46,7 @@
 #ifdef DEBUG
 
 #define BEGIN_CS(size) do { \
-assert(size <= (RADEON_MAX_CMDBUF_DWORDS - cs_copy->cdw)); \
+assert(size <= (cs_copy->max_dw - cs_copy->cdw)); \
 cs_count = size; \
 } while (0)
 
diff --git a/src/gallium/drivers/r300/r300_render.c 
b/src/gallium/drivers/r300/r300_render.c
index 9dffa49..0487b11 100644
--- a/src/gallium/drivers/r300/r300_render.c
+++ b/src/gallium/drivers/r300/r300_render.c
@@ -215,7 +215,7 @@ static boolean r300_reserve_cs_dwords(struct r300_context 
*r300,
 cs_dwords += r300_get_num_cs_end_dwords(r300);
 
 /* Reserve requested CS space. */
-if (cs_dwords > (RADEON_MAX_CMDBUF_DWORDS - r300->cs->cdw)) {
+if (cs_dwords > (r300->cs->max_dw - r300->cs->cdw)) {
 r300_flush(&r300->context, RADEON_FLUSH_ASYNC, NULL);
 flushed = TRUE;
 }
diff --git a/src/gallium/drivers/r600/r600_hw_context.c 
b/src/gallium/drivers/r600/r600_hw_context.c
index 9155707..ad11e76 100644
--- a/src/gallium/drivers/r600/r600_hw_context.c
+++ b/src/gallium/drivers/r600/r600_hw_context.c
@@ -93,7 +93,7 @@ void r600_need_cs_space(struct r600_context *ctx, unsigned 
num_dw,
num_dw += 10;
 
/* Flush if there's not enough space. */
-   if (num_dw > RADEON_MAX_CMDBUF_DWORDS) {
+   if (num_dw > ctx->b.rings.gfx.cs->max_dw) {
ctx->b.rings.gfx.flush(ctx, RADEON_FLUSH_ASYNC, NULL);
}
 }
diff --git a/src/gallium/drivers/r600/r600_pipe.h 
b/src/gallium/drivers/r600/r600_pipe.h
index 991fda8..bcb9390 100644
--- a/src/gallium/drivers/r600/r600_pipe.h
+++ b/src/gallium/drivers/r600/r600_pipe.h
@@ -493,7 +493,7 @@ struct r600_context {
 static inline void r600_emit_command_buffer(struct radeon_winsys_cs *cs,
struct r600_command_buffer *cb)
 {
-   assert(cs->cdw + cb->num_dw <= RADEON_MAX_CMDBUF_DWORDS);
+   assert(cs->cdw + cb->num_dw <= cs->max_dw);
memcpy(cs->buf + cs->cdw, cb->buf, 4 * cb->num_dw);
cs->cdw += cb->num_dw;
 }
@@ -826,7 +826,7 @@ static inline void 
r600_write_compute_context_reg_seq(struct radeon_winsys_cs *c
 static inline void r600_write_ctl_const_seq(struct radeon_winsys_cs *cs, 
unsigned reg, unsigned num)
 {
assert(reg >= R600_CTL_CONST_OFFSET);
-   assert(cs->cdw+2+num <= RADEON_MAX_CMDBUF_DWORDS);
+   assert(cs->cdw+2+num <= cs->max_dw);
cs->buf[cs->cdw++] = PKT3(PKT3_SET_CTL_CONST, num, 0);
cs->buf[cs->cdw++] = (reg - R600_CTL_CONST_OFFSET) >> 2;
 }
diff --git a/src/gallium/drivers/radeon/r600_cs.h 
b/src/gallium/drivers/radeon/r600_cs.h
index d6d4c88..03a04b7 100644
--- a/src/gallium/drivers/radeon/r600_cs.h
+++ b/src/gallium/drivers/radeon/r600_cs.h
@@ -77,7 +77,7 @@ static inline void r600_emit_reloc(struct r600_common_context 
*rctx,
 static inline void r600_write_config_reg_seq(struct radeon_winsys_cs *cs, 
unsigned reg, unsigned num)
 {
assert(reg < R600_CONTEXT_REG_OFFSET);
-   assert(cs->cdw+2+num <= RADEON_MAX_CMDBUF_DWORDS);
+   assert(cs->cdw+2+num <= cs->max_dw);
radeon_emit(cs, PKT3(PKT3_SET_CONFIG_REG, num, 0));
radeon_emit(cs, (reg - R600_CONFIG_REG_OFFSET) >> 2);
 }
@@ -91,7 +91,7 @@ static inline void r600_write_config_reg(struct 
radeon_winsys_cs *cs, unsigned r
 static inline void r600_write_context_reg_seq(struct radeon_winsys_cs *cs, 
unsigned reg, unsigned num)
 {
assert(reg >= R600_CONTEXT_REG_OFFSET);
-   assert(cs->cdw+2+num <= RADEON_M

Re: [Mesa-dev] [PATCH 08/18] radeonsi: don't use llvm.AMDIL.fraction for FRC and DFRAC

2015-07-31 Thread Marek Olšák
On Fri, Jul 31, 2015 at 5:13 PM, Tom Stellard  wrote:
> On Fri, Jul 31, 2015 at 04:59:19PM +0200, Marek Olšák wrote:
>> On Fri, Jul 31, 2015 at 4:18 PM, Tom Stellard  wrote:
>> > On Tue, Jul 28, 2015 at 12:05:43PM +0200, Marek Olšák wrote:
>> >> From: Marek Olšák 
>> >>
>> >> There are 2 reasons for this:
>> >> - LLVM optimization passes can work with floor
>> >> - there are patterns to select v_fract from floor anyway
>> >>
>> >> There is no change in the generated code.
>> >> ---
>> >>  src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c | 20 
>> >> 
>> >>  1 file changed, 16 insertions(+), 4 deletions(-)
>> >>
>> >> diff --git a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c 
>> >> b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
>> >> index 319380f..5c08cf5 100644
>> >> --- a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
>> >> +++ b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
>> >> @@ -1170,6 +1170,20 @@ static void emit_dneg(
>> >>   emit_data->args[0], "");
>> >>  }
>> >>
>> >> +static void emit_frac(
>> >> + const struct lp_build_tgsi_action * action,
>> >> + struct lp_build_tgsi_context * bld_base,
>> >> + struct lp_build_emit_data * emit_data)
>> >> +{
>> >> + LLVMBuilderRef builder = bld_base->base.gallivm->builder;
>> >> +
>> >> + LLVMValueRef floor = lp_build_intrinsic(builder, "floor", 
>> >> emit_data->dst_type,
>> >
>> > The intrinsics name should be "llvm.floor.f32" for float and 
>> > "llvm.floor.f64"
>> > for double.
>> >
>> > With that fixed, this is:
>> > Reviewed-by: Tom Stellard 
>>
>> Sorry, I have pushed this already. Is it really required to use
>> "llvm.floor.f*"? We've been using "floor" for FLR forever. We've also
>> been using "fabs" and "ceil". Are those wrong too?
>>
>
> It is better to use the intrinsics: (i.e. llvm.*) functions, because
> they don't have side-effects like the libm calls, so they can be optimized
> better.

OK, thanks for the explanation.

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] glx: Fix __glXWireToEvent for BufferSwapComplete

2015-07-31 Thread Adam Jackson
In the DRI2 path this event is magically synthesized from the
corresponding DRI2 event, but with Present, the server sends us the
event itself. The DRI2 path fills in the serial number, send_event, and
display fields of the XEvent struct that the app sees, but the Present
path did not.

This is likely related to a class of crashes seen in gtk/clutter apps:

https://bugzilla.redhat.com/attachment.cgi?id=1032631

Note that the crashing instruction is looking up the lock_fns slot in
the Display *, and %rdi (holding the Display *) is 0x1.

Cc: mesa-sta...@lists.freedesktop.org
Signed-off-by: Adam Jackson 
---
 src/glx/glxext.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/src/glx/glxext.c b/src/glx/glxext.c
index fdc24d4..dc87fb9 100644
--- a/src/glx/glxext.c
+++ b/src/glx/glxext.c
@@ -138,6 +138,9 @@ __glXWireToEvent(Display *dpy, XEvent *event, xEvent *wire)
   if (!glxDraw)
 return False;
 
+  aevent->serial = _XSetLastRequestRead(dpy, (xGenericReply *) wire);
+  aevent->send_event = (awire->type & 0x80) != 0;
+  aevent->display = dpy;
   aevent->event_type = awire->event_type;
   aevent->drawable = glxDraw->xDrawable;
   aevent->ust = ((CARD64)awire->ust_hi << 32) | awire->ust_lo;
-- 
2.4.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] gallium/radeon: always use the llvm. prefix in intrinsic names

2015-07-31 Thread Marek Olšák
From: Marek Olšák 

---
 .../drivers/radeon/radeon_setup_tgsi_llvm.c| 24 +++---
 1 file changed, 17 insertions(+), 7 deletions(-)

diff --git a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c 
b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
index 5c08cf5..ac645b7 100644
--- a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
+++ b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
@@ -767,7 +767,7 @@ static void radeon_llvm_cube_to_2d_coords(struct 
lp_build_tgsi_context *bld_base
coords[i] = LLVMBuildExtractElement(builder, v,

lp_build_const_int32(gallivm, i), "");
 
-   coords[2] = lp_build_intrinsic(builder, "fabs",
+   coords[2] = lp_build_intrinsic(builder, "llvm.fabs.f32",
type, &coords[2], 1, LLVMReadNoneAttribute);
coords[2] = lp_build_emit_llvm_unary(bld_base, TGSI_OPCODE_RCP, 
coords[2]);
 
@@ -1176,8 +1176,18 @@ static void emit_frac(
struct lp_build_emit_data * emit_data)
 {
LLVMBuilderRef builder = bld_base->base.gallivm->builder;
-
-   LLVMValueRef floor = lp_build_intrinsic(builder, "floor", 
emit_data->dst_type,
+char *intr;
+
+if (emit_data->info->opcode == TGSI_OPCODE_FRC)
+intr = "llvm.floor.f32";
+else if (emit_data->info->opcode == TGSI_OPCODE_DFRAC)
+intr = "llvm.floor.f64";
+else {
+assert(0);
+return;
+}
+
+   LLVMValueRef floor = lp_build_intrinsic(builder, intr, 
emit_data->dst_type,
&emit_data->args[0], 1,
LLVMReadNoneAttribute);
emit_data->output[emit_data->chan] = LLVMBuildFSub(builder,
@@ -1425,7 +1435,7 @@ void radeon_llvm_context_init(struct radeon_llvm_context 
* ctx)
lp_set_default_actions(bld_base);
 
bld_base->op_actions[TGSI_OPCODE_ABS].emit = build_tgsi_intrinsic_nomem;
-   bld_base->op_actions[TGSI_OPCODE_ABS].intr_name = "fabs";
+   bld_base->op_actions[TGSI_OPCODE_ABS].intr_name = "llvm.fabs.f32";
bld_base->op_actions[TGSI_OPCODE_AND].emit = emit_and;
bld_base->op_actions[TGSI_OPCODE_ARL].emit = emit_arl;
bld_base->op_actions[TGSI_OPCODE_BFI].emit = emit_bfi;
@@ -1434,7 +1444,7 @@ void radeon_llvm_context_init(struct radeon_llvm_context 
* ctx)
bld_base->op_actions[TGSI_OPCODE_BREV].intr_name = "llvm.AMDGPU.brev";
bld_base->op_actions[TGSI_OPCODE_BRK].emit = brk_emit;
bld_base->op_actions[TGSI_OPCODE_CEIL].emit = 
build_tgsi_intrinsic_nomem;
-   bld_base->op_actions[TGSI_OPCODE_CEIL].intr_name = "ceil";
+   bld_base->op_actions[TGSI_OPCODE_CEIL].intr_name = "llvm.ceil.f32";
bld_base->op_actions[TGSI_OPCODE_CLAMP].emit = 
build_tgsi_intrinsic_nomem;
bld_base->op_actions[TGSI_OPCODE_CLAMP].intr_name = "llvm.AMDIL.clamp.";
bld_base->op_actions[TGSI_OPCODE_CMP].emit = build_tgsi_intrinsic_nomem;
@@ -1443,7 +1453,7 @@ void radeon_llvm_context_init(struct radeon_llvm_context 
* ctx)
bld_base->op_actions[TGSI_OPCODE_COS].emit = build_tgsi_intrinsic_nomem;
bld_base->op_actions[TGSI_OPCODE_COS].intr_name = "llvm.cos.f32";
bld_base->op_actions[TGSI_OPCODE_DABS].emit = 
build_tgsi_intrinsic_nomem;
-   bld_base->op_actions[TGSI_OPCODE_DABS].intr_name = "fabs";
+   bld_base->op_actions[TGSI_OPCODE_DABS].intr_name = "llvm.fabs.f64";
bld_base->op_actions[TGSI_OPCODE_DFMA].emit = 
build_tgsi_intrinsic_nomem;
bld_base->op_actions[TGSI_OPCODE_DFMA].intr_name = "llvm.fma.f64";
bld_base->op_actions[TGSI_OPCODE_DFRAC].emit = emit_frac;
@@ -1462,7 +1472,7 @@ void radeon_llvm_context_init(struct radeon_llvm_context 
* ctx)
bld_base->op_actions[TGSI_OPCODE_EX2].emit = build_tgsi_intrinsic_nomem;
bld_base->op_actions[TGSI_OPCODE_EX2].intr_name = "llvm.AMDIL.exp.";
bld_base->op_actions[TGSI_OPCODE_FLR].emit = build_tgsi_intrinsic_nomem;
-   bld_base->op_actions[TGSI_OPCODE_FLR].intr_name = "floor";
+   bld_base->op_actions[TGSI_OPCODE_FLR].intr_name = "llvm.floor.f32";
bld_base->op_actions[TGSI_OPCODE_FMA].emit = build_tgsi_intrinsic_nomem;
bld_base->op_actions[TGSI_OPCODE_FMA].intr_name = "llvm.fma.f32";
bld_base->op_actions[TGSI_OPCODE_FRC].emit = emit_frac;
-- 
2.1.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 14/18] util/register_allocate: Compute transitive conflicts using 2-passes

2015-07-31 Thread Jason Ekstrand
This patch looks really sketchy to me.  First, you fundamentally
changed some of register_allocate's internal data structures with no
explanation as to what the change is and why it still works.  Maybe
the difference should be obvious to me, but it isn't.  Second, it
appears that your changes assume some intel-specific details of
register allocation.  The register_allocate code is *not* Intel
specific and we are *not* the only ones using it.
--Jason

On Mon, Jul 6, 2015 at 3:33 AM, Chris Wilson  wrote:
> Avoid frequent use of reralloc() for tracking the conflicts list, and
> walking that list every time we add a transitive conflict, by making the
> observation we apply the indirect conflicts by combining the conflicts
> of a conflicting register in a second pass.
>
> Reduces brw_compiler_create() from 18351.5us to 4787.1us on my ivb
> i7-3720QM (in context that 18ms represents about 50% of the time it takes
> to start X, though why X instantiates an intel_screen at all remains a
> mystery).
>
> Signed-off-by: Chris Wilson 
> Cc: Matt Turner 
> Cc: Jason Ekstrand 
> Cc: Martin Peres  ---
>  src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp  | 18 +++-
>  .../drivers/dri/i965/brw_vec4_reg_allocate.cpp | 16 ++-
>  src/util/register_allocate.c   | 53 
> +-
>  src/util/register_allocate.h   |  2 +
>  4 files changed, 64 insertions(+), 25 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp 
> b/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp
> index 8e5621d..7f87221 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp
> @@ -223,7 +223,7 @@ brw_alloc_reg_set(struct brw_compiler *compiler, int 
> reg_width)
>  for (int base_reg = j;
>   base_reg < j + (class_sizes[i] + 1) / 2;
>   base_reg++) {
> -   ra_add_transitive_reg_conflict(regs, base_reg, reg);
> +   ra_mark_transitive_reg_conflict(regs, base_reg, reg);
>  }
>
>  reg++;
> @@ -237,7 +237,7 @@ brw_alloc_reg_set(struct brw_compiler *compiler, int 
> reg_width)
>  for (int base_reg = j;
>   base_reg < j + class_sizes[i];
>   base_reg++) {
> -   ra_add_transitive_reg_conflict(regs, base_reg, reg);
> +   ra_mark_transitive_reg_conflict(regs, base_reg, reg);
>  }
>
>  reg++;
> @@ -246,6 +246,20 @@ brw_alloc_reg_set(struct brw_compiler *compiler, int 
> reg_width)
> }
> assert(reg == ra_reg_count);
>
> +   reg = 0;
> +   for (int i = 0; i < class_count; i++) {
> +  int class_size = class_sizes[i];
> +  int class_reg_count = base_reg_count - (class_size - 1);
> +  if (devinfo->gen <= 5 && reg_width == 2)
> + class_size = (class_size + 1) / 2;
> +  for (int j = 0; j < class_reg_count; j++) {
> +for (int base_reg = j; base_reg < j + class_size; base_reg++)
> +   ra_add_transitive_reg_conflict(regs, base_reg, reg);
> +reg++;
> +  }
> +   }
> +   assert(reg == ra_reg_count);
> +
> /* Add a special class for aligned pairs, which we'll put delta_xy
>  * in on Gen <= 6 so that we can do PLN.
>  */
> diff --git a/src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp 
> b/src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp
> index 555c42e..93b7297 100644
> --- a/src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp
> @@ -140,7 +140,7 @@ brw_vec4_alloc_reg_set(struct brw_compiler *compiler)
>  for (int base_reg = j;
>   base_reg < j + class_sizes[i];
>   base_reg++) {
> -   ra_add_transitive_reg_conflict(compiler->vec4_reg_set.regs, 
> base_reg, reg);
> +   ra_mark_transitive_reg_conflict(compiler->vec4_reg_set.regs, 
> base_reg, reg);
>  }
>
>  reg++;
> @@ -158,6 +158,20 @@ brw_vec4_alloc_reg_set(struct brw_compiler *compiler)
> }
> assert(reg == ra_reg_count);
>
> +   reg = 0;
> +   for (int i = 0; i < class_count; i++) {
> +  int class_reg_count = base_reg_count - (class_sizes[i] - 1);
> +  for (int j = 0; j < class_reg_count; j++) {
> +for (int base_reg = j;
> + base_reg < j + class_sizes[i];
> + base_reg++) {
> +   ra_add_transitive_reg_conflict(compiler->vec4_reg_set.regs, 
> base_reg, reg);
> +}
> +reg++;
> +  }
> +   }
> +   assert(reg == ra_reg_count);
> +
> ra_set_finalize(compiler->vec4_reg_set.regs, q_values);
>
> for (int i = 0; i < MAX_VGRF_SIZE; i++)
> diff --git a/src/util/register_allocate.c b/src/util/register_allocate.c
> index f5f7c04..2bbab7f 100644
> --- a/src/util/register_allocate.c
> +++ b/src/util/register_allocate.c
> @@ -83,19 +83,17 @@
>
>  struct ra_reg {
> BITSET_WORD *conflicts;
> -   unsigned int *conflict_list;
> -   u

Re: [Mesa-dev] [PATCH 14/18] util/register_allocate: Compute transitive conflicts using 2-passes

2015-07-31 Thread Chris Wilson
On Fri, Jul 31, 2015 at 09:30:36AM -0700, Jason Ekstrand wrote:
> This patch looks really sketchy to me.  First, you fundamentally
> changed some of register_allocate's internal data structures with no
> explanation as to what the change is and why it still works.  Maybe
> the difference should be obvious to me, but it isn't.  Second, it
> appears that your changes assume some intel-specific details of
> register allocation.  The register_allocate code is *not* Intel
> specific and we are *not* the only ones using it.

You'll note the generic interface is maintained and the custom interface
for Intel is where our overhead lies. And yes, we are the only ones
using this interface.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 14/18] util/register_allocate: Compute transitive conflicts using 2-passes

2015-07-31 Thread Jason Ekstrand
On Fri, Jul 31, 2015 at 9:38 AM, Chris Wilson  wrote:
> On Fri, Jul 31, 2015 at 09:30:36AM -0700, Jason Ekstrand wrote:
>> This patch looks really sketchy to me.  First, you fundamentally
>> changed some of register_allocate's internal data structures with no
>> explanation as to what the change is and why it still works.  Maybe
>> the difference should be obvious to me, but it isn't.  Second, it
>> appears that your changes assume some intel-specific details of
>> register allocation.  The register_allocate code is *not* Intel
>> specific and we are *not* the only ones using it.
>
> You'll note the generic interface is maintained and the custom interface
> for Intel is where our overhead lies. And yes, we are the only ones
> using this interface.

Even if that is the case, you still haven't documented what the change
does, how it works, and why it produces the same results.
--Jason

> -Chris
>
> --
> Chris Wilson, Intel Open Source Technology Centre
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] IROUND, math errors, etc. (was: Re: proposed patch optimized F_TO_I for powerpc platforms)

2015-07-31 Thread Matt Turner
On Fri, Jul 31, 2015 at 7:13 AM, Roland Scheidegger  wrote:
> CC mesa-dev.
>
> This looks good to me. I am starting to wonder though why we don't just
> use lrintf() and let the compiler sort it out (for x86 too).
> Though actually some quick experiments show that:
> - llvm's clang will always use libm lrintf call. Which then will do
> (x86_64) cvtss2si %xmm0,%rax as expected. Meaning the cost is probably
> twice as high as it could be due to the unnecessary library call.
> - gcc will also use the same library call. Unless you specify
> -fno-math-errno (or some more aggressive math optimizing stuff), in
> which case it will do the cvtss2si on its own. Which is fairly stupid,
> because this function doesn't set errno in any case, so it could be used
> independent of -fno-math-errno.
>
> Speaking of -fno-math-errno, why don't we use that in mesa? I know the
> fast math stuff can be problematic, but noone is _ever_ interested in
> math error numbers.
>
> Speaking of which, I'm not really sure why IROUND isn't doing the same.
> Yes it rounds away from zero, but I doubt that matters - would probably
> be better to match whatever rounding is used in hw (GL doesn't seem to
> specify tie-breaker rules for round to nearest afaict).
>
> FWIW IROUND along with even the 64bit sibling IROUND64 (and IROUND_POS)
> is not even really correct in any case. There exist floats where f +
> 0.5f will round up to the next integer incorrectly. e.g. something like
> "largest float smaller than 63.5f", 63.499f or so, if you add +0.5f
> the resulting number for the hw is right between that largest float
> smaller than 63.5f and 64.0f, and thus it will use the tie-breaker rule
> (round to nearest even for your typical hw with typical rounding mode
> set) making this 64.0, thus the rounded integer will be 64, which is
> just plain wrong no matter the round-to-nearest tie breaker rule.
> There are ways to fix it (the obvious one is to add 0.5 as double), but
> I don't think we should even try that, and assume lrintf can do a decent
> job on hw we care about (compiler not doing its job right is a pity but
> might not be too bad even if it uses lib call).

I've actually got a branch to get rid of F_TO_I (and I want to remove
IROUND as well) in favor of libm rounding functions.

I agree that we don't care about errno and traps and such, so I tried
a few things to get the code we want from rintf, etc. I tried marking
a wrapper around rintf with __attribute__((optimize("-ffast-math")))
but just today a gcc developer confirmed that this cannot work because
when the function is inlined it loses the optimization attribute. I'll
do some tests with -fno-math-errno and friends.

I'll finish this branch up very soon.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] st/mesa: don't try to clear depth and stencil together in glClear fallback

2015-07-31 Thread Roland Scheidegger
I don't think that's quite true in general.
For gpus which have combined ds buffers I can'see why you'd wanted to do
separate clears for depth and stencil in this case (i.e. doing
pipe->clear for depth, then draw a quad for clearing stencil).
At least for "simple" hw like llvmpipe which don't have special depth
clear, this clearly seems to be much worse (you have to go through the
memory twice).

I vaguely remember something like this being proposed some time ago with
some discussion that not the same thing is optimal depending on the
hw... I don't think though there's anything at the moment where you
could figure out what is better.

Roland



Am 31.07.2015 um 17:15 schrieb Marek Olšák:
> From: Marek Olšák 
> 
> A lot of GPUs allocate separate depth and stencil buffers, so clearing them
> together doesn't make much sense. If some GPUs don't allocate separate
> depth & stencil, it's still beneficial to clear the HiZ / HiS information
> for only one of the two.
> ---
>  src/mesa/state_tracker/st_cb_clear.c | 9 -
>  1 file changed, 9 deletions(-)
> 
> diff --git a/src/mesa/state_tracker/st_cb_clear.c 
> b/src/mesa/state_tracker/st_cb_clear.c
> index 137fac8..1e404a2 100644
> --- a/src/mesa/state_tracker/st_cb_clear.c
> +++ b/src/mesa/state_tracker/st_cb_clear.c
> @@ -515,15 +515,6 @@ st_Clear(struct gl_context *ctx, GLbitfield mask)
>}
> }
>  
> -   /* Always clear depth and stencil together.
> -* This can only happen when the stencil writemask is not a full mask.
> -*/
> -   if (quad_buffers & PIPE_CLEAR_DEPTHSTENCIL &&
> -   clear_buffers & PIPE_CLEAR_DEPTHSTENCIL) {
> -  quad_buffers |= clear_buffers & PIPE_CLEAR_DEPTHSTENCIL;
> -  clear_buffers &= ~PIPE_CLEAR_DEPTHSTENCIL;
> -   }
> -
> /* Only use quad-based clearing for the renderbuffers which cannot
>  * use pipe->clear. We want to always use pipe->clear for the other
>  * renderbuffers, because it's likely to be faster.
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 0/4] Implementation of glMemoryBarrierByRegion

2015-07-31 Thread Matt Turner
On Fri, Jul 31, 2015 at 5:15 AM, Marta Lofstedt
 wrote:
> This provides an i965 implementation of the
> OpenGL ES 3.1 needed function, glMemoryBarrierByRegion.
>
> Marta Lofstedt (4):
>   gles/es3.1: Enable dispatch of glMemoryBarrierByRegion
>   mesa/es3.1: Add driver interface for glMemoryBarrierByRegion
>   mesa/es3.1: Implement the entry point of MemoryBarrierByRegion
>   i965/es3.1: Implement glMemoryBarrierByRegion

Patches 1-3 are

Reviewed-by: Matt Turner 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/4] gles/es3.1: Enable dispatch of glMemoryBarrierByRegion

2015-07-31 Thread Ilia Mirkin
Won't this cause a compilation failure in mesa? I thought it'd start
looking for the _mesa_MemoryBarrierByRegion function. Normally this is
added as a stub in the first commit, and then the real impl comes
later.

On Fri, Jul 31, 2015 at 8:15 AM, Marta Lofstedt
 wrote:
> From: Marta Lofstedt 
>
> Signed-off-by: Marta Lofstedt 
> ---
>  src/mapi/glapi/gen/gl_API.xml   | 4 
>  src/mesa/main/tests/dispatch_sanity.cpp | 3 +--
>  2 files changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/src/mapi/glapi/gen/gl_API.xml b/src/mapi/glapi/gen/gl_API.xml
> index 658efa4..3db4349 100644
> --- a/src/mapi/glapi/gen/gl_API.xml
> +++ b/src/mapi/glapi/gen/gl_API.xml
> @@ -2966,6 +2966,10 @@
>  
>  
>  
> +
> +
> +
> +
>  
>
>  
> diff --git a/src/mesa/main/tests/dispatch_sanity.cpp 
> b/src/mesa/main/tests/dispatch_sanity.cpp
> index af89d2c..14c9eda 100644
> --- a/src/mesa/main/tests/dispatch_sanity.cpp
> +++ b/src/mesa/main/tests/dispatch_sanity.cpp
> @@ -2461,8 +2461,7 @@ const struct function gles31_functions_possible[] = {
> { "glGetBooleani_v", 31, -1 },
> { "glMemoryBarrier", 31, -1 },
>
> -   // FINISHME: This function has not been implemented yet.
> -   // { "glMemoryBarrierByRegion", 31, -1 },
> +   { "glMemoryBarrierByRegion", 31, -1 },
>
> { "glTexStorage2DMultisample", 31, -1 },
> { "glGetMultisamplefv", 31, -1 },
> --
> 1.9.1
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 10/20] i965/fs: Implement image load, store and atomic.

2015-07-31 Thread Francisco Jerez
Jason Ekstrand  writes:

> On Fri, Jul 31, 2015 at 6:15 AM, Francisco Jerez  
> wrote:
>> Jason Ekstrand  writes:
>>
>>> On Thu, Jul 23, 2015 at 4:38 AM, Francisco Jerez  
>>> wrote:
 Jason Ekstrand  writes:

> This all looks correct as far as I can tell.  However, I'm very
> concerned about the number of checks such as
> has_matching_typed_format() that are built-in to the compiler (via
> surface_builder) where we then go on to do something that is highly
> dependant on state setup doing the exact same check (not via
> surface_builder) and doing the right thing.  Another example would be
> the interplay with has_split_bit_layout.  How do we know these won't
> get out of sync?
>
 They cannot get out of sync because the policy which surface format to
 use for a given GLSL format is shared between the compiler and the
 state-setup code, see brw_lower_mesa_image_format() in "i965: Implement
 surface state set-up for shader images.".
>>>
>>> I came back to this today and looked at it again.  I think my main
>>> objection is that all of these metadata functions work in terms of a
>>> devinfo and a format and then immediately call
>>> brw_lower_mesa_image_format.  Why don't they work instead in terms
>>> comparing the lowered format to the nominal format?  Then it would be
>>> a lot more obvious that you're doing a transformation from one format
>>> to another.  As is, it's a bunch of magic "what do I have to do next"
>>> queries
>>
>> I don't think any of these functions implies what to do next, they all
>> encode clearly defined device-specific properties of the formats:
>>
>>  - Does the device support any format with the same bit layout?
>>(has_supported_bit_layout())
>>
>>  - Does the device support any format with the same bit layout *and*
>>encoding? (is_conversion_trivial())
>>
>>  - Does the device support a typed format of the same size?
>>(has_matching_typed_format())
>>
>>  - Does the device split the pixel data into a number of discontiguous
>>segments? (has_split_bit_layout())
>>
>>  - Does the device return garbage in the unused bits when reading from
>>the given format? (has_undefined_high_bits())
>>
>>  - Do all fields of the format have the same size? (This is not even
>>device-specific)
>>
>>  - Is the format represented as a signed integer in memory? (Ditto)
>>
>> You may have noticed that except for the last two the natural subject
>> and object of all questions are "device" and "format" respectively --
>> I'm not sure I see how rephrasing them to involve the lowered format
>> (that in fact a number of them don't care about) would make any of these
>> questions easier to understand.
>
> I *understand* the questions being asked.  My claim is this is the
> *wrong* set of questions.  By the time you get to this stage, someone
> higher up has made the decision that they're not going to give you
> format A but will rather give you format B which they know will also
> work.  (This decision is made by code that understands the hardware
> limitations and you're guaranteed that B is a format you can read and
> transform into A.)  The fact that that decision is passed down through
> brw_lower_mesa_image_format is secondary.  At this stage, you have two
> formats, A and B and need to figure out how to get from one to the
> other.
>
> For example, instead of "Does the device support a typed format of the
> same size?" the better question would be "Are these two formats the
> same size?".

Those two questions are not equivalent, lowered formats are always the
same size as the original, whether a format is supported for typed
surface access or not is inherently devinfo-specific, the devinfo
argument couldn't be replaced with a lower_format argument.  The same
goes for "Does the device return garbage in the unused bits when reading
From the given format?".

> Instead of asking "Does the device support any format with the same
> bit layout *and* encoding?" you should ask "Do these two formats have
> the same bit layout *and* encoding?"
>
*Shrug*, I think they're only wrong questions if you start from the
assumption that the policy that computes the lowered format from a given
image format is broken -- E.g. that it gives you a suboptimal format of
different encoding even though the hardware supports an identically
encoded one.  You can quickly rule out that assumption by looking up the
definition of is_conversion_trivial() and seeing that it does little
more than call brw_lower_mesa_image_format().

I'm afraid that I was slightly misleading earlier:
is_conversion_trivial() is really about the encoding and only has
indirect bit layout implications.  The current implementation is based
on the assumption that the lowered format we're converting from or to is
just the canonical one -- I.e. it cannot be readily extended to check
whether the conversion between any two arbitrary formats is trivial, so
even though replacing the devin

[Mesa-dev] [PATCH 1/5] ra: Refactor ra_set_finalize

2015-07-31 Thread Jason Ekstrand
All this commit does is change an early return to an if with an else
clause.
---
 src/util/register_allocate.c | 51 ++--
 1 file changed, 25 insertions(+), 26 deletions(-)

diff --git a/src/util/register_allocate.c b/src/util/register_allocate.c
index 95be20f..129d58d 100644
--- a/src/util/register_allocate.c
+++ b/src/util/register_allocate.c
@@ -321,32 +321,31 @@ ra_set_finalize(struct ra_regs *regs, unsigned int 
**q_values)
 regs->classes[b]->q[c] = q_values[b][c];
 }
   }
-  return;
-   }
-
-   /* Compute, for each class B and C, how many regs of B an
-* allocation to C could conflict with.
-*/
-   for (b = 0; b < regs->class_count; b++) {
-  for (c = 0; c < regs->class_count; c++) {
-unsigned int rc;
-int max_conflicts = 0;
-
-for (rc = 0; rc < regs->count; rc++) {
-   int conflicts = 0;
-   unsigned int i;
-
-if (!reg_belongs_to_class(rc, regs->classes[c]))
-  continue;
-
-   for (i = 0; i < regs->regs[rc].num_conflicts; i++) {
-  unsigned int rb = regs->regs[rc].conflict_list[i];
-  if (reg_belongs_to_class(rb, regs->classes[b]))
- conflicts++;
-   }
-   max_conflicts = MAX2(max_conflicts, conflicts);
-}
-regs->classes[b]->q[c] = max_conflicts;
+   } else {
+  /* Compute, for each class B and C, how many regs of B an
+   * allocation to C could conflict with.
+   */
+  for (b = 0; b < regs->class_count; b++) {
+ for (c = 0; c < regs->class_count; c++) {
+unsigned int rc;
+int max_conflicts = 0;
+
+for (rc = 0; rc < regs->count; rc++) {
+   int conflicts = 0;
+   unsigned int i;
+
+   if (!reg_belongs_to_class(rc, regs->classes[c]))
+  continue;
+
+   for (i = 0; i < regs->regs[rc].num_conflicts; i++) {
+  unsigned int rb = regs->regs[rc].conflict_list[i];
+  if (reg_belongs_to_class(rb, regs->classes[b]))
+ conflicts++;
+   }
+   max_conflicts = MAX2(max_conflicts, conflicts);
+}
+regs->classes[b]->q[c] = max_conflicts;
+ }
   }
}
 }
-- 
2.4.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 0/5] Some register allocation improvements

2015-07-31 Thread Jason Ekstrand
The following 5 patches contain a few register allocation cleanups and
performance improvements.  Chris Wilson noticed that setting up register
sets on i965 calls reralloc an absurd number of times.  I did a little
hacking and found out that the initial size for the collision lists is way
too low.  This series also contains a patch to avoid setting up registers
more times than needed on platforms where RA is the same for SIMD8 vs
SIMD16.

The whole series seems to cut about 4 minutes off a piglit run on BYT.  It
usually takes around 31 minutes and this time it ran in 27.

Jason Ekstrand (5):
  ra: Refactor ra_set_finalize
  ra: Delete the conflict lists in ra_set_finalize
  ra: Allocate bigger initial conflict lists
  i965/fs: Use dispatch_width instead of reg_width in alloc_reg_sets
  i965/fs: Don't do redundant RA setup on IVB+

 src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp | 25 ++
 src/util/register_allocate.c  | 58 ---
 2 files changed, 48 insertions(+), 35 deletions(-)

-- 
2.4.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/5] ra: Delete the conflict lists in ra_set_finalize

2015-07-31 Thread Jason Ekstrand
I'm not 100% sure that this is the right patch.  Instead of baking 256 into
the allocator, we could allow the user to pass in an initial constant.
Since the maximum is statically known, we could also make said constant a
hard limit and allocate everything up-front in a single array and save all
the allocations.

-->8-

Subject: [PATCH 3/5] ra: Allocate bigger initial conflict lists

We allocate register sets three times: Once for SIMD4x2, SIMD8, and SIMD16.
On broadwell, hacking up the register allocate code yields:

   Average umber of conflicts: 241.892120 (16 min, 376 max)
   846 over 256, 86 under 128, 1928 total
   Average umber of conflicts: 241.892120 (16 min, 376 max)
   846 over 256, 86 under 128, 1928 total
   Average umber of conflicts: 239.732056 (16 min, 376 max)
   718 over 256, 86 under 128, 1672 total

where the first two are for SIMD8/16 and the last is for SIMD4x2.  On
IronLake, however, the numbers are a bit different for FS:

   Average umber of conflicts: 30.161085 (5 min, 53 max)
   Average umber of conflicts: 14.596154 (2 min, 25 max)
   Average umber of conflicts: 241.892120 (16 min, 376 max)

This is because we don't support send-from-GRF on ILK and so we don't need
as much wiggle-room in the allocator.  That said, once we add support
indirect addressing in the FS backend, these numbers will jump back to
the level we have for BDW.

According to callgrind, this takes the number of reralloc_size calls during
screen creation from 35,343 to 2,467 on BDW on a glxgears run.

Cc: Chris Wilson 
Cc: Eric Anholt 
Cc: Matt Turner 
Cc: Kenneth Graunke 

---
 src/util/register_allocate.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/util/register_allocate.c b/src/util/register_allocate.c
index 436e008..570e13f 100644
--- a/src/util/register_allocate.c
+++ b/src/util/register_allocate.c
@@ -197,8 +197,8 @@ ra_alloc_reg_set(void *mem_ctx, unsigned int count)
   BITSET_WORDS(count));
   BITSET_SET(regs->regs[i].conflicts, i);
 
-  regs->regs[i].conflict_list = ralloc_array(regs->regs, unsigned int, 4);
-  regs->regs[i].conflict_list_size = 4;
+  regs->regs[i].conflict_list = ralloc_array(regs->regs, unsigned int, 
256);
+  regs->regs[i].conflict_list_size = 256;
   regs->regs[i].conflict_list[0] = i;
   regs->regs[i].num_conflicts = 1;
}
-- 
2.4.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 4/5] i965/fs: Use dispatch_width instead of reg_width in alloc_reg_sets

2015-07-31 Thread Jason Ekstrand
reg_width is kind of an outdated concept.
---
 src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp
index 6a7ed64..211f70e 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp
@@ -73,11 +73,11 @@ fs_visitor::assign_regs_trivial()
 }
 
 static void
-brw_alloc_reg_set(struct brw_compiler *compiler, int reg_width)
+brw_alloc_reg_set(struct brw_compiler *compiler, int dispatch_width)
 {
const struct brw_device_info *devinfo = compiler->devinfo;
int base_reg_count = BRW_MAX_GRF;
-   int index = reg_width - 1;
+   int index = (dispatch_width / 8) - 1;
 
/* The registers used to make up almost all values handled in the compiler
 * are a scalar value occupying a single register (or 2 registers in the
@@ -121,7 +121,7 @@ brw_alloc_reg_set(struct brw_compiler *compiler, int 
reg_width)
/* Compute the total number of registers across all classes. */
int ra_reg_count = 0;
for (int i = 0; i < class_count; i++) {
-  if (devinfo->gen <= 5 && reg_width == 2) {
+  if (devinfo->gen <= 5 && dispatch_width == 16) {
  /* From the G45 PRM:
   *
   * In order to reduce the hardware complexity, the following
@@ -168,7 +168,7 @@ brw_alloc_reg_set(struct brw_compiler *compiler, int 
reg_width)
int pairs_reg_count = 0;
for (int i = 0; i < class_count; i++) {
   int class_reg_count;
-  if (devinfo->gen <= 5 && reg_width == 2) {
+  if (devinfo->gen <= 5 && dispatch_width == 16) {
  class_reg_count = (base_reg_count - (class_sizes[i] - 1)) / 2;
 
  /* See comment below.  The only difference here is that we are
@@ -214,7 +214,7 @@ brw_alloc_reg_set(struct brw_compiler *compiler, int 
reg_width)
  pairs_reg_count = class_reg_count;
   }
 
-  if (devinfo->gen <= 5 && reg_width == 2) {
+  if (devinfo->gen <= 5 && dispatch_width == 16) {
  for (int j = 0; j < class_reg_count; j++) {
 ra_class_add_reg(regs, classes[i], reg);
 
@@ -249,7 +249,7 @@ brw_alloc_reg_set(struct brw_compiler *compiler, int 
reg_width)
/* Add a special class for aligned pairs, which we'll put delta_xy
 * in on Gen <= 6 so that we can do PLN.
 */
-   if (devinfo->has_pln && reg_width == 1 && devinfo->gen <= 6) {
+   if (devinfo->has_pln && dispatch_width == 8 && devinfo->gen <= 6) {
   aligned_pairs_class = ra_alloc_reg_class(regs);
 
   for (int i = 0; i < pairs_reg_count; i++) {
@@ -287,8 +287,8 @@ brw_alloc_reg_set(struct brw_compiler *compiler, int 
reg_width)
 void
 brw_fs_alloc_reg_sets(struct brw_compiler *compiler)
 {
-   brw_alloc_reg_set(compiler, 1);
-   brw_alloc_reg_set(compiler, 2);
+   brw_alloc_reg_set(compiler, 8);
+   brw_alloc_reg_set(compiler, 16);
 }
 
 static int
-- 
2.4.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 5/5] i965/fs: Don't do redundant RA setup on IVB+

2015-07-31 Thread Jason Ekstrand
---
 src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp | 9 +
 1 file changed, 9 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp
index 211f70e..512da22 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp
@@ -79,6 +79,15 @@ brw_alloc_reg_set(struct brw_compiler *compiler, int 
dispatch_width)
int base_reg_count = BRW_MAX_GRF;
int index = (dispatch_width / 8) - 1;
 
+   if (dispatch_width > 8 && devinfo->gen >= 7) {
+  /* For IVB+, we don't need the PLN hacks or the 2-reg alignment in
+   * SIMD16.  Therefore, we can use the exact same register sets for
+   * SIMD16 as we do for SIMD8 and we don't need to recalculate them.
+   */
+  compiler->fs_reg_sets[index] = compiler->fs_reg_sets[0];
+  return;
+   }
+
/* The registers used to make up almost all values handled in the compiler
 * are a scalar value occupying a single register (or 2 registers in the
 * case of SIMD16, which is handled by dividing base_reg_count by 2 and
-- 
2.4.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] IROUND, math errors, etc.

2015-07-31 Thread Roland Scheidegger
Am 31.07.2015 um 18:44 schrieb Matt Turner:
> On Fri, Jul 31, 2015 at 7:13 AM, Roland Scheidegger  
> wrote:
>> CC mesa-dev.
>>
>> This looks good to me. I am starting to wonder though why we don't just
>> use lrintf() and let the compiler sort it out (for x86 too).
>> Though actually some quick experiments show that:
>> - llvm's clang will always use libm lrintf call. Which then will do
>> (x86_64) cvtss2si %xmm0,%rax as expected. Meaning the cost is probably
>> twice as high as it could be due to the unnecessary library call.
>> - gcc will also use the same library call. Unless you specify
>> -fno-math-errno (or some more aggressive math optimizing stuff), in
>> which case it will do the cvtss2si on its own. Which is fairly stupid,
>> because this function doesn't set errno in any case, so it could be used
>> independent of -fno-math-errno.
>>
>> Speaking of -fno-math-errno, why don't we use that in mesa? I know the
>> fast math stuff can be problematic, but noone is _ever_ interested in
>> math error numbers.
>>
>> Speaking of which, I'm not really sure why IROUND isn't doing the same.
>> Yes it rounds away from zero, but I doubt that matters - would probably
>> be better to match whatever rounding is used in hw (GL doesn't seem to
>> specify tie-breaker rules for round to nearest afaict).
>>
>> FWIW IROUND along with even the 64bit sibling IROUND64 (and IROUND_POS)
>> is not even really correct in any case. There exist floats where f +
>> 0.5f will round up to the next integer incorrectly. e.g. something like
>> "largest float smaller than 63.5f", 63.499f or so, if you add +0.5f
>> the resulting number for the hw is right between that largest float
>> smaller than 63.5f and 64.0f, and thus it will use the tie-breaker rule
>> (round to nearest even for your typical hw with typical rounding mode
>> set) making this 64.0, thus the rounded integer will be 64, which is
>> just plain wrong no matter the round-to-nearest tie breaker rule.
>> There are ways to fix it (the obvious one is to add 0.5 as double), but
>> I don't think we should even try that, and assume lrintf can do a decent
>> job on hw we care about (compiler not doing its job right is a pity but
>> might not be too bad even if it uses lib call).
> 
> I've actually got a branch to get rid of F_TO_I (and I want to remove
> IROUND as well) in favor of libm rounding functions.
> 
> I agree that we don't care about errno and traps and such, so I tried
> a few things to get the code we want from rintf, etc. I tried marking
> a wrapper around rintf with __attribute__((optimize("-ffast-math")))
> but just today a gcc developer confirmed that this cannot work because
> when the function is inlined it loses the optimization attribute. I'll
> do some tests with -fno-math-errno and friends.
> 
> I'll finish this branch up very soon.
> 

Well for F_TO_I and IROUND we really want lrintf, not rintf (which is
much easier on the (x86) cpu - rintf requires sse4.1 for a trivial
implementation, lrintf just sse so at least on x86_64 you always get it
which is quite a benefit for "standard" compilation). I assume though
gcc only inlining it with -fno-math-errno is really a gcc bug - the
function doesn't set error number, and exceptions will happen just the
same no matter if you use the libm call or do it directly, both depend
on current exception settings. Though I don't know why clang didn't want
to inline it at all no matter what, seems kind of silly.

Thanks for looking into it.

Roland

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/5] ra: Delete the conflict lists in ra_set_finalize

2015-07-31 Thread Jason Ekstrand
They are never used after the set is finalized so there's no reason to keep
them around.
---
 src/util/register_allocate.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/src/util/register_allocate.c b/src/util/register_allocate.c
index 129d58d..436e008 100644
--- a/src/util/register_allocate.c
+++ b/src/util/register_allocate.c
@@ -348,6 +348,11 @@ ra_set_finalize(struct ra_regs *regs, unsigned int 
**q_values)
  }
   }
}
+
+   for (b = 0; b < regs->count; b++) {
+  ralloc_free(regs->regs[b].conflict_list);
+  regs->regs[b].conflict_list = NULL;
+   }
 }
 
 static void
-- 
2.4.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/2 v1.1] clover: make dispatch matches functions def

2015-07-31 Thread EdB
---
 src/gallium/state_trackers/clover/api/dispatch.hpp | 23 +-
 1 file changed, 18 insertions(+), 5 deletions(-)

diff --git a/src/gallium/state_trackers/clover/api/dispatch.hpp 
b/src/gallium/state_trackers/clover/api/dispatch.hpp
index ffae1ae..781b54e 100644
--- a/src/gallium/state_trackers/clover/api/dispatch.hpp
+++ b/src/gallium/state_trackers/clover/api/dispatch.hpp
@@ -693,7 +693,13 @@ struct _cl_icd_dispatch {
CL_API_ENTRY cl_int (CL_API_CALL *clUnloadPlatformCompiler)(
   cl_platform_id platform);
 
-   void *clGetKernelArgInfo;
+   CL_API_ENTRY cl_int (CL_API_CALL *clGetKernelArgInfo)(
+  cl_kernel kernel,
+  cl_uint arg_indx,
+  cl_kernel_arg_info  param_name,
+  size_t param_value_size,
+  void *param_value,
+  size_t *param_value_size_ret);
 
CL_API_ENTRY cl_int (CL_API_CALL *clEnqueueFillBuffer)(
   cl_command_queue command_queue,
@@ -701,7 +707,7 @@ struct _cl_icd_dispatch {
   const void *pattern,
   size_t pattern_size,
   size_t offset,
-  size_t cb,
+  size_t size,
   cl_uint num_events_in_wait_list,
   const cl_event *event_wait_list,
   cl_event *event);
@@ -710,13 +716,20 @@ struct _cl_icd_dispatch {
   cl_command_queue command_queue,
   cl_mem image,
   const void *fill_color,
-  const size_t origin[3],
-  const size_t region[3],
+  const size_t *origin,
+  const size_t *region,
   cl_uint num_events_in_wait_list,
   const cl_event *event_wait_list,
   cl_event *event);
 
-   void *clEnqueueMigrateMemObjects;
+   CL_API_ENTRY cl_int (CL_API_CALL *clEnqueueMigrateMemObjects)(
+  cl_command_queue command_queue,
+  cl_uint num_mem_objects,
+  const cl_mem *mem_objects,
+  cl_mem_migration_flags flags,
+  cl_uint num_events_in_wait_list,
+  const cl_event *event_wait_list,
+  cl_event *event);
 
CL_API_ENTRY cl_int (CL_API_CALL *clEnqueueMarkerWithWaitList)(
   cl_command_queue command_queue,
-- 
2.5.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 4/4] i965/es3.1: Implement glMemoryBarrierByRegion

2015-07-31 Thread Matt Turner
On Fri, Jul 31, 2015 at 5:15 AM, Marta Lofstedt
 wrote:
> From: Marta Lofstedt 
>
> Signed-off-by: Marta Lofstedt 
> ---
>  src/mesa/drivers/dri/i965/brw_program.c | 34 
> +
>  1 file changed, 34 insertions(+)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_program.c 
> b/src/mesa/drivers/dri/i965/brw_program.c
> index 85e271d..332d84e 100644
> --- a/src/mesa/drivers/dri/i965/brw_program.c
> +++ b/src/mesa/drivers/dri/i965/brw_program.c
> @@ -226,6 +226,39 @@ brw_memory_barrier(struct gl_context *ctx, GLbitfield 
> barriers)
> brw_emit_pipe_control_flush(brw, bits);
>  }
>
> +static void
> +brw_memory_barrier_by_region(struct gl_context *ctx, GLbitfield barriers)
> +{
> +   GLbitfield all_allowed_bits = GL_ATOMIC_COUNTER_BARRIER_BIT |
> +  GL_FRAMEBUFFER_BARRIER_BIT |
> +  GL_SHADER_IMAGE_ACCESS_BARRIER_BIT |
> +  GL_SHADER_STORAGE_BARRIER_BIT |
> +  GL_TEXTURE_FETCH_BARRIER_BIT |
> +  GL_UNIFORM_BARRIER_BIT;

Indent these to match GL_ATOMIC_COUNTER_BARRIER_BIT.

Also, GL_ATOMIC_COUNTER_BARRIER_BIT and GL_SHADER_STORAGE_BARRIER_BIT
are not handled by brw_memory_barrier(). I know the latter is part of
an in-progress feature, but are we missing something for atomic
counters?

> +   /*
> +* According to OpenGL ES 3.1 spec. April 29, 2015, 7.11.2:
> +* "When barriers are ALL_BARRIERS_BIT, shader memory access
> +* will be synchronized realtive to all theese barrier bits,
> +* but not to other barrier bits specific to MemoryBarrier."

Just copy and paste from the spec to avoid s/is/are/ and typos.

> +* I.e if bariiers is the special value GL_ALL_BARRIER_BITS,
> +* then all barriers allowed by glMemoryBarrierByRegion
> +* should be activated.
> +   */
> +   if (barriers == GL_ALL_BARRIER_BITS)
> +  return brw_memory_barrier(ctx, all_allowed_bits);
> +
> +   /*
> +* If barriers contain a value that is not allowed
> +* for glMemoryBarrierByRegion an GL_INVALID_VALUE
> +* should be generated.
> +   */
> +   if ((all_allowed_bits | barriers) ^ all_allowed_bits)

This might work but I have a hard time thinking about it. The usual pattern is

   (barriers & ~all_allowed_bits) != 0

> +   _mesa_error(ctx, GL_INVALID_VALUE,
> +"glMemoryBarrierByRegion(unsupported barrier bit");

Indent.

I've attached a patch that cleans up the things I mentioned. Please
squash it in. With that, the patch looks good. I suppose the only
pending question is about the unhandled GL_ATOMIC_COUNTER_BARRIER_BIT
bit.
diff --git a/src/mesa/drivers/dri/i965/brw_program.c b/src/mesa/drivers/dri/i965/brw_program.c
index 332d84e..eb8ae3f 100644
--- a/src/mesa/drivers/dri/i965/brw_program.c
+++ b/src/mesa/drivers/dri/i965/brw_program.c
@@ -230,31 +230,34 @@ static void
 brw_memory_barrier_by_region(struct gl_context *ctx, GLbitfield barriers)
 {
GLbitfield all_allowed_bits = GL_ATOMIC_COUNTER_BARRIER_BIT |
-  GL_FRAMEBUFFER_BARRIER_BIT |
-  GL_SHADER_IMAGE_ACCESS_BARRIER_BIT |
-  GL_SHADER_STORAGE_BARRIER_BIT |
-  GL_TEXTURE_FETCH_BARRIER_BIT |
-  GL_UNIFORM_BARRIER_BIT;
-   /*
-* According to OpenGL ES 3.1 spec. April 29, 2015, 7.11.2:
-* "When barriers are ALL_BARRIERS_BIT, shader memory access
-* will be synchronized realtive to all theese barrier bits,
-* but not to other barrier bits specific to MemoryBarrier."
-* I.e if bariiers is the special value GL_ALL_BARRIER_BITS,
-* then all barriers allowed by glMemoryBarrierByRegion
-* should be activated.
-   */
+ GL_FRAMEBUFFER_BARRIER_BIT |
+ GL_SHADER_IMAGE_ACCESS_BARRIER_BIT |
+ GL_SHADER_STORAGE_BARRIER_BIT |
+ GL_TEXTURE_FETCH_BARRIER_BIT |
+ GL_UNIFORM_BARRIER_BIT;
+
+   /* From section 7.11.2 of the OpenGL ES 3.1 specification:
+*
+*"When barriers is ALL_BARRIER_BITS, shader memory accesses will be
+* synchronized relative to all these barrier bits, but not to other
+* barrier bits specific to MemoryBarrier."
+*
+* That is, if barriers is the special value GL_ALL_BARRIER_BITS, then all
+* barriers allowed by glMemoryBarrierByRegion should be activated."
+*/
if (barriers == GL_ALL_BARRIER_BITS)
   return brw_memory_barrier(ctx, all_allowed_bits);
 
-   /*
-* If barriers contain a value that is not allowed
-* for glMemoryBarrierByRegion an GL_INVALID_VALUE
-* should be generated.
-   */
-   if ((all_allowed_bits | barriers) ^ all_allowed_bits)
+   /* From section 7.11.2 of the OpenGL ES 3.1 specification:
+*
+*"An INVALID_VALUE error is generated if barriers is not the special
+* value ALL_BARRIER_BITS, and has any bits set other than those
+* described above."
+*/
+   if ((barriers & ~all_allowed_bits) != 0) {
_mesa_error(ctx, GL_INVALID_VALUE,
-"glMemoryBarr

Re: [Mesa-dev] [PATCH 1/4] gles/es3.1: Enable dispatch of glMemoryBarrierByRegion

2015-07-31 Thread Matt Turner
On Fri, Jul 31, 2015 at 9:57 AM, Ilia Mirkin  wrote:
> Won't this cause a compilation failure in mesa? I thought it'd start
> looking for the _mesa_MemoryBarrierByRegion function. Normally this is
> added as a stub in the first commit, and then the real impl comes
> later.

Yeah, that's probably true.

Marta, you might just want to squash the first three patches together.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 4/4] i965/es3.1: Implement glMemoryBarrierByRegion

2015-07-31 Thread Ilia Mirkin
On Fri, Jul 31, 2015 at 8:15 AM, Marta Lofstedt
 wrote:
> From: Marta Lofstedt 
>
> Signed-off-by: Marta Lofstedt 
> ---
>  src/mesa/drivers/dri/i965/brw_program.c | 34 
> +
>  1 file changed, 34 insertions(+)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_program.c 
> b/src/mesa/drivers/dri/i965/brw_program.c
> index 85e271d..332d84e 100644
> --- a/src/mesa/drivers/dri/i965/brw_program.c
> +++ b/src/mesa/drivers/dri/i965/brw_program.c
> @@ -226,6 +226,39 @@ brw_memory_barrier(struct gl_context *ctx, GLbitfield 
> barriers)
> brw_emit_pipe_control_flush(brw, bits);
>  }
>
> +static void
> +brw_memory_barrier_by_region(struct gl_context *ctx, GLbitfield barriers)
> +{
> +   GLbitfield all_allowed_bits = GL_ATOMIC_COUNTER_BARRIER_BIT |
> +  GL_FRAMEBUFFER_BARRIER_BIT |
> +  GL_SHADER_IMAGE_ACCESS_BARRIER_BIT |
> +  GL_SHADER_STORAGE_BARRIER_BIT |
> +  GL_TEXTURE_FETCH_BARRIER_BIT |
> +  GL_UNIFORM_BARRIER_BIT;
> +   /*
> +* According to OpenGL ES 3.1 spec. April 29, 2015, 7.11.2:
> +* "When barriers are ALL_BARRIERS_BIT, shader memory access
> +* will be synchronized realtive to all theese barrier bits,
> +* but not to other barrier bits specific to MemoryBarrier."
> +* I.e if bariiers is the special value GL_ALL_BARRIER_BITS,
> +* then all barriers allowed by glMemoryBarrierByRegion
> +* should be activated.
> +   */
> +   if (barriers == GL_ALL_BARRIER_BITS)
> +  return brw_memory_barrier(ctx, all_allowed_bits);
> +
> +   /*
> +* If barriers contain a value that is not allowed
> +* for glMemoryBarrierByRegion an GL_INVALID_VALUE
> +* should be generated.
> +   */
> +   if ((all_allowed_bits | barriers) ^ all_allowed_bits)
> +   _mesa_error(ctx, GL_INVALID_VALUE,
> +"glMemoryBarrierByRegion(unsupported barrier bit");

It's fairly unusual to do _mesa_error() in the driver. It's done for
some texture stuff, but in general such checking is left to the common
implementation.

Is the list of allowed bits a per-driver thing? If so, perhaps there
should be a ctx->Const.MemoryBarriers bitfield?

And then you wouldn't even need this separate callback (at least for
now) and just use functions->MemoryBarrier().

Also as Matt pointed out, this is a very unusual (for mesa) way of
checking bits.

> +
> +   return brw_memory_barrier(ctx, barriers);
> +}
> +
>  void
>  brw_add_texrect_params(struct gl_program *prog)
>  {
> @@ -285,6 +318,7 @@ void brwInitFragProgFuncs( struct dd_function_table 
> *functions )
> functions->LinkShader = brw_link_shader;
>
> functions->MemoryBarrier = brw_memory_barrier;
> +   functions->MemoryBarrierByRegion = brw_memory_barrier_by_region;
>  }
>
>  struct shader_times {
> --
> 1.9.1
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] st/mesa: don't try to clear depth and stencil together in glClear fallback

2015-07-31 Thread Roland Scheidegger
Actually, since the code says it can only happen with a non-full stencil
mask, isn't clearing depth/stencil with a non-full stencil mask
incredibly rare?

Roland

Am 31.07.2015 um 18:51 schrieb Roland Scheidegger:
> I don't think that's quite true in general.
> For gpus which have combined ds buffers I can'see why you'd wanted to do
> separate clears for depth and stencil in this case (i.e. doing
> pipe->clear for depth, then draw a quad for clearing stencil).
> At least for "simple" hw like llvmpipe which don't have special depth
> clear, this clearly seems to be much worse (you have to go through the
> memory twice).
> 
> I vaguely remember something like this being proposed some time ago with
> some discussion that not the same thing is optimal depending on the
> hw... I don't think though there's anything at the moment where you
> could figure out what is better.
> 
> Roland
> 
> 
> 
> Am 31.07.2015 um 17:15 schrieb Marek Olšák:
>> From: Marek Olšák 
>>
>> A lot of GPUs allocate separate depth and stencil buffers, so clearing them
>> together doesn't make much sense. If some GPUs don't allocate separate
>> depth & stencil, it's still beneficial to clear the HiZ / HiS information
>> for only one of the two.
>> ---
>>  src/mesa/state_tracker/st_cb_clear.c | 9 -
>>  1 file changed, 9 deletions(-)
>>
>> diff --git a/src/mesa/state_tracker/st_cb_clear.c 
>> b/src/mesa/state_tracker/st_cb_clear.c
>> index 137fac8..1e404a2 100644
>> --- a/src/mesa/state_tracker/st_cb_clear.c
>> +++ b/src/mesa/state_tracker/st_cb_clear.c
>> @@ -515,15 +515,6 @@ st_Clear(struct gl_context *ctx, GLbitfield mask)
>>}
>> }
>>  
>> -   /* Always clear depth and stencil together.
>> -* This can only happen when the stencil writemask is not a full mask.
>> -*/
>> -   if (quad_buffers & PIPE_CLEAR_DEPTHSTENCIL &&
>> -   clear_buffers & PIPE_CLEAR_DEPTHSTENCIL) {
>> -  quad_buffers |= clear_buffers & PIPE_CLEAR_DEPTHSTENCIL;
>> -  clear_buffers &= ~PIPE_CLEAR_DEPTHSTENCIL;
>> -   }
>> -
>> /* Only use quad-based clearing for the renderbuffers which cannot
>>  * use pipe->clear. We want to always use pipe->clear for the other
>>  * renderbuffers, because it's likely to be faster.
>>
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] i965/cs: Setup push constant data for uniforms

2015-07-31 Thread Jordan Justen
brw_upload_cs_push_constants was based on gen6_upload_push_constants.

v2:
 * Add FINISHME comments about more efficient ways to push uniforms

Signed-off-by: Jordan Justen 
Cc: Ben Widawsky 
---
 Ben, Regarding your v1 feedback:

 * I looked into the other mechanisms for uploading uniform data once,
   rather than once per local workgroup thread. They look compelling,
   but for now I just added 'FINISHME' comments to document them.

 * I think that the MI_ATOMIC workaround is only needed if the new bdw
   "Indirect Payload Storage" is used.

 src/mesa/drivers/dri/i965/brw_context.h  |   2 +-
 src/mesa/drivers/dri/i965/brw_cs.cpp | 131 ++-
 src/mesa/drivers/dri/i965/brw_defines.h  |   6 ++
 src/mesa/drivers/dri/i965/brw_state.h|   1 +
 src/mesa/drivers/dri/i965/brw_state_upload.c |   2 +
 5 files changed, 137 insertions(+), 5 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_context.h 
b/src/mesa/drivers/dri/i965/brw_context.h
index cd43ac5..0bc497b 100644
--- a/src/mesa/drivers/dri/i965/brw_context.h
+++ b/src/mesa/drivers/dri/i965/brw_context.h
@@ -1460,7 +1460,7 @@ struct brw_context
 
int num_atoms[BRW_NUM_PIPELINES];
const struct brw_tracked_state render_atoms[57];
-   const struct brw_tracked_state compute_atoms[3];
+   const struct brw_tracked_state compute_atoms[4];
 
/* If (INTEL_DEBUG & DEBUG_BATCH) */
struct {
diff --git a/src/mesa/drivers/dri/i965/brw_cs.cpp 
b/src/mesa/drivers/dri/i965/brw_cs.cpp
index 29ee75b..28eddfc 100644
--- a/src/mesa/drivers/dri/i965/brw_cs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_cs.cpp
@@ -327,6 +327,10 @@ brw_upload_cs_state(struct brw_context *brw)
 
prog_data->binding_table.size_bytes,
 32, &stage_state->bind_bo_offset);
 
+   unsigned push_constant_data_size =
+  prog_data->nr_params * sizeof(gl_constant_value);
+   unsigned reg_aligned_constant_size = ALIGN(push_constant_data_size, 32);
+   unsigned push_constant_regs = reg_aligned_constant_size / 32;
unsigned threads = get_cs_thread_count(cs_prog_data);
 
uint32_t dwords = brw->gen < 8 ? 8 : 9;
@@ -359,12 +363,41 @@ brw_upload_cs_state(struct brw_context *brw)
 
OUT_BATCH(0);
const uint32_t vfe_urb_allocation = brw->gen >= 8 ? 2 : 0;
-   OUT_BATCH(SET_FIELD(vfe_urb_allocation, MEDIA_VFE_STATE_URB_ALLOC));
+
+   /* We are uploading duplicated copies of push constant uniforms for each
+* thread. Although the local id data needs to vary per thread, it won't
+* change for other uniform data. Unfortunately this duplication is
+* required for gen7. As of Haswell, this duplication can be avoided, but
+* this older mechanism with duplicated data continues to work.
+*
+* FINISHME: As of Haswell, we could make use of the
+* INTERFACE_DESCRIPTOR_DATA "Cross-Thread Constant Data Read Length" field
+* to only store one copy of uniform data.
+*
+* FINISHME: Broadwell adds a new alternative "Indirect Payload Storage"
+* which is described in the GPGPU_WALKER command and in the Broadwell PRM
+* Volume 7: 3D Media GPGPU, under Media GPGPU Pipeline => Mode of
+* Operations => GPGPU Mode => Indirect Payload Storage.
+*
+* Note: The constant data is built in brw_upload_cs_push_constants below.
+*/
+   const uint32_t vfe_curbe_allocation = push_constant_regs * threads;
+   OUT_BATCH(SET_FIELD(vfe_urb_allocation, MEDIA_VFE_STATE_URB_ALLOC) |
+ SET_FIELD(vfe_curbe_allocation, MEDIA_VFE_STATE_CURBE_ALLOC));
OUT_BATCH(0);
OUT_BATCH(0);
OUT_BATCH(0);
ADVANCE_BATCH();
 
+   if (reg_aligned_constant_size > 0) {
+  BEGIN_BATCH(4);
+  OUT_BATCH(MEDIA_CURBE_LOAD << 16 | (4 - 2));
+  OUT_BATCH(0);
+  OUT_BATCH(reg_aligned_constant_size * threads);
+  OUT_BATCH(stage_state->push_const_offset);
+  ADVANCE_BATCH();
+   }
+
/* BRW_NEW_SURFACES and BRW_NEW_*_CONSTBUF */
memcpy(bind, stage_state->surf_offset,
   prog_data->binding_table.size_bytes);
@@ -378,7 +411,7 @@ brw_upload_cs_state(struct brw_context *brw)
desc[dw++] = 0;
desc[dw++] = 0;
desc[dw++] = stage_state->bind_bo_offset;
-   desc[dw++] = 0;
+   desc[dw++] = SET_FIELD(push_constant_regs, MEDIA_CURBE_READ_LENGTH);
const uint32_t media_threads =
   brw->gen >= 8 ?
   SET_FIELD(threads, GEN8_MEDIA_GPGPU_THREAD_COUNT) :
@@ -400,8 +433,98 @@ const struct brw_tracked_state brw_cs_state = {
/* explicit initialisers aren't valid C++, comment
 * them for documentation purposes */
/* .dirty = */{
-  /* .mesa = */ 0,
-  /* .brw = */  BRW_NEW_CS_PROG_DATA,
+  /* .mesa = */ _NEW_PROGRAM_CONSTANTS,
+  /* .brw = */  BRW_NEW_CS_PROG_DATA |
+BRW_NEW_PUSH_CONSTANT_ALLOCATION,
},
/* .emit = */ brw_upload_cs_state
 };
+
+
+/**
+ * Creates a region containing the push constants for the CS on gen7+.
+ *
+ * Push constants a

Re: [Mesa-dev] [PATCH 4/4] i965/es3.1: Implement glMemoryBarrierByRegion

2015-07-31 Thread Matt Turner
On Fri, Jul 31, 2015 at 10:25 AM, Ilia Mirkin  wrote:
> On Fri, Jul 31, 2015 at 8:15 AM, Marta Lofstedt
>  wrote:
>> From: Marta Lofstedt 
>>
>> Signed-off-by: Marta Lofstedt 
>> ---
>>  src/mesa/drivers/dri/i965/brw_program.c | 34 
>> +
>>  1 file changed, 34 insertions(+)
>>
>> diff --git a/src/mesa/drivers/dri/i965/brw_program.c 
>> b/src/mesa/drivers/dri/i965/brw_program.c
>> index 85e271d..332d84e 100644
>> --- a/src/mesa/drivers/dri/i965/brw_program.c
>> +++ b/src/mesa/drivers/dri/i965/brw_program.c
>> @@ -226,6 +226,39 @@ brw_memory_barrier(struct gl_context *ctx, GLbitfield 
>> barriers)
>> brw_emit_pipe_control_flush(brw, bits);
>>  }
>>
>> +static void
>> +brw_memory_barrier_by_region(struct gl_context *ctx, GLbitfield barriers)
>> +{
>> +   GLbitfield all_allowed_bits = GL_ATOMIC_COUNTER_BARRIER_BIT |
>> +  GL_FRAMEBUFFER_BARRIER_BIT |
>> +  GL_SHADER_IMAGE_ACCESS_BARRIER_BIT |
>> +  GL_SHADER_STORAGE_BARRIER_BIT |
>> +  GL_TEXTURE_FETCH_BARRIER_BIT |
>> +  GL_UNIFORM_BARRIER_BIT;
>> +   /*
>> +* According to OpenGL ES 3.1 spec. April 29, 2015, 7.11.2:
>> +* "When barriers are ALL_BARRIERS_BIT, shader memory access
>> +* will be synchronized realtive to all theese barrier bits,
>> +* but not to other barrier bits specific to MemoryBarrier."
>> +* I.e if bariiers is the special value GL_ALL_BARRIER_BITS,
>> +* then all barriers allowed by glMemoryBarrierByRegion
>> +* should be activated.
>> +   */
>> +   if (barriers == GL_ALL_BARRIER_BITS)
>> +  return brw_memory_barrier(ctx, all_allowed_bits);
>> +
>> +   /*
>> +* If barriers contain a value that is not allowed
>> +* for glMemoryBarrierByRegion an GL_INVALID_VALUE
>> +* should be generated.
>> +   */
>> +   if ((all_allowed_bits | barriers) ^ all_allowed_bits)
>> +   _mesa_error(ctx, GL_INVALID_VALUE,
>> +"glMemoryBarrierByRegion(unsupported barrier bit");
>
> It's fairly unusual to do _mesa_error() in the driver. It's done for
> some texture stuff, but in general such checking is left to the common
> implementation.
>
> Is the list of allowed bits a per-driver thing? If so, perhaps there
> should be a ctx->Const.MemoryBarriers bitfield?

I don't think so. It seems to be that exact list, according to the spec.

> And then you wouldn't even need this separate callback (at least for
> now) and just use functions->MemoryBarrier().

Oh, that's a good point.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] st/mesa: don't try to clear depth and stencil together in glClear fallback

2015-07-31 Thread Marek Olšák
Indeed, it is rare. I thought this was hit more often, but apparently
not. Nevermind.

Marek

On Fri, Jul 31, 2015 at 7:44 PM, Roland Scheidegger  wrote:
> Actually, since the code says it can only happen with a non-full stencil
> mask, isn't clearing depth/stencil with a non-full stencil mask
> incredibly rare?
>
> Roland
>
> Am 31.07.2015 um 18:51 schrieb Roland Scheidegger:
>> I don't think that's quite true in general.
>> For gpus which have combined ds buffers I can'see why you'd wanted to do
>> separate clears for depth and stencil in this case (i.e. doing
>> pipe->clear for depth, then draw a quad for clearing stencil).
>> At least for "simple" hw like llvmpipe which don't have special depth
>> clear, this clearly seems to be much worse (you have to go through the
>> memory twice).
>>
>> I vaguely remember something like this being proposed some time ago with
>> some discussion that not the same thing is optimal depending on the
>> hw... I don't think though there's anything at the moment where you
>> could figure out what is better.
>>
>> Roland
>>
>>
>>
>> Am 31.07.2015 um 17:15 schrieb Marek Olšák:
>>> From: Marek Olšák 
>>>
>>> A lot of GPUs allocate separate depth and stencil buffers, so clearing them
>>> together doesn't make much sense. If some GPUs don't allocate separate
>>> depth & stencil, it's still beneficial to clear the HiZ / HiS information
>>> for only one of the two.
>>> ---
>>>  src/mesa/state_tracker/st_cb_clear.c | 9 -
>>>  1 file changed, 9 deletions(-)
>>>
>>> diff --git a/src/mesa/state_tracker/st_cb_clear.c 
>>> b/src/mesa/state_tracker/st_cb_clear.c
>>> index 137fac8..1e404a2 100644
>>> --- a/src/mesa/state_tracker/st_cb_clear.c
>>> +++ b/src/mesa/state_tracker/st_cb_clear.c
>>> @@ -515,15 +515,6 @@ st_Clear(struct gl_context *ctx, GLbitfield mask)
>>>}
>>> }
>>>
>>> -   /* Always clear depth and stencil together.
>>> -* This can only happen when the stencil writemask is not a full mask.
>>> -*/
>>> -   if (quad_buffers & PIPE_CLEAR_DEPTHSTENCIL &&
>>> -   clear_buffers & PIPE_CLEAR_DEPTHSTENCIL) {
>>> -  quad_buffers |= clear_buffers & PIPE_CLEAR_DEPTHSTENCIL;
>>> -  clear_buffers &= ~PIPE_CLEAR_DEPTHSTENCIL;
>>> -   }
>>> -
>>> /* Only use quad-based clearing for the renderbuffers which cannot
>>>  * use pipe->clear. We want to always use pipe->clear for the other
>>>  * renderbuffers, because it's likely to be faster.
>>>
>>
>> ___
>> mesa-dev mailing list
>> mesa-dev@lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
>>
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] st/mesa: don't try to clear depth and stencil together in glClear fallback

2015-07-31 Thread Ilia Mirkin
Don't know the situation on other hardware, but at least nvidia
supports both scissors and stencil for its "fast" clear (it's fast at
least in terms of the number of commands submitted and lack of state
changes, no comment on actual execution speed).

I was thinking of adding a few caps for it.

On Fri, Jul 31, 2015 at 2:30 PM, Marek Olšák  wrote:
> Indeed, it is rare. I thought this was hit more often, but apparently
> not. Nevermind.
>
> Marek
>
> On Fri, Jul 31, 2015 at 7:44 PM, Roland Scheidegger  
> wrote:
>> Actually, since the code says it can only happen with a non-full stencil
>> mask, isn't clearing depth/stencil with a non-full stencil mask
>> incredibly rare?
>>
>> Roland
>>
>> Am 31.07.2015 um 18:51 schrieb Roland Scheidegger:
>>> I don't think that's quite true in general.
>>> For gpus which have combined ds buffers I can'see why you'd wanted to do
>>> separate clears for depth and stencil in this case (i.e. doing
>>> pipe->clear for depth, then draw a quad for clearing stencil).
>>> At least for "simple" hw like llvmpipe which don't have special depth
>>> clear, this clearly seems to be much worse (you have to go through the
>>> memory twice).
>>>
>>> I vaguely remember something like this being proposed some time ago with
>>> some discussion that not the same thing is optimal depending on the
>>> hw... I don't think though there's anything at the moment where you
>>> could figure out what is better.
>>>
>>> Roland
>>>
>>>
>>>
>>> Am 31.07.2015 um 17:15 schrieb Marek Olšák:
 From: Marek Olšák 

 A lot of GPUs allocate separate depth and stencil buffers, so clearing them
 together doesn't make much sense. If some GPUs don't allocate separate
 depth & stencil, it's still beneficial to clear the HiZ / HiS information
 for only one of the two.
 ---
  src/mesa/state_tracker/st_cb_clear.c | 9 -
  1 file changed, 9 deletions(-)

 diff --git a/src/mesa/state_tracker/st_cb_clear.c 
 b/src/mesa/state_tracker/st_cb_clear.c
 index 137fac8..1e404a2 100644
 --- a/src/mesa/state_tracker/st_cb_clear.c
 +++ b/src/mesa/state_tracker/st_cb_clear.c
 @@ -515,15 +515,6 @@ st_Clear(struct gl_context *ctx, GLbitfield mask)
}
 }

 -   /* Always clear depth and stencil together.
 -* This can only happen when the stencil writemask is not a full mask.
 -*/
 -   if (quad_buffers & PIPE_CLEAR_DEPTHSTENCIL &&
 -   clear_buffers & PIPE_CLEAR_DEPTHSTENCIL) {
 -  quad_buffers |= clear_buffers & PIPE_CLEAR_DEPTHSTENCIL;
 -  clear_buffers &= ~PIPE_CLEAR_DEPTHSTENCIL;
 -   }
 -
 /* Only use quad-based clearing for the renderbuffers which cannot
  * use pipe->clear. We want to always use pipe->clear for the other
  * renderbuffers, because it's likely to be faster.

>>>
>>> ___
>>> mesa-dev mailing list
>>> mesa-dev@lists.freedesktop.org
>>> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
>>>
>>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] st/mesa: don't try to clear depth and stencil together in glClear fallback

2015-07-31 Thread Roland Scheidegger
Though arguably because it should be so rare, it shouldn't really matter
much if it is removed neither for these drivers which want that...

Roland


Am 31.07.2015 um 20:30 schrieb Marek Olšák:
> Indeed, it is rare. I thought this was hit more often, but apparently
> not. Nevermind.
> 
> Marek
> 
> On Fri, Jul 31, 2015 at 7:44 PM, Roland Scheidegger  
> wrote:
>> Actually, since the code says it can only happen with a non-full stencil
>> mask, isn't clearing depth/stencil with a non-full stencil mask
>> incredibly rare?
>>
>> Roland
>>
>> Am 31.07.2015 um 18:51 schrieb Roland Scheidegger:
>>> I don't think that's quite true in general.
>>> For gpus which have combined ds buffers I can'see why you'd wanted to do
>>> separate clears for depth and stencil in this case (i.e. doing
>>> pipe->clear for depth, then draw a quad for clearing stencil).
>>> At least for "simple" hw like llvmpipe which don't have special depth
>>> clear, this clearly seems to be much worse (you have to go through the
>>> memory twice).
>>>
>>> I vaguely remember something like this being proposed some time ago with
>>> some discussion that not the same thing is optimal depending on the
>>> hw... I don't think though there's anything at the moment where you
>>> could figure out what is better.
>>>
>>> Roland
>>>
>>>
>>>
>>> Am 31.07.2015 um 17:15 schrieb Marek Olšák:
 From: Marek Olšák 

 A lot of GPUs allocate separate depth and stencil buffers, so clearing them
 together doesn't make much sense. If some GPUs don't allocate separate
 depth & stencil, it's still beneficial to clear the HiZ / HiS information
 for only one of the two.
 ---
  src/mesa/state_tracker/st_cb_clear.c | 9 -
  1 file changed, 9 deletions(-)

 diff --git a/src/mesa/state_tracker/st_cb_clear.c 
 b/src/mesa/state_tracker/st_cb_clear.c
 index 137fac8..1e404a2 100644
 --- a/src/mesa/state_tracker/st_cb_clear.c
 +++ b/src/mesa/state_tracker/st_cb_clear.c
 @@ -515,15 +515,6 @@ st_Clear(struct gl_context *ctx, GLbitfield mask)
}
 }

 -   /* Always clear depth and stencil together.
 -* This can only happen when the stencil writemask is not a full mask.
 -*/
 -   if (quad_buffers & PIPE_CLEAR_DEPTHSTENCIL &&
 -   clear_buffers & PIPE_CLEAR_DEPTHSTENCIL) {
 -  quad_buffers |= clear_buffers & PIPE_CLEAR_DEPTHSTENCIL;
 -  clear_buffers &= ~PIPE_CLEAR_DEPTHSTENCIL;
 -   }
 -
 /* Only use quad-based clearing for the renderbuffers which cannot
  * use pipe->clear. We want to always use pipe->clear for the other
  * renderbuffers, because it's likely to be faster.

>>>
>>> ___
>>> mesa-dev mailing list
>>> mesa-dev@lists.freedesktop.org
>>> https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.freedesktop.org_mailman_listinfo_mesa-2Ddev&d=BQIFaQ&c=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEs&r=Vjtt0vs_iqoI31UfJxBl7yv9I2FeiaeAYgMTLKRBc_I&m=qYlUiAdaROHYyjpCbp7x8-tzMXUkloq-8rnMW32rdDU&s=VIPdq-vACi98MlXh8bojT2SKc6BOBI1Yy_yXVxCAAcA&e=
>>>  
>>>
>>

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] st/mesa: don't try to clear depth and stencil together in glClear fallback

2015-07-31 Thread Marek Olšák
I wouldn't mind moving scissored clears to drivers. u_blitter can do
it in the same way after the support is added and drivers will have
more control over the states and how they are saved and restored. The
catch is the person who will do that will also have to fix it for all
drivers.

Marek

On Fri, Jul 31, 2015 at 8:40 PM, Ilia Mirkin  wrote:
> Don't know the situation on other hardware, but at least nvidia
> supports both scissors and stencil for its "fast" clear (it's fast at
> least in terms of the number of commands submitted and lack of state
> changes, no comment on actual execution speed).
>
> I was thinking of adding a few caps for it.
>
> On Fri, Jul 31, 2015 at 2:30 PM, Marek Olšák  wrote:
>> Indeed, it is rare. I thought this was hit more often, but apparently
>> not. Nevermind.
>>
>> Marek
>>
>> On Fri, Jul 31, 2015 at 7:44 PM, Roland Scheidegger  
>> wrote:
>>> Actually, since the code says it can only happen with a non-full stencil
>>> mask, isn't clearing depth/stencil with a non-full stencil mask
>>> incredibly rare?
>>>
>>> Roland
>>>
>>> Am 31.07.2015 um 18:51 schrieb Roland Scheidegger:
 I don't think that's quite true in general.
 For gpus which have combined ds buffers I can'see why you'd wanted to do
 separate clears for depth and stencil in this case (i.e. doing
 pipe->clear for depth, then draw a quad for clearing stencil).
 At least for "simple" hw like llvmpipe which don't have special depth
 clear, this clearly seems to be much worse (you have to go through the
 memory twice).

 I vaguely remember something like this being proposed some time ago with
 some discussion that not the same thing is optimal depending on the
 hw... I don't think though there's anything at the moment where you
 could figure out what is better.

 Roland



 Am 31.07.2015 um 17:15 schrieb Marek Olšák:
> From: Marek Olšák 
>
> A lot of GPUs allocate separate depth and stencil buffers, so clearing 
> them
> together doesn't make much sense. If some GPUs don't allocate separate
> depth & stencil, it's still beneficial to clear the HiZ / HiS information
> for only one of the two.
> ---
>  src/mesa/state_tracker/st_cb_clear.c | 9 -
>  1 file changed, 9 deletions(-)
>
> diff --git a/src/mesa/state_tracker/st_cb_clear.c 
> b/src/mesa/state_tracker/st_cb_clear.c
> index 137fac8..1e404a2 100644
> --- a/src/mesa/state_tracker/st_cb_clear.c
> +++ b/src/mesa/state_tracker/st_cb_clear.c
> @@ -515,15 +515,6 @@ st_Clear(struct gl_context *ctx, GLbitfield mask)
>}
> }
>
> -   /* Always clear depth and stencil together.
> -* This can only happen when the stencil writemask is not a full mask.
> -*/
> -   if (quad_buffers & PIPE_CLEAR_DEPTHSTENCIL &&
> -   clear_buffers & PIPE_CLEAR_DEPTHSTENCIL) {
> -  quad_buffers |= clear_buffers & PIPE_CLEAR_DEPTHSTENCIL;
> -  clear_buffers &= ~PIPE_CLEAR_DEPTHSTENCIL;
> -   }
> -
> /* Only use quad-based clearing for the renderbuffers which cannot
>  * use pipe->clear. We want to always use pipe->clear for the other
>  * renderbuffers, because it's likely to be faster.
>

 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev

>>>
>> ___
>> mesa-dev mailing list
>> mesa-dev@lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] st/mesa: don't try to clear depth and stencil together in glClear fallback

2015-07-31 Thread Ilia Mirkin
Unless that person sticks it behind a pipe cap :) Which is how *this*
person was planning on doing it... nouveau doesn't use u_blitter by
the way.

  -ilia

On Fri, Jul 31, 2015 at 4:00 PM, Marek Olšák  wrote:
> I wouldn't mind moving scissored clears to drivers. u_blitter can do
> it in the same way after the support is added and drivers will have
> more control over the states and how they are saved and restored. The
> catch is the person who will do that will also have to fix it for all
> drivers.
>
> Marek
>
> On Fri, Jul 31, 2015 at 8:40 PM, Ilia Mirkin  wrote:
>> Don't know the situation on other hardware, but at least nvidia
>> supports both scissors and stencil for its "fast" clear (it's fast at
>> least in terms of the number of commands submitted and lack of state
>> changes, no comment on actual execution speed).
>>
>> I was thinking of adding a few caps for it.
>>
>> On Fri, Jul 31, 2015 at 2:30 PM, Marek Olšák  wrote:
>>> Indeed, it is rare. I thought this was hit more often, but apparently
>>> not. Nevermind.
>>>
>>> Marek
>>>
>>> On Fri, Jul 31, 2015 at 7:44 PM, Roland Scheidegger  
>>> wrote:
 Actually, since the code says it can only happen with a non-full stencil
 mask, isn't clearing depth/stencil with a non-full stencil mask
 incredibly rare?

 Roland

 Am 31.07.2015 um 18:51 schrieb Roland Scheidegger:
> I don't think that's quite true in general.
> For gpus which have combined ds buffers I can'see why you'd wanted to do
> separate clears for depth and stencil in this case (i.e. doing
> pipe->clear for depth, then draw a quad for clearing stencil).
> At least for "simple" hw like llvmpipe which don't have special depth
> clear, this clearly seems to be much worse (you have to go through the
> memory twice).
>
> I vaguely remember something like this being proposed some time ago with
> some discussion that not the same thing is optimal depending on the
> hw... I don't think though there's anything at the moment where you
> could figure out what is better.
>
> Roland
>
>
>
> Am 31.07.2015 um 17:15 schrieb Marek Olšák:
>> From: Marek Olšák 
>>
>> A lot of GPUs allocate separate depth and stencil buffers, so clearing 
>> them
>> together doesn't make much sense. If some GPUs don't allocate separate
>> depth & stencil, it's still beneficial to clear the HiZ / HiS information
>> for only one of the two.
>> ---
>>  src/mesa/state_tracker/st_cb_clear.c | 9 -
>>  1 file changed, 9 deletions(-)
>>
>> diff --git a/src/mesa/state_tracker/st_cb_clear.c 
>> b/src/mesa/state_tracker/st_cb_clear.c
>> index 137fac8..1e404a2 100644
>> --- a/src/mesa/state_tracker/st_cb_clear.c
>> +++ b/src/mesa/state_tracker/st_cb_clear.c
>> @@ -515,15 +515,6 @@ st_Clear(struct gl_context *ctx, GLbitfield mask)
>>}
>> }
>>
>> -   /* Always clear depth and stencil together.
>> -* This can only happen when the stencil writemask is not a full 
>> mask.
>> -*/
>> -   if (quad_buffers & PIPE_CLEAR_DEPTHSTENCIL &&
>> -   clear_buffers & PIPE_CLEAR_DEPTHSTENCIL) {
>> -  quad_buffers |= clear_buffers & PIPE_CLEAR_DEPTHSTENCIL;
>> -  clear_buffers &= ~PIPE_CLEAR_DEPTHSTENCIL;
>> -   }
>> -
>> /* Only use quad-based clearing for the renderbuffers which cannot
>>  * use pipe->clear. We want to always use pipe->clear for the other
>>  * renderbuffers, because it's likely to be faster.
>>
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
>

>>> ___
>>> mesa-dev mailing list
>>> mesa-dev@lists.freedesktop.org
>>> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] st/mesa: don't try to clear depth and stencil together in glClear fallback

2015-07-31 Thread Roland Scheidegger
You could use pipe clear_render_target and clear_depth_stencil for that.

Roland

Am 31.07.2015 um 22:00 schrieb Marek Olšák:
> I wouldn't mind moving scissored clears to drivers. u_blitter can do
> it in the same way after the support is added and drivers will have
> more control over the states and how they are saved and restored. The
> catch is the person who will do that will also have to fix it for all
> drivers.
> 
> Marek
> 
> On Fri, Jul 31, 2015 at 8:40 PM, Ilia Mirkin  wrote:
>> Don't know the situation on other hardware, but at least nvidia
>> supports both scissors and stencil for its "fast" clear (it's fast at
>> least in terms of the number of commands submitted and lack of state
>> changes, no comment on actual execution speed).
>>
>> I was thinking of adding a few caps for it.
>>
>> On Fri, Jul 31, 2015 at 2:30 PM, Marek Olšák  wrote:
>>> Indeed, it is rare. I thought this was hit more often, but apparently
>>> not. Nevermind.
>>>
>>> Marek
>>>
>>> On Fri, Jul 31, 2015 at 7:44 PM, Roland Scheidegger  
>>> wrote:
 Actually, since the code says it can only happen with a non-full stencil
 mask, isn't clearing depth/stencil with a non-full stencil mask
 incredibly rare?

 Roland

 Am 31.07.2015 um 18:51 schrieb Roland Scheidegger:
> I don't think that's quite true in general.
> For gpus which have combined ds buffers I can'see why you'd wanted to do
> separate clears for depth and stencil in this case (i.e. doing
> pipe->clear for depth, then draw a quad for clearing stencil).
> At least for "simple" hw like llvmpipe which don't have special depth
> clear, this clearly seems to be much worse (you have to go through the
> memory twice).
>
> I vaguely remember something like this being proposed some time ago with
> some discussion that not the same thing is optimal depending on the
> hw... I don't think though there's anything at the moment where you
> could figure out what is better.
>
> Roland
>
>
>
> Am 31.07.2015 um 17:15 schrieb Marek Olšák:
>> From: Marek Olšák 
>>
>> A lot of GPUs allocate separate depth and stencil buffers, so clearing 
>> them
>> together doesn't make much sense. If some GPUs don't allocate separate
>> depth & stencil, it's still beneficial to clear the HiZ / HiS information
>> for only one of the two.
>> ---
>>  src/mesa/state_tracker/st_cb_clear.c | 9 -
>>  1 file changed, 9 deletions(-)
>>
>> diff --git a/src/mesa/state_tracker/st_cb_clear.c 
>> b/src/mesa/state_tracker/st_cb_clear.c
>> index 137fac8..1e404a2 100644
>> --- a/src/mesa/state_tracker/st_cb_clear.c
>> +++ b/src/mesa/state_tracker/st_cb_clear.c
>> @@ -515,15 +515,6 @@ st_Clear(struct gl_context *ctx, GLbitfield mask)
>>}
>> }
>>
>> -   /* Always clear depth and stencil together.
>> -* This can only happen when the stencil writemask is not a full 
>> mask.
>> -*/
>> -   if (quad_buffers & PIPE_CLEAR_DEPTHSTENCIL &&
>> -   clear_buffers & PIPE_CLEAR_DEPTHSTENCIL) {
>> -  quad_buffers |= clear_buffers & PIPE_CLEAR_DEPTHSTENCIL;
>> -  clear_buffers &= ~PIPE_CLEAR_DEPTHSTENCIL;
>> -   }
>> -
>> /* Only use quad-based clearing for the renderbuffers which cannot
>>  * use pipe->clear. We want to always use pipe->clear for the other
>>  * renderbuffers, because it's likely to be faster.
>>
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.freedesktop.org_mailman_listinfo_mesa-2Ddev&d=BQIFaQ&c=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEs&r=Vjtt0vs_iqoI31UfJxBl7yv9I2FeiaeAYgMTLKRBc_I&m=cfRW1YwcNlp92zw8x8jRjmiUHTf_l_PxJZlPVhAQEJM&s=tXQSp9kIbNDFyS4r00lTAftpw9d8oDN6xYLYxVRZTOM&e=
>  
>

>>> ___
>>> mesa-dev mailing list
>>> mesa-dev@lists.freedesktop.org
>>> https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.freedesktop.org_mailman_listinfo_mesa-2Ddev&d=BQIFaQ&c=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEs&r=Vjtt0vs_iqoI31UfJxBl7yv9I2FeiaeAYgMTLKRBc_I&m=cfRW1YwcNlp92zw8x8jRjmiUHTf_l_PxJZlPVhAQEJM&s=tXQSp9kIbNDFyS4r00lTAftpw9d8oDN6xYLYxVRZTOM&e=
>>>  

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] List of unsupported extensions per driver

2015-07-31 Thread Ilia Mirkin
OK, I believe I've fixed my list up. Note that you may have to
shift-reload to get the updates, I think fd.o isn't setting the proper
cache headers or something else is messed up.

On Wed, Jul 29, 2015 at 5:50 PM, Marek Olšák  wrote:
> R600/R700 can also do:
> - ARB_conditional_render_inverted
> - ARB_cull_distance
>
> Marek
>
> On Wed, Jul 29, 2015 at 11:46 PM, Marek Olšák  wrote:
>> Hi Ilia,
>>
>> R600/R700:
>> - can't do ARB_shader_storage_buffer_object
>> - can do ARB_robust_buffer_access_behavior (I think this one can
>> really be exposed unconditionally on all GL>=3 hardware)
>> - can do ARB_framebuffer_no_attachments
>> - ARB_query_buffer_object can be emulated with a vertex shader to read
>> the results, sum them up, and write them to the query buffer using
>> transform feedback.
>>
>> Marek
>>
>> On Wed, Jul 29, 2015 at 3:12 PM, Ilia Mirkin  wrote:
>>> I think the reality is that all the drivers you're listing are
>>> GL4.5-capable except nv50 (all but nv50 drive, in part, DX11/DX12
>>> hardware). Use my GT21x list to figure out which exts nv50 will just
>>> never be able to do.
>>>
>>> On Wed, Jul 29, 2015 at 9:09 AM, Romain Failliot
>>>  wrote:
 I'll have a look at that. I'll try to see which driver is for which
 GPU and then make a hardcoded list too.

 Thanks Ilia!

 2015-07-29 4:25 GMT-04:00 Ilia Mirkin :
> Feel free to lift the info from my glxinfo page...
>
> http://people.freedesktop.org/~imirkin/glxinfo/glxinfo.html
>
> Note the grayed out bits... it's just a hardcoded list, search for
> "UNSUPPORTED" in
> http://people.freedesktop.org/~imirkin/glxinfo/glxinfo.js . Note that
> it's per hardware group, not per driver (which is a fundamental
> problem with the approach mesamatrix has taken). For NVIDIA, I filled
> the list in myself, for radeon, I think GlennK provided it. I asked
> the intel guys a while back, but they didn't seem interested in
> providing the info for their HW (and I didn't push for it).
>
> Hope this helps,
>
>   -ilia
>
> On Wed, Jul 29, 2015 at 12:35 AM, Romain Failliot
>  wrote:
>> Hi!
>>
>> First, I wanted to thank you for the incredible work you've done to
>> achieve OpenGL 4.1. The number of visitors exploded on mesamatrix,
>> while the average was around 100 visits per day, I had more than 2500
>> visits just for the day of the news! This simple fact shows how much
>> this news was expected by the community ;)
>>
>> Anyhow, I'd like to improve the information displayed on the site by
>> graying the cells of the drivers that can't support some specific
>> extensions (if it's still the case since the old r300 and swrast have
>> been removed). But for that I'd need a list of pairs
>> "extension-driver" that shows which extension is invalid for which
>> driver.
>>
>> Do you know that? Or do you know where I could find this information?
>>
>> Thanks a lot!
>> Romain
>> ___
>> mesa-dev mailing list
>> mesa-dev@lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
>>> ___
>>> mesa-dev mailing list
>>> mesa-dev@lists.freedesktop.org
>>> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] st/mesa: don't try to clear depth and stencil together in glClear fallback

2015-07-31 Thread Marek Olšák
Not really. Those two don't use the current framebuffer state, which
makes them not nice to drivers.

The only difference for us would be that radeonsi wouldn't have to
touch and restore some states, and I think the depth could be cleared
fast if the scissor was aligned to 8x8.

Marek

On Fri, Jul 31, 2015 at 10:30 PM, Roland Scheidegger  wrote:
> You could use pipe clear_render_target and clear_depth_stencil for that.
>
> Roland
>
> Am 31.07.2015 um 22:00 schrieb Marek Olšák:
>> I wouldn't mind moving scissored clears to drivers. u_blitter can do
>> it in the same way after the support is added and drivers will have
>> more control over the states and how they are saved and restored. The
>> catch is the person who will do that will also have to fix it for all
>> drivers.
>>
>> Marek
>>
>> On Fri, Jul 31, 2015 at 8:40 PM, Ilia Mirkin  wrote:
>>> Don't know the situation on other hardware, but at least nvidia
>>> supports both scissors and stencil for its "fast" clear (it's fast at
>>> least in terms of the number of commands submitted and lack of state
>>> changes, no comment on actual execution speed).
>>>
>>> I was thinking of adding a few caps for it.
>>>
>>> On Fri, Jul 31, 2015 at 2:30 PM, Marek Olšák  wrote:
 Indeed, it is rare. I thought this was hit more often, but apparently
 not. Nevermind.

 Marek

 On Fri, Jul 31, 2015 at 7:44 PM, Roland Scheidegger  
 wrote:
> Actually, since the code says it can only happen with a non-full stencil
> mask, isn't clearing depth/stencil with a non-full stencil mask
> incredibly rare?
>
> Roland
>
> Am 31.07.2015 um 18:51 schrieb Roland Scheidegger:
>> I don't think that's quite true in general.
>> For gpus which have combined ds buffers I can'see why you'd wanted to do
>> separate clears for depth and stencil in this case (i.e. doing
>> pipe->clear for depth, then draw a quad for clearing stencil).
>> At least for "simple" hw like llvmpipe which don't have special depth
>> clear, this clearly seems to be much worse (you have to go through the
>> memory twice).
>>
>> I vaguely remember something like this being proposed some time ago with
>> some discussion that not the same thing is optimal depending on the
>> hw... I don't think though there's anything at the moment where you
>> could figure out what is better.
>>
>> Roland
>>
>>
>>
>> Am 31.07.2015 um 17:15 schrieb Marek Olšák:
>>> From: Marek Olšák 
>>>
>>> A lot of GPUs allocate separate depth and stencil buffers, so clearing 
>>> them
>>> together doesn't make much sense. If some GPUs don't allocate separate
>>> depth & stencil, it's still beneficial to clear the HiZ / HiS 
>>> information
>>> for only one of the two.
>>> ---
>>>  src/mesa/state_tracker/st_cb_clear.c | 9 -
>>>  1 file changed, 9 deletions(-)
>>>
>>> diff --git a/src/mesa/state_tracker/st_cb_clear.c 
>>> b/src/mesa/state_tracker/st_cb_clear.c
>>> index 137fac8..1e404a2 100644
>>> --- a/src/mesa/state_tracker/st_cb_clear.c
>>> +++ b/src/mesa/state_tracker/st_cb_clear.c
>>> @@ -515,15 +515,6 @@ st_Clear(struct gl_context *ctx, GLbitfield mask)
>>>}
>>> }
>>>
>>> -   /* Always clear depth and stencil together.
>>> -* This can only happen when the stencil writemask is not a full 
>>> mask.
>>> -*/
>>> -   if (quad_buffers & PIPE_CLEAR_DEPTHSTENCIL &&
>>> -   clear_buffers & PIPE_CLEAR_DEPTHSTENCIL) {
>>> -  quad_buffers |= clear_buffers & PIPE_CLEAR_DEPTHSTENCIL;
>>> -  clear_buffers &= ~PIPE_CLEAR_DEPTHSTENCIL;
>>> -   }
>>> -
>>> /* Only use quad-based clearing for the renderbuffers which cannot
>>>  * use pipe->clear. We want to always use pipe->clear for the other
>>>  * renderbuffers, because it's likely to be faster.
>>>
>>
>> ___
>> mesa-dev mailing list
>> mesa-dev@lists.freedesktop.org
>> https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.freedesktop.org_mailman_listinfo_mesa-2Ddev&d=BQIFaQ&c=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEs&r=Vjtt0vs_iqoI31UfJxBl7yv9I2FeiaeAYgMTLKRBc_I&m=cfRW1YwcNlp92zw8x8jRjmiUHTf_l_PxJZlPVhAQEJM&s=tXQSp9kIbNDFyS4r00lTAftpw9d8oDN6xYLYxVRZTOM&e=
>>
>
 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.freedesktop.org_mailman_listinfo_mesa-2Ddev&d=BQIFaQ&c=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEs&r=Vjtt0vs_iqoI31UfJxBl7yv9I2FeiaeAYgMTLKRBc_I&m=cfRW1YwcNlp92zw8x8jRjmiUHTf_l_PxJZlPVhAQEJM&s=tXQSp9kIbNDFyS4r00lTAftpw9d8oDN6xYLYxVRZTOM&e=
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] RadeonSI 10.6 candidates

2015-07-31 Thread Marek Olšák
Hi Emil,

I'm nominating the attached patches for 10.6. They are quite big, but
they fix a lot of graphical corruption on SI cards, especially in
games from Valve (Portal, CS:GO, etc). I've checked with piglit that
there are no regressions.

There was a comment on Phoronix forums that this might fix about a
half of radeonsi bugs in the bugzilla. (I sincerely hope that's true,
but I don't know)

Marek
From d4c55a4bc4f64cca22c5eb267d2e2dabaef5e67d Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Marek=20Ol=C5=A1=C3=A1k?= 
Date: Mon, 15 Sep 2014 23:34:28 +0200
Subject: [PATCH 1/2] radeonsi: rework how shader pointers to descriptors are
 set
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

For 10.6: This is a prerequisite for the next fix. The below comment is from
the original commit.

This is mainly needed for tessellation where a VS can be bound as VS, ES,
or LS, and TES (tess. evaluationshader) can be bound as VS or ES or neither.
Therefore we need the ability to move pointers to descriptors between
shaders arbitrarily.

The idea is that the context has a mapping from PIPE_SHADER_x to
SPI_SHADER_USER_DATA_x. After a shader is enabled or disabled,
si_shader_change_notify should be called to update this mapping accordingly.

There is a dirty flag for each shader pointer, but only one emit function
for all pointers in the whole context, whose code and logic is separated
from descriptors.

Reviewed-by: Michel Dänzer 
(cherry picked from commit 3ce91c727f2a00a05f414351266b0b45d677611e)
---
 src/gallium/drivers/radeonsi/si_descriptors.c   | 222 +++-
 src/gallium/drivers/radeonsi/si_pipe.h  |   5 +-
 src/gallium/drivers/radeonsi/si_state.h |  13 +-
 src/gallium/drivers/radeonsi/si_state_draw.c|   3 +-
 src/gallium/drivers/radeonsi/si_state_shaders.c |   4 +
 5 files changed, 156 insertions(+), 91 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_descriptors.c b/src/gallium/drivers/radeonsi/si_descriptors.c
index bbfd36d..f31cccb 100644
--- a/src/gallium/drivers/radeonsi/si_descriptors.c
+++ b/src/gallium/drivers/radeonsi/si_descriptors.c
@@ -141,7 +141,7 @@ static void si_emit_cp_dma_clear_buffer(struct si_context *sctx,
 
 static void si_init_descriptors(struct si_context *sctx,
 struct si_descriptors *desc,
-unsigned shader_userdata_reg,
+unsigned shader_userdata_index,
 unsigned element_dw_size,
 unsigned num_elements,
 void (*emit_func)(struct si_context *ctx, struct r600_atom *state))
@@ -150,7 +150,7 @@ static void si_init_descriptors(struct si_context *sctx,
 	assert(num_elements <= sizeof(desc->dirty_mask)*8);
 
 	desc->atom.emit = (void*)emit_func;
-	desc->shader_userdata_reg = shader_userdata_reg;
+	desc->shader_userdata_offset = shader_userdata_index * 4;
 	desc->element_dw_size = element_dw_size;
 	desc->num_elements = num_elements;
 	desc->context_size = num_elements * element_dw_size * 4;
@@ -181,14 +181,11 @@ static void si_update_descriptors(struct si_context *sctx,
 	if (desc->dirty_mask) {
 		desc->atom.num_dw =
 			7 + /* copy */
-			(4 + desc->element_dw_size) * util_bitcount64(desc->dirty_mask) + /* update */
-			4; /* pointer update */
-
-		if (desc->shader_userdata_reg >= R_00B130_SPI_SHADER_USER_DATA_VS_0 &&
-		desc->shader_userdata_reg < R_00B230_SPI_SHADER_USER_DATA_GS_0)
-			desc->atom.num_dw += 4; /* second pointer update */
+			(4 + desc->element_dw_size) * util_bitcount(desc->dirty_mask); /* update */
 
 		desc->atom.dirty = true;
+		desc->pointer_dirty = true;
+		sctx->shader_userdata.atom.dirty = true;
 
 		/* TODO: Investigate if these flushes can be removed after
 		 * adding CE support. */
@@ -206,32 +203,6 @@ static void si_update_descriptors(struct si_context *sctx,
 	}
 }
 
-static void si_emit_shader_pointer(struct si_context *sctx,
-   struct r600_atom *atom)
-{
-	struct si_descriptors *desc = (struct si_descriptors*)atom;
-	struct radeon_winsys_cs *cs = sctx->b.rings.gfx.cs;
-	uint64_t va = desc->buffer->gpu_address +
-		  desc->current_context_id * desc->context_size +
-		  desc->buffer_offset;
-
-	radeon_emit(cs, PKT3(PKT3_SET_SH_REG, 2, 0));
-	radeon_emit(cs, (desc->shader_userdata_reg - SI_SH_REG_OFFSET) >> 2);
-	radeon_emit(cs, va);
-	radeon_emit(cs, va >> 32);
-
-	if (desc->shader_userdata_reg >= R_00B130_SPI_SHADER_USER_DATA_VS_0 &&
-	desc->shader_userdata_reg < R_00B230_SPI_SHADER_USER_DATA_GS_0) {
-		radeon_emit(cs, PKT3(PKT3_SET_SH_REG, 2, 0));
-		radeon_emit(cs, (desc->shader_userdata_reg +
- (R_00B330_SPI_SHADER_USER_DATA_ES_0 -
-  R_00B130_SPI_SHADER_USER_DATA_VS_0) -
- SI_SH_REG_OFFSET) >> 2);
-		radeon_emit(cs, va);
-		radeon_emit(cs, va >> 32);
-	}
-}
-
 static void si_emit_descriptors(struct si_context *sctx,
 struct si_descriptors *desc,
 uint32_t **descriptors)
@@ -295,24 +266,6 @@ static void si_emit_descriptors(struct si_context *sctx,
 
 	desc->dirty_mask = 0;
 	desc->current_context_id = new_context_id;
-

[Mesa-dev] [PATCH 3/7] mesa: Replace F_TO_I() with _mesa_lroundevenf().

2015-07-31 Thread Matt Turner
I'm not sure what the true meaning of "The rounding mode may vary." is,
but it is the case that the IROUND() path rounds differently than the
other paths (and does it wrong, at that).

Like _mesa_roundeven{f,}(), just add an use _mesa_lroundeven{f,}() that
has known semantics.
---
 src/mesa/main/format_utils.h  |  5 +++--
 src/mesa/main/imports.h   | 28 
 src/mesa/main/macros.h| 11 ++-
 src/mesa/main/pack.c  |  4 ++--
 src/mesa/main/pixeltransfer.c | 11 ++-
 src/util/rounding.h   | 25 +
 6 files changed, 42 insertions(+), 42 deletions(-)

diff --git a/src/mesa/main/format_utils.h b/src/mesa/main/format_utils.h
index 7f500ec..00ec777 100644
--- a/src/mesa/main/format_utils.h
+++ b/src/mesa/main/format_utils.h
@@ -33,6 +33,7 @@
 
 #include "imports.h"
 #include "macros.h"
+#include "util/rounding.h"
 
 extern const mesa_array_format RGBA32_FLOAT;
 extern const mesa_array_format RGBA8_UBYTE;
@@ -84,7 +85,7 @@ _mesa_float_to_unorm(float x, unsigned dst_bits)
else if (x > 1.0f)
   return MAX_UINT(dst_bits);
else
-  return F_TO_I(x * MAX_UINT(dst_bits));
+  return _mesa_lroundevenf(x * MAX_UINT(dst_bits));
 }
 
 static inline unsigned
@@ -128,7 +129,7 @@ _mesa_float_to_snorm(float x, unsigned dst_bits)
else if (x > 1.0f)
   return MAX_INT(dst_bits);
else
-  return F_TO_I(x * MAX_INT(dst_bits));
+  return _mesa_lroundevenf(x * MAX_INT(dst_bits));
 }
 
 static inline int
diff --git a/src/mesa/main/imports.h b/src/mesa/main/imports.h
index 9ffe3de..d61279a 100644
--- a/src/mesa/main/imports.h
+++ b/src/mesa/main/imports.h
@@ -170,34 +170,6 @@ static inline int IROUND_POS(float f)
return (int) (f + 0.5F);
 }
 
-#ifdef __x86_64__
-#  include 
-#endif
-
-/**
- * Convert float to int using a fast method.  The rounding mode may vary.
- */
-static inline int F_TO_I(float f)
-{
-#if defined(USE_X86_ASM) && defined(__GNUC__) && defined(__i386__)
-   int r;
-   __asm__ ("fistpl %0" : "=m" (r) : "t" (f) : "st");
-   return r;
-#elif defined(USE_X86_ASM) && defined(_MSC_VER)
-   int r;
-   _asm {
-fld f
-fistp r
-   }
-   return r;
-#elif defined(__x86_64__)
-   return _mm_cvt_ss2si(_mm_load_ss(&f));
-#else
-   return IROUND(f);
-#endif
-}
-
-
 /** Return (as an integer) floor of float */
 static inline int IFLOOR(float f)
 {
diff --git a/src/mesa/main/macros.h b/src/mesa/main/macros.h
index 07919a6..54df50c 100644
--- a/src/mesa/main/macros.h
+++ b/src/mesa/main/macros.h
@@ -33,6 +33,7 @@
 
 #include "util/macros.h"
 #include "util/u_math.h"
+#include "util/rounding.h"
 #include "imports.h"
 
 
@@ -131,12 +132,12 @@ extern GLfloat _mesa_ubyte_to_float_color_tab[256];
 #define INT_TO_USHORT(i)   ((i) < 0 ? 0 : ((GLushort) ((i) >> 15)))
 #define UINT_TO_USHORT(i)  ((i) < 0 ? 0 : ((GLushort) ((i) >> 16)))
 #define UNCLAMPED_FLOAT_TO_USHORT(us, f)  \
-us = ( (GLushort) F_TO_I( CLAMP((f), 0.0F, 1.0F) * 65535.0F) )
+us = ( (GLushort) _mesa_lroundevenf( CLAMP((f), 0.0F, 1.0F) * 
65535.0F) )
 #define CLAMPED_FLOAT_TO_USHORT(us, f)  \
-us = ( (GLushort) F_TO_I( (f) * 65535.0F) )
+us = ( (GLushort) _mesa_lroundevenf( (f) * 65535.0F) )
 
 #define UNCLAMPED_FLOAT_TO_SHORT(s, f)  \
-s = ( (GLshort) F_TO_I( CLAMP((f), -1.0F, 1.0F) * 32767.0F) )
+s = ( (GLshort) _mesa_lroundevenf( CLAMP((f), -1.0F, 1.0F) * 32767.0F) 
)
 
 /***
  *** UNCLAMPED_FLOAT_TO_UBYTE: clamp float to [0,1] and map to ubyte in [0,255]
@@ -167,9 +168,9 @@ extern GLfloat _mesa_ubyte_to_float_color_tab[256];
 } while (0)
 #else
 #define UNCLAMPED_FLOAT_TO_UBYTE(ub, f) \
-   ub = ((GLubyte) F_TO_I(CLAMP((f), 0.0F, 1.0F) * 255.0F))
+   ub = ((GLubyte) _mesa_lroundevenf(CLAMP((f), 0.0F, 1.0F) * 255.0F))
 #define CLAMPED_FLOAT_TO_UBYTE(ub, f) \
-   ub = ((GLubyte) F_TO_I((f) * 255.0F))
+   ub = ((GLubyte) _mesa_lroundevenf((f) * 255.0F))
 #endif
 
 static fi_type UINT_AS_UNION(GLuint u)
diff --git a/src/mesa/main/pack.c b/src/mesa/main/pack.c
index fb642b8..7147fd6 100644
--- a/src/mesa/main/pack.c
+++ b/src/mesa/main/pack.c
@@ -470,7 +470,7 @@ extract_uint_indexes(GLuint n, GLuint indexes[],
 static inline GLuint
 clamp_float_to_uint(GLfloat f)
 {
-   return f < 0.0F ? 0 : F_TO_I(f);
+   return f < 0.0F ? 0 : _mesa_lroundevenf(f);
 }
 
 
@@ -478,7 +478,7 @@ static inline GLuint
 clamp_half_to_uint(GLhalfARB h)
 {
GLfloat f = _mesa_half_to_float(h);
-   return f < 0.0F ? 0 : F_TO_I(f);
+   return f < 0.0F ? 0 : _mesa_lroundevenf(f);
 }
 
 
diff --git a/src/mesa/main/pixeltransfer.c b/src/mesa/main/pixeltransfer.c
index 51f2ebf..22eac00 100644
--- a/src/mesa/main/pixeltransfer.c
+++ b/src/mesa/main/pixeltransfer.c
@@ -35,6 +35,7 @@
 #include "pixeltransfer.h"
 #include "imports.h"
 #include "mtypes.h"
+#include "util/rounding.h"
 
 
 /*
@@ -94,10 +95,10 @@ _mesa_map_rgba( const struct gl_context *ctx, GLuint n, 
GLfloat rgba[][4] )
   GLflo

[Mesa-dev] [PATCH 1/7] mesa: Add -fno-math-errno to CFLAGS.

2015-07-31 Thread Matt Turner
Cuts about 9k of .text size.

   textdata bss dec hex filename
4992804  197808   26328 5216940  4f9aac i965_dri.so before
4983676  197808   26328 5207812  4f7704 i965_dri.so after

Also, Darwin's libm does not ever set errno, so if we care about those
systems we shouldn't rely on errno anyway.
---
 configure.ac | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/configure.ac b/configure.ac
index 480018a..15a9a3d 100644
--- a/configure.ac
+++ b/configure.ac
@@ -284,6 +284,9 @@ if test "x$GCC" = xyes; then
 # Work around aliasing bugs - developers should comment this out
 CFLAGS="$CFLAGS -fno-strict-aliasing"
 
+# We don't want floating-point math functions to set errno
+CFLAGS="$CFLAGS -fno-math-errno"
+
 # gcc's builtin memcmp is slower than glibc's
 # http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43052
 CFLAGS="$CFLAGS -fno-builtin-memcmp"
-- 
2.3.6

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 7/7] mesa: Replace uses of IROUND{, 64} with libm functions.

2015-07-31 Thread Matt Turner
lroundf is the most common replacement. I replaced uses of IROUND()
where there was a comment saying "rounded to nearest integer" with
_mesa_lroundevenf.

IROUND64 is replaced with llroundf.
---
 src/mesa/main/drawpix.c | 21 +++--
 src/mesa/main/eval.c| 14 +++---
 src/mesa/main/get.c | 28 ++--
 src/mesa/main/imports.h | 18 --
 src/mesa/main/light.c   |  8 
 src/mesa/main/pixel.c   |  2 +-
 src/mesa/main/pixelstore.c  |  3 ++-
 src/mesa/main/samplerobj.c  |  8 
 src/mesa/main/texparam.c|  8 
 src/mesa/main/uniform_query.cpp |  2 +-
 src/mesa/swrast/s_blit.c|  2 +-
 src/mesa/swrast/s_context.h |  2 +-
 12 files changed, 50 insertions(+), 66 deletions(-)

diff --git a/src/mesa/main/drawpix.c b/src/mesa/main/drawpix.c
index 720a082..025cf7e 100644
--- a/src/mesa/main/drawpix.c
+++ b/src/mesa/main/drawpix.c
@@ -22,6 +22,7 @@
  * OTHER DEALINGS IN THE SOFTWARE.
  */
 
+#include "c99_math.h"
 #include "glheader.h"
 #include "imports.h"
 #include "bufferobj.h"
@@ -51,14 +52,14 @@ _mesa_DrawPixels( GLsizei width, GLsizei height,
FLUSH_VERTICES(ctx, 0);
 
if (MESA_VERBOSE & VERBOSE_API)
-  _mesa_debug(ctx, "glDrawPixels(%d, %d, %s, %s, %p) // to %s at %d, %d\n",
+  _mesa_debug(ctx, "glDrawPixels(%d, %d, %s, %s, %p) // to %s at %ld, 
%ld\n",
   width, height,
   _mesa_enum_to_string(format),
   _mesa_enum_to_string(type),
   pixels,
   _mesa_enum_to_string(ctx->DrawBuffer->ColorDrawBuffer[0]),
-  IROUND(ctx->Current.RasterPos[0]),
-  IROUND(ctx->Current.RasterPos[1]));
+  lroundf(ctx->Current.RasterPos[0]),
+  lroundf(ctx->Current.RasterPos[1]));
 
 
if (width < 0 || height < 0) {
@@ -140,8 +141,8 @@ _mesa_DrawPixels( GLsizei width, GLsizei height,
if (ctx->RenderMode == GL_RENDER) {
   if (width > 0 && height > 0) {
  /* Round, to satisfy conformance tests (matches SGI's OpenGL) */
- GLint x = IROUND(ctx->Current.RasterPos[0]);
- GLint y = IROUND(ctx->Current.RasterPos[1]);
+ GLint x = lroundf(ctx->Current.RasterPos[0]);
+ GLint y = lroundf(ctx->Current.RasterPos[1]);
 
  if (_mesa_is_bufferobj(ctx->Unpack.BufferObj)) {
 /* unpack from PBO */
@@ -196,13 +197,13 @@ _mesa_CopyPixels( GLint srcx, GLint srcy, GLsizei width, 
GLsizei height,
 
if (MESA_VERBOSE & VERBOSE_API)
   _mesa_debug(ctx,
-  "glCopyPixels(%d, %d, %d, %d, %s) // from %s to %s at %d, 
%d\n",
+  "glCopyPixels(%d, %d, %d, %d, %s) // from %s to %s at %ld, 
%ld\n",
   srcx, srcy, width, height,
   _mesa_enum_to_string(type),
   _mesa_enum_to_string(ctx->ReadBuffer->ColorReadBuffer),
   _mesa_enum_to_string(ctx->DrawBuffer->ColorDrawBuffer[0]),
-  IROUND(ctx->Current.RasterPos[0]),
-  IROUND(ctx->Current.RasterPos[1]));
+  lroundf(ctx->Current.RasterPos[0]),
+  lroundf(ctx->Current.RasterPos[1]));
 
if (width < 0 || height < 0) {
   _mesa_error(ctx, GL_INVALID_VALUE, "glCopyPixels(width or height < 0)");
@@ -264,8 +265,8 @@ _mesa_CopyPixels( GLint srcx, GLint srcy, GLsizei width, 
GLsizei height,
if (ctx->RenderMode == GL_RENDER) {
   /* Round to satisfy conformance tests (matches SGI's OpenGL) */
   if (width > 0 && height > 0) {
- GLint destx = IROUND(ctx->Current.RasterPos[0]);
- GLint desty = IROUND(ctx->Current.RasterPos[1]);
+ GLint destx = lroundf(ctx->Current.RasterPos[0]);
+ GLint desty = lroundf(ctx->Current.RasterPos[1]);
  ctx->Driver.CopyPixels( ctx, srcx, srcy, width, height, destx, desty,
  type );
   }
diff --git a/src/mesa/main/eval.c b/src/mesa/main/eval.c
index 86c8f75..b21a90d 100644
--- a/src/mesa/main/eval.c
+++ b/src/mesa/main/eval.c
@@ -704,7 +704,7 @@ _mesa_GetnMapivARB( GLenum target, GLenum query, GLsizei 
bufSize, GLint *v )
 if (bufSize < numBytes)
goto overflow;
for (i=0;iu1);
-v[1] = IROUND(map1d->u2);
+v[0] = lroundf(map1d->u1);
+v[1] = lroundf(map1d->u2);
  }
  else {
 numBytes = 4 * sizeof *v;
 if (bufSize < numBytes)
goto overflow;
-v[0] = IROUND(map2d->u1);
-v[1] = IROUND(map2d->u2);
-v[2] = IROUND(map2d->v1);
-v[3] = IROUND(map2d->v2);
+v[0] = lroundf(map2d->u1);
+v[1] = lroundf(map2d->u2);
+v[2] = lroundf(map2d->v1);
+v[3] = lroundf(map2d->v2);
  }
  break;
   default:
diff --git a/src/mesa/main/get.c b/src/mesa/main/ge

[Mesa-dev] [PATCH 5/7] util: Use SSE intrinsics in _mesa_lroundeven{f, }.

2015-07-31 Thread Matt Turner
gcc actually generates this for us now that we use -fno-math-errno
(which is weird, since lrintf()/lrint() don't set errno) but clang still
does not. Presumably helps MSVC as well.

Reduced .text size by 8.5k with gcc before -fno-math-errno.

   text data  bss  dec  hex  filename
4935850   19513626192  5157178   4eb13a  i965_dri.so before
4927225   19512826192  5148545   4e8f81  i965_dri.so after
---
 src/util/rounding.h | 13 +
 1 file changed, 13 insertions(+)

diff --git a/src/util/rounding.h b/src/util/rounding.h
index 2d00760..e546c9f 100644
--- a/src/util/rounding.h
+++ b/src/util/rounding.h
@@ -26,6 +26,11 @@
 
 #include 
 
+#ifdef __x86_64__
+#include 
+#include 
+#endif
+
 #ifdef __SSE4_1__
 #include 
 #endif
@@ -87,7 +92,11 @@ _mesa_roundeven(double x)
 static inline long
 _mesa_lroundevenf(float x)
 {
+#ifdef __x86_64__
+   return _mm_cvtss_si64(_mm_load_ss(&x));
+#else
return lrintf(x);
+#endif
 }
 
 /**
@@ -97,7 +106,11 @@ _mesa_lroundevenf(float x)
 static inline long
 _mesa_lroundeven(double x)
 {
+#ifdef __x86_64__
+   return _mm_cvtsd_si64(_mm_load_sd(&x));
+#else
return lrint(x);
+#endif
 }
 
 #endif
-- 
2.3.6

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/7] mesa: Add -fno-trapping-math to CFLAGS.

2015-07-31 Thread Matt Turner
Cuts about 1k of .text size.

   textdata bss dec hex filename
4983676  197808   26328 5207812  4f7704 i965_dri.so before
4982522  197800   26328 5206650  4f727a i965_dri.so after
---
 configure.ac | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/configure.ac b/configure.ac
index 15a9a3d..65d195f 100644
--- a/configure.ac
+++ b/configure.ac
@@ -284,8 +284,8 @@ if test "x$GCC" = xyes; then
 # Work around aliasing bugs - developers should comment this out
 CFLAGS="$CFLAGS -fno-strict-aliasing"
 
-# We don't want floating-point math functions to set errno
-CFLAGS="$CFLAGS -fno-math-errno"
+# We don't want floating-point math functions to set errno or trap
+CFLAGS="$CFLAGS -fno-math-errno -fno-trapping-math"
 
 # gcc's builtin memcmp is slower than glibc's
 # http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43052
-- 
2.3.6

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 4/7] mesa: Use _mesa_lroundevenf() in some more places.

2015-07-31 Thread Matt Turner
---
 src/glsl/ir_constant_expression.cpp | 14 --
 src/mesa/main/imports.c |  4 ++--
 2 files changed, 6 insertions(+), 12 deletions(-)

diff --git a/src/glsl/ir_constant_expression.cpp 
b/src/glsl/ir_constant_expression.cpp
index 2853c16..309b6b7 100644
--- a/src/glsl/ir_constant_expression.cpp
+++ b/src/glsl/ir_constant_expression.cpp
@@ -230,12 +230,9 @@ pack_snorm_1x8(float x)
  *follows:
  *
  *  packSnorm4x8: round(clamp(c, -1, +1) * 127.0)
- *
- * We must first cast the float to an int, because casting a negative
- * float to a uint is undefined.
  */
-   return (uint8_t) (int)
-  _mesa_roundevenf(CLAMP(x, -1.0f, +1.0f) * 127.0f);
+   return (uint8_t)
+  _mesa_lroundevenf(CLAMP(x, -1.0f, +1.0f) * 127.0f);
 }
 
 /**
@@ -252,12 +249,9 @@ pack_snorm_1x16(float x)
  *follows:
  *
  *  packSnorm2x16: round(clamp(c, -1, +1) * 32767.0)
- *
- * We must first cast the float to an int, because casting a negative
- * float to a uint is undefined.
  */
-   return (uint16_t) (int)
-  _mesa_roundevenf(CLAMP(x, -1.0f, +1.0f) * 32767.0f);
+   return (uint16_t)
+  _mesa_lroundevenf(CLAMP(x, -1.0f, +1.0f) * 32767.0f);
 }
 
 /**
diff --git a/src/mesa/main/imports.c b/src/mesa/main/imports.c
index 68c7316..350e675 100644
--- a/src/mesa/main/imports.c
+++ b/src/mesa/main/imports.c
@@ -369,7 +369,7 @@ _mesa_float_to_half(float val)
   * or normal.
   */
  e = 0;
- m = (int) _mesa_roundevenf((1 << 24) * fabsf(fi.f));
+ m = _mesa_lroundevenf((1 << 24) * fabsf(fi.f));
   }
   else if (new_exp > 15) {
  /* map this value to infinity */
@@ -383,7 +383,7 @@ _mesa_float_to_half(float val)
   * either normal or infinite.
   */
  e = new_exp + 15;
- m = (int) _mesa_roundevenf(flt_m / (float) (1 << 13));
+ m = _mesa_lroundevenf(flt_m / (float) (1 << 13));
   }
}
 
-- 
2.3.6

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


  1   2   >