from:"Pohjolainen, Topi"

Re: [Mesa-dev] [PATCH V3 14/22] i965/gen9: Set vertical and horizontal surface alignments

2015-06-15 Thread Pohjolainen, Topi

On Tue, Jun 09, 2015 at 02:30:02PM -0700, Anuj Phogat wrote:
> On Tue, Jun 2, 2015 at 2:51 PM, Anuj Phogat  wrote:
> > Patch sets the alignments for texture and renderbuffer surfaces.
> >
> > V3: Make changes inside horizontal_alignment() and
> > vertical_alignment() (Topi)
> >
> > Signed-off-by: Anuj Phogat 
> > Cc: Topi Pohjolainen 
> > ---
> >  src/mesa/drivers/dri/i965/gen8_surface_state.c | 32 
> > +-
> >  1 file changed, 26 insertions(+), 6 deletions(-)
> >
> > diff --git a/src/mesa/drivers/dri/i965/gen8_surface_state.c 
> > b/src/mesa/drivers/dri/i965/gen8_surface_state.c
> > index bb0c464..62ed4e0 100644
> > --- a/src/mesa/drivers/dri/i965/gen8_surface_state.c
> > +++ b/src/mesa/drivers/dri/i965/gen8_surface_state.c
> > @@ -83,8 +83,18 @@ surface_tiling_mode(uint32_t tiling)
> >  }
> >
> >  static unsigned
> > -vertical_alignment(const struct intel_mipmap_tree *mt)
> > +vertical_alignment(const struct brw_context *brw,
> > +   const struct intel_mipmap_tree *mt,
> > +   uint32_t surf_type)
> >  {
> > +   /* On Gen9+ vertical alignment is ignored for 1D surfaces and when
> > +* tr_mode is not TRMODE_NONE.
> > +*/
> > +   if (brw->gen > 8 &&
> > +   (mt->tr_mode != INTEL_MIPTREE_TRMODE_NONE ||
> > +surf_type == BRW_SURFACE_1D))
> > +  return 0;
> > +
> > switch (mt->align_h) {
> > case 4:
> >return GEN8_SURFACE_VALIGN_4;
> > @@ -98,8 +108,18 @@ vertical_alignment(const struct intel_mipmap_tree *mt)
> >  }
> >
> >  static unsigned
> > -horizontal_alignment(const struct intel_mipmap_tree *mt)
> > +horizontal_alignment(const struct brw_context *brw,
> > + const struct intel_mipmap_tree *mt,
> > + uint32_t surf_type)
> >  {
> > +   /* On Gen9+ horizontal alignment is ignored when tr_mode is not
> > +* TRMODE_NONE.
> > +*/
> > +   if (brw->gen > 8 &&
> > +   (mt->tr_mode != INTEL_MIPTREE_TRMODE_NONE ||
> > +gen9_use_linear_1d_layout(brw, mt)))
> > +  return 0;
> > +
> > switch (mt->align_w) {
> > case 4:
> >return GEN8_SURFACE_HALIGN_4;
> > @@ -199,8 +219,8 @@ gen8_emit_texture_surface_state(struct brw_context *brw,
> >
> > surf[0] = SET_FIELD(surf_type, BRW_SURFACE_TYPE) |
> >   format << BRW_SURFACE_FORMAT_SHIFT |
> > - vertical_alignment(mt) |
> > - horizontal_alignment(mt) |
> > + vertical_alignment(brw, mt, surf_type) |
> > + horizontal_alignment(brw, mt, surf_type) |
> >   tiling_mode;
> >
> > if (surf_type == BRW_SURFACE_CUBE) {
> > @@ -416,8 +436,8 @@ gen8_update_renderbuffer_surface(struct brw_context 
> > *brw,
> > surf[0] = (surf_type << BRW_SURFACE_TYPE_SHIFT) |
> >   (is_array ? GEN7_SURFACE_IS_ARRAY : 0) |
> >   (format << BRW_SURFACE_FORMAT_SHIFT) |
> > - vertical_alignment(mt) |
> > - horizontal_alignment(mt) |
> > + vertical_alignment(brw, mt, surf_type) |
> > + horizontal_alignment(brw, mt, surf_type) |
> >   surface_tiling_mode(tiling);
> >
> > surf[1] = SET_FIELD(mocs, GEN8_SURFACE_MOCS) | mt->qpitch >> 2;
> > --
> > 1.9.3
> >
> 
> Topi, I'm pushing few of Yf/Ys patches upstream. If you don't have any
> more comments on this one, may i use your r-b ?

Sure, go ahead and push.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 1/2] i965: Split VUE map handling out of brw_vs.c into brw_vue_map.c.

2015-06-17 Thread Pohjolainen, Topi

On Wed, Jun 17, 2015 at 10:36:04PM -0700, Kenneth Graunke wrote:
> This was originally only used by the vertex shader, but it's now used by
> the geometry shader as well, and will also eventually be used for
> tessellation control and evaluation shaders.
> 
> I suspect it will be easier to find in a file named after the concept.

I like this, both patches are:

Reviewed-by: Topi Pohjolainen 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [RFC] i965: Don't consider uniform value locations in program uploads

2015-06-22 Thread Pohjolainen, Topi

On Thu, Jun 04, 2015 at 05:35:11PM -0700, Ben Widawsky wrote:
> On Wed, Jun 03, 2015 at 09:32:55PM +0300, Pohjolainen, Topi wrote:
> > On Wed, Jun 03, 2015 at 09:21:11PM +0300, Topi Pohjolainen wrote:
> > > Shader programs are cached per stage (FS, VS, GS) using the
> > > corresponding shader source identifier and compile time choices
> > > as key. However, one not only stores the program binary but
> > > a pair consisting of program binary and program data. The latter
> > > represents the store of constants (such as uniforms) used by
> > > the program.
> > > 
> > > However, when programs are searched in the cache for reloading
> > > only the program key representing the binary is considered
> > > (see for example, brw_upload_wm_prog() and brw_search_cache()).
> > > Hence, when programs are re-loaded from cache the first program
> > > binary, program data pair is extracted without considering if
> > > the program data matches the currently in use uniform storage
> > > as well.
> > > 
> > > My reasoning Why this actually works is because the key
> > > contains the identifier of the corresponding gl_program that
> > > represents the source code for the shader program. Hence,
> > > two programs having identical source code still have unique
> > > keys.
> > > And therefore brw_try_upload_using_copy() never encounters
> > > a case where a matching binary is found but the program data
> > > doesn't match.
> > 
> > In fact, thinking some more I think this is possible when the
> > same, say fragment shader, is used with two different vertex
> > shaders. This results into there being matching binaries but
> > program data pointing to different storage. Looking at
> > brw_upload_cache() I still can't see how failing
> > brw_try_upload_using_copy() makes a difference. We only upload
> > the program binary again (even though that is the part that
> > actually matches). And then proceed the same way regardless
> > of the result of brw_try_upload_using_copy(). The program data
> > gets augmented with the key.
> > 
> > But the point remains that when a program is reloaded through
> > the brw_search_cache() only the key (and not the program data)
> > is considered returning the first matching pair.
> > 
> > I probably need to write a piglit test for this.
> > 
> > > 
> > > My ultimate goal is to stop storing pointers to the individual
> > > components of a uniform but to store only a pointer to the
> > > "struct gl_uniform_storage" instead, and allow
> > > gen6_upload_push_constants() to iterate over individual
> > > components and array elements. This is needed to be able to
> > > convert 32-bits floats to fp16 - otherwise there is only
> > > pointer to 32-bits without knowing its type (int, float, etc)
> > > let alone its target precision.
> > > 
> > > No regression in jenkins. However, we talked about this with
> > > Ken and this doesn't really tell much as piglit doesn't really
> > > re-use shader sources during one execution.
> > > 
> > > Signed-off-by: Topi Pohjolainen 
> > > CC: Kenneth Graunke 
> > > CC: Tapani P\344lli 
> > > ---
> > >  src/mesa/drivers/dri/i965/brw_program.c | 6 --
> > >  1 file changed, 6 deletions(-)
> > > 
> > > diff --git a/src/mesa/drivers/dri/i965/brw_program.c 
> > > b/src/mesa/drivers/dri/i965/brw_program.c
> > > index e5c0d3c..7f5fde8 100644
> > > --- a/src/mesa/drivers/dri/i965/brw_program.c
> > > +++ b/src/mesa/drivers/dri/i965/brw_program.c
> > > @@ -576,12 +576,6 @@ brw_stage_prog_data_compare(const struct 
> > > brw_stage_prog_data *a,
> > > if (memcmp(a, b, offsetof(struct brw_stage_prog_data, param)))
> > >return false;
> > >  
> > > -   if (memcmp(a->param, b->param, a->nr_params * sizeof(void *)))
> > > -  return false;
> > > -
> > > -   if (memcmp(a->pull_param, b->pull_param, a->nr_pull_params * 
> > > sizeof(void *)))
> > > -  return false;
> > > -
> > > return true;
> > >  }
> > >  
> 
> I am looking at a lot of this code for the first time, and I have a kind of 
> wild
> guess.
> 
> The first time you upload a program, the program (kinda annoying that
> brw_upload_item_data doesn't seem to actually do that). Malloc a pointer (tmp,
> item->key), store the program and aux there. Set that

Re: [Mesa-dev] [RFC] i965: Don't consider uniform value locations in program uploads

2015-06-22 Thread Pohjolainen, Topi

On Mon, Jun 22, 2015 at 01:28:12PM +0300, Pohjolainen, Topi wrote:
> On Thu, Jun 04, 2015 at 05:35:11PM -0700, Ben Widawsky wrote:
> > On Wed, Jun 03, 2015 at 09:32:55PM +0300, Pohjolainen, Topi wrote:
> > > On Wed, Jun 03, 2015 at 09:21:11PM +0300, Topi Pohjolainen wrote:
> > > > Shader programs are cached per stage (FS, VS, GS) using the
> > > > corresponding shader source identifier and compile time choices
> > > > as key. However, one not only stores the program binary but
> > > > a pair consisting of program binary and program data. The latter
> > > > represents the store of constants (such as uniforms) used by
> > > > the program.
> > > > 
> > > > However, when programs are searched in the cache for reloading
> > > > only the program key representing the binary is considered
> > > > (see for example, brw_upload_wm_prog() and brw_search_cache()).
> > > > Hence, when programs are re-loaded from cache the first program
> > > > binary, program data pair is extracted without considering if
> > > > the program data matches the currently in use uniform storage
> > > > as well.
> > > > 
> > > > My reasoning Why this actually works is because the key
> > > > contains the identifier of the corresponding gl_program that
> > > > represents the source code for the shader program. Hence,
> > > > two programs having identical source code still have unique
> > > > keys.
> > > > And therefore brw_try_upload_using_copy() never encounters
> > > > a case where a matching binary is found but the program data
> > > > doesn't match.
> > > 
> > > In fact, thinking some more I think this is possible when the
> > > same, say fragment shader, is used with two different vertex
> > > shaders. This results into there being matching binaries but
> > > program data pointing to different storage. Looking at
> > > brw_upload_cache() I still can't see how failing
> > > brw_try_upload_using_copy() makes a difference. We only upload
> > > the program binary again (even though that is the part that
> > > actually matches). And then proceed the same way regardless
> > > of the result of brw_try_upload_using_copy(). The program data
> > > gets augmented with the key.
> > > 
> > > But the point remains that when a program is reloaded through
> > > the brw_search_cache() only the key (and not the program data)
> > > is considered returning the first matching pair.
> > > 
> > > I probably need to write a piglit test for this.
> > > 
> > > > 
> > > > My ultimate goal is to stop storing pointers to the individual
> > > > components of a uniform but to store only a pointer to the
> > > > "struct gl_uniform_storage" instead, and allow
> > > > gen6_upload_push_constants() to iterate over individual
> > > > components and array elements. This is needed to be able to
> > > > convert 32-bits floats to fp16 - otherwise there is only
> > > > pointer to 32-bits without knowing its type (int, float, etc)
> > > > let alone its target precision.
> > > > 
> > > > No regression in jenkins. However, we talked about this with
> > > > Ken and this doesn't really tell much as piglit doesn't really
> > > > re-use shader sources during one execution.
> > > > 
> > > > Signed-off-by: Topi Pohjolainen 
> > > > CC: Kenneth Graunke 
> > > > CC: Tapani P\344lli 
> > > > ---
> > > >  src/mesa/drivers/dri/i965/brw_program.c | 6 --
> > > >  1 file changed, 6 deletions(-)
> > > > 
> > > > diff --git a/src/mesa/drivers/dri/i965/brw_program.c 
> > > > b/src/mesa/drivers/dri/i965/brw_program.c
> > > > index e5c0d3c..7f5fde8 100644
> > > > --- a/src/mesa/drivers/dri/i965/brw_program.c
> > > > +++ b/src/mesa/drivers/dri/i965/brw_program.c
> > > > @@ -576,12 +576,6 @@ brw_stage_prog_data_compare(const struct 
> > > > brw_stage_prog_data *a,
> > > > if (memcmp(a, b, offsetof(struct brw_stage_prog_data, param)))
> > > >return false;
> > > >  
> > > > -   if (memcmp(a->param, b->param, a->nr_params * sizeof(void *)))
> > > > -  return false;
> > > > -
> > > > -   if (memcmp(a->pull_param, b->pull_param, a->nr_pull_params * 
>

Re: [Mesa-dev] [PATCH 06/17] i965/blorp: Explicitly set execution sizes for new'd instructions

2015-06-23 Thread Pohjolainen, Topi

On Thu, Jun 18, 2015 at 05:51:35PM -0700, Jason Ekstrand wrote:
> This doesn't affect instructions allocated using the builder.
> ---
>  src/mesa/drivers/dri/i965/brw_blorp_blit_eu.cpp | 9 +
>  1 file changed, 5 insertions(+), 4 deletions(-)

Reviewed-by: Topi Pohjolainen 

> 
> diff --git a/src/mesa/drivers/dri/i965/brw_blorp_blit_eu.cpp 
> b/src/mesa/drivers/dri/i965/brw_blorp_blit_eu.cpp
> index c1b7609..f655a0c 100644
> --- a/src/mesa/drivers/dri/i965/brw_blorp_blit_eu.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_blorp_blit_eu.cpp
> @@ -72,7 +72,7 @@ brw_blorp_eu_emitter::emit_kill_if_outside_rect(const 
> struct brw_reg &x,
> emit_cmp(BRW_CONDITIONAL_L, x, dst_x1)->predicate = BRW_PREDICATE_NORMAL;
> emit_cmp(BRW_CONDITIONAL_L, y, dst_y1)->predicate = BRW_PREDICATE_NORMAL;
>  
> -   fs_inst *inst = new (mem_ctx) fs_inst(BRW_OPCODE_AND, g1, f0, g1);
> +   fs_inst *inst = new (mem_ctx) fs_inst(BRW_OPCODE_AND, 16, g1, f0, g1);
> inst->force_writemask_all = true;
> insts.push_tail(inst);
>  }
> @@ -83,7 +83,7 @@ brw_blorp_eu_emitter::emit_texture_lookup(const struct 
> brw_reg &dst,
>unsigned base_mrf,
>unsigned msg_length)
>  {
> -   fs_inst *inst = new (mem_ctx) fs_inst(op, dst, brw_message_reg(base_mrf),
> +   fs_inst *inst = new (mem_ctx) fs_inst(op, 16, dst, 
> brw_message_reg(base_mrf),
>   fs_reg(0u));
>  
> inst->base_mrf = base_mrf;
> @@ -118,7 +118,8 @@ brw_blorp_eu_emitter::emit_combine(enum opcode 
> combine_opcode,
>  {
> assert(combine_opcode == BRW_OPCODE_ADD || combine_opcode == 
> BRW_OPCODE_AVG);
>  
> -   insts.push_tail(new (mem_ctx) fs_inst(combine_opcode, dst, src_1, src_2));
> +   insts.push_tail(new (mem_ctx) fs_inst(combine_opcode, 16, dst,
> + src_1, src_2));
>  }
>  
>  fs_inst *
> @@ -126,7 +127,7 @@ brw_blorp_eu_emitter::emit_cmp(enum brw_conditional_mod 
> op,
> const struct brw_reg &x,
> const struct brw_reg &y)
>  {
> -   fs_inst *cmp = new (mem_ctx) fs_inst(BRW_OPCODE_CMP,
> +   fs_inst *cmp = new (mem_ctx) fs_inst(BRW_OPCODE_CMP, 16,
>  vec16(brw_null_reg()), x, y);
> cmp->conditional_mod = op;
> insts.push_tail(cmp);
> -- 
> 2.4.3
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 07/17] i965/fs: Move offset() and half() to the fs_builder

2015-06-23 Thread Pohjolainen, Topi

On Thu, Jun 18, 2015 at 05:51:36PM -0700, Jason Ekstrand wrote:
> We want to move these into the builder so that they know the current
> builder's dispatch width.  This will be needed by a later commit.
> ---
>  src/mesa/drivers/dri/i965/brw_fs.cpp |  52 ++
>  src/mesa/drivers/dri/i965/brw_fs_builder.h   |  46 +
>  src/mesa/drivers/dri/i965/brw_fs_cse.cpp |   2 +-
>  src/mesa/drivers/dri/i965/brw_fs_nir.cpp |  60 +--
>  src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 149 
> ++-
>  src/mesa/drivers/dri/i965/brw_ir_fs.h|  51 -
>  6 files changed, 182 insertions(+), 178 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
> b/src/mesa/drivers/dri/i965/brw_fs.cpp
> index 4f98d63..c13ac7d 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
> @@ -267,7 +267,7 @@ fs_visitor::VARYING_PULL_CONSTANT_LOAD(const fs_builder 
> &bld,
>   inst->mlen = 1 + dispatch_width / 8;
> }
>  
> -   bld.MOV(dst, offset(vec4_result, (const_offset & 3) * scale));
> +   bld.MOV(dst, bld.offset(vec4_result, (const_offset & 3) * scale));
>  }
>  
>  /**
> @@ -361,7 +361,12 @@ fs_inst::is_copy_payload(const brw::simple_allocator 
> &grf_alloc) const
>reg.width = this->src[i].width;
>if (!this->src[i].equals(reg))
>   return false;
> -  reg = ::offset(reg, 1);
> +
> +  if (i < this->header_size) {
> + reg.reg_offset += 1;
> +  } else {
> + reg.reg_offset += this->exec_size / 8;
> +  }

The latter branch is new functionality, isn't it? There is no consideration
for header_size in the offset() utility.

> }
>  
> return true;
> @@ -963,7 +968,7 @@ fs_visitor::emit_fragcoord_interpolation(bool 
> pixel_center_integer,
> } else {
>bld.ADD(wpos, this->pixel_x, fs_reg(0.5f));
> }
> -   wpos = offset(wpos, 1);
> +   wpos = bld.offset(wpos, 1);
>  
> /* gl_FragCoord.y */
> if (!flip && pixel_center_integer) {
> @@ -979,7 +984,7 @@ fs_visitor::emit_fragcoord_interpolation(bool 
> pixel_center_integer,
>  
>bld.ADD(wpos, pixel_y, fs_reg(offset));
> }
> -   wpos = offset(wpos, 1);
> +   wpos = bld.offset(wpos, 1);
>  
> /* gl_FragCoord.z */
> if (devinfo->gen >= 6) {
> @@ -989,7 +994,7 @@ fs_visitor::emit_fragcoord_interpolation(bool 
> pixel_center_integer,
> this->delta_xy[BRW_WM_PERSPECTIVE_PIXEL_BARYCENTRIC],
> interp_reg(VARYING_SLOT_POS, 2));
> }
> -   wpos = offset(wpos, 1);
> +   wpos = bld.offset(wpos, 1);
>  
> /* gl_FragCoord.w: Already set up in emit_interpolation */
> bld.MOV(wpos, this->wpos_w);
> @@ -1072,7 +1077,7 @@ fs_visitor::emit_general_interpolation(fs_reg attr, 
> const char *name,
>   /* If there's no incoming setup data for this slot, don't
>* emit interpolation for it.
>*/
> - attr = offset(attr, type->vector_elements);
> + attr = bld.offset(attr, type->vector_elements);
>   location++;
>   continue;
>}
> @@ -1087,7 +1092,7 @@ fs_visitor::emit_general_interpolation(fs_reg attr, 
> const char *name,
>  interp = suboffset(interp, 3);
> interp.type = attr.type;
> bld.emit(FS_OPCODE_CINTERP, attr, fs_reg(interp));
> -attr = offset(attr, 1);
> +attr = bld.offset(attr, 1);
>   }
>} else {
>   /* Smooth/noperspective interpolation case. */
> @@ -1125,7 +1130,7 @@ fs_visitor::emit_general_interpolation(fs_reg attr, 
> const char *name,
> if (devinfo->gen < 6 && interpolation_mode == 
> INTERP_QUALIFIER_SMOOTH) {
>bld.MUL(attr, attr, this->pixel_w);
> }
> -attr = offset(attr, 1);
> +attr = bld.offset(attr, 1);
>   }
>  
>}
> @@ -1227,19 +1232,19 @@ fs_visitor::emit_samplepos_setup()
> if (dispatch_width == 8) {
>abld.MOV(int_sample_x, fs_reg(sample_pos_reg));
> } else {
> -  abld.half(0).MOV(half(int_sample_x, 0), fs_reg(sample_pos_reg));
> -  abld.half(1).MOV(half(int_sample_x, 1),
> +  abld.half(0).MOV(abld.half(int_sample_x, 0), fs_reg(sample_pos_reg));
> +  abld.half(1).MOV(abld.half(int_sample_x, 1),
> fs_reg(suboffset(sample_pos_reg, 16)));
> }
> /* Compute gl_SamplePosition.x */
> compute_sample_position(pos, int_sample_x);
> -   pos = offset(pos, 1);
> +   pos = abld.offset(pos, 1);
> if (dispatch_width == 8) {
>abld.MOV(int_sample_y, fs_reg(suboffset(sample_pos_reg, 1)));
> } else {
> -  abld.half(0).MOV(half(int_sample_y, 0),
> +  abld.half(0).MOV(abld.half(int_sample_y, 0),
> fs_reg(suboffset(sample_pos_reg, 1)));
> -  abld.half(1).MOV(half(int_sample_y, 1),
> +  abld.half(1).MOV(abld.half(int_sample_y, 1),
> fs_reg(suboffset(sample_po

Re: [Mesa-dev] [PATCH 10/17] i965/fs: Use exec_size for determining regs read/written and partial writes

2015-06-23 Thread Pohjolainen, Topi

On Thu, Jun 18, 2015 at 05:51:39PM -0700, Jason Ekstrand wrote:
> ---
>  src/mesa/drivers/dri/i965/brw_fs.cpp | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)

Reviewed-by: Topi Pohjolainen 

> 
> diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
> b/src/mesa/drivers/dri/i965/brw_fs.cpp
> index 61235d7..cff27e7 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
> @@ -101,7 +101,7 @@ fs_inst::init(enum opcode opcode, uint8_t exec_size, 
> const fs_reg &dst,
> case MRF:
> case ATTR:
>this->regs_written =
> - DIV_ROUND_UP(MAX2(dst.width * dst.stride, 1) * type_sz(dst.type), 
> 32);
> + DIV_ROUND_UP(MAX2(exec_size * dst.stride, 1) * type_sz(dst.type), 
> 32);
>break;
> case BAD_FILE:
>this->regs_written = 0;
> @@ -718,7 +718,7 @@ bool
>  fs_inst::is_partial_write() const
>  {
> return ((this->predicate && this->opcode != BRW_OPCODE_SEL) ||
> -   (this->dst.width * type_sz(this->dst.type)) < 32 ||
> +   (this->exec_size * type_sz(this->dst.type)) < 32 ||
> !this->dst.is_contiguous());
>  }
>  
> @@ -772,8 +772,8 @@ fs_inst::regs_read(int arg) const
>if (src[arg].stride == 0) {
>   return 1;
>} else {
> - int size = src[arg].width * src[arg].stride * 
> type_sz(src[arg].type);
> - return (size + 31) / 32;
> + int size = this->exec_size * src[arg].stride * 
> type_sz(src[arg].type);
> + return DIV_ROUND_UP(size, 32);
>}
> case MRF:
>unreachable("MRF registers are not allowed as sources");
> -- 
> 2.4.3
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 14/17] i965/fs_builder: Use dispatch_width instead of reg.width for offset and half

2015-06-23 Thread Pohjolainen, Topi

On Thu, Jun 18, 2015 at 05:51:43PM -0700, Jason Ekstrand wrote:
> ---
>  src/mesa/drivers/dri/i965/brw_fs_builder.h | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_fs_builder.h 
> b/src/mesa/drivers/dri/i965/brw_fs_builder.h
> index 7d3c8ab..58519d7 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs_builder.h
> +++ b/src/mesa/drivers/dri/i965/brw_fs_builder.h
> @@ -161,7 +161,7 @@ namespace brw {
>   case MRF:
>   case ATTR:

How would you feel about an assertion here:

assert(reg.stride);

>  return byte_offset(reg,
> -   delta * MAX2(reg.width * reg.stride, 1) *
> +   delta * dispatch_width() * reg.stride *
> type_sz(reg.type));
>   case UNIFORM:
>  reg.reg_offset += delta;
> @@ -185,9 +185,9 @@ namespace brw {
>  
>   case GRF:
>   case MRF:
> -assert(reg.width == 16);
> -reg.width = 8;
> -return horiz_offset(reg, 8 * idx);
> +assert(dispatch_width() == 16);
> +reg.width = dispatch_width() / 2;
> +return horiz_offset(reg, (dispatch_width() / 2) * idx);
>  
>   case ATTR:
>   case HW_REG:
> -- 
> 2.4.3
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 08/17] i965/fs: Make better use of the builder in shader_time

2015-06-23 Thread Pohjolainen, Topi

On Thu, Jun 18, 2015 at 05:51:37PM -0700, Jason Ekstrand wrote:
> Previously, we were just depending on register widths to ensure that
> various things were exec_size of 1 etc.  Now, we do so explicitly using the
> builder.
> ---
>  src/mesa/drivers/dri/i965/brw_fs.cpp | 14 --
>  1 file changed, 8 insertions(+), 6 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
> b/src/mesa/drivers/dri/i965/brw_fs.cpp
> index c13ac7d..740b51d 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
> @@ -557,7 +557,7 @@ fs_visitor::get_timestamp(const fs_builder &bld)
> /* We want to read the 3 fields we care about even if it's not enabled in
>  * the dispatch.
>  */
> -   bld.exec_all().MOV(dst, ts);
> +   bld.group(4, 0).exec_all().MOV(dst, ts);

Just to make sure I understand correctly, we want SIMD4 in order to read wide
enough to get all the mentioned 3 fields?

>  
> /* The caller wants the low 32 bits of the timestamp.  Since it's running
>  * at the GPU clock rate of ~1.2ghz, it will roll over every ~3 seconds,
> @@ -637,17 +637,19 @@ fs_visitor::emit_shader_time_end()
> start.negate = true;
> fs_reg diff = fs_reg(GRF, alloc.allocate(1), BRW_REGISTER_TYPE_UD, 1);
> diff.set_smear(0);
> -   ibld.ADD(diff, start, shader_end_time);
> +
> +   const fs_builder cbld = ibld.group(1, 0);
> +   cbld.group(1, 0).ADD(diff, start, shader_end_time);
>  
> /* If there were no instructions between the two timestamp gets, the diff
>  * is 2 cycles.  Remove that overhead, so I can forget about that when
>  * trying to determine the time taken for single instructions.
>  */
> -   ibld.ADD(diff, diff, fs_reg(-2u));
> -   SHADER_TIME_ADD(ibld, type, diff);
> -   SHADER_TIME_ADD(ibld, written_type, fs_reg(1u));
> +   cbld.ADD(diff, diff, fs_reg(-2u));
> +   SHADER_TIME_ADD(cbld, type, diff);
> +   SHADER_TIME_ADD(cbld, written_type, fs_reg(1u));
> ibld.emit(BRW_OPCODE_ELSE);
> -   SHADER_TIME_ADD(ibld, reset_type, fs_reg(1u));
> +   SHADER_TIME_ADD(cbld, reset_type, fs_reg(1u));
> ibld.emit(BRW_OPCODE_ENDIF);
>  }
>  
> -- 
> 2.4.3
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 17/17] i965/fs: Remove the width field from fs_reg

2015-06-23 Thread Pohjolainen, Topi

On Thu, Jun 18, 2015 at 05:51:46PM -0700, Jason Ekstrand wrote:
> As of now, the width field is no longer used for anything.  The width field
> "seemed like a good idea at the time" but is actually entirely redundant
> with the instruction's execution size.  Initially, it gave us the ability
> to easily set the instructions execution size based entirely on register
> widths.  With the builder, we can easiliy set the sizes explicitly and the
> width field doesn't have as much purpose.  At this point, it's just
> redundant information that can get out of sync so it really needs to go.
> ---
>  src/mesa/drivers/dri/i965/brw_fs.cpp   | 62 
> --
>  src/mesa/drivers/dri/i965/brw_fs_builder.h | 21 ++--
>  .../drivers/dri/i965/brw_fs_copy_propagation.cpp   |  4 --
>  src/mesa/drivers/dri/i965/brw_fs_cse.cpp   |  6 +--
>  src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp  |  4 +-
>  .../drivers/dri/i965/brw_fs_register_coalesce.cpp  |  1 -
>  src/mesa/drivers/dri/i965/brw_fs_visitor.cpp   | 26 -
>  src/mesa/drivers/dri/i965/brw_ir_fs.h  | 13 +
>  8 files changed, 30 insertions(+), 107 deletions(-)
> 

I started tagging one by one but apart from patch seven where I have doubts
that I understand the copy-payload related logic well enough all the rest is:

Reviewed-by: Topi Pohjolainen 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] i965/gen9: use an unreserved surface alignment value

2015-06-25 Thread Pohjolainen, Topi

On Wed, Jun 24, 2015 at 05:57:13PM -0700, Anuj Phogat wrote:
> On Wed, Jun 24, 2015 at 3:51 PM, Nanley Chery  wrote:
> > From: Nanley Chery 
> >
> > Although the horizontal and vertical alignment fields are ignored here,
> > 0 is a reserved value for them and may cause undefined behavior. Change
> > the default value to an abitrary valid one.
> >
> > Signed-off-by: Nanley Chery 
> > ---
> >  src/mesa/drivers/dri/i965/gen8_surface_state.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/src/mesa/drivers/dri/i965/gen8_surface_state.c 
> > b/src/mesa/drivers/dri/i965/gen8_surface_state.c
> > index b2d1a57..22ae960 100644
> > --- a/src/mesa/drivers/dri/i965/gen8_surface_state.c
> > +++ b/src/mesa/drivers/dri/i965/gen8_surface_state.c
> > @@ -93,7 +93,7 @@ vertical_alignment(const struct brw_context *brw,
> > if (brw->gen > 8 &&
> > (mt->tr_mode != INTEL_MIPTREE_TRMODE_NONE ||
> >  surf_type == BRW_SURFACE_1D))
> > -  return 0;
> > +  return GEN8_SURFACE_VALIGN_4;
> >
> > switch (mt->align_h) {
> > case 4:
> > @@ -118,7 +118,7 @@ horizontal_alignment(const struct brw_context *brw,
> > if (brw->gen > 8 &&
> > (mt->tr_mode != INTEL_MIPTREE_TRMODE_NONE ||
> >  gen9_use_linear_1d_layout(brw, mt)))
> > -  return 0;
> > +  return GEN8_SURFACE_HALIGN_4;
> >
> > switch (mt->align_w) {
> > case 4:
> > --
> > 2.4.4
> >
> > ___
> > mesa-dev mailing list
> > mesa-dev@lists.freedesktop.org
> > http://lists.freedesktop.org/mailman/listinfo/mesa-dev
> 
> Good find Nanley. We had no known issues with value 0 but it's
> always nice to avoid undefined behavior :).

Right, I thought about this when I reviewed the original. The spec says
it is ignored in these cases and hence the reserved value seemed fine. Now
that we put something meaningful there, somebody is going to compare the
spec and wonder why we set it to 4. If we added also a comment here that
says this is just an arbitrary (non-reserved) value and really ignored
by the hardware, it would prevent misunderstandings. What do you guys think?

> 
> Reviewed-by: Anuj Phogat 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] i965/gen9: use an unreserved surface alignment value

2015-06-25 Thread Pohjolainen, Topi

On Thu, Jun 25, 2015 at 08:40:33AM -0700, Nanley Chery wrote:
> On Thu, Jun 25, 2015 at 12:37 AM, Pohjolainen, Topi
>  wrote:
> > On Wed, Jun 24, 2015 at 05:57:13PM -0700, Anuj Phogat wrote:
> >> On Wed, Jun 24, 2015 at 3:51 PM, Nanley Chery  
> >> wrote:
> >> > From: Nanley Chery 
> >> >
> >> > Although the horizontal and vertical alignment fields are ignored here,
> >> > 0 is a reserved value for them and may cause undefined behavior. Change
> >> > the default value to an abitrary valid one.
> >> >
> >> > Signed-off-by: Nanley Chery 
> >> > ---
> >> >  src/mesa/drivers/dri/i965/gen8_surface_state.c | 4 ++--
> >> >  1 file changed, 2 insertions(+), 2 deletions(-)
> >> >
> >> > diff --git a/src/mesa/drivers/dri/i965/gen8_surface_state.c 
> >> > b/src/mesa/drivers/dri/i965/gen8_surface_state.c
> >> > index b2d1a57..22ae960 100644
> >> > --- a/src/mesa/drivers/dri/i965/gen8_surface_state.c
> >> > +++ b/src/mesa/drivers/dri/i965/gen8_surface_state.c
> >> > @@ -93,7 +93,7 @@ vertical_alignment(const struct brw_context *brw,
> >> > if (brw->gen > 8 &&
> >> > (mt->tr_mode != INTEL_MIPTREE_TRMODE_NONE ||
> >> >  surf_type == BRW_SURFACE_1D))
> >> > -  return 0;
> >> > +  return GEN8_SURFACE_VALIGN_4;
> >> >
> >> > switch (mt->align_h) {
> >> > case 4:
> >> > @@ -118,7 +118,7 @@ horizontal_alignment(const struct brw_context *brw,
> >> > if (brw->gen > 8 &&
> >> > (mt->tr_mode != INTEL_MIPTREE_TRMODE_NONE ||
> >> >  gen9_use_linear_1d_layout(brw, mt)))
> >> > -  return 0;
> >> > +  return GEN8_SURFACE_HALIGN_4;
> >> >
> >> > switch (mt->align_w) {
> >> > case 4:
> >> > --
> >> > 2.4.4
> >> >
> >> > ___
> >> > mesa-dev mailing list
> >> > mesa-dev@lists.freedesktop.org
> >> > http://lists.freedesktop.org/mailman/listinfo/mesa-dev
> >>
> >> Good find Nanley. We had no known issues with value 0 but it's
> >> always nice to avoid undefined behavior :).
> >
> > Right, I thought about this when I reviewed the original. The spec says
> > it is ignored in these cases and hence the reserved value seemed fine. Now
> > that we put something meaningful there, somebody is going to compare the
> > spec and wonder why we set it to 4. If we added also a comment here that
> > says this is just an arbitrary (non-reserved) value and really ignored
> > by the hardware, it would prevent misunderstandings. What do you guys think?
> >
> There's enough space to insert "Set to an arbitrary non-reserved
> value." in both of the comments preceding the conditional without
> adding an extra line. I wouldn't mind including it.

Sounds great, thanks!
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v2 09/19] i965/fs: Add a builder argument to offset()

2015-06-26 Thread Pohjolainen, Topi

On Thu, Jun 25, 2015 at 01:24:53PM -0700, Jason Ekstrand wrote:
> ---
>  src/mesa/drivers/dri/i965/brw_fs.cpp |  42 
>  src/mesa/drivers/dri/i965/brw_fs.h   |   2 +-
>  src/mesa/drivers/dri/i965/brw_fs_cse.cpp |   2 +-
>  src/mesa/drivers/dri/i965/brw_fs_nir.cpp |  58 +--
>  src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 143 
> ++-
>  5 files changed, 128 insertions(+), 119 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
> b/src/mesa/drivers/dri/i965/brw_fs.cpp
> index 6cf9e96..9855bfb 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
> @@ -267,7 +267,7 @@ fs_visitor::VARYING_PULL_CONSTANT_LOAD(const fs_builder 
> &bld,
>   inst->mlen = 1 + dispatch_width / 8;
> }
>  
> -   bld.MOV(dst, offset(vec4_result, (const_offset & 3) * scale));
> +   bld.MOV(dst, offset(vec4_result, bld, (const_offset & 3) * scale));
>  }
>  
>  /**
> @@ -361,7 +361,12 @@ fs_inst::is_copy_payload(const brw::simple_allocator 
> &grf_alloc) const
>reg.width = this->src[i].width;
>if (!this->src[i].equals(reg))
>   return false;
> -  reg = ::offset(reg, 1);
> +
> +  if (i < this->header_size) {
> + reg.reg_offset += 1;
> +  } else {
> + reg.reg_offset += this->exec_size / 8;
> +  }
> }

After studying some more and with your explanation to the earlier version
(thanks), I can now see why this change here and the additinal builder
in lower_load_payload() are needed. The rest was fine already in the previous
version:

Reviewed-by: Topi Pohjolainen 

>  
> return true;
> @@ -920,7 +925,7 @@ fs_visitor::emit_fragcoord_interpolation(bool 
> pixel_center_integer,
> } else {
>bld.ADD(wpos, this->pixel_x, fs_reg(0.5f));
> }
> -   wpos = offset(wpos, 1);
> +   wpos = offset(wpos, bld, 1);
>  
> /* gl_FragCoord.y */
> if (!flip && pixel_center_integer) {
> @@ -936,7 +941,7 @@ fs_visitor::emit_fragcoord_interpolation(bool 
> pixel_center_integer,
>  
>bld.ADD(wpos, pixel_y, fs_reg(offset));
> }
> -   wpos = offset(wpos, 1);
> +   wpos = offset(wpos, bld, 1);
>  
> /* gl_FragCoord.z */
> if (devinfo->gen >= 6) {
> @@ -946,7 +951,7 @@ fs_visitor::emit_fragcoord_interpolation(bool 
> pixel_center_integer,
> this->delta_xy[BRW_WM_PERSPECTIVE_PIXEL_BARYCENTRIC],
> interp_reg(VARYING_SLOT_POS, 2));
> }
> -   wpos = offset(wpos, 1);
> +   wpos = offset(wpos, bld, 1);
>  
> /* gl_FragCoord.w: Already set up in emit_interpolation */
> bld.MOV(wpos, this->wpos_w);
> @@ -1029,7 +1034,7 @@ fs_visitor::emit_general_interpolation(fs_reg attr, 
> const char *name,
>   /* If there's no incoming setup data for this slot, don't
>* emit interpolation for it.
>*/
> - attr = offset(attr, type->vector_elements);
> + attr = offset(attr, bld, type->vector_elements);
>   location++;
>   continue;
>}
> @@ -1044,7 +1049,7 @@ fs_visitor::emit_general_interpolation(fs_reg attr, 
> const char *name,
>  interp = suboffset(interp, 3);
> interp.type = attr.type;
> bld.emit(FS_OPCODE_CINTERP, attr, fs_reg(interp));
> -attr = offset(attr, 1);
> +attr = offset(attr, bld, 1);
>   }
>} else {
>   /* Smooth/noperspective interpolation case. */
> @@ -1082,7 +1087,7 @@ fs_visitor::emit_general_interpolation(fs_reg attr, 
> const char *name,
> if (devinfo->gen < 6 && interpolation_mode == 
> INTERP_QUALIFIER_SMOOTH) {
>bld.MUL(attr, attr, this->pixel_w);
> }
> -attr = offset(attr, 1);
> +attr = offset(attr, bld, 1);
>   }
>  
>}
> @@ -1190,7 +1195,7 @@ fs_visitor::emit_samplepos_setup()
> }
> /* Compute gl_SamplePosition.x */
> compute_sample_position(pos, int_sample_x);
> -   pos = offset(pos, 1);
> +   pos = offset(pos, abld, 1);
> if (dispatch_width == 8) {
>abld.MOV(int_sample_y, fs_reg(suboffset(sample_pos_reg, 1)));
> } else {
> @@ -2980,10 +2985,6 @@ fs_visitor::lower_load_payload()
>  
>assert(inst->dst.file == MRF || inst->dst.file == GRF);
>assert(inst->saturate == false);
> -
> -  const fs_builder ibld = bld.group(inst->exec_size, inst->force_sechalf)
> - .exec_all(inst->force_writemask_all)
> - .at(block, inst);
>fs_reg dst = inst->dst;
>  
>/* Get rid of COMPR4.  We'll add it back in if we need it */
> @@ -2991,17 +2992,23 @@ fs_visitor::lower_load_payload()
>   dst.reg = dst.reg & ~BRW_MRF_COMPR4;
>  
>dst.width = 8;
> +  const fs_builder hbld = bld.group(8, 0).exec_all().at(block, inst);
> +
>for (uint8_t i = 0; i < inst->header_size; i++) {
>   if (inst->src[i].file != BAD_FILE

Re: [Mesa-dev] [PATCH] i965/gen6: Set up layer constraints properly for depth buffers.

2015-06-26 Thread Pohjolainen, Topi

On Thu, Jun 25, 2015 at 09:17:38AM -0700, Kenneth Graunke wrote:
> This ports over Chris Forbes' equivalent fixes in gen7_misc_state.c
> from commit 77d55ef4819436ebbf9786a1e720ec00707bbb19.
> 
> No Piglit changes on Sandybridge.
> 
> Signed-off-by: Kenneth Graunke 

Reviewed-by: Topi Pohjolainen 

> ---
>  src/mesa/drivers/dri/i965/gen6_depth_state.c | 6 +-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/gen6_depth_state.c 
> b/src/mesa/drivers/dri/i965/gen6_depth_state.c
> index 8f0d7dc..febd478 100644
> --- a/src/mesa/drivers/dri/i965/gen6_depth_state.c
> +++ b/src/mesa/drivers/dri/i965/gen6_depth_state.c
> @@ -73,7 +73,7 @@ gen6_emit_depth_stencil_hiz(struct brw_context *brw,
> rb = (struct gl_renderbuffer*) irb;
>  
> if (rb) {
> -  depth = MAX2(rb->Depth, 1);
> +  depth = MAX2(irb->layer_count, 1);
>if (rb->TexImage)
>   gl_target = rb->TexImage->TexObject->Target;
> }
> @@ -89,6 +89,10 @@ gen6_emit_depth_stencil_hiz(struct brw_context *brw,
>surftype = BRW_SURFACE_2D;
>depth *= 6;
>break;
> +   case GL_TEXTURE_3D:
> +  assert(mt);
> +  depth = MAX2(mt->logical_depth0, 1);
> +  /* fallthrough */
> default:
>surftype = translate_tex_target(gl_target);
>break;
> -- 
> 2.4.0
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] gl-2.0: Add test for re-using shader objects

2015-06-26 Thread Pohjolainen, Topi

On Fri, Jun 26, 2015 at 10:54:48AM -0400, Ilia Mirkin wrote:
> On Fri, Jun 26, 2015 at 10:50 AM, Topi Pohjolainen
>  wrote:
> > This is stimulating the shader binary re-use logic in i965 buffer
> > object uploading. Without relaxing the current constraints there
> > will be two identical copies in the cache. With the introduced
> > three patches the logic begins to share only one copy between
> > two entries in the cache.
> >
> > CC: Kenneth Graunke 
> > Signed-off-by: Topi Pohjolainen 
> > ---
> >  tests/spec/gl-2.0/CMakeLists.gl.txt   |   1 +
> >  tests/spec/gl-2.0/reuse_fragment_shader.c | 105 
> > ++
> 
> also add to all.py?

Yes, and need to send to piglit list as well :)

I'm suspecting I need to write a few others as well. I just wanted to show
something to demonstrate the logic I proposed in my i965 patches.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 1/6] i965/vec4: Plumb log_data through so the backend_shader field gets set.

2015-07-03 Thread Pohjolainen, Topi

On Wed, Jul 01, 2015 at 03:03:31PM -0700, Kenneth Graunke wrote:
> Jason plumbed this through a while back in the FS backend, but
> apparently we were just passing NULL in the vec4 backend.
> 
> This patch passes brw in as intended.
> 
> Signed-off-by: Kenneth Graunke 
> ---
>  src/mesa/drivers/dri/i965/brw_vec4.cpp|  2 +-
>  src/mesa/drivers/dri/i965/brw_vec4.h  |  1 +
>  src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp | 10 ++
>  src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.h   |  1 +
>  src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp|  3 ++-
>  src/mesa/drivers/dri/i965/brw_vec4_vs_visitor.cpp |  4 +++-
>  src/mesa/drivers/dri/i965/brw_vs.h|  1 +
>  src/mesa/drivers/dri/i965/gen6_gs_visitor.h   |  4 +++-
>  8 files changed, 18 insertions(+), 8 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp 
> b/src/mesa/drivers/dri/i965/brw_vec4.cpp
> index a5c686c..2a56564 100644
> --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
> @@ -1940,7 +1940,7 @@ brw_vs_emit(struct brw_context *brw,
> if (!assembly) {
>prog_data->base.dispatch_mode = DISPATCH_MODE_4X2_DUAL_OBJECT;
>  
> -  vec4_vs_visitor v(brw->intelScreen->compiler,
> +  vec4_vs_visitor v(brw->intelScreen->compiler, brw,
>  c, prog_data, prog, mem_ctx, st_index,
>  !_mesa_is_gles3(&brw->ctx));
>if (!v.run(brw_select_clip_planes(&brw->ctx))) {
> diff --git a/src/mesa/drivers/dri/i965/brw_vec4.h 
> b/src/mesa/drivers/dri/i965/brw_vec4.h
> index 2ac1693..043557b 100644
> --- a/src/mesa/drivers/dri/i965/brw_vec4.h
> +++ b/src/mesa/drivers/dri/i965/brw_vec4.h
> @@ -77,6 +77,7 @@ class vec4_visitor : public backend_shader, public 
> ir_visitor
>  {
>  public:
> vec4_visitor(const struct brw_compiler *compiler,
> +void *log_data,

As far as I can see, all the constructors addressed in this patch are
"struct brw_context" aware. Could we use the type "struct brw_context *"
instead of "void *"? The pointer is in the end given to shader_perf_log_mesa()
which in turn unconditionally casts is to "struct brw_context *".

>  struct brw_vec4_compile *c,
>  struct gl_program *prog,
>  const struct brw_vue_prog_key *key,
> diff --git a/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp 
> b/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp
> index 69bcf5a..80c59af 100644
> --- a/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp
> @@ -35,12 +35,14 @@ const unsigned MAX_GS_INPUT_VERTICES = 6;
>  namespace brw {
>  
>  vec4_gs_visitor::vec4_gs_visitor(const struct brw_compiler *compiler,
> + void *log_data,
>   struct brw_gs_compile *c,
>   struct gl_shader_program *prog,
>   void *mem_ctx,
>   bool no_spills,
>   int shader_time_index)
> -   : vec4_visitor(compiler, &c->base, &c->gp->program.Base, &c->key.base,
> +   : vec4_visitor(compiler, log_data,
> +  &c->base, &c->gp->program.Base, &c->key.base,
>&c->prog_data.base, prog, MESA_SHADER_GEOMETRY, mem_ctx,
>no_spills, shader_time_index),
>   c(c)
> @@ -662,7 +664,7 @@ brw_gs_emit(struct brw_context *brw,
>likely(!(INTEL_DEBUG & DEBUG_NO_DUAL_OBJECT_GS))) {
>   c->prog_data.base.dispatch_mode = DISPATCH_MODE_4X2_DUAL_OBJECT;
>  
> - vec4_gs_visitor v(brw->intelScreen->compiler,
> + vec4_gs_visitor v(brw->intelScreen->compiler, brw,
> c, prog, mem_ctx, true /* no_spills */, st_index);
>   if (v.run(NULL /* clip planes */)) {
>  return generate_assembly(brw, prog, &c->gp->program.Base,
> @@ -704,11 +706,11 @@ brw_gs_emit(struct brw_context *brw,
> const unsigned *ret = NULL;
>  
> if (brw->gen >= 7)
> -  gs = new vec4_gs_visitor(brw->intelScreen->compiler,
> +  gs = new vec4_gs_visitor(brw->intelScreen->compiler, brw,
> c, prog, mem_ctx, false /* no_spills */,
> st_index);
> else
> -  gs = new gen6_gs_visitor(brw->intelScreen->compiler,
> +  gs = new gen6_gs_visitor(brw->intelScreen->compiler, brw,
> c, prog, mem_ctx, false /* no_spills */,
> st_index);
>  
> diff --git a/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.h 
> b/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.h
> index e693c56..e48d861 100644
> --- a/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.h
> +++ b/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.h
> @@ -69,6 +69,7 @@ class vec4_gs_visitor : public vec4_visitor
>  {
>  public:
> vec4_gs_visitor(const struct brw_c

Re: [Mesa-dev] [PATCH 2/6] i965/vec4: Move perf_debug about register spilling into the visitor.

2015-07-03 Thread Pohjolainen, Topi

On Wed, Jul 01, 2015 at 03:03:32PM -0700, Kenneth Graunke wrote:
> This patch makes us only issue the performance warning about register
> spilling if we actually spilled registers.  We also use scratch space
> for indirect addressing and the like.
> 
> This is basically commit c51163b0cf7aff0375b1a5ea4cb3da9d9e164044 for
> the vec4 backend.
> 
> Signed-off-by: Kenneth Graunke 
> ---
>  src/mesa/drivers/dri/i965/brw_gs.c |  4 
>  src/mesa/drivers/dri/i965/brw_vec4.cpp | 16 +---
>  src/mesa/drivers/dri/i965/brw_vs.c |  4 
>  3 files changed, 13 insertions(+), 11 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_gs.c 
> b/src/mesa/drivers/dri/i965/brw_gs.c
> index 52c7303..7f947e0 100644
> --- a/src/mesa/drivers/dri/i965/brw_gs.c
> +++ b/src/mesa/drivers/dri/i965/brw_gs.c
> @@ -268,10 +268,6 @@ brw_codegen_gs_prog(struct brw_context *brw,
>  
> /* Scratch space is used for register spilling */
> if (c.base.last_scratch) {
> -  perf_debug("Geometry shader triggered register spilling.  "
> - "Try reducing the number of live vec4 values to "
> - "improve performance.\n");
> -
>c.prog_data.base.base.total_scratch
>   = brw_get_scratch_size(c.base.last_scratch*REG_SIZE);
>  
> diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp 
> b/src/mesa/drivers/dri/i965/brw_vec4.cpp
> index 2a56564..60f73e2 100644
> --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
> @@ -1827,9 +1827,19 @@ vec4_visitor::run(gl_clip_plane *clip_planes)
>}
> }
>  
> -   while (!reg_allocate()) {
> -  if (failed)
> - return false;
> +   bool allocated_without_spills = reg_allocate();
> +
> +   if (!allocated_without_spills) {
> +  compiler->shader_perf_log(log_data,
> +"%s shader triggered register spilling.  "
> +"Try reducing the number of live vec4 values 
> "
> +"to improve performance.\n",
> +stage_name);
> +
> +  while (!reg_allocate()) {

I tried to understand a little how repeating calls to reg_allocate() differ
from previous in result wise. I didn't really get it but that doesn't really
prevent me from reviewing this patch. This patch preserves the logic while
corresponding to the intent in commit message.

Reviewed-by: Topi Pohjolainen 

> + if (failed)
> +return false;
> +  }
> }
>  
> opt_schedule_instructions();
> diff --git a/src/mesa/drivers/dri/i965/brw_vs.c 
> b/src/mesa/drivers/dri/i965/brw_vs.c
> index 6e9848f..edbcbcf 100644
> --- a/src/mesa/drivers/dri/i965/brw_vs.c
> +++ b/src/mesa/drivers/dri/i965/brw_vs.c
> @@ -196,10 +196,6 @@ brw_codegen_vs_prog(struct brw_context *brw,
>  
> /* Scratch space is used for register spilling */
> if (c.base.last_scratch) {
> -  perf_debug("Vertex shader triggered register spilling.  "
> - "Try reducing the number of live vec4 values to "
> - "improve performance.\n");
> -
>prog_data.base.base.total_scratch
>   = brw_get_scratch_size(c.base.last_scratch*REG_SIZE);
>  
> -- 
> 2.4.4
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 3/6] i965/vec4: Move total_scratch calculation into the visitor.

2015-07-03 Thread Pohjolainen, Topi

On Wed, Jul 01, 2015 at 03:03:33PM -0700, Kenneth Graunke wrote:
> This is more consistent with how we do it in the FS backend, and reduces
> a tiny bit of duplication.  It'll also allow for a bit more tidying.

And it also makes it clearer that code generation doesn't have anything to do
with the scratch space allocation. Setting a value as soon as it is available
is always better.

Reviewed-by: Topi Pohjolainen 

> 
> Signed-off-by: Kenneth Graunke 
> ---
>  src/mesa/drivers/dri/i965/brw_gs.c | 5 +
>  src/mesa/drivers/dri/i965/brw_vec4.cpp | 7 +--
>  src/mesa/drivers/dri/i965/brw_vs.c | 5 +
>  3 files changed, 7 insertions(+), 10 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_gs.c 
> b/src/mesa/drivers/dri/i965/brw_gs.c
> index 7f947e0..9c59c8a 100644
> --- a/src/mesa/drivers/dri/i965/brw_gs.c
> +++ b/src/mesa/drivers/dri/i965/brw_gs.c
> @@ -267,10 +267,7 @@ brw_codegen_gs_prog(struct brw_context *brw,
> }
>  
> /* Scratch space is used for register spilling */
> -   if (c.base.last_scratch) {
> -  c.prog_data.base.base.total_scratch
> - = brw_get_scratch_size(c.base.last_scratch*REG_SIZE);
> -
> +   if (c.prog_data.base.base.total_scratch) {
>brw_get_scratch_bo(brw, &stage_state->scratch_bo,
>c.prog_data.base.base.total_scratch *
>   brw->max_gs_threads);
> diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp 
> b/src/mesa/drivers/dri/i965/brw_vec4.cpp
> index 60f73e2..7b367ec 100644
> --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
> @@ -1846,6 +1846,11 @@ vec4_visitor::run(gl_clip_plane *clip_planes)
>  
> opt_set_dependency_control();
>  
> +   if (c->last_scratch > 0) {
> +  prog_data->base.total_scratch =
> + brw_get_scratch_size(c->last_scratch * REG_SIZE);
> +   }
> +
> /* If any state parameters were appended, then ParameterValues could have
>  * been realloced, in which case the driver uniform storage set up by
>  * _mesa_associate_uniform_storage() would point to freed memory.  Make
> @@ -1943,8 +1948,6 @@ brw_vs_emit(struct brw_context *brw,
>}
>g.generate_code(v.cfg, 8);
>assembly = g.get_assembly(final_assembly_size);
> -
> -  c->base.last_scratch = v.last_scratch;
> }
>  
> if (!assembly) {
> diff --git a/src/mesa/drivers/dri/i965/brw_vs.c 
> b/src/mesa/drivers/dri/i965/brw_vs.c
> index edbcbcf..ee3f664 100644
> --- a/src/mesa/drivers/dri/i965/brw_vs.c
> +++ b/src/mesa/drivers/dri/i965/brw_vs.c
> @@ -195,10 +195,7 @@ brw_codegen_vs_prog(struct brw_context *brw,
> }
>  
> /* Scratch space is used for register spilling */
> -   if (c.base.last_scratch) {
> -  prog_data.base.base.total_scratch
> - = brw_get_scratch_size(c.base.last_scratch*REG_SIZE);
> -
> +   if (prog_data.base.base.total_scratch) {
>brw_get_scratch_bo(brw, &brw->vs.base.scratch_bo,
>prog_data.base.base.total_scratch *
>   brw->max_vs_threads);
> -- 
> 2.4.4
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 4/6] i965/vec4: Move c->last_scratch into vec4_visitor.

2015-07-03 Thread Pohjolainen, Topi

On Wed, Jul 01, 2015 at 03:03:34PM -0700, Kenneth Graunke wrote:
> Nothing outside of vec4_visitor uses it, so we may as well keep it
> internal.
> 
> Commit db9c915abcc5ad78d2d11d0e732f04cc94631350 for the vec4 backend.
> 
> (The empty class will be going away soon.)
> 
> Signed-off-by: Kenneth Graunke 
> ---
>  src/mesa/drivers/dri/i965/brw_vec4.cpp  |  4 ++--
>  src/mesa/drivers/dri/i965/brw_vec4.h|  8 ++--
>  src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp   |  2 +-
>  src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.h |  1 -
>  src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp |  2 +-
>  src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp  | 17 -
>  src/mesa/drivers/dri/i965/brw_vec4_vs_visitor.cpp   |  2 +-
>  src/mesa/drivers/dri/i965/brw_vs.h  |  1 -
>  8 files changed, 15 insertions(+), 22 deletions(-)

Reviewed-by: Topi Pohjolainen 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 5/6] i965/vs: Remove 'c'/vs_compile from vec4_vs_visitor.

2015-07-03 Thread Pohjolainen, Topi

On Wed, Jul 01, 2015 at 03:03:35PM -0700, Kenneth Graunke wrote:
> At this point, the brw_vs_compile structure only contains the key and
> gl_vertex_program pointer.  We may as well pass and store them directly;
> it's simpler and more convenient (key-> instead of vs_compile->key...).
> 
> Signed-off-by: Kenneth Graunke 
> ---
>  src/mesa/drivers/dri/i965/brw_vec4.cpp|  4 ++--
>  src/mesa/drivers/dri/i965/brw_vec4_vp.cpp |  9 +++--
>  src/mesa/drivers/dri/i965/brw_vec4_vs_visitor.cpp | 11 ++-
>  src/mesa/drivers/dri/i965/brw_vs.h|  6 --
>  4 files changed, 15 insertions(+), 15 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp 
> b/src/mesa/drivers/dri/i965/brw_vec4.cpp
> index e5db268..42d014c 100644
> --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
> @@ -1953,8 +1953,8 @@ brw_vs_emit(struct brw_context *brw,
> if (!assembly) {
>prog_data->base.dispatch_mode = DISPATCH_MODE_4X2_DUAL_OBJECT;
>  
> -  vec4_vs_visitor v(brw->intelScreen->compiler, brw,
> -c, prog_data, prog, mem_ctx, st_index,
> +  vec4_vs_visitor v(brw->intelScreen->compiler, brw, &c->key, prog_data,
> +&c->vp->program, prog, mem_ctx, st_index,
>  !_mesa_is_gles3(&brw->ctx));
>if (!v.run(brw_select_clip_planes(&brw->ctx))) {
>   if (prog) {
> diff --git a/src/mesa/drivers/dri/i965/brw_vec4_vp.cpp 
> b/src/mesa/drivers/dri/i965/brw_vec4_vp.cpp
> index dcbd240..d1a72d7 100644
> --- a/src/mesa/drivers/dri/i965/brw_vec4_vp.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_vec4_vp.cpp
> @@ -394,8 +394,7 @@ vec4_vs_visitor::emit_program_code()
>  * pull constants.  Do that now.
>  */
> if (this->need_all_constants_in_pull_buffer) {
> -  const struct gl_program_parameter_list *params =
> - vs_compile->vp->program.Base.Parameters;
> +  const struct gl_program_parameter_list *params = vp->Base.Parameters;
>unsigned i;
>for (i = 0; i < params->NumParameters * 4; i++) {
>   stage_prog_data->pull_param[i] =
> @@ -415,8 +414,7 @@ vec4_vs_visitor::setup_vp_regs()
>vp_temp_regs[i] = src_reg(this, glsl_type::vec4_type);
>  
> /* PROGRAM_STATE_VAR etc. */
> -   struct gl_program_parameter_list *plist =
> -  vs_compile->vp->program.Base.Parameters;
> +   struct gl_program_parameter_list *plist = vp->Base.Parameters;
> for (unsigned p = 0; p < plist->NumParameters; p++) {
>unsigned components = plist->Parameters[p].Size;
>  
> @@ -486,8 +484,7 @@ vec4_vs_visitor::get_vp_dst_reg(const prog_dst_register 
> &dst)
>  src_reg
>  vec4_vs_visitor::get_vp_src_reg(const prog_src_register &src)
>  {
> -   struct gl_program_parameter_list *plist =
> -  vs_compile->vp->program.Base.Parameters;
> +   struct gl_program_parameter_list *plist = vp->Base.Parameters;
>  
> src_reg result;
>  
> diff --git a/src/mesa/drivers/dri/i965/brw_vec4_vs_visitor.cpp 
> b/src/mesa/drivers/dri/i965/brw_vec4_vs_visitor.cpp
> index 35b601a..b7ec8b9 100644
> --- a/src/mesa/drivers/dri/i965/brw_vec4_vs_visitor.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_vec4_vs_visitor.cpp
> @@ -36,7 +36,7 @@ vec4_vs_visitor::emit_prolog()
>  
> for (int i = 0; i < VERT_ATTRIB_MAX; i++) {
>if (vs_prog_data->inputs_read & BITFIELD64_BIT(i)) {
> - uint8_t wa_flags = vs_compile->key.gl_attrib_wa_flags[i];
> + uint8_t wa_flags = key->gl_attrib_wa_flags[i];
>   dst_reg reg(ATTR, i);
>   dst_reg reg_d = reg;
>   reg_d.type = BRW_REGISTER_TYPE_D;
> @@ -213,20 +213,21 @@ vec4_vs_visitor::emit_thread_end()
>  
>  vec4_vs_visitor::vec4_vs_visitor(const struct brw_compiler *compiler,
>   void *log_data,
> - struct brw_vs_compile *vs_compile,
> + const struct brw_vs_prog_key *key,
>   struct brw_vs_prog_data *vs_prog_data,
> + struct gl_vertex_program *vp,
>   struct gl_shader_program *prog,
>   void *mem_ctx,
>   int shader_time_index,
>   bool use_legacy_snorm_formula)
> : vec4_visitor(compiler, log_data,
> -  &vs_compile->vp->program.Base,
> -  &vs_compile->key.base, &vs_prog_data->base, prog,
> +  &vp->Base, &key->base, &vs_prog_data->base, prog,
>MESA_SHADER_VERTEX,
>mem_ctx, false /* no_spills */,
>shader_time_index),
> - vs_compile(vs_compile),
> + key(key),
>   vs_prog_data(vs_prog_data),
> + vp(vp),
>   use_legacy_snorm_formula(use_legacy_snorm_formula)
>  {
>  }
> diff --git a/src/mesa/drivers/dri/i965/brw_vs.h 
> b/src/mesa/drivers/dri/i965/brw_vs.h
>

Re: [Mesa-dev] [PATCH 5/6] i965/vs: Remove 'c'/vs_compile from vec4_vs_visitor.

2015-07-03 Thread Pohjolainen, Topi

On Fri, Jul 03, 2015 at 11:29:33AM +0300, Pohjolainen, Topi wrote:
> On Wed, Jul 01, 2015 at 03:03:35PM -0700, Kenneth Graunke wrote:
> > At this point, the brw_vs_compile structure only contains the key and
> > gl_vertex_program pointer.  We may as well pass and store them directly;
> > it's simpler and more convenient (key-> instead of vs_compile->key...).
> > 
> > Signed-off-by: Kenneth Graunke 
> > ---
> >  src/mesa/drivers/dri/i965/brw_vec4.cpp|  4 ++--
> >  src/mesa/drivers/dri/i965/brw_vec4_vp.cpp |  9 +++--
> >  src/mesa/drivers/dri/i965/brw_vec4_vs_visitor.cpp | 11 ++-
> >  src/mesa/drivers/dri/i965/brw_vs.h|  6 --
> >  4 files changed, 15 insertions(+), 15 deletions(-)
> > 
> > diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp 
> > b/src/mesa/drivers/dri/i965/brw_vec4.cpp
> > index e5db268..42d014c 100644
> > --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
> > +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
> > @@ -1953,8 +1953,8 @@ brw_vs_emit(struct brw_context *brw,
> > if (!assembly) {
> >prog_data->base.dispatch_mode = DISPATCH_MODE_4X2_DUAL_OBJECT;
> >  
> > -  vec4_vs_visitor v(brw->intelScreen->compiler, brw,
> > -c, prog_data, prog, mem_ctx, st_index,
> > +  vec4_vs_visitor v(brw->intelScreen->compiler, brw, &c->key, 
> > prog_data,
> > +&c->vp->program, prog, mem_ctx, st_index,
> >  !_mesa_is_gles3(&brw->ctx));
> >if (!v.run(brw_select_clip_planes(&brw->ctx))) {
> >   if (prog) {
> > diff --git a/src/mesa/drivers/dri/i965/brw_vec4_vp.cpp 
> > b/src/mesa/drivers/dri/i965/brw_vec4_vp.cpp
> > index dcbd240..d1a72d7 100644
> > --- a/src/mesa/drivers/dri/i965/brw_vec4_vp.cpp
> > +++ b/src/mesa/drivers/dri/i965/brw_vec4_vp.cpp
> > @@ -394,8 +394,7 @@ vec4_vs_visitor::emit_program_code()
> >  * pull constants.  Do that now.
> >  */
> > if (this->need_all_constants_in_pull_buffer) {
> > -  const struct gl_program_parameter_list *params =
> > - vs_compile->vp->program.Base.Parameters;
> > +  const struct gl_program_parameter_list *params = vp->Base.Parameters;
> >unsigned i;
> >for (i = 0; i < params->NumParameters * 4; i++) {
> >   stage_prog_data->pull_param[i] =
> > @@ -415,8 +414,7 @@ vec4_vs_visitor::setup_vp_regs()
> >vp_temp_regs[i] = src_reg(this, glsl_type::vec4_type);
> >  
> > /* PROGRAM_STATE_VAR etc. */
> > -   struct gl_program_parameter_list *plist =
> > -  vs_compile->vp->program.Base.Parameters;
> > +   struct gl_program_parameter_list *plist = vp->Base.Parameters;
> > for (unsigned p = 0; p < plist->NumParameters; p++) {
> >unsigned components = plist->Parameters[p].Size;
> >  
> > @@ -486,8 +484,7 @@ vec4_vs_visitor::get_vp_dst_reg(const prog_dst_register 
> > &dst)
> >  src_reg
> >  vec4_vs_visitor::get_vp_src_reg(const prog_src_register &src)
> >  {
> > -   struct gl_program_parameter_list *plist =
> > -  vs_compile->vp->program.Base.Parameters;
> > +   struct gl_program_parameter_list *plist = vp->Base.Parameters;
> >  
> > src_reg result;
> >  
> > diff --git a/src/mesa/drivers/dri/i965/brw_vec4_vs_visitor.cpp 
> > b/src/mesa/drivers/dri/i965/brw_vec4_vs_visitor.cpp
> > index 35b601a..b7ec8b9 100644
> > --- a/src/mesa/drivers/dri/i965/brw_vec4_vs_visitor.cpp
> > +++ b/src/mesa/drivers/dri/i965/brw_vec4_vs_visitor.cpp
> > @@ -36,7 +36,7 @@ vec4_vs_visitor::emit_prolog()
> >  
> > for (int i = 0; i < VERT_ATTRIB_MAX; i++) {
> >if (vs_prog_data->inputs_read & BITFIELD64_BIT(i)) {
> > - uint8_t wa_flags = vs_compile->key.gl_attrib_wa_flags[i];
> > + uint8_t wa_flags = key->gl_attrib_wa_flags[i];
> >   dst_reg reg(ATTR, i);
> >   dst_reg reg_d = reg;
> >   reg_d.type = BRW_REGISTER_TYPE_D;
> > @@ -213,20 +213,21 @@ vec4_vs_visitor::emit_thread_end()
> >  
> >  vec4_vs_visitor::vec4_vs_visitor(const struct brw_compiler *compiler,
> >   void *log_data,
> > - struct brw_vs_compile *vs_compile,
> > + const struct brw_vs_prog_key *key,
> >   struct brw_vs_prog_data *vs_prog_data,
> >

Re: [Mesa-dev] [PATCH 6/6] i965/vs: Get rid of brw_vs_compile completely.

2015-07-03 Thread Pohjolainen, Topi

On Wed, Jul 01, 2015 at 03:03:36PM -0700, Kenneth Graunke wrote:
> After tearing it out another level or two, and just passing the key and
> vp directly, we can finally remove this struct.  It also eliminates a
> pointless memcpy() of the key.
> 
> Signed-off-by: Kenneth Graunke 
> ---
>  src/mesa/drivers/dri/i965/brw_vec4.cpp | 37 
> +-
>  src/mesa/drivers/dri/i965/brw_vs.c | 20 --
>  src/mesa/drivers/dri/i965/brw_vs.h | 13 
>  3 files changed, 31 insertions(+), 39 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp 
> b/src/mesa/drivers/dri/i965/brw_vec4.cpp
> index 42d014c..39715c4 100644
> --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
> @@ -1872,10 +1872,11 @@ extern "C" {
>   */
>  const unsigned *
>  brw_vs_emit(struct brw_context *brw,
> -struct gl_shader_program *prog,
> -struct brw_vs_compile *c,
> -struct brw_vs_prog_data *prog_data,
>  void *mem_ctx,
> +const struct brw_vs_prog_key *key,
> +struct brw_vs_prog_data *prog_data,
> +struct gl_vertex_program *vp,
> +struct gl_shader_program *prog,
>  unsigned *final_assembly_size)
>  {
> bool start_busy = false;
> @@ -1894,29 +1895,29 @@ brw_vs_emit(struct brw_context *brw,
>  
> int st_index = -1;
> if (INTEL_DEBUG & DEBUG_SHADER_TIME)
> -  st_index = brw_get_shader_time_index(brw, prog, &c->vp->program.Base,
> +  st_index = brw_get_shader_time_index(brw, prog, &vp->Base,
> ST_VS);

This would now fit into the previous line. Two similar insignificant
formatting nits further down, either way:

Reviewed-by: Topi Pohjolainen 

>  
> if (unlikely(INTEL_DEBUG & DEBUG_VS))
> -  brw_dump_ir("vertex", prog, &shader->base, &c->vp->program.Base);
> +  brw_dump_ir("vertex", prog, &shader->base, &vp->Base);
>  
> if (brw->intelScreen->compiler->scalar_vs) {
> -  if (!c->vp->program.Base.nir) {
> +  if (!vp->Base.nir) {
>   /* Normally we generate NIR in LinkShader() or
>* ProgramStringNotify(), but Mesa's fixed-function vertex program
>* handling doesn't notify the driver at all.  Just do it here, at
>* the last minute, even though it's lame.
>*/
> - assert(c->vp->program.Base.Id == 0 && prog == NULL);
> - c->vp->program.Base.nir =
> -brw_create_nir(brw, NULL, &c->vp->program.Base, 
> MESA_SHADER_VERTEX);
> + assert(vp->Base.Id == 0 && prog == NULL);
> + vp->Base.nir =
> +brw_create_nir(brw, NULL, &vp->Base, MESA_SHADER_VERTEX);
>}
>  
>prog_data->base.dispatch_mode = DISPATCH_MODE_SIMD8;
>  
>fs_visitor v(brw->intelScreen->compiler, brw,
> -   mem_ctx, MESA_SHADER_VERTEX, &c->key,
> -   &prog_data->base.base, prog, &c->vp->program.Base,
> +   mem_ctx, MESA_SHADER_VERTEX, key,
> +   &prog_data->base.base, prog, &vp->Base,
> 8, st_index);
>if (!v.run_vs(brw_select_clip_planes(&brw->ctx))) {
>   if (prog) {
> @@ -1931,8 +1932,8 @@ brw_vs_emit(struct brw_context *brw,
>}
>  
>fs_generator g(brw->intelScreen->compiler, brw,
> - mem_ctx, (void *) &c->key, &prog_data->base.base,
> - &c->vp->program.Base, v.promoted_constants,
> + mem_ctx, (void *) key, &prog_data->base.base,

You could drop the extra space before 'key'.

> + &vp->Base, v.promoted_constants,
>   v.runtime_check_aads_emit, "VS");
>if (INTEL_DEBUG & DEBUG_VS) {
>   char *name;
> @@ -1942,7 +1943,7 @@ brw_vs_emit(struct brw_context *brw,
> prog->Name);
>   } else {
>  name = ralloc_asprintf(mem_ctx, "vertex program %d",
> -   c->vp->program.Base.Id);
> +   vp->Base.Id);
>   }
>   g.enable_debug(name);
>}
> @@ -1953,8 +1954,8 @@ brw_vs_emit(struct brw_context *brw,
> if (!assembly) {
>prog_data->base.dispatch_mode = DISPATCH_MODE_4X2_DUAL_OBJECT;
>  
> -  vec4_vs_visitor v(brw->intelScreen->compiler, brw, &c->key, prog_data,
> -&c->vp->program, prog, mem_ctx, st_index,
> +  vec4_vs_visitor v(brw->intelScreen->compiler, brw, key, prog_data,
> +vp, prog, mem_ctx, st_index,
>  !_mesa_is_gles3(&brw->ctx));
>if (!v.run(brw_select_clip_planes(&brw->ctx))) {
>   if (prog) {
> @@ -1969,14 +1970,14 @@ brw_vs_emit(struct brw_context *brw,
>}
>  
>vec4_generator g(brw->intelScreen->compiler, brw,
> -   prog, &c->vp->program.Base, &prog_data->base,
> +

Re: [Mesa-dev] [PATCH 1/6] i965/vec4: Plumb log_data through so the backend_shader field gets set.

2015-07-05 Thread Pohjolainen, Topi

On Fri, Jul 03, 2015 at 09:29:16AM -0700, Kenneth Graunke wrote:
> On Friday, July 03, 2015 10:50:52 AM Pohjolainen, Topi wrote:
> > On Wed, Jul 01, 2015 at 03:03:31PM -0700, Kenneth Graunke wrote:
> > > Jason plumbed this through a while back in the FS backend, but
> > > apparently we were just passing NULL in the vec4 backend.
> > > 
> > > This patch passes brw in as intended.
> > > 
> > > Signed-off-by: Kenneth Graunke 
> > > ---
> > >  src/mesa/drivers/dri/i965/brw_vec4.cpp|  2 +-
> > >  src/mesa/drivers/dri/i965/brw_vec4.h  |  1 +
> > >  src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp | 10 ++
> > >  src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.h   |  1 +
> > >  src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp|  3 ++-
> > >  src/mesa/drivers/dri/i965/brw_vec4_vs_visitor.cpp |  4 +++-
> > >  src/mesa/drivers/dri/i965/brw_vs.h|  1 +
> > >  src/mesa/drivers/dri/i965/gen6_gs_visitor.h   |  4 +++-
> > >  8 files changed, 18 insertions(+), 8 deletions(-)
> > > 
> > > diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp 
> > > b/src/mesa/drivers/dri/i965/brw_vec4.cpp
> > > index a5c686c..2a56564 100644
> > > --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
> > > +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
> > > @@ -1940,7 +1940,7 @@ brw_vs_emit(struct brw_context *brw,
> > > if (!assembly) {
> > >prog_data->base.dispatch_mode = DISPATCH_MODE_4X2_DUAL_OBJECT;
> > >  
> > > -  vec4_vs_visitor v(brw->intelScreen->compiler,
> > > +  vec4_vs_visitor v(brw->intelScreen->compiler, brw,
> > >  c, prog_data, prog, mem_ctx, st_index,
> > >  !_mesa_is_gles3(&brw->ctx));
> > >if (!v.run(brw_select_clip_planes(&brw->ctx))) {
> > > diff --git a/src/mesa/drivers/dri/i965/brw_vec4.h 
> > > b/src/mesa/drivers/dri/i965/brw_vec4.h
> > > index 2ac1693..043557b 100644
> > > --- a/src/mesa/drivers/dri/i965/brw_vec4.h
> > > +++ b/src/mesa/drivers/dri/i965/brw_vec4.h
> > > @@ -77,6 +77,7 @@ class vec4_visitor : public backend_shader, public 
> > > ir_visitor
> > >  {
> > >  public:
> > > vec4_visitor(const struct brw_compiler *compiler,
> > > +void *log_data,
> > 
> > As far as I can see, all the constructors addressed in this patch are
> > "struct brw_context" aware. Could we use the type "struct brw_context *"
> > instead of "void *"? The pointer is in the end given to 
> > shader_perf_log_mesa()
> > which in turn unconditionally casts is to "struct brw_context *".
> 
> Jason is trying to separate the compiler backend from the OpenGL driver,
> so we can more easily reuse it...elsewhere :)  "elsewhere" does not have
> a brw_context, but will have some other structure.  So instead he made the
> logging functions pass a void * closure.
> 
> I'm also concerned that if we pass in brw_context that people will start
> using it everywhere.  Admittedly, having to type log_data-> might deter
> them, though...

Okay, there is a specific reason for it, and in that case I'm fine with
the interface as is.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] i965: Mark constant static data as const.

2015-07-14 Thread Pohjolainen, Topi

On Mon, Jul 13, 2015 at 04:15:02PM -0700, Matt Turner wrote:
> ---
>  src/mesa/drivers/dri/i965/brw_curbe.c   |  2 +-
>  src/mesa/drivers/dri/i965/brw_draw_upload.c | 44 
> ++---
>  2 files changed, 23 insertions(+), 23 deletions(-)

Reviewed-by: Topi Pohjolainen 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 02/10] i965: Reduce the scope of input in buffer tex setup

2015-07-15 Thread Pohjolainen, Topi

On Tue, Jul 14, 2015 at 04:48:19PM -0700, Ben Widawsky wrote:
> On Wed, Jul 01, 2015 at 02:46:32PM +0300, Topi Pohjolainen wrote:
> > Signed-off-by: Topi Pohjolainen 
> 
> I swear I am not trying to nitpick but I don't actually understand what your
> goal of the patch is. Could you maybe elaborate a bit on what "reduce the 
> scope
> of input in.."

That does reserve an explanation. Before this patch the call was given
the texture unit and the call resolved the texture object using it. Now,
the texture object is given directly and hence the scope of input arguments
can be thought to shrink as the call isn't aware of the set of texture
objects anymore.

I agree that the title is misleading and should be instead something on the
lines:

i965: Pass the tex object directly to surface setup

> 
> > ---
> >  src/mesa/drivers/dri/i965/brw_context.h   | 4 ++--
> >  src/mesa/drivers/dri/i965/brw_wm_surface_state.c  | 8 +++-
> >  src/mesa/drivers/dri/i965/gen7_wm_surface_state.c | 2 +-
> >  src/mesa/drivers/dri/i965/gen8_surface_state.c| 2 +-
> >  4 files changed, 7 insertions(+), 9 deletions(-)
> > 
> > diff --git a/src/mesa/drivers/dri/i965/brw_context.h 
> > b/src/mesa/drivers/dri/i965/brw_context.h
> > index ae29798..da018bf 100644
> > --- a/src/mesa/drivers/dri/i965/brw_context.h
> > +++ b/src/mesa/drivers/dri/i965/brw_context.h
> > @@ -1710,8 +1710,8 @@ void brw_create_constant_surface(struct brw_context 
> > *brw,
> >   uint32_t size,
> >   uint32_t *out_offset,
> >   bool dword_pitch);
> > -void brw_update_buffer_texture_surface(struct gl_context *ctx,
> > -   unsigned unit,
> > +void brw_update_buffer_texture_surface(struct brw_context *brw,
> > +   struct gl_texture_object *tObj,
> > uint32_t *surf_offset);
> >  void
> >  brw_update_sol_surface(struct brw_context *brw,
> > diff --git a/src/mesa/drivers/dri/i965/brw_wm_surface_state.c 
> > b/src/mesa/drivers/dri/i965/brw_wm_surface_state.c
> > index 72aad96..73aa719 100644
> > --- a/src/mesa/drivers/dri/i965/brw_wm_surface_state.c
> > +++ b/src/mesa/drivers/dri/i965/brw_wm_surface_state.c
> > @@ -276,12 +276,10 @@ gen4_emit_buffer_surface_state(struct brw_context 
> > *brw,
> >  }
> >  
> >  void
> > -brw_update_buffer_texture_surface(struct gl_context *ctx,
> > -  unsigned unit,
> > +brw_update_buffer_texture_surface(struct brw_context *brw,
> > +  struct gl_texture_object *tObj,
> >uint32_t *surf_offset)
> >  {
> > -   struct brw_context *brw = brw_context(ctx);
> > -   struct gl_texture_object *tObj = ctx->Texture.Unit[unit]._Current;
> > struct intel_buffer_object *intel_obj =
> >intel_buffer_object(tObj->BufferObject);
> > uint32_t size = tObj->BufferSize;
> > @@ -323,7 +321,7 @@ brw_update_texture_surface(struct gl_context *ctx,
> >  
> > /* BRW_NEW_TEXTURE_BUFFER */
> > if (tObj->Target == GL_TEXTURE_BUFFER) {
> > -  brw_update_buffer_texture_surface(ctx, unit, surf_offset);
> > +  brw_update_buffer_texture_surface(brw, tObj, surf_offset);
> >return;
> > }
> >  
> > diff --git a/src/mesa/drivers/dri/i965/gen7_wm_surface_state.c 
> > b/src/mesa/drivers/dri/i965/gen7_wm_surface_state.c
> > index 494bc22..6aa8299 100644
> > --- a/src/mesa/drivers/dri/i965/gen7_wm_surface_state.c
> > +++ b/src/mesa/drivers/dri/i965/gen7_wm_surface_state.c
> > @@ -357,7 +357,7 @@ gen7_update_texture_surface(struct gl_context *ctx,
> > struct gl_texture_object *obj = ctx->Texture.Unit[unit]._Current;
> >  
> > if (obj->Target == GL_TEXTURE_BUFFER) {
> > -  brw_update_buffer_texture_surface(ctx, unit, surf_offset);
> > +  brw_update_buffer_texture_surface(brw, obj, surf_offset);
> >  
> > } else {
> >struct intel_texture_object *intel_obj = intel_texture_object(obj);
> > diff --git a/src/mesa/drivers/dri/i965/gen8_surface_state.c 
> > b/src/mesa/drivers/dri/i965/gen8_surface_state.c
> > index c595ec3..11defd1 100644
> > --- a/src/mesa/drivers/dri/i965/gen8_surface_state.c
> > +++ b/src/mesa/drivers/dri/i965/gen8_surface_state.c
> > @@ -308,7 +308,7 @@ gen8_update_texture_surface(struct gl_context *ctx,
> > struct gl_texture_object *obj = ctx->Texture.Unit[unit]._Current;
> >  
> > if (obj->Target == GL_TEXTURE_BUFFER) {
> > -  brw_update_buffer_texture_surface(ctx, unit, surf_offset);
> > +  brw_update_buffer_texture_surface(brw, obj, surf_offset);
> >  
> > } else {
> >struct gl_texture_image *firstImage = obj->Image[0][obj->BaseLevel];
> > -- 
> > 1.9.3
> > 
> > ___
> > mesa-dev mailing list
> > mesa-dev@lists.freedesktop.org
> > http://lists.freedesktop.org/mailman/listinfo/mesa-dev
_

Re: [Mesa-dev] [PATCH v2 12/17] i965/vs: Rework vs_emit to take a nir_shader and a brw_compiler

2015-10-14 Thread Pohjolainen, Topi

On Sat, Oct 10, 2015 at 08:09:01AM -0700, Jason Ekstrand wrote:
> This commit removes all dependence on GL state by getting rid of the
> brw_context parameter and the GL data structures.
> 
> v2 (Jason Ekstrand):
>- Patch use_legacy_snorm_formula through as a function argument rather
>  than trying to go through the shader key.
> ---
>  src/mesa/drivers/dri/i965/brw_vec4.cpp | 70 
> +-
>  src/mesa/drivers/dri/i965/brw_vs.c | 16 +++-
>  src/mesa/drivers/dri/i965/brw_vs.h | 12 --
>  3 files changed, 49 insertions(+), 49 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp 
> b/src/mesa/drivers/dri/i965/brw_vec4.cpp
> index 4b8390f..8e38729 100644
> --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
> @@ -1937,51 +1937,42 @@ extern "C" {
>   * Returns the final assembly and the program's size.
>   */
>  const unsigned *
> -brw_vs_emit(struct brw_context *brw,
> +brw_vs_emit(const struct brw_compiler *compiler, void *log_data,
>  void *mem_ctx,
>  const struct brw_vs_prog_key *key,
>  struct brw_vs_prog_data *prog_data,
> -struct gl_vertex_program *vp,
> -struct gl_shader_program *prog,
> +const nir_shader *shader,
> +gl_clip_plane *clip_planes,
> +bool use_legacy_snorm_formula,
>  int shader_time_index,
> -unsigned *final_assembly_size)
> +unsigned *final_assembly_size,
> +char **error_str)
>  {
> const unsigned *assembly = NULL;
>  
> -   if (brw->intelScreen->compiler->scalar_vs) {
> +   if (compiler->scalar_vs) {
>prog_data->base.dispatch_mode = DISPATCH_MODE_SIMD8;
>  
> -  fs_visitor v(brw->intelScreen->compiler, brw,
> -   mem_ctx, key, &prog_data->base.base,
> +  fs_visitor v(compiler, log_data, mem_ctx, key, &prog_data->base.base,
> NULL, /* prog; Only used for TEXTURE_RECTANGLE on gen < 8 
> */
> -   vp->Base.nir, 8, shader_time_index);
> -  if (!v.run_vs(brw_select_clip_planes(&brw->ctx))) {
> - if (prog) {
> -prog->LinkStatus = false;
> -ralloc_strcat(&prog->InfoLog, v.fail_msg);
> - }
> -
> - _mesa_problem(NULL, "Failed to compile vertex shader: %s\n",
> -   v.fail_msg);
> +   shader, 8, shader_time_index);
> +  if (!v.run_vs(clip_planes)) {
> + if (error_str)
> +*error_str = ralloc_strdup(mem_ctx, v.fail_msg);

I don't particularly like the complexity of the error reporting mechanism.
First vec4_visitor::fail() uses ralloc_asprintf() to create one string, then
we make a copy of it here and finally the caller of brw_vs_emit() makes yet
another copy using ralloc_strcat().
I wonder if we could pass the final destination all the way for the
vec4_visitor::fail() to augment with ralloc_asprintf() and hence avoid all
the indirection in the middle. What do you think?

>  
>   return NULL;
>}
>  
> -  fs_generator g(brw->intelScreen->compiler, brw,
> - mem_ctx, (void *) key, &prog_data->base.base,
> - v.promoted_constants,
> +  fs_generator g(compiler, log_data, mem_ctx, (void *) key,
> + &prog_data->base.base, v.promoted_constants,
>   v.runtime_check_aads_emit, "VS");
>if (INTEL_DEBUG & DEBUG_VS) {
> - char *name;
> - if (prog) {
> -name = ralloc_asprintf(mem_ctx, "%s vertex shader %d",
> -   prog->Label ? prog->Label : "unnamed",
> -   prog->Name);
> - } else {
> -name = ralloc_asprintf(mem_ctx, "vertex program %d",
> -   vp->Base.Id);
> - }
> - g.enable_debug(name);
> + const char *debug_name =
> +ralloc_asprintf(mem_ctx, "%s vertex shader %s",
> +shader->info.label ? shader->info.label : 
> "unnamed",
> +shader->info.name);
> +
> + g.enable_debug(debug_name);
>}
>g.generate_code(v.cfg, 8);
>assembly = g.get_assembly(final_assembly_size);
> @@ -1990,26 +1981,19 @@ brw_vs_emit(struct brw_context *brw,
> if (!assembly) {
>prog_data->base.dispatch_mode = DISPATCH_MODE_4X2_DUAL_OBJECT;
>  
> -  vec4_vs_visitor v(brw->intelScreen->compiler, brw, key, prog_data,
> -vp->Base.nir, brw_select_clip_planes(&brw->ctx),
> -mem_ctx, shader_time_index,
> -!_mesa_is_gles3(&brw->ctx));
> +  vec4_vs_visitor v(compiler, log_data, key, prog_data,
> +shader, clip_planes, mem_ctx,
> +shader_time_index, use_legacy_snorm_formula);
>if (!v.run()) {
> - if (prog) {
>

Re: [Mesa-dev] [PATCH v2 12/17] i965/vs: Rework vs_emit to take a nir_shader and a brw_compiler

2015-10-14 Thread Pohjolainen, Topi

On Wed, Oct 14, 2015 at 11:25:40AM +0300, Pohjolainen, Topi wrote:
> On Sat, Oct 10, 2015 at 08:09:01AM -0700, Jason Ekstrand wrote:
> > This commit removes all dependence on GL state by getting rid of the
> > brw_context parameter and the GL data structures.
> > 
> > v2 (Jason Ekstrand):
> >- Patch use_legacy_snorm_formula through as a function argument rather
> >  than trying to go through the shader key.
> > ---
> >  src/mesa/drivers/dri/i965/brw_vec4.cpp | 70 
> > +-
> >  src/mesa/drivers/dri/i965/brw_vs.c | 16 +++-
> >  src/mesa/drivers/dri/i965/brw_vs.h | 12 --
> >  3 files changed, 49 insertions(+), 49 deletions(-)
> > 
> > diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp 
> > b/src/mesa/drivers/dri/i965/brw_vec4.cpp
> > index 4b8390f..8e38729 100644
> > --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
> > +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
> > @@ -1937,51 +1937,42 @@ extern "C" {
> >   * Returns the final assembly and the program's size.
> >   */
> >  const unsigned *
> > -brw_vs_emit(struct brw_context *brw,
> > +brw_vs_emit(const struct brw_compiler *compiler, void *log_data,
> >  void *mem_ctx,
> >  const struct brw_vs_prog_key *key,
> >  struct brw_vs_prog_data *prog_data,
> > -struct gl_vertex_program *vp,
> > -struct gl_shader_program *prog,
> > +const nir_shader *shader,
> > +gl_clip_plane *clip_planes,
> > +bool use_legacy_snorm_formula,
> >  int shader_time_index,
> > -unsigned *final_assembly_size)
> > +unsigned *final_assembly_size,
> > +char **error_str)
> >  {
> > const unsigned *assembly = NULL;
> >  
> > -   if (brw->intelScreen->compiler->scalar_vs) {
> > +   if (compiler->scalar_vs) {
> >prog_data->base.dispatch_mode = DISPATCH_MODE_SIMD8;
> >  
> > -  fs_visitor v(brw->intelScreen->compiler, brw,
> > -   mem_ctx, key, &prog_data->base.base,
> > +  fs_visitor v(compiler, log_data, mem_ctx, key, &prog_data->base.base,
> > NULL, /* prog; Only used for TEXTURE_RECTANGLE on gen < 
> > 8 */
> > -   vp->Base.nir, 8, shader_time_index);
> > -  if (!v.run_vs(brw_select_clip_planes(&brw->ctx))) {
> > - if (prog) {
> > -prog->LinkStatus = false;
> > -ralloc_strcat(&prog->InfoLog, v.fail_msg);
> > - }
> > -
> > - _mesa_problem(NULL, "Failed to compile vertex shader: %s\n",
> > -   v.fail_msg);
> > +   shader, 8, shader_time_index);
> > +  if (!v.run_vs(clip_planes)) {
> > + if (error_str)
> > +*error_str = ralloc_strdup(mem_ctx, v.fail_msg);
> 
> I don't particularly like the complexity of the error reporting mechanism.
> First vec4_visitor::fail() uses ralloc_asprintf() to create one string, then
> we make a copy of it here and finally the caller of brw_vs_emit() makes yet
> another copy using ralloc_strcat().
> I wonder if we could pass the final destination all the way for the
> vec4_visitor::fail() to augment with ralloc_asprintf() and hence avoid all

Or more appropiately using ralloc_asprintf_append()...

> the indirection in the middle. What do you think?
> 
> >  
> >   return NULL;
> >}
> >  
> > -  fs_generator g(brw->intelScreen->compiler, brw,
> > - mem_ctx, (void *) key, &prog_data->base.base,
> > - v.promoted_constants,
> > +  fs_generator g(compiler, log_data, mem_ctx, (void *) key,
> > + &prog_data->base.base, v.promoted_constants,
> >   v.runtime_check_aads_emit, "VS");
> >if (INTEL_DEBUG & DEBUG_VS) {
> > - char *name;
> > - if (prog) {
> > -name = ralloc_asprintf(mem_ctx, "%s vertex shader %d",
> > -   prog->Label ? prog->Label : "unnamed",
> > -   prog->Name);
> > - } else {
> > -name = ralloc_asprintf(mem_ctx, "vertex program %d",
> > -   vp->Base.Id);
> > - }
> > - g.enable_debug(name);
> > + const char *debug_name

Re: [Mesa-dev] [PATCH v2 18/17 (was 10/17)] i965/vs: Move use_legacy_snorm_formula into the shader key

2015-10-14 Thread Pohjolainen, Topi

On Sat, Oct 10, 2015 at 08:05:59AM -0700, Jason Ekstrand wrote:
> This is really an input into the shader compiler so it kind of makes sense
> in the key.  Also, given where it's placed into the key, it doesn't
> actually make it any bigger.
> 
> v2 (Jason Ekstrand):
>- Rebase on top of the compiler clean-ups so the affects of this patch
>  can better be studied without being in the middle of a series.

I guess you are planning to check the fixed hangs before pushing. The patch
itself looks good:

Reviewed-by: Topi Pohjolainen 

> ---
>  src/mesa/drivers/dri/i965/brw_compiler.h  | 3 ++-
>  src/mesa/drivers/dri/i965/brw_vec4.cpp| 4 +---
>  src/mesa/drivers/dri/i965/brw_vec4_vs_visitor.cpp | 9 -
>  src/mesa/drivers/dri/i965/brw_vs.c| 3 ++-
>  src/mesa/drivers/dri/i965/brw_vs.h| 5 +
>  5 files changed, 10 insertions(+), 14 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_compiler.h 
> b/src/mesa/drivers/dri/i965/brw_compiler.h
> index 4bc1caa..153e381 100644
> --- a/src/mesa/drivers/dri/i965/brw_compiler.h
> +++ b/src/mesa/drivers/dri/i965/brw_compiler.h
> @@ -161,6 +161,8 @@ struct brw_vs_prog_key {
>  
> bool clamp_vertex_color:1;
>  
> +   bool use_legacy_snorm_formula:1;
> +
> /**
>  * How many user clipping planes are being uploaded to the vertex shader 
> as
>  * push constants.
> @@ -585,7 +587,6 @@ brw_compile_vs(const struct brw_compiler *compiler, void 
> *log_data,
> struct brw_vs_prog_data *prog_data,
> const struct nir_shader *shader,
> gl_clip_plane *clip_planes,
> -   bool use_legacy_snorm_formula,
> int shader_time_index,
> unsigned *final_assembly_size,
> char **error_str);
> diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp 
> b/src/mesa/drivers/dri/i965/brw_vec4.cpp
> index 8636323..5336590 100644
> --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
> @@ -1943,7 +1943,6 @@ brw_compile_vs(const struct brw_compiler *compiler, 
> void *log_data,
> struct brw_vs_prog_data *prog_data,
> const nir_shader *shader,
> gl_clip_plane *clip_planes,
> -   bool use_legacy_snorm_formula,
> int shader_time_index,
> unsigned *final_assembly_size,
> char **error_str)
> @@ -1982,8 +1981,7 @@ brw_compile_vs(const struct brw_compiler *compiler, 
> void *log_data,
>prog_data->base.dispatch_mode = DISPATCH_MODE_4X2_DUAL_OBJECT;
>  
>vec4_vs_visitor v(compiler, log_data, key, prog_data,
> -shader, clip_planes, mem_ctx,
> -shader_time_index, use_legacy_snorm_formula);
> +shader, clip_planes, mem_ctx, shader_time_index);
>if (!v.run()) {
>   if (error_str)
>  *error_str = ralloc_strdup(mem_ctx, v.fail_msg);
> diff --git a/src/mesa/drivers/dri/i965/brw_vec4_vs_visitor.cpp 
> b/src/mesa/drivers/dri/i965/brw_vec4_vs_visitor.cpp
> index 485a80e..9cf04cd 100644
> --- a/src/mesa/drivers/dri/i965/brw_vec4_vs_visitor.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_vec4_vs_visitor.cpp
> @@ -77,7 +77,8 @@ vec4_vs_visitor::emit_prolog()
>  /* ES 3.0 has different rules for converting signed normalized
>   * fixed-point numbers than desktop GL.
>   */
> -if ((wa_flags & BRW_ATTRIB_WA_SIGN) && 
> !use_legacy_snorm_formula) {
> +if ((wa_flags & BRW_ATTRIB_WA_SIGN) &&
> +!key->use_legacy_snorm_formula) {
> /* According to equation 2.2 of the ES 3.0 specification,
>  * signed normalization conversion is done by:
>  *
> @@ -304,14 +305,12 @@ vec4_vs_visitor::vec4_vs_visitor(const struct 
> brw_compiler *compiler,
>   const nir_shader *shader,
>   gl_clip_plane *clip_planes,
>   void *mem_ctx,
> - int shader_time_index,
> - bool use_legacy_snorm_formula)
> + int shader_time_index)
> : vec4_visitor(compiler, log_data, &key->tex, &vs_prog_data->base, shader,
>mem_ctx, false /* no_spills */, shader_time_index),
>   key(key),
>   vs_prog_data(vs_prog_data),
> - clip_planes(clip_planes),
> - use_legacy_snorm_formula(use_legacy_snorm_formula)
> + clip_planes(clip_planes)
>  {
>  }
>  
> diff --git a/src/mesa/drivers/dri/i965/brw_vs.c 
> b/src/mesa/drivers/dri/i965/brw_vs.c
> index 9c9b83b..3b3eb8b 100644
> --- a/src/mesa/drivers/dri/i965/brw_vs.c
> +++ b/src/mesa/drivers/dri/i965/brw_vs.c
> @@ -184,7 +184,6 @@ brw_codegen_vs_prog(struct brw_context *brw,
> program = brw_compile_vs(brw->intelScreen->compiler, brw, mem_ctx,

Re: [Mesa-dev] [PATCH 05/10] i965/meta/gen9: Individually fast clear color attachments

2015-10-14 Thread Pohjolainen, Topi

On Wed, Oct 14, 2015 at 11:39:03AM +0200, Neil Roberts wrote:
> Ben Widawsky  writes:
> 
> > The impetus for this patch comes from a seemingly benign statement within 
> > the
> > spec (quoted within the patch). For me, this patch was at some point 
> > critical
> > for getting stable piglit results (though this did not seem to be the case 
> > on a
> > branch Chad was working on).
> >
> > It is very important for clearing multiple color buffer attachments and can 
> > be
> > observed in the following piglit tests:
> > spec/arb_framebuffer_object/fbo-drawbuffers-none glclear
> > spec/ext_framebuffer_multisample/blit-multiple-render-targets 0
> >
> > Signed-off-by: Ben Widawsky 
> > ---
> >  src/mesa/drivers/dri/i965/brw_meta_fast_clear.c | 97 
> > +
> >  1 file changed, 84 insertions(+), 13 deletions(-)
> >
> > diff --git a/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c 
> > b/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
> > index 7bf52f0..9e6711e 100644
> > --- a/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
> > +++ b/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
> > @@ -427,6 +427,74 @@ use_rectlist(struct brw_context *brw, bool enable)
> > brw->ctx.NewDriverState |= BRW_NEW_FRAGMENT_PROGRAM;
> >  }
> >  
> > +/**
> > + * Individually fast clear each color buffer attachment. On previous gens 
> > this
> > + * isn't required. The motivation for this comes from one line (which 
> > seems to
> > + * be specific to SKL+). The list item is in section titled _MCS Buffer for
> > + * Render Target(s)_
> > + *
> > + *   "Since only one RT is bound with a clear pass, only one RT can be 
> > cleared
> > + *   at a time. To clear multiple RTs, multiple clear passes are required."
> 
> This sentence also appears in the HSW PRM so it seems a bit odd if it's
> only causing problems on SKL. I guess if we get Piglit regressions
> without it then it makes sense to have the patch. It might be worth just
> double checking whether this patch is completely necessary. The wording
> in the commit message seems a little unsure.

The spec seems to be missing something as the section discussing "Render
Target Fast Clear" seems to suggest the opposite:

"The render target(s) is/are bound as they normally would be, with the MCS
 surface defined in SURFACE_STATE."
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 05/10] i965/meta/gen9: Individually fast clear color attachments

2015-10-14 Thread Pohjolainen, Topi

On Wed, Oct 14, 2015 at 09:54:43AM -0700, Ben Widawsky wrote:
> On Wed, Oct 14, 2015 at 02:43:24PM +0300, Pohjolainen, Topi wrote:
> > On Wed, Oct 14, 2015 at 11:39:03AM +0200, Neil Roberts wrote:
> > > Ben Widawsky  writes:
> > > 
> > > > The impetus for this patch comes from a seemingly benign statement 
> > > > within the
> > > > spec (quoted within the patch). For me, this patch was at some point 
> > > > critical
> > > > for getting stable piglit results (though this did not seem to be the 
> > > > case on a
> > > > branch Chad was working on).
> > > >
> > > > It is very important for clearing multiple color buffer attachments and 
> > > > can be
> > > > observed in the following piglit tests:
> > > > spec/arb_framebuffer_object/fbo-drawbuffers-none glclear
> > > > spec/ext_framebuffer_multisample/blit-multiple-render-targets 0
> > > >
> > > > Signed-off-by: Ben Widawsky 
> > > > ---
> > > >  src/mesa/drivers/dri/i965/brw_meta_fast_clear.c | 97 
> > > > +
> > > >  1 file changed, 84 insertions(+), 13 deletions(-)
> > > >
> > > > diff --git a/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c 
> > > > b/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
> > > > index 7bf52f0..9e6711e 100644
> > > > --- a/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
> > > > +++ b/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
> > > > @@ -427,6 +427,74 @@ use_rectlist(struct brw_context *brw, bool enable)
> > > > brw->ctx.NewDriverState |= BRW_NEW_FRAGMENT_PROGRAM;
> > > >  }
> > > >  
> > > > +/**
> > > > + * Individually fast clear each color buffer attachment. On previous 
> > > > gens this
> > > > + * isn't required. The motivation for this comes from one line (which 
> > > > seems to
> > > > + * be specific to SKL+). The list item is in section titled _MCS 
> > > > Buffer for
> > > > + * Render Target(s)_
> > > > + *
> > > > + *   "Since only one RT is bound with a clear pass, only one RT can be 
> > > > cleared
> > > > + *   at a time. To clear multiple RTs, multiple clear passes are 
> > > > required."
> > > 
> > > This sentence also appears in the HSW PRM so it seems a bit odd if it's
> > > only causing problems on SKL. I guess if we get Piglit regressions
> > > without it then it makes sense to have the patch. It might be worth just
> > > double checking whether this patch is completely necessary. The wording
> > > in the commit message seems a little unsure.
> > 
> > The spec seems to be missing something as the section discussing "Render
> > Target Fast Clear" seems to suggest the opposite:
> > 
> > "The render target(s) is/are bound as they normally would be, with the MCS
> >  surface defined in SURFACE_STATE."
> 
> I am aware of all this. Neil, yes it is completely necessary for piglit (I 
> don't
> know if anything in the real world does this or not).
> 
> You are both asking to me to provide something which may be impossible, an
> explanation of why the docs and/or hardware are behaving this way. Let me
> respond in kind, please provide an alternate patch which fixes:
> spec/ext_framebuffer_multisample/blit-multiple-render-targets 0
> spec/arb_framebuffer_object/fbo-drawbuffers-none glclear (all subtests)
> 
> FWIW Topi, it's also contradicted in 3DSTATE_PS definition.

You misunderstood me I think, I'm not questioning your patch or your
interpretation, or asking you to provide some information that just isn't
there in the spec. We talked about this quite a bit. I'm just saying that I
feel that something is missing in the spec.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v2 12/17] i965/vs: Rework vs_emit to take a nir_shader and a brw_compiler

2015-10-14 Thread Pohjolainen, Topi

On Wed, Oct 14, 2015 at 11:53:37AM -0700, Jason Ekstrand wrote:
> On Wed, Oct 14, 2015 at 1:41 AM, Pohjolainen, Topi
>  wrote:
> > On Wed, Oct 14, 2015 at 11:25:40AM +0300, Pohjolainen, Topi wrote:
> >> On Sat, Oct 10, 2015 at 08:09:01AM -0700, Jason Ekstrand wrote:
> >> > This commit removes all dependence on GL state by getting rid of the
> >> > brw_context parameter and the GL data structures.
> >> >
> >> > v2 (Jason Ekstrand):
> >> >- Patch use_legacy_snorm_formula through as a function argument rather
> >> >  than trying to go through the shader key.
> >> > ---
> >> >  src/mesa/drivers/dri/i965/brw_vec4.cpp | 70 
> >> > +-
> >> >  src/mesa/drivers/dri/i965/brw_vs.c | 16 +++-
> >> >  src/mesa/drivers/dri/i965/brw_vs.h | 12 --
> >> >  3 files changed, 49 insertions(+), 49 deletions(-)
> >> >
> >> > diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp 
> >> > b/src/mesa/drivers/dri/i965/brw_vec4.cpp
> >> > index 4b8390f..8e38729 100644
> >> > --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
> >> > +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
> >> > @@ -1937,51 +1937,42 @@ extern "C" {
> >> >   * Returns the final assembly and the program's size.
> >> >   */
> >> >  const unsigned *
> >> > -brw_vs_emit(struct brw_context *brw,
> >> > +brw_vs_emit(const struct brw_compiler *compiler, void *log_data,
> >> >  void *mem_ctx,
> >> >  const struct brw_vs_prog_key *key,
> >> >  struct brw_vs_prog_data *prog_data,
> >> > -struct gl_vertex_program *vp,
> >> > -struct gl_shader_program *prog,
> >> > +const nir_shader *shader,
> >> > +gl_clip_plane *clip_planes,
> >> > +bool use_legacy_snorm_formula,
> >> >  int shader_time_index,
> >> > -unsigned *final_assembly_size)
> >> > +unsigned *final_assembly_size,
> >> > +char **error_str)
> >> >  {
> >> > const unsigned *assembly = NULL;
> >> >
> >> > -   if (brw->intelScreen->compiler->scalar_vs) {
> >> > +   if (compiler->scalar_vs) {
> >> >prog_data->base.dispatch_mode = DISPATCH_MODE_SIMD8;
> >> >
> >> > -  fs_visitor v(brw->intelScreen->compiler, brw,
> >> > -   mem_ctx, key, &prog_data->base.base,
> >> > +  fs_visitor v(compiler, log_data, mem_ctx, key, 
> >> > &prog_data->base.base,
> >> > NULL, /* prog; Only used for TEXTURE_RECTANGLE on 
> >> > gen < 8 */
> >> > -   vp->Base.nir, 8, shader_time_index);
> >> > -  if (!v.run_vs(brw_select_clip_planes(&brw->ctx))) {
> >> > - if (prog) {
> >> > -prog->LinkStatus = false;
> >> > -ralloc_strcat(&prog->InfoLog, v.fail_msg);
> >> > - }
> >> > -
> >> > - _mesa_problem(NULL, "Failed to compile vertex shader: %s\n",
> >> > -   v.fail_msg);
> >> > +   shader, 8, shader_time_index);
> >> > +  if (!v.run_vs(clip_planes)) {
> >> > + if (error_str)
> >> > +*error_str = ralloc_strdup(mem_ctx, v.fail_msg);
> >>
> >> I don't particularly like the complexity of the error reporting mechanism.
> >> First vec4_visitor::fail() uses ralloc_asprintf() to create one string, 
> >> then
> >> we make a copy of it here and finally the caller of brw_vs_emit() makes yet
> >> another copy using ralloc_strcat().
> >> I wonder if we could pass the final destination all the way for the
> >> vec4_visitor::fail() to augment with ralloc_asprintf() and hence avoid all
> >
> > Or more appropiately using ralloc_asprintf_append()...
> >
> >> the indirection in the middle. What do you think?
> 
> I'd be moderately ok with just doing "*error_str = v.fail_msg" and
> avoiding the extra copy.  I'm not a big fan of the extra copy, but I
> decided to leave it in for a couple of reasons
> 
> 1) It only happens on the error path so it's not a big deal.

I wasn't concerned about the overhead either, as you said

Re: [Mesa-dev] [PATCH 15/17] i965/fs: Move some of the prog_data setup into brw_wm_emit

2015-10-16 Thread Pohjolainen, Topi

On Fri, Oct 09, 2015 at 05:50:22AM -0700, Jason Ekstrand wrote:
> On Fri, Oct 9, 2015 at 12:10 AM, Pohjolainen, Topi
>  wrote:
> > On Thu, Oct 08, 2015 at 05:22:47PM -0700, Jason Ekstrand wrote:
> >> This commit moves the common/modern stuff.  Some legacy stuff such as
> >> setting use_alt_mode was left because it needs to know whether or not we're
> >> an ARB program.
> >> ---
> >>  src/mesa/drivers/dri/i965/brw_fs.cpp | 98 
> >> 
> >>  src/mesa/drivers/dri/i965/brw_wm.c   | 98 
> >> 
> >>  2 files changed, 98 insertions(+), 98 deletions(-)
> >>
> >> diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
> >> b/src/mesa/drivers/dri/i965/brw_fs.cpp
> >> index 146f4b4..0e39b50 100644
> >> --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
> >> +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
> >> @@ -5114,6 +5114,90 @@ fs_visitor::run_cs()
> >> return !failed;
> >>  }
> >>
> >> +/**
> >> + * Return a bitfield where bit n is set if barycentric interpolation mode 
> >> n
> >> + * (see enum brw_wm_barycentric_interp_mode) is needed by the fragment 
> >> shader.
> >> + */
> >> +static unsigned
> >> +brw_compute_barycentric_interp_modes(const struct brw_device_info 
> >> *devinfo,
> >> + bool shade_model_flat,
> >> + bool persample_shading,
> >> + const nir_shader *shader)
> >> +{
> >> +   unsigned barycentric_interp_modes = 0;
> >> +
> >> +   nir_foreach_variable(var, &shader->inputs) {
> >> +  enum glsl_interp_qualifier interp_qualifier =
> >> + (enum glsl_interp_qualifier)var->data.interpolation;
> >> +  bool is_centroid = var->data.centroid && !persample_shading;
> >> +  bool is_sample = var->data.sample || persample_shading;
> >> +  bool is_gl_Color = (var->data.location == VARYING_SLOT_COL0) ||
> >> + (var->data.location == VARYING_SLOT_COL1);
> >> +
> >> +  /* Ignore WPOS and FACE, because they don't require interpolation. 
> >> */
> >> +  if (var->data.location == VARYING_SLOT_POS ||
> >> +  var->data.location == VARYING_SLOT_FACE)
> >> + continue;
> >> +
> >> +  /* Determine the set (or sets) of barycentric coordinates needed to
> >> +   * interpolate this variable.  Note that when
> >> +   * brw->needs_unlit_centroid_workaround is set, centroid 
> >> interpolation
> >> +   * uses PIXEL interpolation for unlit pixels and CENTROID 
> >> interpolation
> >> +   * for lit pixels, so we need both sets of barycentric coordinates.
> >> +   */
> >> +  if (interp_qualifier == INTERP_QUALIFIER_NOPERSPECTIVE) {
> >> + if (is_centroid) {
> >> +barycentric_interp_modes |=
> >> +   1 << BRW_WM_NONPERSPECTIVE_CENTROID_BARYCENTRIC;
> >> + } else if (is_sample) {
> >> +barycentric_interp_modes |=
> >> +   1 << BRW_WM_NONPERSPECTIVE_SAMPLE_BARYCENTRIC;
> >> + }
> >> + if ((!is_centroid && !is_sample) ||
> >> + devinfo->needs_unlit_centroid_workaround) {
> >> +barycentric_interp_modes |=
> >> +   1 << BRW_WM_NONPERSPECTIVE_PIXEL_BARYCENTRIC;
> >> + }
> >> +  } else if (interp_qualifier == INTERP_QUALIFIER_SMOOTH ||
> >> + (!(shade_model_flat && is_gl_Color) &&
> >> +  interp_qualifier == INTERP_QUALIFIER_NONE)) {
> >> + if (is_centroid) {
> >> +barycentric_interp_modes |=
> >> +   1 << BRW_WM_PERSPECTIVE_CENTROID_BARYCENTRIC;
> >> + } else if (is_sample) {
> >> +barycentric_interp_modes |=
> >> +   1 << BRW_WM_PERSPECTIVE_SAMPLE_BARYCENTRIC;
> >> + }
> >> + if ((!is_centroid && !is_sample) ||
> >> + devinfo->needs_unlit_centroid_workaround) {
> >> +barycentric_interp_modes |=
> >> +   1 << BRW_WM_PERSPECTIVE_PIXEL_BARYCENTRIC;
> >> + }
> >> +  }
> >> +   }
> >> +
> >>

Re: [Mesa-dev] [PATCH v2 12/17] i965/vs: Rework vs_emit to take a nir_shader and a brw_compiler

2015-10-16 Thread Pohjolainen, Topi

On Thu, Oct 15, 2015 at 07:29:31AM -0700, Jason Ekstrand wrote:
>On Oct 14, 2015 10:48 PM, "Pohjolainen, Topi" 
>wrote:
>>
>> On Wed, Oct 14, 2015 at 11:53:37AM -0700, Jason Ekstrand wrote:
>> > On Wed, Oct 14, 2015 at 1:41 AM, Pohjolainen, Topi
>> >  wrote:
>    > > > On Wed, Oct 14, 2015 at 11:25:40AM +0300, Pohjolainen, Topi wrote:
>> > >> On Sat, Oct 10, 2015 at 08:09:01AM -0700, Jason Ekstrand wrote:
>> > >> > This commit removes all dependence on GL state by getting rid of
>the
>> > >> > brw_context parameter and the GL data structures.
>> > >> >
>> > >> > v2 (Jason Ekstrand):
>> > >> >- Patch use_legacy_snorm_formula through as a function
>argument rather
>> > >> >  than trying to go through the shader key.
>> > >> > ---
>> > >> >  src/mesa/drivers/dri/i965/brw_vec4.cpp | 70
>+-
>> > >> >  src/mesa/drivers/dri/i965/brw_vs.c | 16 +++-
>> > >> >  src/mesa/drivers/dri/i965/brw_vs.h | 12 --
>> > >> >  3 files changed, 49 insertions(+), 49 deletions(-)
>> > >> >
>> > >> > diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp
>b/src/mesa/drivers/dri/i965/brw_vec4.cpp
>> > >> > index 4b8390f..8e38729 100644
>> > >> > --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
>> > >> > +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
>> > >> > @@ -1937,51 +1937,42 @@ extern "C" {
>> > >> >   * Returns the final assembly and the program's size.
>> > >> >   */
>> > >> >  const unsigned *
>> > >> > -brw_vs_emit(struct brw_context *brw,
>> > >> > +brw_vs_emit(const struct brw_compiler *compiler, void *log_data,
>> > >> >  void *mem_ctx,
>> > >> >  const struct brw_vs_prog_key *key,
>> > >> >  struct brw_vs_prog_data *prog_data,
>> > >> > -struct gl_vertex_program *vp,
>> > >> > -struct gl_shader_program *prog,
>> > >> > +const nir_shader *shader,
>> > >> > +gl_clip_plane *clip_planes,
>> > >> > +bool use_legacy_snorm_formula,
>> > >> >  int shader_time_index,
>> > >> > -unsigned *final_assembly_size)
>> > >> > +unsigned *final_assembly_size,
>> > >> > +char **error_str)
>> > >> >  {
>> > >> > const unsigned *assembly = NULL;
>> > >> >
>> > >> > -   if (brw->intelScreen->compiler->scalar_vs) {
>> > >> > +   if (compiler->scalar_vs) {
>> > >> >prog_data->base.dispatch_mode = DISPATCH_MODE_SIMD8;
>> > >> >
>> > >> > -  fs_visitor v(brw->intelScreen->compiler, brw,
>> > >> > -   mem_ctx, key, &prog_data->base.base,
>> > >> > +  fs_visitor v(compiler, log_data, mem_ctx, key,
>&prog_data->base.base,
>> > >> > NULL, /* prog; Only used for
>TEXTURE_RECTANGLE on gen < 8 */
>> > >> > -   vp->Base.nir, 8, shader_time_index);
>> > >> > -  if (!v.run_vs(brw_select_clip_planes(&brw->ctx))) {
>> > >> > - if (prog) {
>> > >> > -prog->LinkStatus = false;
>> > >> > -ralloc_strcat(&prog->InfoLog, v.fail_msg);
>> > >> > - }
>> > >> > -
>> > >> > - _mesa_problem(NULL, "Failed to compile vertex shader:
>%s\n",
>> > >> > -   v.fail_msg);
>> > >> > +   shader, 8, shader_time_index);
>> > >> > +  if (!v.run_vs(clip_planes)) {
>> > >> > + if (error_str)
>> > >> > +*error_str = ralloc_strdup(mem_ctx, v.fail_msg);
>> > >>
>> > >> I don't particularly like

Re: [Mesa-dev] [PATCH 15/17] i965/fs: Move some of the prog_data setup into brw_wm_emit

2015-10-19 Thread Pohjolainen, Topi

On Fri, Oct 16, 2015 at 08:24:11AM -0700, Jason Ekstrand wrote:
> On Fri, Oct 16, 2015 at 12:35 AM, Pohjolainen, Topi
>  wrote:
> > On Fri, Oct 09, 2015 at 05:50:22AM -0700, Jason Ekstrand wrote:
> >> On Fri, Oct 9, 2015 at 12:10 AM, Pohjolainen, Topi
> >>  wrote:
> >> > On Thu, Oct 08, 2015 at 05:22:47PM -0700, Jason Ekstrand wrote:
> >> >> This commit moves the common/modern stuff.  Some legacy stuff such as
> >> >> setting use_alt_mode was left because it needs to know whether or not 
> >> >> we're
> >> >> an ARB program.
> >> >> ---
> >> >>  src/mesa/drivers/dri/i965/brw_fs.cpp | 98 
> >> >> 
> >> >>  src/mesa/drivers/dri/i965/brw_wm.c   | 98 
> >> >> 
> >> >>  2 files changed, 98 insertions(+), 98 deletions(-)
> >> >>
> >> >> diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
> >> >> b/src/mesa/drivers/dri/i965/brw_fs.cpp
> >> >> index 146f4b4..0e39b50 100644
> >> >> --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
> >> >> +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
> >> >> @@ -5114,6 +5114,90 @@ fs_visitor::run_cs()
> >> >> return !failed;
> >> >>  }
> >> >>
> >> >> +/**
> >> >> + * Return a bitfield where bit n is set if barycentric interpolation 
> >> >> mode n
> >> >> + * (see enum brw_wm_barycentric_interp_mode) is needed by the fragment 
> >> >> shader.
> >> >> + */
> >> >> +static unsigned
> >> >> +brw_compute_barycentric_interp_modes(const struct brw_device_info 
> >> >> *devinfo,
> >> >> + bool shade_model_flat,
> >> >> + bool persample_shading,
> >> >> + const nir_shader *shader)
> >> >> +{
> >> >> +   unsigned barycentric_interp_modes = 0;
> >> >> +
> >> >> +   nir_foreach_variable(var, &shader->inputs) {
> >> >> +  enum glsl_interp_qualifier interp_qualifier =
> >> >> + (enum glsl_interp_qualifier)var->data.interpolation;
> >> >> +  bool is_centroid = var->data.centroid && !persample_shading;
> >> >> +  bool is_sample = var->data.sample || persample_shading;
> >> >> +  bool is_gl_Color = (var->data.location == VARYING_SLOT_COL0) ||
> >> >> + (var->data.location == VARYING_SLOT_COL1);
> >> >> +
> >> >> +  /* Ignore WPOS and FACE, because they don't require 
> >> >> interpolation. */
> >> >> +  if (var->data.location == VARYING_SLOT_POS ||
> >> >> +  var->data.location == VARYING_SLOT_FACE)
> >> >> + continue;
> >> >> +
> >> >> +  /* Determine the set (or sets) of barycentric coordinates needed 
> >> >> to
> >> >> +   * interpolate this variable.  Note that when
> >> >> +   * brw->needs_unlit_centroid_workaround is set, centroid 
> >> >> interpolation
> >> >> +   * uses PIXEL interpolation for unlit pixels and CENTROID 
> >> >> interpolation
> >> >> +   * for lit pixels, so we need both sets of barycentric 
> >> >> coordinates.
> >> >> +   */
> >> >> +  if (interp_qualifier == INTERP_QUALIFIER_NOPERSPECTIVE) {
> >> >> + if (is_centroid) {
> >> >> +barycentric_interp_modes |=
> >> >> +   1 << BRW_WM_NONPERSPECTIVE_CENTROID_BARYCENTRIC;
> >> >> + } else if (is_sample) {
> >> >> +barycentric_interp_modes |=
> >> >> +   1 << BRW_WM_NONPERSPECTIVE_SAMPLE_BARYCENTRIC;
> >> >> + }
> >> >> + if ((!is_centroid && !is_sample) ||
> >> >> + devinfo->needs_unlit_centroid_workaround) {
> >> >> +barycentric_interp_modes |=
> >> >> +   1 << BRW_WM_NONPERSPECTIVE_PIXEL_BARYCENTRIC;
> >> >> + }
> >> >> +  } else if (interp_qualifier == INTERP_QUALIFIER_SMOOTH ||
> >> >> +

Re: [Mesa-dev] [PATCH 2/7] i965/gen9: Don't call tr_mode_vertical_texture_alignment() for 1D textures

2015-10-20 Thread Pohjolainen, Topi

On Mon, Oct 19, 2015 at 02:29:04PM -0700, Anuj Phogat wrote:
> On Thu, Aug 13, 2015 at 2:51 PM, Anuj Phogat  wrote:
> > Vertical alignment is not applicable to 1D textures.
> >
> > Signed-off-by: Anuj Phogat 
> > ---
> >  src/mesa/drivers/dri/i965/brw_tex_layout.c | 4 +++-
> >  1 file changed, 3 insertions(+), 1 deletion(-)
> >
> > diff --git a/src/mesa/drivers/dri/i965/brw_tex_layout.c 
> > b/src/mesa/drivers/dri/i965/brw_tex_layout.c
> > index 4e44b15..edd7518 100644
> > --- a/src/mesa/drivers/dri/i965/brw_tex_layout.c
> > +++ b/src/mesa/drivers/dri/i965/brw_tex_layout.c
> > @@ -277,7 +277,9 @@ intel_vertical_texture_alignment_unit(struct 
> > brw_context *brw,
> > if (mt->format == MESA_FORMAT_S_UINT8)
> >return brw->gen >= 7 ? 8 : 4;
> >
> > -   if (mt->tr_mode != INTEL_MIPTREE_TRMODE_NONE) {
> > +   if (mt->tr_mode != INTEL_MIPTREE_TRMODE_NONE &&
> > +   mt->target != GL_TEXTURE_1D &&
> > +   mt->target != GL_TEXTURE_1D_ARRAY) {

There is the exact same assertion in tr_mode_vertical_texture_alignment(),
and therefore this makes sense.

> >uint32_t align = tr_mode_vertical_texture_alignment(brw, mt);
> >/* XY_FAST_COPY_BLT doesn't support vertical alignment < 64 */
> >return align < 64 ? 64 : align;
> > --
> > 2.4.3
> >
> 
> Patches 2, 4, 5-7 of this series are waiting for review. These patches are 
> doing
> simple changes and should be easy to review. Here is a patchwork link to the
> list of patches:
> http://patchwork.freedesktop.org/project/mesa/patches/?submitter=10862

Along with patch 2, number 5 is clear improvment also. The replacament of the
static alignment table with a multiplier (targeting another alignment table)
in patch 4 wasn't at first a clear improvement to me. But combined with
patch 6 I can see them reducing the number of lines of code and therefore I'm
in favor of commiting them as well.
In patch 4 you could call "multiplier" as "multiplier_ys" as it is
specifically used to derive YS alignment from YF alignment.
In patch 6 (or later in patch 7) you could declare the variable "i" as const
and initialize it right away with the correct value.

Using "mt->cpp" instead of "_mesa_get_format_bytes(mt->format) * 8" in patch
7 looks better to me also.

So 2 and 4-7 are:

Reviewed-by: Topi Pohjolainen 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 4/9] i965: Set annotation_info's mem_ctx.

2015-10-26 Thread Pohjolainen, Topi

On Wed, Oct 21, 2015 at 03:58:12PM -0700, Matt Turner wrote:
> It was being memset to 0 previously.
> ---
>  src/mesa/drivers/dri/i965/brw_fs_generator.cpp   | 2 +-
>  src/mesa/drivers/dri/i965/brw_vec4_generator.cpp | 2 +-
>  src/mesa/drivers/dri/i965/intel_asm_annotation.c | 3 +++
>  3 files changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp 
> b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
> index aed4adb..8ab57f7 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
> @@ -2187,7 +2187,7 @@ fs_generator::generate_code(const cfg_t *cfg, int 
> dispatch_width)
>  
>dump_assembly(p->store, annotation.ann_count, annotation.ann,
>  p->devinfo);
> -  ralloc_free(annotation.ann);
> +  ralloc_free(annotation.mem_ctx);

I had to check how "annotation.ann" gets allocated - that is by reralloc()
against "annotation.mem_ctx".

Reviewed-by: Topi Pohjolainen 

> }
>  
> compiler->shader_debug_log(log_data,
> diff --git a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp 
> b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
> index a84f6c4..6ac8591 100644
> --- a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
> @@ -1659,7 +1659,7 @@ vec4_generator::generate_code(const cfg_t *cfg, const 
> nir_shader *nir)
>  
>dump_assembly(p->store, annotation.ann_count, annotation.ann,
>  p->devinfo);
> -  ralloc_free(annotation.ann);
> +  ralloc_free(annotation.mem_ctx);
> }
>  
> compiler->shader_debug_log(log_data,
> diff --git a/src/mesa/drivers/dri/i965/intel_asm_annotation.c 
> b/src/mesa/drivers/dri/i965/intel_asm_annotation.c
> index b3d6324..f87a9bb 100644
> --- a/src/mesa/drivers/dri/i965/intel_asm_annotation.c
> +++ b/src/mesa/drivers/dri/i965/intel_asm_annotation.c
> @@ -86,6 +86,9 @@ void annotate(const struct brw_device_info *devinfo,
>struct annotation_info *annotation, const struct cfg_t *cfg,
>struct backend_instruction *inst, unsigned offset)
>  {
> +   if (annotation->mem_ctx == NULL)
> +  annotation->mem_ctx = ralloc_context(NULL);
> +
> if (annotation->ann_size <= annotation->ann_count) {
>int old_size = annotation->ann_size;
>annotation->ann_size = MAX2(1024, annotation->ann_size * 2);
> -- 
> 2.4.9
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 5/9] i965: Combine assembly annotations if possible.

2015-10-26 Thread Pohjolainen, Topi

On Wed, Oct 21, 2015 at 03:58:13PM -0700, Matt Turner wrote:
> Often annotations are identical between sets of consecutive
> instructions. We can perhaps avoid some memory allocations by reusing
> the previous annotation.
> ---
>  src/mesa/drivers/dri/i965/intel_asm_annotation.c | 19 ++-
>  1 file changed, 14 insertions(+), 5 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/intel_asm_annotation.c 
> b/src/mesa/drivers/dri/i965/intel_asm_annotation.c
> index f87a9bb..58830db 100644
> --- a/src/mesa/drivers/dri/i965/intel_asm_annotation.c
> +++ b/src/mesa/drivers/dri/i965/intel_asm_annotation.c
> @@ -112,6 +112,20 @@ void annotate(const struct brw_device_info *devinfo,
>ann->block_start = cfg->blocks[annotation->cur_block];
> }
>  
> +   if (bblock_end(cfg->blocks[annotation->cur_block]) == inst) {
> +  ann->block_end = cfg->blocks[annotation->cur_block];
> +  annotation->cur_block++;
> +   }
> +
> +   /* Merge this annotation with the previous if possible. */
> +   struct annotation *prev = &annotation->ann[annotation->ann_count - 2];

What guarantees that annotation->ann_count is always at least two at this
point?

> +   if (ann->ir == prev->ir &&
> +   ann->annotation == prev->annotation &&
> +   ann->block_start == NULL) {
> +  annotation->ann_count--;
> +  return;
> +   }
> +
> /* There is no hardware DO instruction on Gen6+, so since DO always
>  * starts a basic block, we need to set the .block_start of the next
>  * instruction's annotation with a pointer to the bblock started by
> @@ -123,11 +137,6 @@ void annotate(const struct brw_device_info *devinfo,
> if (devinfo->gen >= 6 && inst->opcode == BRW_OPCODE_DO) {
>annotation->ann_count--;
> }
> -
> -   if (bblock_end(cfg->blocks[annotation->cur_block]) == inst) {
> -  ann->block_end = cfg->blocks[annotation->cur_block];
> -  annotation->cur_block++;
> -   }
>  }
>  
>  void
> -- 
> 2.4.9
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 6/9] i965: Add annotation_insert_error() and support for printing errors.

2015-10-26 Thread Pohjolainen, Topi

On Wed, Oct 21, 2015 at 03:58:14PM -0700, Matt Turner wrote:
> Will allow annotations to contain error messages (indicating an
> instruction violates a rule for instance) that are printed after the
> disassembly of the block.
> ---
>  src/mesa/drivers/dri/i965/intel_asm_annotation.c | 60 
> 
>  src/mesa/drivers/dri/i965/intel_asm_annotation.h |  7 +++
>  2 files changed, 67 insertions(+)
> 
> diff --git a/src/mesa/drivers/dri/i965/intel_asm_annotation.c 
> b/src/mesa/drivers/dri/i965/intel_asm_annotation.c
> index 58830db..eaee386 100644
> --- a/src/mesa/drivers/dri/i965/intel_asm_annotation.c
> +++ b/src/mesa/drivers/dri/i965/intel_asm_annotation.c
> @@ -69,6 +69,10 @@ dump_assembly(void *assembly, int num_annotations, struct 
> annotation *annotation
>  
>brw_disassemble(devinfo, assembly, start_offset, end_offset, stderr);
>  
> +  if (annotation[i].error) {
> + fputs(annotation[i].error, stderr);
> +  }
> +
>if (annotation[i].block_end) {
>   fprintf(stderr, "   END B%d", annotation[i].block_end->num);
>   foreach_list_typed(struct bblock_link, successor_link, link,
> @@ -152,3 +156,59 @@ annotation_finalize(struct annotation_info *annotation,
> }
> annotation->ann[annotation->ann_count].offset = next_inst_offset;
>  }
> +
> +void
> +annotation_insert_error(struct annotation_info *annotation, unsigned offset,
> +const char *error)
> +{
> +   struct annotation *ann = NULL;
> +
> +   if (!annotation->ann_count)
> +  return;
> +
> +   /* We may have to split an annotation, so ensure we have enough space
> +* allocated for that case up front.
> +*/
> +   if (annotation->ann_size <= annotation->ann_count) {
> +  int old_size = annotation->ann_size;
> +  annotation->ann_size = MAX2(1024, annotation->ann_size * 2);
> +  annotation->ann = reralloc(annotation->mem_ctx, annotation->ann,
> + struct annotation, annotation->ann_size);
> +  if (!annotation->ann)
> + return;
> +
> +  memset(annotation->ann + old_size, 0,
> + (annotation->ann_size - old_size) * sizeof(struct annotation));
> +   }

This same block can already be found in "annotate()". Would it made sense to
refactor and re-use?

> +
> +   for (int i = 0; i <= annotation->ann_count; i++) {
> +  if (annotation->ann[i].offset <= offset)
> + continue;
> +
> +  struct annotation *cur = &annotation->ann[i - 1];
> +  struct annotation *next = &annotation->ann[i];
> +  ann = cur;
> +
> +  if (offset + sizeof(brw_inst) != next->offset) {
> + memmove(next, cur,
> + (annotation->ann_count - i + 2) * sizeof(struct 
> annotation));

I guess the same question here as in patch five, does "i >= 2" always hold?

> + cur->error = NULL;
> + cur->error_length = 0;
> + cur->block_end = NULL;
> + next->offset = offset + sizeof(brw_inst);
> + next->block_start = NULL;
> + annotation->ann_count++;
> +  }
> +  break;
> +   }
> +
> +   assume(ann != NULL);
> +
> +   ralloc_asprintf_rewrite_tail(&ann->error, &ann->error_length, error);
> +
> +   /* FIXME: ralloc_vasprintf_rewrite_tail() allocates memory out of the
> +* null context. We have to reparent the it if we want it to be freed
> +* with the rest of the annotation context.
> +*/
> +   ralloc_steal(annotation->mem_ctx, ann->error);
> +}
> diff --git a/src/mesa/drivers/dri/i965/intel_asm_annotation.h 
> b/src/mesa/drivers/dri/i965/intel_asm_annotation.h
> index 6c72326..662a4b4 100644
> --- a/src/mesa/drivers/dri/i965/intel_asm_annotation.h
> +++ b/src/mesa/drivers/dri/i965/intel_asm_annotation.h
> @@ -37,6 +37,9 @@ struct cfg_t;
>  struct annotation {
> int offset;
>  
> +   size_t error_length;
> +   char *error;
> +
> /* Pointers to the basic block in the CFG if the instruction group starts
>  * or ends a basic block.
>  */
> @@ -69,6 +72,10 @@ annotate(const struct brw_device_info *devinfo,
>  void
>  annotation_finalize(struct annotation_info *annotation, unsigned offset);
>  
> +void
> +annotation_insert_error(struct annotation_info *annotation, unsigned offset,
> +const char *error);
> +
>  #ifdef __cplusplus
>  } /* extern "C" */
>  #endif
> -- 
> 2.4.9
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 2/2] glsl: add fragdata arrays to program resource list

2015-10-27 Thread Pohjolainen, Topi

On Tue, Oct 27, 2015 at 01:18:42PM +0200, Tapani P?lli wrote:
> This makes sure that user is still able to query properties about
> variables that have gotten removed by opt_dead_builtin_varyings pass.
> 
> Fixes following OpenGL ES 3.1 test:
>ES31-CTS.program_interface_query.output-layout
> 
> No Piglit regressions.
> 
> Signed-off-by: Tapani Pälli 
> ---
>  src/glsl/linker.cpp | 29 +
>  1 file changed, 29 insertions(+)
> 
> diff --git a/src/glsl/linker.cpp b/src/glsl/linker.cpp
> index cfd8f81..e9660fc 100644
> --- a/src/glsl/linker.cpp
> +++ b/src/glsl/linker.cpp
> @@ -3386,6 +3386,12 @@ add_interface_variables(struct gl_shader_program 
> *shProg,
>if (strncmp(var->name, "packed:", 7) == 0)
>   continue;
>  
> +  /* Skip fragdata arrays, these are handled separately
> +   * by add_fragdata_arrays.
> +   */
> +  if (strncmp(var->name, "gl_out_FragData", 15) == 0)
> + continue;
> +
>if (!add_program_resource(shProg, programInterface, var,
>  build_stageref(shProg, var->name,
> var->data.mode) | mask))
> @@ -3425,6 +3431,26 @@ add_packed_varyings(struct gl_shader_program *shProg, 
> int stage)
> return true;
>  }
>  
> +static bool
> +add_fragdata_arrays(struct gl_shader_program *shProg)
> +{
> +   struct gl_shader *sh = shProg->_LinkedShaders[MESA_SHADER_FRAGMENT];
> +
> +   if (!sh || !sh->fragdata_arrays)
> +  return true;
> +
> +   foreach_in_list(ir_instruction, node, sh->fragdata_arrays) {
> +  ir_variable *var = node->as_variable();
> +  if (var) {
> + assert(var->data.mode == ir_var_shader_out);
> + if (!add_program_resource(shProg, GL_PROGRAM_OUTPUT, var,
> +   (1 << MESA_SHADER_FRAGMENT)))

You can drop the extra ().
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 1/4] nir/instr_set: Add an allow_loads field

2015-10-27 Thread Pohjolainen, Topi

On Tue, Oct 27, 2015 at 10:28:58AM +0100, Iago Toral Quiroga wrote:
> We need this so we can configure different behaviors for passes that
> cannot deal with side-effectful instructions (CSE) and passes that can
> (we will add a load-combine pass shortly).
> 
> For now, when allow_loads is true, we let the instruction set rewrite
> SSBO loads.
> ---
>  src/glsl/nir/nir_instr_set.c | 51 
> 
>  src/glsl/nir/nir_instr_set.h | 20 -
>  src/glsl/nir/nir_opt_cse.c   |  4 ++--
>  3 files changed, 50 insertions(+), 25 deletions(-)
> 
> diff --git a/src/glsl/nir/nir_instr_set.c b/src/glsl/nir/nir_instr_set.c
> index d3f939f..583618f 100644
> --- a/src/glsl/nir/nir_instr_set.c
> +++ b/src/glsl/nir/nir_instr_set.c
> @@ -398,6 +398,13 @@ dest_is_ssa(nir_dest *dest, void *data)
> return dest->is_ssa;
>  }
>  
> +static bool
> +is_load(nir_intrinsic_instr *instr)
> +{
> +   return instr->intrinsic == nir_intrinsic_load_ssbo ||
> +  instr->intrinsic == nir_intrinsic_load_ssbo_indirect;
> +}
> +
>  /* This function determines if uses of an instruction can safely be rewritten
>   * to use another identical instruction instead. Note that this function must
>   * be kept in sync with hash_instr() and nir_instrs_equal() -- only
> @@ -406,7 +413,7 @@ dest_is_ssa(nir_dest *dest, void *data)
>   */
>  
>  static bool
> -instr_can_rewrite(nir_instr *instr)
> +instr_can_rewrite(nir_instr *instr, bool allow_loads)
>  {
> /* We only handle SSA. */
> if (!nir_foreach_dest(instr, dest_is_ssa, NULL) ||
> @@ -428,11 +435,15 @@ instr_can_rewrite(nir_instr *instr)
>return true;
> }
> case nir_instr_type_intrinsic: {
> +  nir_intrinsic_instr *intrinsic = nir_instr_as_intrinsic(instr);
>const nir_intrinsic_info *info =
> - &nir_intrinsic_infos[nir_instr_as_intrinsic(instr)->intrinsic];
> -  return (info->flags & NIR_INTRINSIC_CAN_ELIMINATE) &&
> - (info->flags & NIR_INTRINSIC_CAN_REORDER) &&
> - info->num_variables == 0; /* not implemented yet */
> + &nir_intrinsic_infos[intrinsic->intrinsic];
> +  bool can_eliminate_and_reorder =
> + (info->flags & NIR_INTRINSIC_CAN_ELIMINATE) &&
> + (info->flags & NIR_INTRINSIC_CAN_REORDER) &&
> + info->num_variables == 0; /* not implemented yet */
> +  return can_eliminate_and_reorder ?
> + true: allow_loads && is_load(intrinsic);

Isn't this just?

 return can_eliminate_and_reorder ||
(allow_loads && is_load(intrinsic));
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] i965: Fix missing BRW_NEW_*_PROG_DATA flagging caused by cache reuse.

2015-10-28 Thread Pohjolainen, Topi

On Wed, Oct 28, 2015 at 01:58:41AM -0700, Kenneth Graunke wrote:
> Consider the case of two nearly identical GLSL fragment shaders:
> 
>out vec4 color;
>void main() { color = vec4(1); }
> 
> and
> 
>layout(early_fragment_tests) in;
>out vec4 color;
>void main() { color = vec4(1); }
> 
> These shaders compile to the exact same assembly, but have distinct
> values for brw_wm_prog_data::early_fragment_tests.
> 
> Since these are two independent GLSL shaders, they have different
> program keys - notably, brw_wm_prog_key::program_string_id differs.
> 
> When uploading the second, brw_upload_cache will find an existing copy
> of the assembly in the cache BO, which means matching_data will be
> non-NULL.  Although we create a second cache item (with the new key
> and prog_data), we set item->offset to the existing copy and avoid
> re-uploading duplicate assembly.
> 
> However, brw_search_cache() would only flag BRW_NEW_*_PROG_DATA if
> item->offset differed from the supplied offset.  With reuse, both
> programs have the same offset, but prog_data changed.  We have to
> flag it, but failed to.
> 
> To fix this, we simply need to check if the aux (prog_data) pointer
> changed.  If either the assembly or the prog_data differs, flag it.
> 
> This fixes a regression since 1bba29ed403e735ba0bf04ed8aa2e571884f,
> where Topi fixed brw_upload_cache() to actually reuse identical
> assembly.  Prior to that, reuse basically never happened due to bugs.
> Unfortunately, this code apparently wasn't prepared to handle reuse!
> 
> Fixes GPU hangs in Dolphin on Broadwell.
> 
> Huge thanks to Pierre Bourdon and Ilia Mirkin for debugging this
> and helping track down the real issue.

And thanks for the quick fix Ken!

Reviewed-by: Topi Pohjolainen 

> 
> Cc: Topi Pohjolainen 
> Cc: "11.0" 
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92623
> Tested-by: Pierre Bourdon 
> Signed-off-by: Kenneth Graunke 
> ---
>  src/mesa/drivers/dri/i965/brw_state.h   | 2 +-
>  src/mesa/drivers/dri/i965/brw_state_cache.c | 7 ---
>  2 files changed, 5 insertions(+), 4 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_state.h 
> b/src/mesa/drivers/dri/i965/brw_state.h
> index dc2b941..6fc9c14 100644
> --- a/src/mesa/drivers/dri/i965/brw_state.h
> +++ b/src/mesa/drivers/dri/i965/brw_state.h
> @@ -220,7 +220,7 @@ bool brw_search_cache(struct brw_cache *cache,
> enum brw_cache_id cache_id,
> const void *key,
> GLuint key_size,
> -   uint32_t *inout_offset, void *out_aux);
> +   uint32_t *inout_offset, void *inout_aux);
>  void brw_state_cache_check_size( struct brw_context *brw );
>  
>  void brw_init_caches( struct brw_context *brw );
> diff --git a/src/mesa/drivers/dri/i965/brw_state_cache.c 
> b/src/mesa/drivers/dri/i965/brw_state_cache.c
> index 2fbcd14..f9a1918 100644
> --- a/src/mesa/drivers/dri/i965/brw_state_cache.c
> +++ b/src/mesa/drivers/dri/i965/brw_state_cache.c
> @@ -137,7 +137,7 @@ bool
>  brw_search_cache(struct brw_cache *cache,
>   enum brw_cache_id cache_id,
>   const void *key, GLuint key_size,
> - uint32_t *inout_offset, void *out_aux)
> + uint32_t *inout_offset, void *inout_aux)
>  {
> struct brw_context *brw = cache->brw;
> struct brw_cache_item *item;
> @@ -155,11 +155,12 @@ brw_search_cache(struct brw_cache *cache,
> if (item == NULL)
>return false;
>  
> -   *(void **)out_aux = ((char *)item->key + item->key_size);
> +   void *aux = ((char *) item->key) + item->key_size;
>  
> -   if (item->offset != *inout_offset) {
> +   if (item->offset != *inout_offset || aux != *((void **) inout_aux)) {
>brw->ctx.NewDriverState |= (1 << cache_id);
>*inout_offset = item->offset;
> +  *((void **) inout_aux) = aux;
> }
>  
> return true;
> -- 
> 2.6.2
> 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] i965: Combine BRW_NEW_*_BINDING_TABLE dirty bits.

2015-11-10 Thread Pohjolainen, Topi

On Tue, Nov 10, 2015 at 06:33:44PM -0800, Kenneth Graunke wrote:
> A while back, we moved to directly emitting the Gen7+ state when
> constructing the binding tables.  These flags are only used on
> Gen4-6, which emit all the binding table pointers at once.
> 
> We gain nothing by having separate flags, so combine them.
> 
> Signed-off-by: Kenneth Graunke 

Reviewed-by: Topi Pohjolainen 

> ---
>  src/mesa/drivers/dri/i965/brw_binding_tables.c | 21 -
>  src/mesa/drivers/dri/i965/brw_context.h|  8 ++--
>  src/mesa/drivers/dri/i965/brw_state.h  |  1 -
>  src/mesa/drivers/dri/i965/brw_state_upload.c   |  4 +---
>  src/mesa/drivers/dri/i965/gen6_sol.c   |  6 +++---
>  5 files changed, 14 insertions(+), 26 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_binding_tables.c 
> b/src/mesa/drivers/dri/i965/brw_binding_tables.c
> index 508f1f0..d8226e0 100644
> --- a/src/mesa/drivers/dri/i965/brw_binding_tables.c
> +++ b/src/mesa/drivers/dri/i965/brw_binding_tables.c
> @@ -88,7 +88,6 @@ reserve_hw_bt_space(struct brw_context *brw, unsigned bytes)
>  void
>  brw_upload_binding_table(struct brw_context *brw,
>   uint32_t packet_name,
> - GLbitfield brw_new_binding_table,
>   const struct brw_stage_prog_data *prog_data,
>   struct brw_stage_state *stage_state)
>  {
> @@ -127,7 +126,7 @@ brw_upload_binding_table(struct brw_context *brw,
>}
> }
>  
> -   brw->ctx.NewDriverState |= brw_new_binding_table;
> +   brw->ctx.NewDriverState |= BRW_NEW_BINDING_TABLE_POINTERS;
>  
> if (brw->gen >= 7) {
>if (brw->use_resource_streamer) {
> @@ -159,7 +158,7 @@ brw_vs_upload_binding_table(struct brw_context *brw)
> const struct brw_stage_prog_data *prog_data = brw->vs.base.prog_data;
> brw_upload_binding_table(brw,
>  _3DSTATE_BINDING_TABLE_POINTERS_VS,
> -BRW_NEW_VS_BINDING_TABLE, prog_data,
> +prog_data,
>  &brw->vs.base);
>  }
>  
> @@ -183,7 +182,7 @@ brw_upload_wm_binding_table(struct brw_context *brw)
> const struct brw_stage_prog_data *prog_data = brw->wm.base.prog_data;
> brw_upload_binding_table(brw,
>  _3DSTATE_BINDING_TABLE_POINTERS_PS,
> -BRW_NEW_PS_BINDING_TABLE, prog_data,
> +prog_data,
>  &brw->wm.base);
>  }
>  
> @@ -209,7 +208,7 @@ brw_gs_upload_binding_table(struct brw_context *brw)
> const struct brw_stage_prog_data *prog_data = brw->gs.base.prog_data;
> brw_upload_binding_table(brw,
>  _3DSTATE_BINDING_TABLE_POINTERS_GS,
> -BRW_NEW_GS_BINDING_TABLE, prog_data,
> +prog_data,
>  &brw->gs.base);
>  }
>  
> @@ -406,10 +405,8 @@ const struct brw_tracked_state 
> brw_binding_table_pointers = {
> .dirty = {
>.mesa = 0,
>.brw = BRW_NEW_BATCH |
> - BRW_NEW_GS_BINDING_TABLE |
> - BRW_NEW_PS_BINDING_TABLE |
> - BRW_NEW_STATE_BASE_ADDRESS |
> - BRW_NEW_VS_BINDING_TABLE,
> + BRW_NEW_BINDING_TABLE_POINTERS |
> + BRW_NEW_STATE_BASE_ADDRESS,
> },
> .emit = gen4_upload_binding_table_pointers,
>  };
> @@ -442,10 +439,8 @@ const struct brw_tracked_state 
> gen6_binding_table_pointers = {
> .dirty = {
>.mesa = 0,
>.brw = BRW_NEW_BATCH |
> - BRW_NEW_GS_BINDING_TABLE |
> - BRW_NEW_PS_BINDING_TABLE |
> - BRW_NEW_STATE_BASE_ADDRESS |
> - BRW_NEW_VS_BINDING_TABLE,
> + BRW_NEW_BINDING_TABLE_POINTERS |
> + BRW_NEW_STATE_BASE_ADDRESS,
> },
> .emit = gen6_upload_binding_table_pointers,
>  };
> diff --git a/src/mesa/drivers/dri/i965/brw_context.h 
> b/src/mesa/drivers/dri/i965/brw_context.h
> index c83f47b..4b2db61 100644
> --- a/src/mesa/drivers/dri/i965/brw_context.h
> +++ b/src/mesa/drivers/dri/i965/brw_context.h
> @@ -184,9 +184,7 @@ enum brw_state_id {
> BRW_STATE_CONTEXT,
> BRW_STATE_PSP,
> BRW_STATE_SURFACES,
> -   BRW_STATE_VS_BINDING_TABLE,
> -   BRW_STATE_GS_BINDING_TABLE,
> -   BRW_STATE_PS_BINDING_TABLE,
> +   BRW_STATE_BINDING_TABLE_POINTERS,
> BRW_STATE_INDICES,
> BRW_STATE_VERTICES,
> BRW_STATE_BATCH,
> @@ -261,9 +259,7 @@ enum brw_state_id {
>  #define BRW_NEW_CONTEXT (1ull << BRW_STATE_CONTEXT)
>  #define BRW_NEW_PSP (1ull << BRW_STATE_PSP)
>  #define BRW_NEW_SURFACES(1ull << BRW_STATE_SURFACES)
> -#define BRW_NEW_VS_BINDING_TABLE(1ull << BRW_STATE_VS_BINDING_TABLE)
> -#define BRW_NEW_GS_BINDING_TABLE(1ull << BRW_STATE_GS_BINDING_TABLE)
> -#define BRW_NEW_PS_BINDING_TABLE(1ull << BRW_STATE_PS_BINDING_T

Re: [Mesa-dev] [PATCH 1/2] i965: Convert scalar_* flags to a scalar_stage array.

2015-11-13 Thread Pohjolainen, Topi

On Thu, Nov 12, 2015 at 03:38:51PM -0800, Kenneth Graunke wrote:
> I was going to add scalar_tcs and scalar_tes flags, and then thought
> better of it and decided to convert this to an array.  Simpler.
> 
> Signed-off-by: Kenneth Graunke 
> ---
>  src/mesa/drivers/dri/i965/brw_compiler.h  |  3 +--
>  src/mesa/drivers/dri/i965/brw_context.c   |  2 +-
>  src/mesa/drivers/dri/i965/brw_gs.c|  3 ++-
>  src/mesa/drivers/dri/i965/brw_link.cpp| 11 +---
>  src/mesa/drivers/dri/i965/brw_program.c   |  3 ++-
>  src/mesa/drivers/dri/i965/brw_shader.cpp  | 31 
> ++-
>  src/mesa/drivers/dri/i965/brw_shader.h|  2 --
>  src/mesa/drivers/dri/i965/brw_vec4.cpp|  4 +--
>  src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp |  2 +-
>  src/mesa/drivers/dri/i965/brw_vs.c|  7 ++---
>  10 files changed, 28 insertions(+), 40 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_compiler.h 
> b/src/mesa/drivers/dri/i965/brw_compiler.h
> index e3a26d6..3f54616 100644
> --- a/src/mesa/drivers/dri/i965/brw_compiler.h
> +++ b/src/mesa/drivers/dri/i965/brw_compiler.h
> @@ -89,8 +89,7 @@ struct brw_compiler {
> void (*shader_debug_log)(void *, const char *str, ...) PRINTFLIKE(2, 3);
> void (*shader_perf_log)(void *, const char *str, ...) PRINTFLIKE(2, 3);
>  
> -   bool scalar_vs;
> -   bool scalar_gs;
> +   bool scalar_stage[MESA_SHADER_STAGES];
> struct gl_shader_compiler_options 
> glsl_compiler_options[MESA_SHADER_STAGES];
>  };
>  
> diff --git a/src/mesa/drivers/dri/i965/brw_context.c 
> b/src/mesa/drivers/dri/i965/brw_context.c
> index ac6045d..2db99c7 100644
> --- a/src/mesa/drivers/dri/i965/brw_context.c
> +++ b/src/mesa/drivers/dri/i965/brw_context.c
> @@ -525,7 +525,7 @@ brw_initialize_context_constants(struct brw_context *brw)
>ctx->Const.Program[MESA_SHADER_FRAGMENT].MaxImageUniforms =
>   BRW_MAX_IMAGES;
>ctx->Const.Program[MESA_SHADER_VERTEX].MaxImageUniforms =
> - (brw->intelScreen->compiler->scalar_vs ? BRW_MAX_IMAGES : 0);
> + (brw->intelScreen->compiler->scalar_stage[MESA_SHADER_VERTEX] ? 
> BRW_MAX_IMAGES : 0);
>ctx->Const.Program[MESA_SHADER_COMPUTE].MaxImageUniforms =
>   BRW_MAX_IMAGES;
>ctx->Const.MaxImageUnits = MAX_IMAGE_UNITS;
> diff --git a/src/mesa/drivers/dri/i965/brw_gs.c 
> b/src/mesa/drivers/dri/i965/brw_gs.c
> index ed0890f..ad5b242 100644
> --- a/src/mesa/drivers/dri/i965/brw_gs.c
> +++ b/src/mesa/drivers/dri/i965/brw_gs.c
> @@ -87,7 +87,8 @@ brw_codegen_gs_prog(struct brw_context *brw,
> prog_data.base.base.nr_image_params = gs->NumImages;
>  
> brw_nir_setup_glsl_uniforms(gp->program.Base.nir, prog, &gp->program.Base,
> -   &prog_data.base.base, compiler->scalar_gs);
> +   &prog_data.base.base,
> +   compiler->scalar_stage[MESA_SHADER_GEOMETRY]);
>  
> GLbitfield64 outputs_written = gp->program.Base.OutputsWritten;
>  
> diff --git a/src/mesa/drivers/dri/i965/brw_link.cpp 
> b/src/mesa/drivers/dri/i965/brw_link.cpp
> index 2991173..14421d4 100644
> --- a/src/mesa/drivers/dri/i965/brw_link.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_link.cpp
> @@ -66,12 +66,14 @@ brw_lower_packing_builtins(struct brw_context *brw,
> gl_shader_stage shader_type,
> exec_list *ir)
>  {
> +   const struct brw_compiler *compiler = brw->intelScreen->compiler;
> +
> int ops = LOWER_PACK_SNORM_2x16
> | LOWER_UNPACK_SNORM_2x16
> | LOWER_PACK_UNORM_2x16
> | LOWER_UNPACK_UNORM_2x16;
>  
> -   if (is_scalar_shader_stage(brw->intelScreen->compiler, shader_type)) {
> +   if (compiler->scalar_stage[shader_type]) {
>ops |= LOWER_UNPACK_UNORM_4x8
> | LOWER_UNPACK_SNORM_4x8
> | LOWER_PACK_UNORM_4x8
> @@ -84,7 +86,7 @@ brw_lower_packing_builtins(struct brw_context *brw,
> * lowering is needed. For SOA code, the Half2x16 ops must be
> * scalarized.
> */
> -  if (is_scalar_shader_stage(brw->intelScreen->compiler, shader_type)) {
> +  if (compiler->scalar_stage[shader_type]) {
>   ops |= LOWER_PACK_HALF_2x16_TO_SPLIT
>   |  LOWER_UNPACK_HALF_2x16_TO_SPLIT;
>}
> @@ -103,6 +105,7 @@ process_glsl_ir(gl_shader_stage stage,
>  struct gl_shader *shader)
>  {
> struct gl_context *ctx = &brw->ctx;
> +   const struct brw_compiler *compiler = brw->intelScreen->compiler;
> const struct gl_shader_compiler_options *options =
>&ctx->Const.ShaderCompilerOptions[shader->Stage];
>  
> @@ -161,7 +164,7 @@ process_glsl_ir(gl_shader_stage stage,
> do {
>progress = false;
>  
> -  if (is_scalar_shader_stage(brw->intelScreen->compiler, shader->Stage)) 
> {
> +  if (compiler->scalar_stage[shader->Stage]) {
>   brw_do_channel_e

Re: [Mesa-dev] [PATCH 2/2] i965: Clean up context constant initialization code.

2015-11-13 Thread Pohjolainen, Topi

On Thu, Nov 12, 2015 at 03:38:52PM -0800, Kenneth Graunke wrote:
> This was getting pretty out of hand, and with compute partially in place
> and tessellation on the way, it was only going to get worse.
> 
> This patch makes a "stage exists?" predicate and a "number of stages"
> count and uses them to clean up a lot of calculations.  We can just
> loop over shader stages and set things for the ones that exist.  For
> combined counts, we can just multiply by the number of stages.
> 
> It also tries to organize a little bit.
> 
> We should probably use _mesa_has_geometry_shaders/tessellation/compute
> here, but we can't because ctx->Version isn't initialized yet.  Perhaps
> that could be fixed in the future.
> 
> No change in "glxinfo -l" on Broadwell.
> 
> Signed-off-by: Kenneth Graunke 
> ---
>  src/mesa/drivers/dri/i965/brw_context.c | 138 
> ++--
>  1 file changed, 58 insertions(+), 80 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_context.c 
> b/src/mesa/drivers/dri/i965/brw_context.c
> index 2db99c7..89533ae 100644
> --- a/src/mesa/drivers/dri/i965/brw_context.c
> +++ b/src/mesa/drivers/dri/i965/brw_context.c
> @@ -322,64 +322,85 @@ static void
>  brw_initialize_context_constants(struct brw_context *brw)
>  {
> struct gl_context *ctx = &brw->ctx;
> +   const struct brw_compiler *compiler = brw->intelScreen->compiler;
> +
> +   bool stage_exists[MESA_SHADER_STAGES] = {

This could be const.

> +  [MESA_SHADER_VERTEX] = true,
> +  [MESA_SHADER_TESS_CTRL] = false,
> +  [MESA_SHADER_TESS_EVAL] = false,
> +  [MESA_SHADER_GEOMETRY] = brw->gen >= 6,
> +  [MESA_SHADER_FRAGMENT] = true,
> +  [MESA_SHADER_COMPUTE] = 
> _mesa_extension_override_enables.ARB_compute_shader,
> +   };
> +
> +   unsigned num_stages = 0;
> +   for (int i = 0; i < MESA_SHADER_STAGES; i++) {
> +  if (stage_exists[i])
> + num_stages++;
> +   }
>  
> unsigned max_samplers =
>brw->gen >= 8 || brw->is_haswell ? BRW_MAX_TEX_UNIT : 16;
>  
> +   ctx->Const.MaxDualSourceDrawBuffers = 1;
> +   ctx->Const.MaxDrawBuffers = BRW_MAX_DRAW_BUFFERS;
> +   ctx->Const.MaxCombinedShaderOutputResources =
> +  MAX_IMAGE_UNITS + BRW_MAX_DRAW_BUFFERS;
> +
> ctx->Const.QueryCounterBits.Timestamp = 36;
>  
> +   ctx->Const.MaxTextureCoordUnits = 8; /* Mesa limit */
> +   ctx->Const.MaxImageUnits = MAX_IMAGE_UNITS;
> +   ctx->Const.MaxRenderbufferSize = 8192;
> +   ctx->Const.MaxTextureLevels = MIN2(14 /* 8192 */, MAX_TEXTURE_LEVELS);
> +   ctx->Const.Max3DTextureLevels = 12; /* 2048 */
> +   ctx->Const.MaxCubeTextureLevels = 14; /* 8192 */
> +   ctx->Const.MaxArrayTextureLayers = brw->gen >= 7 ? 2048 : 512;
> +   ctx->Const.MaxTextureMbytes = 1536;
> +   ctx->Const.MaxTextureRectSize = 1 << 12;
> +   ctx->Const.MaxTextureMaxAnisotropy = 16.0;
> ctx->Const.StripTextureBorder = true;
> +   if (brw->gen >= 7)
> +  ctx->Const.MaxProgramTextureGatherComponents = 4;
> +   else if (brw->gen == 6)
> +  ctx->Const.MaxProgramTextureGatherComponents = 1;
>  
> ctx->Const.MaxUniformBlockSize = 65536;
> +
> for (int i = 0; i < MESA_SHADER_STAGES; i++) {
>struct gl_program_constants *prog = &ctx->Const.Program[i];
> +
> +  if (!stage_exists[i])
> + continue;
> +
> +  prog->MaxTextureImageUnits = max_samplers;
> +
>prog->MaxUniformBlocks = BRW_MAX_UBO;
>prog->MaxCombinedUniformComponents =
>   prog->MaxUniformComponents +
>   ctx->Const.MaxUniformBlockSize / 4 * prog->MaxUniformBlocks;
> +
> +  prog->MaxAtomicCounters = MAX_ATOMIC_COUNTERS;
> +  prog->MaxAtomicBuffers = BRW_MAX_ABO;
> +  prog->MaxImageUniforms = compiler->scalar_stage[i] ? BRW_MAX_IMAGES : 
> 0;
> +  prog->MaxShaderStorageBlocks = BRW_MAX_SSBO;
> }
>  
> -   ctx->Const.MaxDualSourceDrawBuffers = 1;
> -   ctx->Const.MaxDrawBuffers = BRW_MAX_DRAW_BUFFERS;
> -   ctx->Const.Program[MESA_SHADER_FRAGMENT].MaxTextureImageUnits = 
> max_samplers;
> -   ctx->Const.MaxTextureCoordUnits = 8; /* Mesa limit */
> +   if (ctx->Extensions.ARB_compute_shader)
> +  ctx->Const.MaxShaderStorageBufferBindings += BRW_MAX_SSBO;
> +
> +
> ctx->Const.MaxTextureUnits =
>MIN2(ctx->Const.MaxTextureCoordUnits,
> ctx->Const.Program[MESA_SHADER_FRAGMENT].MaxTextureImageUnits);
> -   ctx->Const.Program[MESA_SHADER_VERTEX].MaxTextureImageUnits = 
> max_samplers;
> -   if (brw->gen >= 6)
> -  ctx->Const.Program[MESA_SHADER_GEOMETRY].MaxTextureImageUnits = 
> max_samplers;
> -   else
> -  ctx->Const.Program[MESA_SHADER_GEOMETRY].MaxTextureImageUnits = 0;
> -   if (_mesa_extension_override_enables.ARB_compute_shader) {
> -  ctx->Const.Program[MESA_SHADER_COMPUTE].MaxTextureImageUnits = 
> BRW_MAX_TEX_UNIT;
> -  ctx->Const.MaxUniformBufferBindings += BRW_MAX_UBO;
> -   } else {
> -  ctx->Const.Program[MESA_SHADER_COMPUTE].MaxTextureImageUnits = 0;
> -   }
> -   ctx->Const.MaxCombinedTextureImageUnits =
>

Re: [Mesa-dev] Split version of 07/13 glsl: add double support

2015-02-05 Thread Pohjolainen, Topi

On Thu, Feb 05, 2015 at 10:23:27AM -0500, Ilia Mirkin wrote:
> Topi, this is awesome! I wanted to do something like that last night,
> but tiredness and laziness got in the way. Can I find these in git
> form somewhere so that I'll be able to integrate when doing a resend?
> (Also, I think it's fine to drop the "(was other patch)" in there.)

I agree, I just thought making it super clear that these were just part of
the something else.

Just pushed it in: git://people.freedesktop.org/~tpohjola/mesa:fp64_split

Thanks,
Topi
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 06/17] glsl/ir: Add cloning support for doubles (was: add double support)

2015-02-05 Thread Pohjolainen, Topi

On Thu, Feb 05, 2015 at 10:39:31AM -0800, Matt Turner wrote:
> Maybe squash this somewhere? I'm not sure.

I felt silly leaving it alone but I didn't really have a good squash
candidate for it. Perhaps somebody comes up with an idea, or we can just
toss a coin :)
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v2 11/28] glsl/ir: Add cloning support for doubles

2015-02-05 Thread Pohjolainen, Topi

On Thu, Feb 05, 2015 at 11:56:33PM -0500, Ilia Mirkin wrote:
> From: Dave Airlie 
> 
> Signed-off-by: Dave Airlie 
> Reviewed-by: Matt Turner 

If we want to squash this somewhere, the first patch of the split
(glsl: Add double builtin type) could be a candidate - we get rid of
one warning.

> ---
>  src/glsl/ir_clone.cpp | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/src/glsl/ir_clone.cpp b/src/glsl/ir_clone.cpp
> index dffa578..5c7279c 100644
> --- a/src/glsl/ir_clone.cpp
> +++ b/src/glsl/ir_clone.cpp
> @@ -327,6 +327,7 @@ ir_constant::clone(void *mem_ctx, struct hash_table *ht) 
> const
> case GLSL_TYPE_UINT:
> case GLSL_TYPE_INT:
> case GLSL_TYPE_FLOAT:
> +   case GLSL_TYPE_DOUBLE:
> case GLSL_TYPE_BOOL:
>return new(mem_ctx) ir_constant(this->type, &this->value);
>  
> -- 
> 2.0.5
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v2 04/28] mesa: add double uniform support. (v5)

2015-02-05 Thread Pohjolainen, Topi

On Fri, Feb 06, 2015 at 09:18:35AM +0200, Ian Romanick wrote:
> On 02/06/2015 06:56 AM, Ilia Mirkin wrote:
> > From: Dave Airlie 
> > 
> > This adds support for the new uniform interfaces
> > from ARB_gpu_shader_fp64.
> > 
> > v2:
> > support ARB_separate_shader_objects ProgramUniform*d* (Ian)
> > don't allow boolean uniforms to be updated (issue 15) (Ian)
> > 
> > v3: fix size_mul
> > v4: Teach uniform update to take into account double precision (Topi)
> > v5: add transpose for double case (Ilia)
> > 
> > Signed-off-by: Dave Airlie 
> > ---
> >  src/mesa/main/uniform_query.cpp   |  47 +++---
> >  src/mesa/main/uniforms.c  | 185 
> > ++
> >  src/mesa/main/uniforms.h  |   3 +-
> >  src/mesa/program/ir_to_mesa.cpp   |  17 +++-
> >  src/mesa/program/prog_parameter.c |  16 ++--
> >  5 files changed, 229 insertions(+), 39 deletions(-)
> > 
> > diff --git a/src/mesa/main/uniform_query.cpp 
> > b/src/mesa/main/uniform_query.cpp
> > index d36f506..2dc272e 100644
> > --- a/src/mesa/main/uniform_query.cpp
> > +++ b/src/mesa/main/uniform_query.cpp
> > @@ -469,6 +469,9 @@ log_uniform(const void *values, enum glsl_base_type 
> > basicType,
> >case GLSL_TYPE_FLOAT:
> >  printf("%g ", v[i].f);
> >  break;
> > +  case GLSL_TYPE_DOUBLE:

This won't compile, GLSL_TYPE_DOUBLE is not introduced until
"glsl: Add double builtin type". There are some more occurences in the rest
of the patch. It looks to me we need to split this in two, one dealing with
core state handling (src/mesa/main/uniforms.c) and another updating the
compiler frontend, and moving the latter further in the series.

> > +printf("%g ", *(double* )&v[i * 2].f);
> > +break;
> 
> I know the rest of the code here uses tabs, but new code should not.
> 
> >default:
> >  assert(!"Should not get here.");
> >  break;
> > @@ -529,11 +532,11 @@ _mesa_propagate_uniforms_to_driver_storage(struct 
> > gl_uniform_storage *uni,
> >  */
> > const unsigned components = MAX2(1, uni->type->vector_elements);
> > const unsigned vectors = MAX2(1, uni->type->matrix_columns);
> > -
> > +   const int dmul = uni->type->base_type == GLSL_TYPE_DOUBLE ? 2 : 1;
> 
> Blank line here.
> 
> > /* Store the data in the driver's requested type in the driver's storage
> >  * areas.
> >  */
> > -   unsigned src_vector_byte_stride = components * 4;
> > +   unsigned src_vector_byte_stride = components * 4 * dmul;
> >  
> > for (i = 0; i < uni->num_driver_storage; i++) {
> >struct gl_uniform_driver_storage *const store = 
> > &uni->driver_storage[i];
> > @@ -608,6 +611,7 @@ _mesa_uniform(struct gl_context *ctx, struct 
> > gl_shader_program *shProg,
> >unsigned src_components)
> >  {
> > unsigned offset;
> > +   int size_mul = basicType == GLSL_TYPE_DOUBLE ? 2 : 1;
> >  
> > struct gl_uniform_storage *const uni =
> >validate_uniform_parameters(ctx, shProg, location, count,
> > @@ -615,15 +619,13 @@ _mesa_uniform(struct gl_context *ctx, struct 
> > gl_shader_program *shProg,
> > if (uni == NULL)
> >return;
> >  
> > -   /* Verify that the types are compatible.
> > -*/
> 
> Why is this comment removed?  We're still verifying that the types are
> compatible.  Right?
> 
> > const unsigned components = uni->type->is_sampler()
> >? 1 : uni->type->vector_elements;
> >  
> > bool match;
> > switch (uni->type->base_type) {
> > case GLSL_TYPE_BOOL:
> > -  match = true;
> > +  match = (basicType != GLSL_TYPE_DOUBLE);
> >break;
> > case GLSL_TYPE_SAMPLER:
> > case GLSL_TYPE_IMAGE:
> > @@ -710,8 +712,8 @@ _mesa_uniform(struct gl_context *ctx, struct 
> > gl_shader_program *shProg,
> > /* Store the data in the "actual type" backing storage for the uniform.
> >  */
> > if (!uni->type->is_boolean()) {
> > -  memcpy(&uni->storage[components * offset], values,
> > -sizeof(uni->storage[0]) * components * count);
> > +  memcpy(&uni->storage[size_mul * components * offset], values,
> > +sizeof(uni->storage[0]) * components * count * size_mul);
> > } else {
> >const union gl_constant_value *src =
> >  (const union gl_constant_value *) values;
> > @@ -808,13 +810,14 @@ extern "C" void
> >  _mesa_uniform_matrix(struct gl_context *ctx, struct gl_shader_program 
> > *shProg,
> >  GLuint cols, GLuint rows,
> >   GLint location, GLsizei count,
> > - GLboolean transpose, const GLfloat *values)
> > + GLboolean transpose,
> > + const GLvoid *values, GLenum type)
> >  {
> > unsigned offset;
> > unsigned vectors;
> > unsigned components;
> > unsigned elements;
> > -
> > +   int size_mul;
> > struct gl_uniform_storage *const uni =
> >validate_uniform_parameters(ctx, shProg, location, count,
> >&of

Re: [Mesa-dev] [PATCH v2 06/28] glsl: Add double builtin type

2015-02-06 Thread Pohjolainen, Topi

On Thu, Feb 05, 2015 at 11:56:28PM -0500, Ilia Mirkin wrote:
> From: Dave Airlie 
> 
> This causes a lot of warnings about unchecked type in
> switch statements - fix them later.

The rest of the series fixes things in the compiler frontend but leaves
a lot unchecked in the compiler backend (at least in i965). Also we now
get complains in NIR.
Putting something intermediate to silence things that get fixed by the
series itself probably doesn't make sense but I wonder if we should
address NIR at least. Thoughts?

There are also number of warnings originating from earlier patches in
the series.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v2 06/28] glsl: Add double builtin type

2015-02-06 Thread Pohjolainen, Topi

On Fri, Feb 06, 2015 at 10:04:13AM +0200, Pohjolainen, Topi wrote:
> On Thu, Feb 05, 2015 at 11:56:28PM -0500, Ilia Mirkin wrote:
> > From: Dave Airlie 
> > 
> > This causes a lot of warnings about unchecked type in
> > switch statements - fix them later.
> 
> The rest of the series fixes things in the compiler frontend but leaves
> a lot unchecked in the compiler backend (at least in i965). Also we now
> get complains in NIR.
> Putting something intermediate to silence things that get fixed by the
> series itself probably doesn't make sense but I wonder if we should
> address NIR at least. Thoughts?

Furthermore this patch introduces warnings in the uniform module test that
are not fixed by the rest of the series either:

tests/uniform_initializer_utils.cpp: In function 'void 
generate_data_element(void*, const glsl_type*, ir_constant*&, unsigned int)':
tests/uniform_initializer_utils.cpp:83:14: warning: enumeration value 
'GLSL_TYPE_DOUBLE' not handled in switch [-Wswitch]
   switch (type->base_type) {
  ^
tests/uniform_initializer_utils.cpp:83:14: warning: enumeration value 
'GLSL_TYPE_IMAGE' not handled in switch [-Wswitch]
tests/uniform_initializer_utils.cpp:111:14: warning: enumeration value 
'GLSL_TYPE_DOUBLE' not handled in switch [-Wswitch]
   switch (type->base_type) {
  ^
tests/uniform_initializer_utils.cpp:111:14: warning: enumeration value 
'GLSL_TYPE_IMAGE' not handled in switch [-Wswitch]
tests/uniform_initializer_utils.cpp: In function 'void 
verify_data(gl_constant_value*, unsigned int, ir_constant*, unsigned int, 
unsigned int)':
tests/uniform_initializer_utils.cpp:211:10: warning: enumeration value 
'GLSL_TYPE_DOUBLE' not handled in switch [-Wswitch]
   switch (val->type->base_type) {
  ^
tests/uniform_initializer_utils.cpp:211:10: warning: enumeration value 
'GLSL_TYPE_IMAGE' not handled in switch [-Wswitch]

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] i965: Prefer Meta over the BLT for BlitFramebuffer.

2015-02-18 Thread Pohjolainen, Topi

On Tue, Feb 17, 2015 at 05:39:33PM -0800, Kenneth Graunke wrote:
> There's some debate about whether we should use Meta or BLORP,
> but either should run circles around the BLT engine.
> 
> In particular, this means that Gen8+ will use the 3D engine for blits,
> like we do on Gen6-7.
> 
> Improves performance in "copypixrate -blit -back" (from Mesa demos)
> by 232.037% +/- 3.15795% (n=10) on Broadwell GT3e.

I've been also experimenting with the same test by disabling blorp for
fb-blits on IVB. I'm a little confused since even without your patch the blit
goes through meta instead of the BLT engine. It is the test for scissor
that prevents intel_blit_framebuffer_with_blitter() from doing the blit:

...
 /* If the source and destination are the same size with no mirroring,
   * the rectangles are within the size of the texture and there is no
   * scissor, then we can probably use the blit engine.
   */
  if (!(srcX0 - srcX1 == dstX0 - dstX1 &&
srcY0 - srcY1 == dstY0 - dstY1 &&
srcX1 >= srcX0 &&
srcY1 >= srcY0 &&
srcX0 >= 0 && srcX1 <= readFb->Width &&
srcY0 >= 0 && srcY1 <= readFb->Height &&
dstX0 >= 0 && dstX1 <= drawFb->Width &&
dstY0 >= 0 && dstY1 <= drawFb->Height &&
!(ctx->Scissor.EnableFlags))) {
 perf_debug("glBlitFramebuffer(): non-1:1 blit.  "
"Falling back to software rendering.\n");
...

I wonder where the performance difference actually comes from, could it
be the intel_prepare_render() that intel_blit_framebuffer_with_blitter()
calls before bailing out. With your patch that won't get called anymore.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 2/4] i965/fs: Make get_timestamp() return an fs_inst * rather than emitting.

2015-02-27 Thread Pohjolainen, Topi

On Fri, Feb 27, 2015 at 12:06:00AM -0800, Kenneth Graunke wrote:
> This makes another part of the INTEL_DEBUG=shader_time code emittable
> at arbitrary locations, rather than just at the end of the instruction
> stream.
> 
> Signed-off-by: Kenneth Graunke 
> ---
>  src/mesa/drivers/dri/i965/brw_fs.cpp | 14 --
>  src/mesa/drivers/dri/i965/brw_fs.h   |  2 +-
>  2 files changed, 9 insertions(+), 7 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
> b/src/mesa/drivers/dri/i965/brw_fs.cpp
> index 9c6f084..faa6f3f 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
> @@ -680,8 +680,8 @@ fs_visitor::type_size(const struct glsl_type *type)
> return 0;
>  }
>  
> -fs_reg
> -fs_visitor::get_timestamp()
> +fs_inst *
> +fs_visitor::timestamp_read()
>  {
> assert(brw->gen >= 7);
>  
> @@ -692,7 +692,7 @@ fs_visitor::get_timestamp()
>  
> fs_reg dst = fs_reg(GRF, alloc.allocate(1), BRW_REGISTER_TYPE_UD, 4);
>  
> -   fs_inst *mov = emit(MOV(dst, ts));
> +   fs_inst *mov = MOV(dst, ts);
> /* We want to read the 3 fields we care about even if it's not enabled in
>  * the dispatch.
>  */
> @@ -710,14 +710,15 @@ fs_visitor::get_timestamp()
>  */
> dst.set_smear(0);

This is now lost. Callers of timestamp_read() will from now on consult the
destination register of the instruction while this stack copy won't be
consumed by anything.

>  
> -   return dst;
> +   return mov;
>  }
>  
>  void
>  fs_visitor::emit_shader_time_begin()
>  {
> current_annotation = "shader time start";
> -   shader_start_time = get_timestamp();
> +   fs_inst *tm_read = emit(timestamp_read());
> +   shader_start_time = tm_read->dst;
>  }
>  
>  void
> @@ -753,7 +754,8 @@ fs_visitor::emit_shader_time_end()
>unreachable("fs_visitor::emit_shader_time_end missing code");
> }
>  
> -   fs_reg shader_end_time = get_timestamp();
> +   fs_inst *tm_read = emit(timestamp_read());
> +   fs_reg shader_end_time = tm_read->dst;
>  
> /* Check that there weren't any timestamp reset events (assuming these
>  * were the only two timestamp reads that happened).
> diff --git a/src/mesa/drivers/dri/i965/brw_fs.h 
> b/src/mesa/drivers/dri/i965/brw_fs.h
> index be1c8a1..f8044f8 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs.h
> +++ b/src/mesa/drivers/dri/i965/brw_fs.h
> @@ -399,7 +399,7 @@ public:
> void resolve_ud_negate(fs_reg *reg);
> void resolve_bool_comparison(ir_rvalue *rvalue, fs_reg *reg);
>  
> -   fs_reg get_timestamp();
> +   fs_inst *timestamp_read();
>  
> struct brw_reg interp_reg(int location, int channel);
> void setup_uniform_values(ir_variable *ir);
> -- 
> 2.2.2
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 1/4] i965: Make emit_shader_time_write return rather than emit.

2015-02-27 Thread Pohjolainen, Topi

On Fri, Feb 27, 2015 at 12:05:59AM -0800, Kenneth Graunke wrote:
> Instead of emit_shader_time_write, we now do emit(SHADER_TIME_ADD(...)).
> The advantage is that we can also insert a shader time write at an
> arbitrary location in the instruction stream, rather than being
> restricted to emitting at the end.
> 
> Signed-off-by: Kenneth Graunke 

Reviewed-by: Topi Pohjolainen 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 3/4] i965/fs: Make emit_shader_time_end() insert before EOT.

2015-02-27 Thread Pohjolainen, Topi

On Fri, Feb 27, 2015 at 12:06:01AM -0800, Kenneth Graunke wrote:
> Previously, we emitted the shader-time epilogue from emit_fb_writes(),
> during the middle of looping through color regions (or emit_urb_writes
> for the VS).  This is duplicated several times and rather awkward.
> 
> I need to fix a bug in our FB write handling, and it will be a lot
> easier if we move emit_shader_time_end() out of there.
> 
> Now, we simply emit FB writes/URB writes, and subsequently have
> emit_shader_time_end() insert instructions before the final SEND with
> EOT.  Not only is this simpler, it's actually a slight improvement:
> we now include the MOVs to set up the final FB write payload in our
> shader-time measurements.
> 
> Note that INTEL_DEBUG=shader_time only exists on Gen7+, and uses
> send-from-GRF.  (In the past, we might have hit trouble where both
> attempt to use MRFs for messages; that's not a problem now.)
> 
> Signed-off-by: Kenneth Graunke 

Reviewed-by: Topi Pohjolainen 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v2 2/4] i965/fs: Make get_timestamp() return an fs_inst * rather than emitting.

2015-02-27 Thread Pohjolainen, Topi

On Fri, Feb 27, 2015 at 11:15:35AM -0800, Kenneth Graunke wrote:
> This makes another part of the INTEL_DEBUG=shader_time code emittable
> at arbitrary locations, rather than just at the end of the instruction
> stream.
> 
> v2: Don't lose smear!  Caught by Topi Pohjolainen.
> 
> Signed-off-by: Kenneth Graunke 
> ---
>  src/mesa/drivers/dri/i965/brw_fs.cpp | 24 +---
>  src/mesa/drivers/dri/i965/brw_fs.h   |  2 +-
>  2 files changed, 14 insertions(+), 12 deletions(-)
> 
> Yikes, good catch!  Thanks for the review, Topi!
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
> b/src/mesa/drivers/dri/i965/brw_fs.cpp
> index 9c6f084..d65f1f1 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
> @@ -680,8 +680,8 @@ fs_visitor::type_size(const struct glsl_type *type)
> return 0;
>  }
>  
> -fs_reg
> -fs_visitor::get_timestamp()
> +fs_inst *
> +fs_visitor::timestamp_read()
>  {
> assert(brw->gen >= 7);
>  
> @@ -692,12 +692,6 @@ fs_visitor::get_timestamp()
>  
> fs_reg dst = fs_reg(GRF, alloc.allocate(1), BRW_REGISTER_TYPE_UD, 4);
>  
> -   fs_inst *mov = emit(MOV(dst, ts));
> -   /* We want to read the 3 fields we care about even if it's not enabled in
> -* the dispatch.
> -*/
> -   mov->force_writemask_all = true;
> -
> /* The caller wants the low 32 bits of the timestamp.  Since it's running
>  * at the GPU clock rate of ~1.2ghz, it will roll over every ~3 seconds,
>  * which is plenty of time for our purposes.  It is identical across the
> @@ -710,14 +704,21 @@ fs_visitor::get_timestamp()
>  */
> dst.set_smear(0);
>  
> -   return dst;
> +   fs_inst *mov = MOV(dst, ts);

Previously the smear wasn't set for the destination in the instruction
itself. I had to check what set_smear() really does. It also sets stride to
zero which the original logic left to the init value of one. I guess this is
not what you intented?

> +   /* We want to read the 3 fields we care about even if it's not enabled in
> +* the dispatch.
> +*/
> +   mov->force_writemask_all = true;
> +
> +   return mov;
>  }
>  
>  void
>  fs_visitor::emit_shader_time_begin()
>  {
> current_annotation = "shader time start";
> -   shader_start_time = get_timestamp();
> +   fs_inst *tm_read = emit(timestamp_read());
> +   shader_start_time = tm_read->dst;
>  }
>  
>  void
> @@ -753,7 +754,8 @@ fs_visitor::emit_shader_time_end()
>unreachable("fs_visitor::emit_shader_time_end missing code");
> }
>  
> -   fs_reg shader_end_time = get_timestamp();
> +   fs_inst *tm_read = emit(timestamp_read());
> +   fs_reg shader_end_time = tm_read->dst;
>  
> /* Check that there weren't any timestamp reset events (assuming these
>  * were the only two timestamp reads that happened).
> diff --git a/src/mesa/drivers/dri/i965/brw_fs.h 
> b/src/mesa/drivers/dri/i965/brw_fs.h
> index be1c8a1..f8044f8 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs.h
> +++ b/src/mesa/drivers/dri/i965/brw_fs.h
> @@ -399,7 +399,7 @@ public:
> void resolve_ud_negate(fs_reg *reg);
> void resolve_bool_comparison(ir_rvalue *rvalue, fs_reg *reg);
>  
> -   fs_reg get_timestamp();
> +   fs_inst *timestamp_read();
>  
> struct brw_reg interp_reg(int location, int channel);
> void setup_uniform_values(ir_variable *ir);
> -- 
> 2.2.2
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] main/base_tex_format: Properly handle STENCIL_INDEX1/4/16

2015-03-04 Thread Pohjolainen, Topi

On Tue, Mar 03, 2015 at 05:08:23PM +, Neil Roberts wrote:
> Jason Ekstrand  writes:
> 
> > On Mon, Mar 2, 2015 at 11:33 AM, Ilia Mirkin  wrote:
> >
> >> On Mon, Mar 2, 2015 at 2:32 PM, Jason Ekstrand 
> >> wrote:
> >> >
> >> >
> >> > On Mon, Mar 2, 2015 at 11:18 AM, Ilia Mirkin 
> >> wrote:
> >> >>
> >> >> Hmmm... I was just looking at this code in connection to attepmting to
> >> >> enable ARB_texture_stencil8, and it _seems_ like that should be if
> >> >> (ARB_texture_stencil8) -- I didn't see what in ARB_stencil_texturing
> >> >> had to do with being able to have a GL_STENCIL_INDEX* internal
> >> >> format...
> >> >
> >> >
> >> > I just pushed it because it does fix a bug...  However, you do raise a
> >> good
> >> > point.  Maybe we should change the i965 driver to use GL_STENCIL_INDEX
> >> for
> >> > the internal format for our internal stencil buffers.
> >>
> >> How do you end up with a GL_STENCIL_INDEX internal format in the first
> >> place? Without ARB_texture_stencil8 that's not a thing, is it?
> >>
> >
> > We're using it internally for doing stencil blits.  I'm going to Cc Neil
> > and he can explain what's going on there better than I can as he was the
> > one who wrote most of that code.
> 
> Do you mean the code to do the meta stencil blit? I haven't touched
> that. Maybe I can pass the buck on to Topi :)

I may not be answering to the question but I'll explain what the meta path
does. It sets the texturing mode using:

   _mesa_TexParameteri(target, GL_DEPTH_STENCIL_TEXTURE_MODE,
  GL_STENCIL_INDEX);

This in order to follow the spec:

"Texture lookups involving texture objects with an internal format of
 DEPTH_STENCIL can read the stencil value as described in section
 3.10.18 by setting the DEPTH_STENCIL_TEXTURE_MODE to STENCIL_INDEX."

But this should be effecting only gl_texture_object::StencilSampling which
is considered by the surface state setup later on in i965. I don't believe
I have altered any internal format related logic for this.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] i965: Split Gen4-5 BlitFramebuffer code; prefer BLT over Meta.

2015-03-05 Thread Pohjolainen, Topi

On Wed, Mar 04, 2015 at 08:44:19PM -0800, Kenneth Graunke wrote:
> A while back I switched intel_blit_framebuffer to prefer Meta over the
> BLT.  This meant that Gen8 platforms would start using the 3D engine
> for blits, just like we do on Gen6-7.5.
> 
> However, I hadn't considered Gen4-5 when making that change.  The BLT
> engine appears to be substantially faster on 965GM than using Meta to
> drive the 3D engine.  This isn't too surprising: original Gen4 doesn't
> support tile offsets (that came on G45), and the level/layer fields
> don't work for cubemap rendering, so for inconvenient miplevel
> alignments, we end up blitting or copying data to/from temporaries
> in order to render to it.  We may as well just use the blitter.
> 
> I chose to use the BLT on Gen4-5 because they use the same ring for
> both 3D and BLT; Gen6+ splits it out.
> 
> Fixes regressions on 965GM due to botched tile offset code (we should
> fix those properly as well, but they're longstanding bugs - for now,
> put things back to the status quo).
> 
> Signed-off-by: Kenneth Graunke 
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89430
> Cc: "10.5" 
> Cc: Mark Janes 

Reviewed-by: Topi Pohjolainen 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 01/13] i965: Factor out logic to build a send message instruction with indirect descriptor.

2015-03-06 Thread Pohjolainen, Topi

On Fri, Feb 27, 2015 at 05:34:44PM +0200, Francisco Jerez wrote:
> ---
>  src/mesa/drivers/dri/i965/brw_eu.h   | 19 ++--
>  src/mesa/drivers/dri/i965/brw_eu_emit.c  | 58 
> ++--
>  src/mesa/drivers/dri/i965/brw_fs_generator.cpp   | 55 +-
>  src/mesa/drivers/dri/i965/brw_vec4_generator.cpp | 37 ---
>  4 files changed, 77 insertions(+), 92 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_eu.h 
> b/src/mesa/drivers/dri/i965/brw_eu.h
> index 1b954c8..9b1e0e2 100644
> --- a/src/mesa/drivers/dri/i965/brw_eu.h
> +++ b/src/mesa/drivers/dri/i965/brw_eu.h
> @@ -205,11 +205,6 @@ void brw_set_sampler_message(struct brw_compile *p,
>   unsigned simd_mode,
>   unsigned return_format);
>  
> -void brw_set_indirect_send_descriptor(struct brw_compile *p,
> -  brw_inst *insn,
> -  unsigned sfid,
> -  struct brw_reg descriptor);
> -
>  void brw_set_dp_read_message(struct brw_compile *p,
>brw_inst *insn,
>unsigned binding_table_index,
> @@ -243,6 +238,20 @@ void brw_urb_WRITE(struct brw_compile *p,
>  unsigned offset,
>  unsigned swizzle);
>  
> +/**
> + * Send message to shared unit \p sfid with a possibly indirect descriptor \p
> + * desc.  If the descriptor is not an immediate it will be transparently
> + * loaded to an address register using an OR instruction that will be 
> returned
> + * to the caller so additional descriptor bits can be specified with the 
> usual
> + * brw_set_*_message() helper functions.
> + */
> +struct brw_inst *
> +brw_send_indirect_message(struct brw_compile *p,
> +  unsigned sfid,
> +  struct brw_reg dst,
> +  struct brw_reg payload,
> +  struct brw_reg desc);
> +
>  void brw_ff_sync(struct brw_compile *p,
>  struct brw_reg dest,
>  unsigned msg_reg_nr,
> diff --git a/src/mesa/drivers/dri/i965/brw_eu_emit.c 
> b/src/mesa/drivers/dri/i965/brw_eu_emit.c
> index e69840a..cd2ce92 100644
> --- a/src/mesa/drivers/dri/i965/brw_eu_emit.c
> +++ b/src/mesa/drivers/dri/i965/brw_eu_emit.c
> @@ -751,21 +751,6 @@ brw_set_sampler_message(struct brw_compile *p,
> }
>  }
>  
> -void brw_set_indirect_send_descriptor(struct brw_compile *p,
> -  brw_inst *insn,
> -  unsigned sfid,
> -  struct brw_reg descriptor)
> -{
> -   /* Only a0.0 may be used as SEND's descriptor operand. */
> -   assert(descriptor.file == BRW_ARCHITECTURE_REGISTER_FILE);
> -   assert(descriptor.type == BRW_REGISTER_TYPE_UD);
> -   assert(descriptor.nr == BRW_ARF_ADDRESS);
> -   assert(descriptor.subnr == 0);
> -
> -   brw_set_message_descriptor(p, insn, sfid, 0, 0, false, false);
> -   brw_set_src1(p, insn, descriptor);
> -}
> -
>  static void
>  gen7_set_dp_scratch_message(struct brw_compile *p,
>  brw_inst *inst,
> @@ -2490,6 +2475,49 @@ void brw_urb_WRITE(struct brw_compile *p,
>  swizzle);
>  }
>  
> +struct brw_inst *
> +brw_send_indirect_message(struct brw_compile *p,
> +  unsigned sfid,
> +  struct brw_reg dst,
> +  struct brw_reg payload,
> +  struct brw_reg desc)
> +{
> +   const struct brw_context *brw = p->brw;
> +   struct brw_inst *send, *setup;
> +
> +   assert(desc.type == BRW_REGISTER_TYPE_UD);
> +
> +   if (desc.file == BRW_IMMEDIATE_VALUE) {
> +  setup = send = next_insn(p, BRW_OPCODE_SEND);

If I'm reading this correctly, all the callers in this patch use 'desc' of
type other than BRW_IMMEDIATE_VALUE. Hence returning the actual
send-instruction as the descriptor instuction is not needed by any of the
logic modified in this patch. Do we really need to do this or could we just
return NULL since in this case there really isn't any OR-instruction setting
the descriptor bits? (Your documentation above says that the returned
instruction is an OR setting the descriptor. Returning the SEND instead is
not the same really).

> +  brw_set_src1(p, send, desc);
> +
> +   } else {
> +  struct brw_reg addr = retype(brw_address_reg(0), BRW_REGISTER_TYPE_UD);
> +
> +  brw_push_insn_state(p);
> +  brw_set_default_access_mode(p, BRW_ALIGN_1);
> +  brw_set_default_mask_control(p, BRW_MASK_DISABLE);
> +  brw_set_default_predicate_control(p, BRW_PREDICATE_NONE);
> +
> +  /* Load the indirect descriptor to an address register using OR so the
> +   * caller can specify additional descriptor bits with the usual
> +   * brw_set_*_message() helper functions.
> +   */
> +  setup = brw_OR(p, ad

Re: [Mesa-dev] [PATCH 02/13] i965: Don't disable exec masking for sampler message sends.

2015-03-06 Thread Pohjolainen, Topi

On Fri, Feb 27, 2015 at 05:34:45PM +0200, Francisco Jerez wrote:
> This was telling the sampler to do texture fetches for *all* channels
> in the non-constant surface index case, what could have reduced
> throughput unnecessarily when some of the channels were disabled by
> control flow.
> ---
>  src/mesa/drivers/dri/i965/brw_fs_generator.cpp   | 12 ++--
>  src/mesa/drivers/dri/i965/brw_vec4_generator.cpp |  8 
>  2 files changed, 10 insertions(+), 10 deletions(-)

Reviewed-by: Topi Pohjolainen 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 04/13] i965: Mask out unused Align16 components in brw_untyped_atomic.

2015-03-06 Thread Pohjolainen, Topi

On Fri, Feb 27, 2015 at 05:34:47PM +0200, Francisco Jerez wrote:
> This is currently not a problem because the vec4 visitor happens to
> mask out unused components from the destination, but it might become
> an issue when we start using atomics without writeback message.  In
> any case it seems sensible to set it again here because the
> consequences of setting the wrong writemask (random graphics memory
> corruption) are difficult to debug and can easily go unnoticed.

I started thinking if this should be an assertion here and should we force
the logic in the visitor to consider the writemask correctly instead? I don't
have a strong opinion, merely just wondering aloud.

> ---
>  src/mesa/drivers/dri/i965/brw_eu_emit.c | 13 +++--
>  1 file changed, 11 insertions(+), 2 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_eu_emit.c 
> b/src/mesa/drivers/dri/i965/brw_eu_emit.c
> index 2b1d6ff..0b655d4 100644
> --- a/src/mesa/drivers/dri/i965/brw_eu_emit.c
> +++ b/src/mesa/drivers/dri/i965/brw_eu_emit.c
> @@ -2799,16 +2799,25 @@ brw_untyped_atomic(struct brw_compile *p,
> bool response_expected)
>  {
> const struct brw_context *brw = p->brw;
> +   const bool align1 = (brw_inst_access_mode(brw, p->current) == 
> BRW_ALIGN_1);
> +   /* Mask out unused components -- This is especially important in Align16
> +* mode on generations that don't have native support for SIMD4x2 atomics,
> +* because unused but enabled components will cause the dataport to 
> perform
> +* additional atomic operations on the addresses that happen to be in the
> +* uninitialized Y, Z and W coordinates of the payload.
> +*/
> +   const unsigned mask = (align1 ? WRITEMASK_XYZW : WRITEMASK_X);
> brw_inst *insn = brw_next_insn(p, BRW_OPCODE_SEND);
>  
> -   brw_set_dest(p, insn, retype(dest, BRW_REGISTER_TYPE_UD));
> +   brw_set_dest(p, insn, retype(brw_writemask(dest, mask),
> +BRW_REGISTER_TYPE_UD));
> brw_set_src0(p, insn, retype(payload, BRW_REGISTER_TYPE_UD));
> brw_set_src1(p, insn, brw_imm_d(0));
> brw_set_dp_untyped_atomic_message(
>p, insn, atomic_op, bind_table_index, msg_length,
>brw_surface_payload_size(p, response_expected,
> brw->gen >= 8 || brw->is_haswell, true),
> -  brw_inst_access_mode(brw, insn) == BRW_ALIGN_1);
> +  align1);
>  }
>  
>  static void
> -- 
> 2.1.3
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 06/13] i965: Simplify generator code for untyped surface messages.

2015-03-06 Thread Pohjolainen, Topi

On Fri, Feb 27, 2015 at 05:34:49PM +0200, Francisco Jerez wrote:
> The generate_untyped_*() methods do nothing useful other than calling
> the corresponding function from brw_eu_emit.c.  The calls to
> brw_mark_surface_used() will go away too in a future commit.
> ---
>  src/mesa/drivers/dri/i965/brw_fs.h   | 11 --
>  src/mesa/drivers/dri/i965/brw_fs_generator.cpp   | 42 +--
>  src/mesa/drivers/dri/i965/brw_vec4.h |  9 -
>  src/mesa/drivers/dri/i965/brw_vec4_generator.cpp | 43 
> +---
>  4 files changed, 18 insertions(+), 87 deletions(-)

Reviewed-by: Topi Pohjolainen 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 07/13] i965: Don't request untyped atomic writeback message if the destination is null.

2015-03-06 Thread Pohjolainen, Topi

On Fri, Feb 27, 2015 at 05:34:50PM +0200, Francisco Jerez wrote:
> ---
>  src/mesa/drivers/dri/i965/brw_fs_generator.cpp   | 2 +-
>  src/mesa/drivers/dri/i965/brw_vec4_generator.cpp | 3 ++-
>  2 files changed, 3 insertions(+), 2 deletions(-)
> 

Reviewed-by: Topi Pohjolainen 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 08/13] i965/vec4: Add support for untyped surface message sends from GRF.

2015-03-06 Thread Pohjolainen, Topi

On Fri, Feb 27, 2015 at 05:34:51PM +0200, Francisco Jerez wrote:
> This doesn't actually enable untyped surface message sends from GRF
> yet, the upcoming atomic counter and image intrinsic lowering code
> will.
> ---
>  src/mesa/drivers/dri/i965/brw_vec4.cpp   |  7 ---
>  src/mesa/drivers/dri/i965/brw_vec4_generator.cpp | 16 +++-
>  src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp   |  5 +++--
>  3 files changed, 14 insertions(+), 14 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp 
> b/src/mesa/drivers/dri/i965/brw_vec4.cpp
> index e19..0004b10 100644
> --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
> @@ -256,6 +256,8 @@ vec4_instruction::is_send_from_grf()
> switch (opcode) {
> case SHADER_OPCODE_SHADER_TIME_ADD:
> case VS_OPCODE_PULL_CONSTANT_LOAD_GEN7:
> +   case SHADER_OPCODE_UNTYPED_ATOMIC:
> +   case SHADER_OPCODE_UNTYPED_SURFACE_READ:
>return true;
> default:
>return false;
> @@ -270,6 +272,8 @@ vec4_instruction::regs_read(unsigned arg) const
>  
> switch (opcode) {
> case SHADER_OPCODE_SHADER_TIME_ADD:
> +   case SHADER_OPCODE_UNTYPED_ATOMIC:
> +   case SHADER_OPCODE_UNTYPED_SURFACE_READ:
>return arg == 0 ? mlen : 1;

Before the logic always falled back to returning one. Now we may return
one, two or three I think. I may be mistaken though, I'm just reading
vec4_visitor::emit_untyped_atomic() and it can produce message lengths up
to three.
Does this effect the instruction scheduling logic and if not, can you
explain why not?

>  
> case VS_OPCODE_PULL_CONSTANT_LOAD_GEN7:
> @@ -347,9 +351,6 @@ vec4_visitor::implied_mrf_writes(vec4_instruction *inst)
> case SHADER_OPCODE_TG4:
> case SHADER_OPCODE_TG4_OFFSET:
>return inst->header_present ? 1 : 0;
> -   case SHADER_OPCODE_UNTYPED_ATOMIC:
> -   case SHADER_OPCODE_UNTYPED_SURFACE_READ:
> -  return 0;
> default:
>unreachable("not reached");
> }
> diff --git a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp 
> b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
> index 22fdd63..ef0cde9 100644
> --- a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
> @@ -1459,19 +1459,17 @@ vec4_generator::generate_code(const cfg_t *cfg)
>   break;
>  
>case SHADER_OPCODE_UNTYPED_ATOMIC:
> - assert(src[0].file == BRW_IMMEDIATE_VALUE &&
> -src[1].file == BRW_IMMEDIATE_VALUE);
> - brw_untyped_atomic(p, dst, brw_message_reg(inst->base_mrf),
> -src[1], src[0].dw1.ud, inst->mlen,
> + assert(src[1].file == BRW_IMMEDIATE_VALUE &&
> +src[2].file == BRW_IMMEDIATE_VALUE);
> + brw_untyped_atomic(p, dst, src[0], src[2], src[1].dw1.ud, 
> inst->mlen,
>  !inst->dst.is_null());
> - brw_mark_surface_used(&prog_data->base, src[1].dw1.ud);
> + brw_mark_surface_used(&prog_data->base, src[2].dw1.ud);
>   break;
>  
>case SHADER_OPCODE_UNTYPED_SURFACE_READ:
> - assert(src[0].file == BRW_IMMEDIATE_VALUE);
> - brw_untyped_surface_read(p, dst, brw_message_reg(inst->base_mrf),
> -  src[0], inst->mlen, 1);
> - brw_mark_surface_used(&prog_data->base, src[0].dw1.ud);
> + assert(src[1].file == BRW_IMMEDIATE_VALUE);
> + brw_untyped_surface_read(p, dst, src[0], src[1], inst->mlen, 1);
> + brw_mark_surface_used(&prog_data->base, src[1].dw1.ud);
>   break;
>  
>case SHADER_OPCODE_FIND_LIVE_CHANNEL:
> diff --git a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp 
> b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
> index f25bff9..b8cfe8f 100644
> --- a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
> @@ -2953,6 +2953,7 @@ vec4_visitor::emit_untyped_atomic(unsigned atomic_op, 
> unsigned surf_index,
>  * unused channels will be masked out.
>  */
> vec4_instruction *inst = emit(SHADER_OPCODE_UNTYPED_ATOMIC, dst,
> + brw_message_reg(0),
>   src_reg(atomic_op), src_reg(surf_index));
> inst->base_mrf = 0;
> inst->mlen = mlen;
> @@ -2969,8 +2970,8 @@ vec4_visitor::emit_untyped_surface_read(unsigned 
> surf_index, dst_reg dst,
>  * untyped surface read message, but that's OK because unused
>  * channels will be masked out.
>  */
> -   vec4_instruction *inst = emit(SHADER_OPCODE_UNTYPED_SURFACE_READ,
> - dst, src_reg(surf_index));
> +   vec4_instruction *inst = emit(SHADER_OPCODE_UNTYPED_SURFACE_READ, dst,
> + brw_message_reg(0), src_reg(surf_index));
> inst->base_mrf = 0;
> inst->mlen = 1;
>  }
> -- 
> 2.1.3
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.fr

Re: [Mesa-dev] [PATCH 09/13] i965: Pass the number of components as a source of the untyped surface read opcode.

2015-03-06 Thread Pohjolainen, Topi

On Fri, Feb 27, 2015 at 05:34:52PM +0200, Francisco Jerez wrote:
> ---
>  src/mesa/drivers/dri/i965/brw_fs_generator.cpp   | 5 +++--
>  src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 2 +-
>  src/mesa/drivers/dri/i965/brw_vec4_generator.cpp | 6 --
>  src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp   | 3 ++-
>  4 files changed, 10 insertions(+), 6 deletions(-)
> 

Reviewed-by: Topi Pohjolainen 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 10/13] i965: Reorder sources of the untyped atomic opcode.

2015-03-06 Thread Pohjolainen, Topi

On Fri, Feb 27, 2015 at 05:34:53PM +0200, Francisco Jerez wrote:
> This is consistent with the untyped surface read opcode.  From now on
> all typed and untyped surface access opcodes will follow the same
> pattern: src[0] will be the message payload, src[1] will be the
> surface index and src[2] will be a control immediate (atomic operation
> for atomic opcodes and number of vector components for surface read
> and write opcodes).
> ---
>  src/mesa/drivers/dri/i965/brw_fs_generator.cpp   | 4 ++--
>  src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 2 +-
>  src/mesa/drivers/dri/i965/brw_vec4_generator.cpp | 4 ++--
>  src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp   | 2 +-
>  4 files changed, 6 insertions(+), 6 deletions(-)
> 

Reviewed-by: Topi Pohjolainen 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 03/13] i965: Pass number of components explicitly to brw_untyped_atomic and _surface_read.

2015-03-06 Thread Pohjolainen, Topi

On Fri, Feb 27, 2015 at 05:34:46PM +0200, Francisco Jerez wrote:
> And calculate the message response size based on the number of
> components rather than the other way around.  This simplifies their
> interface somewhat and allows the caller to request a writeback
> message with more than one vector component in SIMD4x2 mode.
> ---
>  src/mesa/drivers/dri/i965/brw_eu.h   |  4 ++--
>  src/mesa/drivers/dri/i965/brw_eu_emit.c  | 30 
> +++-
>  src/mesa/drivers/dri/i965/brw_fs_generator.cpp   |  9 ---
>  src/mesa/drivers/dri/i965/brw_vec4_generator.cpp |  5 ++--
>  4 files changed, 32 insertions(+), 16 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_eu.h 
> b/src/mesa/drivers/dri/i965/brw_eu.h
> index 9b1e0e2..87a9f3f 100644
> --- a/src/mesa/drivers/dri/i965/brw_eu.h
> +++ b/src/mesa/drivers/dri/i965/brw_eu.h
> @@ -403,7 +403,7 @@ brw_untyped_atomic(struct brw_compile *p,
> unsigned atomic_op,
> unsigned bind_table_index,
> unsigned msg_length,
> -   unsigned response_length);
> +   bool response_expected);

I had to think about this somewhat but after reading the rest of the series
I think this make sense.

Reviewed-by: Topi Pohjolainen 

>  
>  void
>  brw_untyped_surface_read(struct brw_compile *p,
> @@ -411,7 +411,7 @@ brw_untyped_surface_read(struct brw_compile *p,
>   struct brw_reg mrf,
>   unsigned bind_table_index,
>   unsigned msg_length,
> - unsigned response_length);
> + unsigned num_channels);
>  
>  void
>  brw_pixel_interpolator_query(struct brw_compile *p,
> diff --git a/src/mesa/drivers/dri/i965/brw_eu_emit.c 
> b/src/mesa/drivers/dri/i965/brw_eu_emit.c
> index cd2ce92..2b1d6ff 100644
> --- a/src/mesa/drivers/dri/i965/brw_eu_emit.c
> +++ b/src/mesa/drivers/dri/i965/brw_eu_emit.c
> @@ -2729,6 +2729,20 @@ brw_svb_write(struct brw_compile *p,
>  send_commit_msg); /* send_commit_msg */
>  }
>  
> +static unsigned
> +brw_surface_payload_size(struct brw_compile *p,
> + unsigned num_channels,
> + bool has_simd4x2,
> + bool has_simd16)
> +{
> +   if (has_simd4x2 && brw_inst_access_mode(p->brw, p->current) == 
> BRW_ALIGN_16)
> +  return 1;
> +   else if (has_simd16 && p->compressed)
> +  return 2 * num_channels;
> +   else
> +  return num_channels;
> +}
> +
>  static void
>  brw_set_dp_untyped_atomic_message(struct brw_compile *p,
>brw_inst *insn,
> @@ -2782,7 +2796,8 @@ brw_untyped_atomic(struct brw_compile *p,
> unsigned atomic_op,
> unsigned bind_table_index,
> unsigned msg_length,
> -   unsigned response_length) {
> +   bool response_expected)
> +{
> const struct brw_context *brw = p->brw;
> brw_inst *insn = brw_next_insn(p, BRW_OPCODE_SEND);
>  
> @@ -2790,7 +2805,9 @@ brw_untyped_atomic(struct brw_compile *p,
> brw_set_src0(p, insn, retype(payload, BRW_REGISTER_TYPE_UD));
> brw_set_src1(p, insn, brw_imm_d(0));
> brw_set_dp_untyped_atomic_message(
> -  p, insn, atomic_op, bind_table_index, msg_length, response_length,
> +  p, insn, atomic_op, bind_table_index, msg_length,
> +  brw_surface_payload_size(p, response_expected,
> +   brw->gen >= 8 || brw->is_haswell, true),
>brw_inst_access_mode(brw, insn) == BRW_ALIGN_1);
>  }
>  
> @@ -2800,12 +2817,12 @@ brw_set_dp_untyped_surface_read_message(struct 
> brw_compile *p,
>  unsigned bind_table_index,
>  unsigned msg_length,
>  unsigned response_length,
> +unsigned num_channels,
>  bool header_present)
>  {
> const struct brw_context *brw = p->brw;
> const unsigned dispatch_width =
>(brw_inst_exec_size(brw, insn) == BRW_EXECUTE_16 ? 16 : 8);
> -   const unsigned num_channels = response_length / (dispatch_width / 8);
>  
> if (brw->gen >= 8 || brw->is_haswell) {
>brw_set_message_descriptor(p, insn, HSW_SFID_DATAPORT_DATA_CACHE_1,
> @@ -2843,7 +2860,7 @@ brw_untyped_surface_read(struct brw_compile *p,
>   struct brw_reg mrf,
>   unsigned bind_table_index,
>   unsigned msg_length,
> - unsigned response_length)
> + unsigned num_channels)
>  {
> const struct brw_context *brw = p->brw;
> brw_inst *insn = next_insn(p, BRW_OPCODE_SEND);
> @@ -2851,8 +2868,9 @@ brw_untyped_surface_read(struct brw_compile *p,
> brw_set_dest(p, insn, retype(de

Re: [Mesa-dev] [PATCH 01/13] i965: Factor out logic to build a send message instruction with indirect descriptor.

2015-03-06 Thread Pohjolainen, Topi

On Fri, Mar 06, 2015 at 10:37:06AM +0200, Pohjolainen, Topi wrote:
> On Fri, Feb 27, 2015 at 05:34:44PM +0200, Francisco Jerez wrote:
> > ---
> >  src/mesa/drivers/dri/i965/brw_eu.h   | 19 ++--
> >  src/mesa/drivers/dri/i965/brw_eu_emit.c  | 58 
> > ++--
> >  src/mesa/drivers/dri/i965/brw_fs_generator.cpp   | 55 
> > +-
> >  src/mesa/drivers/dri/i965/brw_vec4_generator.cpp | 37 ---
> >  4 files changed, 77 insertions(+), 92 deletions(-)
> > 
> > diff --git a/src/mesa/drivers/dri/i965/brw_eu.h 
> > b/src/mesa/drivers/dri/i965/brw_eu.h
> > index 1b954c8..9b1e0e2 100644
> > --- a/src/mesa/drivers/dri/i965/brw_eu.h
> > +++ b/src/mesa/drivers/dri/i965/brw_eu.h
> > @@ -205,11 +205,6 @@ void brw_set_sampler_message(struct brw_compile *p,
> >   unsigned simd_mode,
> >   unsigned return_format);
> >  
> > -void brw_set_indirect_send_descriptor(struct brw_compile *p,
> > -  brw_inst *insn,
> > -  unsigned sfid,
> > -  struct brw_reg descriptor);
> > -
> >  void brw_set_dp_read_message(struct brw_compile *p,
> >  brw_inst *insn,
> >  unsigned binding_table_index,
> > @@ -243,6 +238,20 @@ void brw_urb_WRITE(struct brw_compile *p,
> >unsigned offset,
> >unsigned swizzle);
> >  
> > +/**
> > + * Send message to shared unit \p sfid with a possibly indirect descriptor 
> > \p
> > + * desc.  If the descriptor is not an immediate it will be transparently
> > + * loaded to an address register using an OR instruction that will be 
> > returned
> > + * to the caller so additional descriptor bits can be specified with the 
> > usual
> > + * brw_set_*_message() helper functions.
> > + */

Right, you exploit this in patch number five. I think at least this comment
is misleading as it doesn't say anything about the returned instruction
in case the given descriptor is an immediate.

All in all I'm not too happy about the return value having such differing
semantics depending on the given descriptor type.

> > +struct brw_inst *
> > +brw_send_indirect_message(struct brw_compile *p,
> > +  unsigned sfid,
> > +  struct brw_reg dst,
> > +  struct brw_reg payload,
> > +  struct brw_reg desc);
> > +
> >  void brw_ff_sync(struct brw_compile *p,
> >struct brw_reg dest,
> >unsigned msg_reg_nr,
> > diff --git a/src/mesa/drivers/dri/i965/brw_eu_emit.c 
> > b/src/mesa/drivers/dri/i965/brw_eu_emit.c
> > index e69840a..cd2ce92 100644
> > --- a/src/mesa/drivers/dri/i965/brw_eu_emit.c
> > +++ b/src/mesa/drivers/dri/i965/brw_eu_emit.c
> > @@ -751,21 +751,6 @@ brw_set_sampler_message(struct brw_compile *p,
> > }
> >  }
> >  
> > -void brw_set_indirect_send_descriptor(struct brw_compile *p,
> > -  brw_inst *insn,
> > -  unsigned sfid,
> > -  struct brw_reg descriptor)
> > -{
> > -   /* Only a0.0 may be used as SEND's descriptor operand. */
> > -   assert(descriptor.file == BRW_ARCHITECTURE_REGISTER_FILE);
> > -   assert(descriptor.type == BRW_REGISTER_TYPE_UD);
> > -   assert(descriptor.nr == BRW_ARF_ADDRESS);
> > -   assert(descriptor.subnr == 0);
> > -
> > -   brw_set_message_descriptor(p, insn, sfid, 0, 0, false, false);
> > -   brw_set_src1(p, insn, descriptor);
> > -}
> > -
> >  static void
> >  gen7_set_dp_scratch_message(struct brw_compile *p,
> >  brw_inst *inst,
> > @@ -2490,6 +2475,49 @@ void brw_urb_WRITE(struct brw_compile *p,
> >swizzle);
> >  }
> >  
> > +struct brw_inst *
> > +brw_send_indirect_message(struct brw_compile *p,
> > +  unsigned sfid,
> > +  struct brw_reg dst,
> > +  struct brw_reg payload,
> > +  struct brw_reg desc)
> > +{
> > +   const struct brw_context *brw = p->brw;
> > +   struct brw_inst *send, *setup;
> > +
> > +   assert(desc.type == BRW_REGISTER_TYPE_UD);
> > +
> > +   if (desc.file == BRW_IMMEDIATE_VALUE) {
> > +  setu

Re: [Mesa-dev] [PATCH] i965/nir: Resolve source modifiers on Gen8+ logic operations.

2015-03-06 Thread Pohjolainen, Topi

On Fri, Mar 06, 2015 at 01:33:05AM -0800, Kenneth Graunke wrote:
> On Gen8+, AND/OR/XOR/NOT don't support the abs() source modifier, and
> negate changes meaning to bitwise-not (~, not -).  This isn't what NIR
> expects, so we should resolve the source modifers via a MOV.
> 
> +30 Piglits (fs-op-bit{and,or,xor}-not-abs-*).
> 
> Signed-off-by: Kenneth Graunke 
> ---
>  src/mesa/drivers/dri/i965/brw_fs.cpp | 11 +++
>  src/mesa/drivers/dri/i965/brw_fs.h   |  1 +
>  src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 15 +++
>  3 files changed, 27 insertions(+)

Reviewed-by: Topi Pohjolainen 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 05/13] i965: Fix the untyped surface opcodes to deal with indirect surface access.

2015-03-06 Thread Pohjolainen, Topi

On Fri, Feb 27, 2015 at 05:34:48PM +0200, Francisco Jerez wrote:
> Change brw_untyped_atomic() and brw_untyped_surface_read() to take the
> surface index as a register instead of a constant and to use
> brw_send_indirect_message() to emit the indirect variant of send with
> a dynamically calculated message descriptor.  This will be required to
> support variable indexing of image arrays for
> ARB_shader_image_load_store.
> ---
>  src/mesa/drivers/dri/i965/brw_eu.h   |  10 +-
>  src/mesa/drivers/dri/i965/brw_eu_emit.c  | 158 
> +--
>  src/mesa/drivers/dri/i965/brw_fs_generator.cpp   |   4 +-
>  src/mesa/drivers/dri/i965/brw_vec4_generator.cpp |   4 +-
>  4 files changed, 96 insertions(+), 80 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_eu.h 
> b/src/mesa/drivers/dri/i965/brw_eu.h
> index 87a9f3f..9cc9123 100644
> --- a/src/mesa/drivers/dri/i965/brw_eu.h
> +++ b/src/mesa/drivers/dri/i965/brw_eu.h
> @@ -398,18 +398,18 @@ void brw_CMP(struct brw_compile *p,
>  
>  void
>  brw_untyped_atomic(struct brw_compile *p,
> -   struct brw_reg dest,
> +   struct brw_reg dst,
> struct brw_reg payload,
> +   struct brw_reg surface,
> unsigned atomic_op,
> -   unsigned bind_table_index,
> unsigned msg_length,
> bool response_expected);
>  
>  void
>  brw_untyped_surface_read(struct brw_compile *p,
> - struct brw_reg dest,
> - struct brw_reg mrf,
> - unsigned bind_table_index,
> + struct brw_reg dst,
> + struct brw_reg payload,
> + struct brw_reg surface,
>   unsigned msg_length,
>   unsigned num_channels);
>  
> diff --git a/src/mesa/drivers/dri/i965/brw_eu_emit.c 
> b/src/mesa/drivers/dri/i965/brw_eu_emit.c
> index 0b655d4..34695bf 100644
> --- a/src/mesa/drivers/dri/i965/brw_eu_emit.c
> +++ b/src/mesa/drivers/dri/i965/brw_eu_emit.c
> @@ -2518,6 +2518,48 @@ brw_send_indirect_message(struct brw_compile *p,
> return setup;
>  }
>  
> +static struct brw_inst *
> +brw_send_indirect_surface_message(struct brw_compile *p,
> +  unsigned sfid,
> +  struct brw_reg dst,
> +  struct brw_reg payload,
> +  struct brw_reg surface,
> +  unsigned message_len,
> +  unsigned response_len,
> +  bool header_present)
> +{
> +   const struct brw_context *brw = p->brw;
> +   struct brw_inst *insn;
> +
> +   if (surface.file != BRW_IMMEDIATE_VALUE) {
> +  struct brw_reg addr = retype(brw_address_reg(0), BRW_REGISTER_TYPE_UD);
> +
> +  brw_push_insn_state(p);
> +  brw_set_default_access_mode(p, BRW_ALIGN_1);
> +  brw_set_default_mask_control(p, BRW_MASK_DISABLE);
> +  brw_set_default_predicate_control(p, BRW_PREDICATE_NONE);
> +
> +  /* Mask out invalid bits from the surface index to avoid hangs e.g. 
> when
> +   * some surface array is accessed out of bounds.
> +   */
> +  insn = brw_AND(p, addr,
> + suboffset(vec1(retype(surface, BRW_REGISTER_TYPE_UD)),
> +   BRW_GET_SWZ(surface.dw1.bits.swizzle, 0)),
> + brw_imm_ud(0xff));
> +
> +  brw_pop_insn_state(p);
> +
> +  surface = addr;
> +   }
> +
> +   insn = brw_send_indirect_message(p, sfid, dst, payload, surface);
> +   brw_inst_set_mlen(brw, insn, message_len);
> +   brw_inst_set_rlen(brw, insn, response_len);
> +   brw_inst_set_header_present(brw, insn, header_present);

I'll continue the discussion we started with patch number one here if you
don't mind. What I find confusing is that in case 'surface' is not an
immediate then these three calls modify the OR-instruction. Otherwise they
modify the send. Or am I missing something?
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 05/13] i965: Fix the untyped surface opcodes to deal with indirect surface access.

2015-03-06 Thread Pohjolainen, Topi

On Fri, Mar 06, 2015 at 02:29:15PM +0200, Francisco Jerez wrote:
> "Pohjolainen, Topi"  writes:
> 
> > On Fri, Feb 27, 2015 at 05:34:48PM +0200, Francisco Jerez wrote:
> >> Change brw_untyped_atomic() and brw_untyped_surface_read() to take the
> >> surface index as a register instead of a constant and to use
> >> brw_send_indirect_message() to emit the indirect variant of send with
> >> a dynamically calculated message descriptor.  This will be required to
> >> support variable indexing of image arrays for
> >> ARB_shader_image_load_store.
> >> ---
> >>  src/mesa/drivers/dri/i965/brw_eu.h   |  10 +-
> >>  src/mesa/drivers/dri/i965/brw_eu_emit.c  | 158 
> >> +--
> >>  src/mesa/drivers/dri/i965/brw_fs_generator.cpp   |   4 +-
> >>  src/mesa/drivers/dri/i965/brw_vec4_generator.cpp |   4 +-
> >>  4 files changed, 96 insertions(+), 80 deletions(-)
> >> 
> >> diff --git a/src/mesa/drivers/dri/i965/brw_eu.h 
> >> b/src/mesa/drivers/dri/i965/brw_eu.h
> >> index 87a9f3f..9cc9123 100644
> >> --- a/src/mesa/drivers/dri/i965/brw_eu.h
> >> +++ b/src/mesa/drivers/dri/i965/brw_eu.h
> >> @@ -398,18 +398,18 @@ void brw_CMP(struct brw_compile *p,
> >>  
> >>  void
> >>  brw_untyped_atomic(struct brw_compile *p,
> >> -   struct brw_reg dest,
> >> +   struct brw_reg dst,
> >> struct brw_reg payload,
> >> +   struct brw_reg surface,
> >> unsigned atomic_op,
> >> -   unsigned bind_table_index,
> >> unsigned msg_length,
> >> bool response_expected);
> >>  
> >>  void
> >>  brw_untyped_surface_read(struct brw_compile *p,
> >> - struct brw_reg dest,
> >> - struct brw_reg mrf,
> >> - unsigned bind_table_index,
> >> + struct brw_reg dst,
> >> + struct brw_reg payload,
> >> + struct brw_reg surface,
> >>   unsigned msg_length,
> >>   unsigned num_channels);
> >>  
> >> diff --git a/src/mesa/drivers/dri/i965/brw_eu_emit.c 
> >> b/src/mesa/drivers/dri/i965/brw_eu_emit.c
> >> index 0b655d4..34695bf 100644
> >> --- a/src/mesa/drivers/dri/i965/brw_eu_emit.c
> >> +++ b/src/mesa/drivers/dri/i965/brw_eu_emit.c
> >> @@ -2518,6 +2518,48 @@ brw_send_indirect_message(struct brw_compile *p,
> >> return setup;
> >>  }
> >>  
> >> +static struct brw_inst *
> >> +brw_send_indirect_surface_message(struct brw_compile *p,
> >> +  unsigned sfid,
> >> +  struct brw_reg dst,
> >> +  struct brw_reg payload,
> >> +  struct brw_reg surface,
> >> +  unsigned message_len,
> >> +  unsigned response_len,
> >> +  bool header_present)
> >> +{
> >> +   const struct brw_context *brw = p->brw;
> >> +   struct brw_inst *insn;
> >> +
> >> +   if (surface.file != BRW_IMMEDIATE_VALUE) {
> >> +  struct brw_reg addr = retype(brw_address_reg(0), 
> >> BRW_REGISTER_TYPE_UD);
> >> +
> >> +  brw_push_insn_state(p);
> >> +  brw_set_default_access_mode(p, BRW_ALIGN_1);
> >> +  brw_set_default_mask_control(p, BRW_MASK_DISABLE);
> >> +  brw_set_default_predicate_control(p, BRW_PREDICATE_NONE);
> >> +
> >> +  /* Mask out invalid bits from the surface index to avoid hangs e.g. 
> >> when
> >> +   * some surface array is accessed out of bounds.
> >> +   */
> >> +  insn = brw_AND(p, addr,
> >> + suboffset(vec1(retype(surface, 
> >> BRW_REGISTER_TYPE_UD)),
> >> +   BRW_GET_SWZ(surface.dw1.bits.swizzle, 0)),
> >> + brw_imm_ud(0xff));
> >> +
> >> +  brw_pop_insn_state(p);
> >> +
> >> +  surface = addr;
> >> +   }
> >> +
> >> +   insn = brw_send_indirect_message(p, sfid, dst, payload, surface);
> >> +   brw_inst_set_mle

Re: [Mesa-dev] [PATCH 05/13] i965: Fix the untyped surface opcodes to deal with indirect surface access.

2015-03-06 Thread Pohjolainen, Topi

On Fri, Mar 06, 2015 at 02:46:51PM +0200, Francisco Jerez wrote:
> "Pohjolainen, Topi"  writes:
> 
> > On Fri, Mar 06, 2015 at 02:29:15PM +0200, Francisco Jerez wrote:
> >> "Pohjolainen, Topi"  writes:
> >> 
> >> > On Fri, Feb 27, 2015 at 05:34:48PM +0200, Francisco Jerez wrote:
> >> >> Change brw_untyped_atomic() and brw_untyped_surface_read() to take the
> >> >> surface index as a register instead of a constant and to use
> >> >> brw_send_indirect_message() to emit the indirect variant of send with
> >> >> a dynamically calculated message descriptor.  This will be required to
> >> >> support variable indexing of image arrays for
> >> >> ARB_shader_image_load_store.
> >> >> ---
> >> >>  src/mesa/drivers/dri/i965/brw_eu.h   |  10 +-
> >> >>  src/mesa/drivers/dri/i965/brw_eu_emit.c  | 158 
> >> >> +--
> >> >>  src/mesa/drivers/dri/i965/brw_fs_generator.cpp   |   4 +-
> >> >>  src/mesa/drivers/dri/i965/brw_vec4_generator.cpp |   4 +-
> >> >>  4 files changed, 96 insertions(+), 80 deletions(-)
> >> >> 
> >> >> diff --git a/src/mesa/drivers/dri/i965/brw_eu.h 
> >> >> b/src/mesa/drivers/dri/i965/brw_eu.h
> >> >> index 87a9f3f..9cc9123 100644
> >> >> --- a/src/mesa/drivers/dri/i965/brw_eu.h
> >> >> +++ b/src/mesa/drivers/dri/i965/brw_eu.h
> >> >> @@ -398,18 +398,18 @@ void brw_CMP(struct brw_compile *p,
> >> >>  
> >> >>  void
> >> >>  brw_untyped_atomic(struct brw_compile *p,
> >> >> -   struct brw_reg dest,
> >> >> +   struct brw_reg dst,
> >> >> struct brw_reg payload,
> >> >> +   struct brw_reg surface,
> >> >> unsigned atomic_op,
> >> >> -   unsigned bind_table_index,
> >> >> unsigned msg_length,
> >> >> bool response_expected);
> >> >>  
> >> >>  void
> >> >>  brw_untyped_surface_read(struct brw_compile *p,
> >> >> - struct brw_reg dest,
> >> >> - struct brw_reg mrf,
> >> >> - unsigned bind_table_index,
> >> >> + struct brw_reg dst,
> >> >> + struct brw_reg payload,
> >> >> + struct brw_reg surface,
> >> >>   unsigned msg_length,
> >> >>   unsigned num_channels);
> >> >>  
> >> >> diff --git a/src/mesa/drivers/dri/i965/brw_eu_emit.c 
> >> >> b/src/mesa/drivers/dri/i965/brw_eu_emit.c
> >> >> index 0b655d4..34695bf 100644
> >> >> --- a/src/mesa/drivers/dri/i965/brw_eu_emit.c
> >> >> +++ b/src/mesa/drivers/dri/i965/brw_eu_emit.c
> >> >> @@ -2518,6 +2518,48 @@ brw_send_indirect_message(struct brw_compile *p,
> >> >> return setup;
> >> >>  }
> >> >>  
> >> >> +static struct brw_inst *
> >> >> +brw_send_indirect_surface_message(struct brw_compile *p,
> >> >> +  unsigned sfid,
> >> >> +  struct brw_reg dst,
> >> >> +  struct brw_reg payload,
> >> >> +  struct brw_reg surface,
> >> >> +  unsigned message_len,
> >> >> +  unsigned response_len,
> >> >> +  bool header_present)
> >> >> +{
> >> >> +   const struct brw_context *brw = p->brw;
> >> >> +   struct brw_inst *insn;
> >> >> +
> >> >> +   if (surface.file != BRW_IMMEDIATE_VALUE) {
> >> >> +  struct brw_reg addr = retype(brw_address_reg(0), 
> >> >> BRW_REGISTER_TYPE_UD);
> >> >> +
> >> >> +  brw_push_insn_state(p);
> >> >> +  brw_set_default_access_mode(p, BRW_ALIGN_1);
> >> >> +  brw_set_default_mask_control(p, BRW_MASK_DISABLE);
> >> >> +  brw_set_default_predicate_control(p, BRW_PREDICATE_NONE);
> &g

Re: [Mesa-dev] [PATCH 01/13] i965: Factor out logic to build a send message instruction with indirect descriptor.

2015-03-06 Thread Pohjolainen, Topi

On Fri, Feb 27, 2015 at 05:34:44PM +0200, Francisco Jerez wrote:
> ---
>  src/mesa/drivers/dri/i965/brw_eu.h   | 19 ++--
>  src/mesa/drivers/dri/i965/brw_eu_emit.c  | 58 
> ++--
>  src/mesa/drivers/dri/i965/brw_fs_generator.cpp   | 55 +-
>  src/mesa/drivers/dri/i965/brw_vec4_generator.cpp | 37 ---
>  4 files changed, 77 insertions(+), 92 deletions(-)

After discussing this further in the context of patch number five I'm now
convinced and this patch is:

Reviewed-by: Topi Pohjolainen 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 08/13] i965/vec4: Add support for untyped surface message sends from GRF.

2015-03-07 Thread Pohjolainen, Topi

On Fri, Mar 06, 2015 at 03:11:27PM +0200, Francisco Jerez wrote:
> "Pohjolainen, Topi"  writes:
> 
> > On Fri, Feb 27, 2015 at 05:34:51PM +0200, Francisco Jerez wrote:
> >> This doesn't actually enable untyped surface message sends from GRF
> >> yet, the upcoming atomic counter and image intrinsic lowering code
> >> will.
> >> ---
> >>  src/mesa/drivers/dri/i965/brw_vec4.cpp   |  7 ---
> >>  src/mesa/drivers/dri/i965/brw_vec4_generator.cpp | 16 +++-
> >>  src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp   |  5 +++--
> >>  3 files changed, 14 insertions(+), 14 deletions(-)
> >> 
> >> diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp 
> >> b/src/mesa/drivers/dri/i965/brw_vec4.cpp
> >> index e19..0004b10 100644
> >> --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
> >> +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
> >> @@ -256,6 +256,8 @@ vec4_instruction::is_send_from_grf()
> >> switch (opcode) {
> >> case SHADER_OPCODE_SHADER_TIME_ADD:
> >> case VS_OPCODE_PULL_CONSTANT_LOAD_GEN7:
> >> +   case SHADER_OPCODE_UNTYPED_ATOMIC:
> >> +   case SHADER_OPCODE_UNTYPED_SURFACE_READ:
> >>return true;
> >> default:
> >>return false;
> >> @@ -270,6 +272,8 @@ vec4_instruction::regs_read(unsigned arg) const
> >>  
> >> switch (opcode) {
> >> case SHADER_OPCODE_SHADER_TIME_ADD:
> >> +   case SHADER_OPCODE_UNTYPED_ATOMIC:
> >> +   case SHADER_OPCODE_UNTYPED_SURFACE_READ:
> >>return arg == 0 ? mlen : 1;
> >
> > Before the logic always falled back to returning one. Now we may return
> > one, two or three I think. I may be mistaken though, I'm just reading
> > vec4_visitor::emit_untyped_atomic() and it can produce message lengths up
> > to three.
> > Does this effect the instruction scheduling logic and if not, can you
> > explain why not?
> >
> 
> Before my change that wouldn't ever happen because we were using fake
> MRFs to assemble the message payload and the MRF register index would be
> specified as inst->base_mrf, so the payload wouldn't be an actual source
> of the untyped surface instruction.  This change adds an additional
> source for the payload, but a fake MRF is still passed in as explicit
> source temporarily.  A future commit will change the vec4 visitor to
> build untyped and typed surface message payloads directly in normal GRFs
> instead of fake MRFs.

I checked the scheduler and confirmed this shouldn't change the current
behavior. If you like, you could add your explanation to the commit message
also. Either way:

Reviewed-by: Topi Pohjolainen 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [RFC] i965: Factor out descriptor building for indirect send messages

2015-03-08 Thread Pohjolainen, Topi

On Sat, Mar 07, 2015 at 04:15:08PM +0200, Francisco Jerez wrote:
> Topi Pohjolainen  writes:
> 
> > The original patch from Curro was based on something that is not
> > present in the master yet. This patch tries to mimick the logic on
> > top master.
> > I wanted to see if could separate the building of the descriptor
> > instruction from building of the send instruction. This logic now
> > allows the caller to construct any kind of sequence of instructions
> > filling in the descriptor before giving it to the send instruction
> > builder.
> >
> > This is only compile tested. Curro, how would you feel about this
> > sort of approach? I apologise for muddying the waters again but I
> > wasn't entirely comfortable with the logic and wanted to try to
> > something else.
> >
> > I believe patch number five should go nicely on top of this as
> > the descriptor instruction could be followed by (or preceeeded by)
> > any additional instructions modifying the descriptor register
> > before the actual send instruction.
> >
> 
> Topi, the goal I had in mind with PATCH 01 was to refactor a commonly
> recurring pattern.  In terms of the functions defined in this patch my
> example from yesterday [1] would now look like:
> 
> |   if (index.file == BRW_IMMEDIATE_VALUE) {
> |
> |  uint32_t surf_index = index.dw1.ud;
> |
> |  brw_inst *send = brw_next_insn(p, BRW_OPCODE_SEND);
> |  brw_set_dest(p, send, retype(dst, BRW_REGISTER_TYPE_UW));
> |  brw_set_src0(p, send, offset);
> |  brw_set_sampler_message(p, send,
> |  surf_index,
> |  0, /* LD message ignores sampler unit */
> |  GEN5_SAMPLER_MESSAGE_SAMPLE_LD,
> |  rlen,
> |  mlen,
> |  false, /* no header */
> |  simd_mode,
> |  0);
> |
> |  brw_mark_surface_used(prog_data, surf_index);
> |
> |   } else {
> |
> |  struct brw_reg addr = vec1(retype(brw_address_reg(0), 
> BRW_REGISTER_TYPE_UD));
> |
> |  brw_push_insn_state(p);
> |  brw_set_default_mask_control(p, BRW_MASK_DISABLE);
> |  brw_set_default_access_mode(p, BRW_ALIGN_1);
> |
> |  /* a0.0 = surf_index & 0xff */
> |  brw_inst *insn_and = brw_next_insn(p, BRW_OPCODE_AND);
> |  brw_inst_set_exec_size(p->brw, insn_and, BRW_EXECUTE_1);
> |  brw_set_dest(p, insn_and, addr);
> |  brw_set_src0(p, insn_and, vec1(retype(index, BRW_REGISTER_TYPE_UD)));
> |  brw_set_src1(p, insn_and, brw_imm_ud(0x0ff));
> |
> |
> |  /* a0.0 |=  */
> |  brw_inst *descr_inst = brw_build_indirect_message_descr(p, addr, addr);
> |  brw_set_sampler_message(p, descr_inst,
> |  0 /* surface */,
> |  0 /* sampler */,
> |  GEN5_SAMPLER_MESSAGE_SAMPLE_LD,
> |  rlen /* rlen */,
> |  mlen /* mlen */,
> |  false /* header */,
> |  simd_mode,
> |  0);
> |
> |  /* dst = send(offset, a0.0) */
> |  brw_send_indirect_message(p, BRW_SFID_SAMPLER, dst, offset, addr);
> |
> |  brw_pop_insn_state(p);
> |
> |  /* visitor knows more than we do about the surface limit required,
> |   * so has already done marking.
> |   */
> |   }

Which I think could also be written as follows. Or am I missing something
again?

static brw_inst *
brw_build_surface_index_descr(struct brw_compile *p,
  struct brw_reg dst, index)
{
   brw_set_default_mask_control(p, BRW_MASK_DISABLE);
   brw_set_default_access_mode(p, BRW_ALIGN_1);

   /* a0.0 = surf_index & 0xff */
   brw_inst *insn_and = brw_next_insn(p, BRW_OPCODE_AND);
   brw_inst_set_exec_size(p->brw, insn_and, BRW_EXECUTE_1);
   brw_set_dest(p, insn_and, addr);
   brw_set_src0(p, insn_and, vec1(retype(index, BRW_REGISTER_TYPE_UD)));
   brw_set_src1(p, insn_and, brw_imm_ud(0x0ff));

   /* a0.0 |=  */
   brw_inst *descr_inst = brw_build_indirect_message_descr(p, addr, addr);
}

...
   brw_inst *descr_inst;
   if (index.file == BRW_IMMEDIATE_VALUE) {
  descr = brw_next_insn(p, BRW_OPCODE_SEND);
  brw_set_dest(p, send, retype(dst, BRW_REGISTER_TYPE_UW));
  brw_set_src0(p, send, offset);

  brw_mark_surface_used(prog_data, surf_index);
   } else {
  struct brw_reg addr = vec1(retype(brw_address_reg(0),
 BRW_REGISTER_TYPE_UD));
  brw_push_insn_state(p);

  brw_build_surface_index_descr(p, addr, index);
  /* dst = send(offset, a0.0) */
  descr_inst = brw_send_indirect_message(p, BRW_SFID_SAMPLER,
 dst, offset, addr);
  brw_pop_insn_state(p);
   }

   uint32_t surf_index = index.file == BRW_IMMEDIATE_VALUE ? index.dw1.ud : 0;
   brw_set_sampler_message(p, des

Re: [Mesa-dev] [RFC] i965: Factor out descriptor building for indirect send messages

2015-03-10 Thread Pohjolainen, Topi

On Mon, Mar 09, 2015 at 12:43:08PM +0200, Francisco Jerez wrote:
> "Pohjolainen, Topi"  writes:
> 
> > On Sat, Mar 07, 2015 at 04:15:08PM +0200, Francisco Jerez wrote:
> >> Topi Pohjolainen  writes:
> >> 
> >> > The original patch from Curro was based on something that is not
> >> > present in the master yet. This patch tries to mimick the logic on
> >> > top master.
> >> > I wanted to see if could separate the building of the descriptor
> >> > instruction from building of the send instruction. This logic now
> >> > allows the caller to construct any kind of sequence of instructions
> >> > filling in the descriptor before giving it to the send instruction
> >> > builder.
> >> >
> >> > This is only compile tested. Curro, how would you feel about this
> >> > sort of approach? I apologise for muddying the waters again but I
> >> > wasn't entirely comfortable with the logic and wanted to try to
> >> > something else.
> >> >
> >> > I believe patch number five should go nicely on top of this as
> >> > the descriptor instruction could be followed by (or preceeeded by)
> >> > any additional instructions modifying the descriptor register
> >> > before the actual send instruction.
> >> >
> >> 
> >> Topi, the goal I had in mind with PATCH 01 was to refactor a commonly
> >> recurring pattern.  In terms of the functions defined in this patch my
> >> example from yesterday [1] would now look like:
> >> 
> >> |   if (index.file == BRW_IMMEDIATE_VALUE) {
> >> |
> >> |  uint32_t surf_index = index.dw1.ud;
> >> |
> >> |  brw_inst *send = brw_next_insn(p, BRW_OPCODE_SEND);
> >> |  brw_set_dest(p, send, retype(dst, BRW_REGISTER_TYPE_UW));
> >> |  brw_set_src0(p, send, offset);
> >> |  brw_set_sampler_message(p, send,
> >> |  surf_index,
> >> |  0, /* LD message ignores sampler unit */
> >> |  GEN5_SAMPLER_MESSAGE_SAMPLE_LD,
> >> |  rlen,
> >> |  mlen,
> >> |  false, /* no header */
> >> |  simd_mode,
> >> |  0);
> >> |
> >> |  brw_mark_surface_used(prog_data, surf_index);
> >> |
> >> |   } else {
> >> |
> >> |  struct brw_reg addr = vec1(retype(brw_address_reg(0), 
> >> BRW_REGISTER_TYPE_UD));
> >> |
> >> |  brw_push_insn_state(p);
> >> |  brw_set_default_mask_control(p, BRW_MASK_DISABLE);
> >> |  brw_set_default_access_mode(p, BRW_ALIGN_1);
> >> |
> >> |  /* a0.0 = surf_index & 0xff */
> >> |  brw_inst *insn_and = brw_next_insn(p, BRW_OPCODE_AND);
> >> |  brw_inst_set_exec_size(p->brw, insn_and, BRW_EXECUTE_1);
> >> |  brw_set_dest(p, insn_and, addr);
> >> |  brw_set_src0(p, insn_and, vec1(retype(index, 
> >> BRW_REGISTER_TYPE_UD)));
> >> |  brw_set_src1(p, insn_and, brw_imm_ud(0x0ff));
> >> |
> >> |
> >> |  /* a0.0 |=  */
> >> |  brw_inst *descr_inst = brw_build_indirect_message_descr(p, addr, 
> >> addr);
> >> |  brw_set_sampler_message(p, descr_inst,
> >> |  0 /* surface */,
> >> |  0 /* sampler */,
> >> |  GEN5_SAMPLER_MESSAGE_SAMPLE_LD,
> >> |  rlen /* rlen */,
> >> |  mlen /* mlen */,
> >> |  false /* header */,
> >> |  simd_mode,
> >> |  0);
> >> |
> >> |  /* dst = send(offset, a0.0) */
> >> |  brw_send_indirect_message(p, BRW_SFID_SAMPLER, dst, offset, addr);
> >> |
> >> |  brw_pop_insn_state(p);
> >> |
> >> |  /* visitor knows more than we do about the surface limit required,
> >> |   * so has already done marking.
> >> |   */
> >> |   }
> >
> > Which I think could also be written as follows. Or am I missing something
> > again?
> >
> > static brw_inst *
> > brw_build_surface_index_descr(struct brw_compile *p,
> >   struct brw_reg dst, index)
> > {
> >

Re: [Mesa-dev] [PATCH] i965: Implement another VF cache invalidate workaround on Gen8+.

2017-01-30 Thread Pohjolainen, Topi

On Sun, Jan 29, 2017 at 08:24:16PM -0800, Kenneth Graunke wrote:
> ...and provide a better citation for the existing one.
> 
> Signed-off-by: Kenneth Graunke 
> ---
>  src/mesa/drivers/dri/i965/brw_pipe_control.c | 32 
> +---
>  1 file changed, 24 insertions(+), 8 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_pipe_control.c 
> b/src/mesa/drivers/dri/i965/brw_pipe_control.c
> index b8f740640f2..3e08841e0a9 100644
> --- a/src/mesa/drivers/dri/i965/brw_pipe_control.c
> +++ b/src/mesa/drivers/dri/i965/brw_pipe_control.c
> @@ -118,14 +118,30 @@ brw_emit_pipe_control_flush(struct brw_context *brw, 
> uint32_t flags)
>if (brw->gen == 8)
>   gen8_add_cs_stall_workaround_bits(&flags);
>  
> -  if (brw->gen == 9 &&
> -  (flags & PIPE_CONTROL_VF_CACHE_INVALIDATE)) {
> - /* Hardware workaround: SKL
> -  *
> -  * Emit Pipe Control with all bits set to zero before emitting
> -  * a Pipe Control with VF Cache Invalidate set.
> -  */
> - brw_emit_pipe_control_flush(brw, 0);
> +  if (flags & PIPE_CONTROL_VF_CACHE_INVALIDATE) {
> + if (brw->gen >= 9) {
> +/* The PIPE_CONTROL "VF Cache Invalidation Enable" bit 
> description
> + * lists several workarounds:
> + *
> + * "Projects: SKL, KBL, BXT
> + *  If the VF Cache Invalidation Enable is set to a 1 in a
> + *  PIPE_CONTROL, a separate Null PIPE_CONTROL, all bitfields 
> sets
> + *  to 0, with the VF Cache Invalidation Enable set to 0 needs to
> + *  be sent prior to the PIPE_CONTROL with VF Cache Invalidation
> + *  Enable set to a 1."
> + */
> +brw_emit_pipe_control_flush(brw, 0);
> +
> +/* "Projects: BDW+
> + *  When VF Cache Invalidate is set ???Post Sync Operation??? 
> must
> + *  be enabled to ???Write Immediate Data??? or ???Write PS 
> Depth Count???
> + *  or ???Write Timestamp???.
> + */
> +brw_emit_pipe_control_write(brw,
> +flags | PIPE_CONTROL_WRITE_IMMEDIATE,
> +brw->workaround_bo, 0, 0, 0);

Your title says gen8+, this is still within the gen9+ block. Did you mean to
have it also for gen8?

> +return;
> + }
>}
>  
>BEGIN_BATCH(6);
> -- 
> 2.11.0
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [v2 6/9] i965: Add support for tex upload using gpu

2017-01-31 Thread Pohjolainen, Topi

On Tue, Jan 31, 2017 at 06:13:45PM +0200, Topi Pohjolainen wrote:
> v2:
>- Fix return value (s/MESA_FORMAT_NONE/false/) (Anuj)
>- Move _mesa_tex_format_from_format_and_type() just
>  in the end avoiding additional if-block (Anuj)
>- Explain better the array alignment restriction (Anuj)
>- Do not bail out in case of gl_pixelstore_attrib::ImageHeight,
>  it is handled by _mesa_image_offset() automatically (Ken).

This is actually wrong. I missed to take the adjusted height into account
when iterating over individual layers. It looks that we are missing tests for
this as I didn't regress anything.

>- Support 1D_ARRAY by flipping depth, width and y, z (Ken).
> 
> CC: Kenneth Graunke 
> CC: Anuj Phogat 
> Signed-off-by: Topi Pohjolainen 
> ---
>  src/mesa/drivers/dri/i965/intel_tex.h  |   8 ++
>  src/mesa/drivers/dri/i965/intel_tex_subimage.c | 186 
> +
>  2 files changed, 194 insertions(+)
> 
> diff --git a/src/mesa/drivers/dri/i965/intel_tex.h 
> b/src/mesa/drivers/dri/i965/intel_tex.h
> index 376f075..c7d0937 100644
> --- a/src/mesa/drivers/dri/i965/intel_tex.h
> +++ b/src/mesa/drivers/dri/i965/intel_tex.h
> @@ -65,6 +65,14 @@ intel_texsubimage_tiled_memcpy(struct gl_context *ctx,
> bool for_glTexImage);
>  
>  bool
> +intel_texsubimage_gpu_copy(struct brw_context *brw, GLuint dims,
> +   struct gl_texture_image *tex_image,
> +   unsigned x, unsigned y, unsigned z,
> +   unsigned w, unsigned h, unsigned d,
> +   GLenum format, GLenum type, const void *pixels,
> +   const struct gl_pixelstore_attrib *packing);
> +
> +bool
>  intel_gettexsubimage_tiled_memcpy(struct gl_context *ctx,
>struct gl_texture_image *texImage,
>GLint xoffset, GLint yofset,
> diff --git a/src/mesa/drivers/dri/i965/intel_tex_subimage.c 
> b/src/mesa/drivers/dri/i965/intel_tex_subimage.c
> index 57c4c38..43e816b 100644
> --- a/src/mesa/drivers/dri/i965/intel_tex_subimage.c
> +++ b/src/mesa/drivers/dri/i965/intel_tex_subimage.c
> @@ -24,6 +24,7 @@
>   */
>  
>  #include "main/bufferobj.h"
> +#include "main/glformats.h"
>  #include "main/image.h"
>  #include "main/macros.h"
>  #include "main/mtypes.h"
> @@ -34,8 +35,10 @@
>  #include "main/enums.h"
>  #include "drivers/common/meta.h"
>  
> +#include "brw_blorp.h"
>  #include "brw_context.h"
>  #include "intel_batchbuffer.h"
> +#include "intel_buffer_objects.h"
>  #include "intel_fbo.h"
>  #include "intel_tex.h"
>  #include "intel_mipmap_tree.h"
> @@ -44,6 +47,189 @@
>  
>  #define FILE_DEBUG_FLAG DEBUG_TEXTURE
>  
> +static drm_intel_bo *
> +intel_texsubimage_get_src_as_bo(struct brw_context *brw, unsigned dims,
> +unsigned w, unsigned h, unsigned d,
> +GLenum format, GLenum type, const void 
> *pixels,
> +const struct gl_pixelstore_attrib *packing)
> +{
> +   /* Account for SKIP_PIXELS, SKIP_ROWS, ALIGNMENT, and SKIP_IMAGES */
> +   const uint32_t first_pixel = _mesa_image_offset(dims, packing, w, h,
> +   format, type, 0, 0, 0);
> +   const uint32_t last_pixel =  _mesa_image_offset(dims, packing, w, h,
> +   format, type,
> +   d - 1, h - 1, w);
> +   const uint32_t size = last_pixel - first_pixel;
> +
> +   drm_intel_bo * const bo =
> +  drm_intel_bo_alloc(brw->bufmgr, "tmp_tex_subimage_src", size, 64);
> +
> +   if (bo == NULL) {
> +  perf_debug("intel_texsubimage: temp bo creation failed: size = %u\n",
> + size);
> +  return false;
> +   }
> +
> +   if (drm_intel_bo_subdata(bo, 0, size, pixels + first_pixel)) {
> +  perf_debug("intel_texsubimage: temp bo upload failed\n");
> +  drm_intel_bo_unreference(bo);
> +  return NULL;
> +   }
> +
> +   return bo;
> +}
> +
> +static uint32_t
> +intel_texsubimage_get_src_offset(unsigned dims, unsigned w, unsigned h,
> + GLenum format, GLenum type,
> + const void *pixels,
> + const struct gl_pixelstore_attrib *packing)
> +{
> +   /* Account for SKIP_PIXELS, SKIP_ROWS, ALIGNMENT, and SKIP_IMAGES */
> +   const uint32_t first_pixel = _mesa_image_offset(dims, packing, w, h,
> +   format, type, 0, 0, 0);
> +
> +   /* In case of buffer object source 'pixels' represents offset in bytes. */
> +   return first_pixel + (intptr_t)pixels;
> +}
> +
> +/* Consider all the restrictions and determine the format of the source. */
> +static mesa_format
> +intel_texsubimage_check_upload(struct brw_context *brw,
> +   const struct gl_texture_im

Re: [Mesa-dev] [v2 4/9] i965: Estimate batch space per shader stage

2017-01-31 Thread Pohjolainen, Topi

On Tue, Jan 31, 2017 at 11:12:31AM -0800, Jason Ekstrand wrote:
>On Tue, Jan 31, 2017 at 10:38 AM, Jason Ekstrand
><[1]ja...@jlekstrand.net> wrote:
> 
>On Tue, Jan 31, 2017 at 8:15 AM, Topi Pohjolainen
><[2]topi.pohjolai...@gmail.com> wrote:
> 
>  Current estimate doesn't consider space needed for surface states
>  and it only calculates for one shader stage. Each stage can have
>  its own sampler and surface state configuration.
>  While this is only matter of runtime dynamics we don't seem to hit
>  it currently. However, this becomes visible with blorp tex uploads
>  (HSW with piglit test max-samplers). One runs out of space while
>  batch wrapping isn't allowed.
> 
>Also, what happens when this case is triggered?  Do we assert fail?

Assert on debug build, I didn't try release.

>GPU Hang?  Just nicely handle it by flushing the batch?
>--Jason
> 
>  v2: Rebase on top of current upstream
>  Signed-off-by: Topi Pohjolainen <[3]topi.pohjolai...@intel.com>
>  CC: Kenneth Graunke <[4]kenn...@whitecape.org>
>  CC: Jason Ekstrand <[5]ja...@jlekstrand.net>
>  ---
>   src/mesa/drivers/dri/i965/brw_draw.c | 49
>  +---
>   1 file changed, 46 insertions(+), 3 deletions(-)
>  diff --git a/src/mesa/drivers/dri/i965/brw_draw.c
>  b/src/mesa/drivers/dri/i965/brw_draw.c
>  index 0db7311..83a9f33 100644
>  --- a/src/mesa/drivers/dri/i965/brw_draw.c
>  +++ b/src/mesa/drivers/dri/i965/brw_draw.c
>  @@ -395,6 +395,51 @@ brw_postdraw_set_buffers_need_resolve(struct
>  brw_context *brw)
> 
>}
> }
>+static unsigned
>+brw_get_num_active_samplers(const struct gl_context *ctx,
>+const struct gl_program *prog)
>+{
>+   const unsigned last = util_last_bit(prog->SamplersUsed);
>+   unsigned count = 0;
>+
>+   for (unsigned s = 0; s < last; s++) {
>+  if (prog->SamplersUsed & (1 << s)) {
>+ const unsigned unit = prog->SamplerUnits[s];
>+ if (ctx->Texture.Unit[unit]._Current)
>+++count;
>+  }
>+   }
>+
>+   return count;
>+}
>+
>+static unsigned
>+brw_estimate_batch_space_for_textures(const struct brw_context *brw)
>+{
>+   const struct gl_context *ctx = &brw->ctx;
>+   unsigned total = 0;
>+
>+   for (unsigned i = 0; i < MESA_SHADER_STAGES; i++) {
> 
>  +  const struct gl_program *prog =
>  ctx->_Shader->CurrentProgram[i];
>  +
>  +  if (prog == NULL)
>  + continue;
>  +
>  +  const unsigned num_samplers = brw_get_num_active_samplers(ct
>  x, prog);
>  +  const unsigned sampler_needs_per_tex_unit =
>  + 16 /* sampler_state_size */ +
>  + sizeof(struct gen5_sampler_default_color);
>  +  const unsigned surface_state_needs_per_tex_unit =
>  + ALIGN(brw->isl_dev.ss.size, brw->isl_dev.ss.align) +
>  + 4 /* binding table pointer */;
>  +  const unsigned total_per_tex_unit =
>  sampler_needs_per_tex_unit +
>  +
>  surface_state_needs_per_tex_unit;
>  +  total += (num_samplers * total_per_tex_unit);
> 
>This isn't exactly correct.  While it's true that a binding table entry
>only consumes 4 bytes, binding table sizes have to be rounded up to 64B
>so, if you have a number of samplers that is not a multiple of 16, this
>will underestimate.

Good catch, thanks!

> 
>  +   }
>  +
>  +   return total;
>  +}
>  +
>   /* May fail if out of video memory for texture or vbo upload, or on
>* fallback conditions.
>*/
>  @@ -477,11 +522,9 @@ brw_try_draw_prims(struct gl_context *ctx,
> 
>for (i = 0; i < nr_prims; i++) {
>   int estimated_max_prim_size;
>-  const int sampler_state_size = 16;
>   estimated_max_prim_size = 512; /* batchbuffer commands */
>-  estimated_max_prim_size += BRW_MAX_TEX_UNIT *
>- (sampler_state_size + sizeof(struct
>gen5_sampler_default_color));
>+  estimated_max_prim_size += brw_estimate_batch_space_for_t
>extures(brw);
>   estimated_max_prim_size += 1024; /* gen6 VS push constants */
>   estimated_max_prim_size += 1024; /* gen6 WM push constants */
>   estimated_max_prim_size += 512; /* misc. pad */
>--
>2.5.5
> 
> References
> 
>1. mailto:ja...@jlekstrand.net
>2. mailto:topi.pohjolai...@gmail.com
>3. mailto:topi.pohjolai...@intel.com
>4. mailto:kenn...@whitecape.org
>5. mailto:ja...@jlekstrand.net
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 31/34] i965: Use partial resolves for CCS buffers being scanned out

2017-01-31 Thread Pohjolainen, Topi

On Tue, Jan 31, 2017 at 01:37:25PM -0800, Jason Ekstrand wrote:
>On Wed, Jan 25, 2017 at 10:39 AM, Pohjolainen, Topi
><[1]topi.pohjolai...@gmail.com> wrote:
> 
>  On Mon, Jan 23, 2017 at 10:21:54PM -0800, Ben Widawsky wrote:
>  > On Gen9 hardware, the display engine is able to scanout a
>  compressed
>  > framebuffer by providing an offset to auxiliary compression
>  information.
>  > Unfortunately, the hardware is incapable of doing the same thing
>  for the
>  > fast clear color.
>  >
>  > To mitigate this, the hardware introduced a new resolve type
>  called a
>  > partial resolve. The partial resolve will only do a resolve of the
>  fast
>  > clear color and leave the rest of the compressed data alone.
>  >
>  > This patch enables using this resolve type for cases where the
>  > framebuffer will be passed along to the kernel for display.
>  >
>  > v2: Add early exit from intel_miptree_make_shareable() when it's
>  > scanout.
>  >
>  > Signed-off-by: Ben Widawsky <[2]b...@bwidawsk.net>
>  > Acked-by: Daniel Stone <[3]dani...@collabora.com>
>  > Reviewed-by: Topi Pohjolainen <[4]topi.pohjolai...@intel.com> (v1)
>  v2 is also
>  Reviewed-by: Topi Pohjolainen <[5]topi.pohjolai...@intel.com>
> 
>> ---
>>  src/mesa/drivers/dri/i965/brw_context.c   | 3 ++-
>>  src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 9 -
>>  2 files changed, 10 insertions(+), 2 deletions(-)
>>
>> diff --git a/src/mesa/drivers/dri/i965/brw_context.c
>b/src/mesa/drivers/dri/i965/brw_context.c
>> index 64b55a8cb7..adfd4c449e 100644
>> --- a/src/mesa/drivers/dri/i965/brw_context.c
>> +++ b/src/mesa/drivers/dri/i965/brw_context.c
>> @@ -1360,7 +1360,8 @@ intel_resolve_for_dri2_flush(struct brw_context
>*brw,
>>if (rb->mt->num_samples <= 1) {
>>   assert(rb->mt_layer == 0 && rb->mt_level == 0 &&
>>  rb->layer_count == 1);
>> - intel_miptree_resolve_color(brw, rb->mt, 0, 0, 1, 0);
>> + intel_miptree_resolve_color(brw, rb->mt, 0, 0, 1,
>> + INTEL_RESOLVE_HINT_CLEAR_
>COLOR);
>>} else {
>>   intel_renderbuffer_downsample(brw, rb);
>>}
>> diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
>b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
>> index 217e80ae31..7edce7d92e 100644
>> --- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
>> +++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
>> @@ -2415,7 +2415,14 @@ intel_miptree_make_shareable(struct
>brw_context *brw,
>> assert(mt->msaa_layout == INTEL_MSAA_LAYOUT_NONE ||
>mt->num_samples <= 1);
>>
>> if (mt->mcs_buf) {
>> -  intel_miptree_all_slices_resolve_color(brw, mt, 0);
>> +  intel_miptree_all_slices_resolve_color(brw, mt, mt->is_scanout
>?
>> +
>INTEL_RESOLVE_HINT_CLEAR_COLOR :
>> +
>INTEL_RESOLVE_HINT_FULL);
> 
>Do we need to be checking modifiers here?  Just because it's marked as
>scanout doesn't mean that the client using it actually knows about CCS
>and will pass it on to the kernel.

I think I had similar concerns earlier. My current understanding (after
dicussing with Ben) is that the choice of support is made when the surfaces
are created. In other words, if mcs buffer exists for scanout miptree it
means that the client has told that it understands auxiliary buffers.

Here the flag tells to the resolver that client doesn't understand fast clear
but lossless compression is supported. Ben, does this explanation agree with
what you told me?

> 
>> +  if (mt->is_scanout) {
>> + assert(!mt->hiz_buf);
>> + return;
>> +  }
>> +
>>mt->aux_disable |= (INTEL_AUX_DISABLE_CCS |
>INTEL_AUX_DISABLE_MCS);
>>drm_intel_bo_unreference(mt->mcs_buf->bo);
>>free(mt->mcs_buf);
>> --
>> 2.11.0
>>
>> ___
>> mesa-dev mailing list
>> [6]mesa-dev@lists.freedesktop.org
>> [7]https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>___
>mesa-dev mailing list
>[8]mesa-dev@lists.freedesktop.org
&g

Re: [Mesa-dev] [PATCH 2/2] blorp: Embed a wm_prog_data in blorp_prog_data

2017-02-01 Thread Pohjolainen, Topi

On Tue, Jan 31, 2017 at 11:05:28AM -0800, Jason Ekstrand wrote:
> While we're at it, we rename it to remove the brw_ prefix

Nice! Both patches:

Reviewed-by: Topi Pohjolainen 

> 
> Signed-off-by: Jason Ekstrand 
> ---
>  src/intel/blorp/blorp.c   | 26 +-
>  src/intel/blorp/blorp_blit.c  |  2 +-
>  src/intel/blorp/blorp_clear.c |  2 +-
>  src/intel/blorp/blorp_genX_exec.h | 72 
> +++
>  src/intel/blorp/blorp_priv.h  | 30 
>  5 files changed, 52 insertions(+), 80 deletions(-)
> 
> diff --git a/src/intel/blorp/blorp.c b/src/intel/blorp/blorp.c
> index 2f27274..8a90c3b 100644
> --- a/src/intel/blorp/blorp.c
> +++ b/src/intel/blorp/blorp.c
> @@ -145,7 +145,7 @@ const unsigned *
>  brw_blorp_compile_nir_shader(struct blorp_context *blorp, struct nir_shader 
> *nir,
>   const struct brw_wm_prog_key *wm_key,
>   bool use_repclear,
> - struct brw_blorp_prog_data *prog_data,
> + struct blorp_prog_data *prog_data,
>   unsigned *program_size)
>  {
> const struct brw_compiler *compiler = blorp->compiler;
> @@ -160,15 +160,14 @@ brw_blorp_compile_nir_shader(struct blorp_context 
> *blorp, struct nir_shader *nir
> nir->options =
>compiler->glsl_compiler_options[MESA_SHADER_FRAGMENT].NirOptions;
>  
> -   struct brw_wm_prog_data wm_prog_data;
> -   memset(&wm_prog_data, 0, sizeof(wm_prog_data));
> +   memset(prog_data, 0, sizeof(*prog_data));
>  
> -   wm_prog_data.base.nr_params = 0;
> -   wm_prog_data.base.param = NULL;
> +   prog_data->wm.base.nr_params = 0;
> +   prog_data->wm.base.param = NULL;
>  
> /* BLORP always just uses the first two binding table entries */
> -   wm_prog_data.binding_table.render_target_start = 
> BLORP_RENDERBUFFER_BT_INDEX;
> -   wm_prog_data.base.binding_table.texture_start = BLORP_TEXTURE_BT_INDEX;
> +   prog_data->wm.binding_table.render_target_start = 
> BLORP_RENDERBUFFER_BT_INDEX;
> +   prog_data->wm.base.binding_table.texture_start = BLORP_TEXTURE_BT_INDEX;
>  
> nir = brw_preprocess_nir(compiler, nir);
> nir_remove_dead_variables(nir, nir_var_shader_in);
> @@ -176,21 +175,12 @@ brw_blorp_compile_nir_shader(struct blorp_context 
> *blorp, struct nir_shader *nir
>  
> const unsigned *program =
>brw_compile_fs(compiler, blorp->driver_ctx, mem_ctx,
> - wm_key, &wm_prog_data, nir,
> + wm_key, &prog_data->wm, nir,
>   NULL, -1, -1, false, use_repclear, program_size, NULL);
>  
> -   /* Copy the relavent bits of wm_prog_data over into the blorp prog data */
> -   prog_data->dispatch_8 = wm_prog_data.dispatch_8;
> -   prog_data->dispatch_16 = wm_prog_data.dispatch_16;
> -   prog_data->first_curbe_grf_0 = wm_prog_data.base.dispatch_grf_start_reg;
> -   prog_data->first_curbe_grf_2 = wm_prog_data.dispatch_grf_start_reg_2;
> -   prog_data->ksp_offset_2 = wm_prog_data.prog_offset_2;
> -   prog_data->persample_msaa_dispatch = wm_prog_data.persample_dispatch;
> -   prog_data->flat_inputs = wm_prog_data.flat_inputs;
> -   prog_data->num_varying_inputs = wm_prog_data.num_varying_inputs;
> prog_data->inputs_read = nir->info.inputs_read;
>  
> -   assert(wm_prog_data.base.nr_params == 0);
> +   assert(prog_data->wm.base.nr_params == 0);
>  
> return program;
>  }
> diff --git a/src/intel/blorp/blorp_blit.c b/src/intel/blorp/blorp_blit.c
> index af46389..fc9e737 100644
> --- a/src/intel/blorp/blorp_blit.c
> +++ b/src/intel/blorp/blorp_blit.c
> @@ -1235,7 +1235,7 @@ brw_blorp_get_blit_kernel(struct blorp_context *blorp,
>  
> const unsigned *program;
> unsigned program_size;
> -   struct brw_blorp_prog_data prog_data;
> +   struct blorp_prog_data prog_data;
>  
> /* Try and compile with NIR first.  If that fails, fall back to the old
>  * method of building shaders manually.
> diff --git a/src/intel/blorp/blorp_clear.c b/src/intel/blorp/blorp_clear.c
> index ce9b292..01889b8 100644
> --- a/src/intel/blorp/blorp_clear.c
> +++ b/src/intel/blorp/blorp_clear.c
> @@ -73,7 +73,7 @@ blorp_params_get_clear_kernel(struct blorp_context *blorp,
> struct brw_wm_prog_key wm_key;
> brw_blorp_init_wm_prog_key(&wm_key);
>  
> -   struct brw_blorp_prog_data prog_data;
> +   struct blorp_prog_data prog_data;
> unsigned program_size;
> const unsigned *program =
>brw_blorp_compile_nir_shader(blorp, b.shader, &wm_key, 
> use_replicated_data,
> diff --git a/src/intel/blorp/blorp_genX_exec.h 
> b/src/intel/blorp/blorp_genX_exec.h
> index 71beadc..bb665e2 100644
> --- a/src/intel/blorp/blorp_genX_exec.h
> +++ b/src/intel/blorp/blorp_genX_exec.h
> @@ -156,7 +156,7 @@ emit_urb_config(struct blorp_batch *batch,
>  * where 'n' stands for number of varying inputs expressed as vec4s.
>  */
>  const unsigned num_varyings =
> -   params->wm_prog_data ?

Re: [Mesa-dev] [PATCH 20/34] i965: Restructure CCS disable

2017-02-06 Thread Pohjolainen, Topi

On Sun, Feb 05, 2017 at 10:48:11PM -0800, Ben Widawsky wrote:
> On 17-01-25 20:53:44, Topi Pohjolainen Topi Pohjolainen wrote:
> > On Mon, Jan 23, 2017 at 10:21:43PM -0800, Ben Widawsky wrote:
> > > Make the code only disable CCS when it has to, unlike before where it
> > > disabled CCS and enabled it when it could. This is much more inline with
> > > how it should work in a few patches, where we have fewer restrictions as
> > > to when we disable CCS.
> > > 
> > > Signed-off-by: Ben Widawsky 
> > > ---
> > >  src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 7 +--
> > >  1 file changed, 5 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c 
> > > b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
> > > index db1732159b..8a30d72d4c 100644
> > > --- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
> > > +++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
> > > @@ -329,7 +329,6 @@ intel_miptree_create_layout(struct brw_context *brw,
> > > mt->logical_depth0 = depth0;
> > > mt->aux_disable = (layout_flags & MIPTREE_LAYOUT_DISABLE_AUX) != 0 ?
> > >INTEL_AUX_DISABLE_ALL : INTEL_AUX_DISABLE_NONE;
> > > -   mt->aux_disable |= INTEL_AUX_DISABLE_CCS;
> > > mt->is_scanout = (layout_flags & MIPTREE_LAYOUT_FOR_SCANOUT) != 0;
> > > exec_list_make_empty(&mt->hiz_map);
> > > exec_list_make_empty(&mt->color_resolve_map);
> > > @@ -522,6 +521,8 @@ intel_miptree_create_layout(struct brw_context *brw,
> > > } else if (brw->gen >= 9 && num_samples > 1) {
> > >layout_flags |= MIPTREE_LAYOUT_FORCE_HALIGN16;
> > > } else {
> > > +  mt->aux_disable |= INTEL_AUX_DISABLE_CCS;
> > > +
> > >const UNUSED bool is_lossless_compressed_aux =
> > >   brw->gen >= 9 && num_samples == 1 &&
> > >   mt->format == MESA_FORMAT_R_UINT32;
> > > @@ -741,7 +742,6 @@ intel_miptree_create(struct brw_context *brw,
> > >  */
> > > if (intel_tiling_supports_non_msrt_mcs(brw, mt->tiling) &&
> > > intel_miptree_supports_non_msrt_fast_clear(brw, mt)) {
> > > -  mt->aux_disable &= ~INTEL_AUX_DISABLE_CCS;
> > >assert(brw->gen < 8 || mt->halign == 16 || num_samples <= 1);
> > > 
> > >/* On Gen9+ clients are not currently capable of consuming 
> > > compressed
> > > @@ -755,8 +755,11 @@ intel_miptree_create(struct brw_context *brw,
> > >   intel_miptree_supports_lossless_compressed(brw, mt);
> > > 
> > >if (is_lossless_compressed) {
> > > + mt->aux_disable &= ~INTEL_AUX_DISABLE_CCS;
> > 
> > Leftover?
> > 
> 
> I don't think so. Can you explain?

This is trying to remove the INTEL_AUX_DISABLE_CCS which isn't set anymore
by default (just as you say in the commit message). In other words, if I'm
not missing something we shouldn't end up here anymore with the flag set?

> 
> > >   intel_miptree_alloc_non_msrt_mcs(brw, mt, 
> > > is_lossless_compressed);
> > >}
> > > +   } else {
> > > +  mt->aux_disable |= INTEL_AUX_DISABLE_CCS;
> > > }
> > > 
> > > return mt;
> > > --
> > > 2.11.0
> > > 
> > > ___
> > > mesa-dev mailing list
> > > mesa-dev@lists.freedesktop.org
> > > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 2/3] Added support for disassembling SENDS and SENDSC.

2017-02-13 Thread Pohjolainen, Topi

On Mon, Feb 13, 2017 at 01:25:57PM +0200, Lonnberg, Toni wrote:
> ---
>  src/mesa/drivers/dri/i965/brw_disasm.c | 109 
> +++--
>  src/mesa/drivers/dri/i965/brw_inst.h   |  31 +-
>  2 files changed, 135 insertions(+), 5 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_disasm.c 
> b/src/mesa/drivers/dri/i965/brw_disasm.c
> index 01c649c..2026312 100644
> --- a/src/mesa/drivers/dri/i965/brw_disasm.c
> +++ b/src/mesa/drivers/dri/i965/brw_disasm.c
> @@ -723,6 +723,38 @@ dest(FILE *file, const struct gen_device_info *devinfo, 
> brw_inst *inst)
> unsigned elem_size = brw_element_size(devinfo, inst, dst);
> int err = 0;
>  
> +   if (brw_inst_opcode(devinfo, inst) == BRW_OPCODE_SENDS || 
> +   brw_inst_opcode(devinfo, inst) == BRW_OPCODE_SENDSC) {
> +  assert(devinfo->gen >= 9);
> +
> +  if (brw_inst_sends_dst_address_mode(devinfo, inst) == 
> BRW_ADDRESS_DIRECT) {
> + err |= reg(file, brw_inst_sends_dst_reg_file(devinfo, inst),
> +brw_inst_sends_dst_da_reg_nr(devinfo, inst));
> +
> + if (err == -1)
> +return 0;
> +
> + if (brw_inst_sends_dst_da_subreg_nr(devinfo, inst))
> +format(file, ".%"PRIu64, 
> brw_inst_sends_dst_da_subreg_nr(devinfo, inst) /

Just a style nitpick for any future revisions: put
brw_inst_sends_dst_da_subreg_nr() on next line to avoid overflowing. We try
to keep lines in 78 columns. Some people (including me) use 80 column wide
terminals and with git adding two formatting characters in front you still get
nice layout.

> +   elem_size);
> + string(file, "<1>");
> + err |= control(file, "dest reg encoding", reg_encoding,
> +brw_inst_sends_dst_reg_type(devinfo, inst), NULL);
> +  } else {
> + string(file, "g[a0");
> + if (brw_inst_sends_dst_ia_subreg_nr(devinfo, inst))
> +format(file, ".%"PRIu64, 
> brw_inst_sends_dst_ia_subreg_nr(devinfo, inst) /
> +   elem_size);
> + if (brw_inst_sends_dst_ia16_addr_imm(devinfo, inst))
> +format(file, " %d", brw_inst_sends_dst_ia16_addr_imm(devinfo, 
> inst));
> + string(file, "]<1>");
> + err |= control(file, "dest reg encoding", reg_encoding,
> +brw_inst_sends_dst_reg_type(devinfo, inst), NULL);
> +  }
> +
> +  return 0;
> +   }
> +
> if (brw_inst_access_mode(devinfo, inst) == BRW_ALIGN_1) {
>if (brw_inst_dst_address_mode(devinfo, inst) == BRW_ADDRESS_DIRECT) {
>   err |= reg(file, brw_inst_dst_reg_file(devinfo, inst),
> @@ -1067,6 +1099,41 @@ imm(FILE *file, const struct gen_device_info *devinfo, 
> unsigned type, brw_inst *
>  static int
>  src0(FILE *file, const struct gen_device_info *devinfo, brw_inst *inst)
>  {
> +   unsigned elem_size = brw_element_size(devinfo, inst, src0);
> +   int err = 0;
> +
> +   if (brw_inst_opcode(devinfo, inst) == BRW_OPCODE_SENDS || 
> +   brw_inst_opcode(devinfo, inst) == BRW_OPCODE_SENDSC) {
> +  assert(devinfo->gen >= 9);
> +
> +  if (brw_inst_sends_src0_address_mode(devinfo, inst) == 
> BRW_ADDRESS_DIRECT) {
> + err |= reg(file, BRW_GENERAL_REGISTER_FILE,
> +brw_inst_sends_src0_da_reg_nr(devinfo, inst));
> +
> + if (err == -1)
> +return 0;
> +
> + if (brw_inst_sends_src0_da_subreg_nr(devinfo, inst))
> +format(file, ".%"PRIu64, 
> brw_inst_sends_src0_da_subreg_nr(devinfo, inst) /
> +   elem_size);
> + string(file, "<1>");
> + err |= control(file, "dest reg encoding", reg_encoding,
> +brw_inst_sends_dst_reg_type(devinfo, inst), NULL);
> +  } else {
> + string(file, "g[a0");
> + if (brw_inst_sends_src0_ia_subreg_nr(devinfo, inst))
> +format(file, ".%"PRIu64, 
> brw_inst_sends_src0_ia_subreg_nr(devinfo, inst) /
> +   elem_size);
> + if (brw_inst_sends_src0_ia16_addr_imm(devinfo, inst))
> +format(file, " %d", brw_inst_sends_src0_ia16_addr_imm(devinfo, 
> inst));
> + string(file, "]<1>");
> + err |= control(file, "dest reg encoding", reg_encoding,
> +brw_inst_sends_dst_reg_type(devinfo, inst), NULL);
> +  }
> +
> +  return 0;
> +   }
> +
> if (brw_inst_src0_reg_file(devinfo, inst) == BRW_IMMEDIATE_VALUE) {
>return imm(file, devinfo, brw_inst_src0_reg_type(devinfo, inst), inst);
> } else if (brw_inst_access_mode(devinfo, inst) == BRW_ALIGN_1) {
> @@ -1123,6 +1190,22 @@ src0(FILE *file, const struct gen_device_info 
> *devinfo, brw_inst *inst)
>  static int
>  src1(FILE *file, const struct gen_device_info *devinfo, brw_inst *inst)
>  {
> +   int err = 0;
> +
> +   if (brw_inst_opcode(devinfo, inst) == BRW_OPCODE_SENDS || 
> +   brw_inst_opcode(devinfo, inst) == BRW_OPCODE_SENDSC) {
> +  assert(devinfo->gen >= 9);
> +
> +  err |=

Re: [Mesa-dev] [PATCH 1/2] isl: Return surface creation success from aux helpers

2017-02-19 Thread Pohjolainen, Topi

On Fri, Feb 17, 2017 at 04:03:48PM -0800, Jason Ekstrand wrote:
> The isl_surf_init call that each of these helpers make can, in theory,
> fail.  We should propagate that up to the caller rather than just
> silently ignoring it.

Reviewed-by: Topi Pohjolainen 

> ---
>  src/intel/isl/isl.c  | 72 
> +---
>  src/intel/isl/isl.h  |  4 +--
>  src/intel/vulkan/anv_image.c |  5 +--
>  3 files changed, 40 insertions(+), 41 deletions(-)
> 
> diff --git a/src/intel/isl/isl.c b/src/intel/isl/isl.c
> index 82ab68d..1a47da5 100644
> --- a/src/intel/isl/isl.c
> +++ b/src/intel/isl/isl.c
> @@ -1323,7 +1323,7 @@ isl_surf_get_tile_info(const struct isl_device *dev,
> isl_tiling_get_info(dev, surf->tiling, fmtl->bpb, tile_info);
>  }
>  
> -void
> +bool
>  isl_surf_get_hiz_surf(const struct isl_device *dev,
>const struct isl_surf *surf,
>struct isl_surf *hiz_surf)
> @@ -1391,20 +1391,20 @@ isl_surf_get_hiz_surf(const struct isl_device *dev,
>  */
> const unsigned samples = ISL_DEV_GEN(dev) >= 9 ? 1 : surf->samples;
>  
> -   isl_surf_init(dev, hiz_surf,
> - .dim = surf->dim,
> - .format = ISL_FORMAT_HIZ,
> - .width = surf->logical_level0_px.width,
> - .height = surf->logical_level0_px.height,
> - .depth = surf->logical_level0_px.depth,
> - .levels = surf->levels,
> - .array_len = surf->logical_level0_px.array_len,
> - .samples = samples,
> - .usage = ISL_SURF_USAGE_HIZ_BIT,
> - .tiling_flags = ISL_TILING_HIZ_BIT);
> +   return isl_surf_init(dev, hiz_surf,
> +.dim = surf->dim,
> +.format = ISL_FORMAT_HIZ,
> +.width = surf->logical_level0_px.width,
> +.height = surf->logical_level0_px.height,
> +.depth = surf->logical_level0_px.depth,
> +.levels = surf->levels,
> +.array_len = surf->logical_level0_px.array_len,
> +.samples = samples,
> +.usage = ISL_SURF_USAGE_HIZ_BIT,
> +.tiling_flags = ISL_TILING_HIZ_BIT);
>  }
>  
> -void
> +bool
>  isl_surf_get_mcs_surf(const struct isl_device *dev,
>const struct isl_surf *surf,
>struct isl_surf *mcs_surf)
> @@ -1427,17 +1427,17 @@ isl_surf_get_mcs_surf(const struct isl_device *dev,
>unreachable("Invalid sample count");
> }
>  
> -   isl_surf_init(dev, mcs_surf,
> - .dim = ISL_SURF_DIM_2D,
> - .format = mcs_format,
> - .width = surf->logical_level0_px.width,
> - .height = surf->logical_level0_px.height,
> - .depth = 1,
> - .levels = 1,
> - .array_len = surf->logical_level0_px.array_len,
> - .samples = 1, /* MCS surfaces are really single-sampled */
> - .usage = ISL_SURF_USAGE_MCS_BIT,
> - .tiling_flags = ISL_TILING_Y0_BIT);
> +   return isl_surf_init(dev, mcs_surf,
> +.dim = ISL_SURF_DIM_2D,
> +.format = mcs_format,
> +.width = surf->logical_level0_px.width,
> +.height = surf->logical_level0_px.height,
> +.depth = 1,
> +.levels = 1,
> +.array_len = surf->logical_level0_px.array_len,
> +.samples = 1, /* MCS surfaces are really 
> single-sampled */
> +.usage = ISL_SURF_USAGE_MCS_BIT,
> +.tiling_flags = ISL_TILING_Y0_BIT);
>  }
>  
>  bool
> @@ -1491,19 +1491,17 @@ isl_surf_get_ccs_surf(const struct isl_device *dev,
>return false;
> }
>  
> -   isl_surf_init(dev, ccs_surf,
> - .dim = surf->dim,
> - .format = ccs_format,
> - .width = surf->logical_level0_px.width,
> - .height = surf->logical_level0_px.height,
> - .depth = surf->logical_level0_px.depth,
> - .levels = surf->levels,
> - .array_len = surf->logical_level0_px.array_len,
> - .samples = 1,
> - .usage = ISL_SURF_USAGE_CCS_BIT,
> - .tiling_flags = ISL_TILING_CCS_BIT);
> -
> -   return true;
> +   return isl_surf_init(dev, ccs_surf,
> +.dim = surf->dim,
> +.format = ccs_format,
> +.width = surf->logical_level0_px.width,
> +.height = surf->logical_level0_px.height,
> +.depth = surf->logical_level0_px.depth,
> +.levels = surf->levels,
> +.array_le

Re: [Mesa-dev] i965: On-demand render target flushing

2017-02-27 Thread Pohjolainen, Topi

On Fri, Feb 17, 2017 at 09:32:03PM +0200, Topi Pohjolainen wrote:
> Currently:
> 
> 1) Blorp color clears and resolves emit unconditional render target
>flush + command stream after every clear/resolve (including
>regular non-fast clears).
> 
> 2) Blorp color clears, resolves and blits emit texture and constant
>cache resolves even in case only destination is dirty. This is
>because brw_render_cache_set_check_flush() does both render target
>flush as well as the top-of-pipe read cache flushes.
> 
> 3) Similarly to item 2, 3D and compute paths also flush texture and
>constant caches even if none of the texture surfaces are dirty.
> 
> 4) In case of multiple surfaces needing resolves, all render paths
>(blorp, 3D and compute) emit render target, texture and constant
>cache flushes after each resolve instead of just once after all
>resolves.
> 
> This series addresses all four cases. Good news are that even though
> the current setup isn't optimal, it doesn't actually get any better in
> most cases performance wise. There is modest gain in OglDrvRes which
> does heavy blorp blitting. I'm expecting this series also to make
> blorp tex uploads and blorp mipmap generation more competitive.
> 
> Bad news are in the final patch - it looks that current unconditional
> flushing/stalling has been hiding bugs elsewhere. There are cases
> which rely on the flushes after non-fast clears. Hunting the real
> cause is, however, difficult. I only saw them in CI system within
> full runs and was not able to reproduce them myself.

With the added patch disabling deferred allocation of mcs for CCS_E,
the last patch gives:

  OglDrvRes:   4.30768% +/- 0.183354%
  OglDrvState: 3.16878% +/- 0.769695%

This is mostly due to blorp blits flushing less than before.

This series also improves blorp tex upload. I'll send some updates
to that series.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [v2 4/9] i965: Estimate batch space per shader stage

2017-02-28 Thread Pohjolainen, Topi


Sorry for the typo in the subject, this is version three.

On Tue, Feb 28, 2017 at 10:02:42AM +0200, Topi Pohjolainen wrote:
> Current estimate doesn't consider space needed for surface states
> and it only calculates for one shader stage. Each stage can have
> its own sampler and surface state configuration.
> 
> While this is only matter of runtime dynamics we don't seem to hit
> it currently. However, this becomes visible with blorp tex uploads
> (HSW with piglit test max-samplers). One runs out of space while
> batch wrapping isn't allowed.
> 
> v2: Rebase on top of current upstream
> v3: Take binding table alignment into account (Jason)
> 
> Signed-off-by: Topi Pohjolainen 
> CC: Kenneth Graunke 
> CC: Jason Ekstrand 
> ---
>  src/mesa/drivers/dri/i965/brw_draw.c | 53 
> ++--
>  1 file changed, 50 insertions(+), 3 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_draw.c 
> b/src/mesa/drivers/dri/i965/brw_draw.c
> index 6ca6a7a..dba8603 100644
> --- a/src/mesa/drivers/dri/i965/brw_draw.c
> +++ b/src/mesa/drivers/dri/i965/brw_draw.c
> @@ -395,6 +395,55 @@ brw_postdraw_set_buffers_need_resolve(struct brw_context 
> *brw)
> }
>  }
>  
> +static unsigned
> +brw_get_num_active_samplers(const struct gl_context *ctx,
> +const struct gl_program *prog)
> +{
> +   const unsigned last = util_last_bit(prog->SamplersUsed);
> +   unsigned count = 0;
> +
> +   for (unsigned s = 0; s < last; s++) {
> +  if (prog->SamplersUsed & (1 << s)) {
> + const unsigned unit = prog->SamplerUnits[s];
> + if (ctx->Texture.Unit[unit]._Current)
> +++count;
> +  }
> +   }
> +
> +   return count;
> +}
> +
> +static unsigned
> +brw_estimate_batch_space_for_textures(const struct brw_context *brw)
> +{
> +   const struct gl_context *ctx = &brw->ctx;
> +   unsigned total = 0;
> +
> +   for (unsigned i = 0; i < MESA_SHADER_STAGES; i++) {
> +  const struct gl_program *prog = ctx->_Shader->CurrentProgram[i];
> +
> +  if (prog == NULL)
> + continue;
> +
> +  const unsigned num_samplers = brw_get_num_active_samplers(ctx, prog);
> +  const unsigned sampler_needs_per_tex_unit =
> + 16 /* sampler_state_size */ +
> + sizeof(struct gen5_sampler_default_color);
> +  const unsigned surface_state_needs_per_tex_unit =
> + ALIGN(brw->isl_dev.ss.size, brw->isl_dev.ss.align);
> +  const unsigned total_per_tex_unit = sampler_needs_per_tex_unit +
> +  surface_state_needs_per_tex_unit;
> +  total += (num_samplers * total_per_tex_unit);
> +
> +  /* Also consider space and alignment needed for binding table pointers,
> +   * each taking four bytes.
> +   */
> +  total += ALIGN(num_samplers * 4, 64);
> +   }
> +
> +   return total;
> +}
> +
>  /* May fail if out of video memory for texture or vbo upload, or on
>   * fallback conditions.
>   */
> @@ -466,11 +515,9 @@ brw_try_draw_prims(struct gl_context *ctx,
>  
> for (i = 0; i < nr_prims; i++) {
>int estimated_max_prim_size;
> -  const int sampler_state_size = 16;
>  
>estimated_max_prim_size = 512; /* batchbuffer commands */
> -  estimated_max_prim_size += BRW_MAX_TEX_UNIT *
> - (sampler_state_size + sizeof(struct gen5_sampler_default_color));
> +  estimated_max_prim_size += brw_estimate_batch_space_for_textures(brw);
>estimated_max_prim_size += 1024; /* gen6 VS push constants */
>estimated_max_prim_size += 1024; /* gen6 WM push constants */
>estimated_max_prim_size += 512; /* misc. pad */
> -- 
> 2.9.3
> 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 14/20] i965: Implement logic to set up and upload an image uniform.

2015-08-05 Thread Pohjolainen, Topi

On Tue, Jul 21, 2015 at 07:38:49PM +0300, Francisco Jerez wrote:
> v2: Move the image_params array back to brw_stage_prog_data.
> ---
>  src/mesa/drivers/dri/i965/brw_shader.cpp | 31 +++
>  src/mesa/drivers/dri/i965/brw_shader.h   |  1 +
>  2 files changed, 32 insertions(+)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_shader.cpp 
> b/src/mesa/drivers/dri/i965/brw_shader.cpp
> index 24bf42d..f7186a4 100644
> --- a/src/mesa/drivers/dri/i965/brw_shader.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_shader.cpp
> @@ -1402,3 +1402,34 @@ 
> backend_shader::assign_common_binding_table_offsets(uint32_t 
> next_binding_table_
>  
> /* prog_data->base.binding_table.size will be set by 
> brw_mark_surface_used. */
>  }
> +
> +void
> +backend_shader::setup_image_uniform_values(const gl_uniform_storage *storage)
> +{
> +   const unsigned stage = _mesa_program_enum_to_shader_stage(prog->Target);
> +
> +   for (unsigned i = 0; i < MAX2(storage->array_elements, 1); i++) {
> +  const unsigned image_idx = storage->image[stage].index + i;
> +  const brw_image_param *param = 
> &stage_prog_data->image_param[image_idx];
> +
> +  /* Upload the brw_image_param structure.  The order is expected to 
> match
> +   * the BRW_IMAGE_PARAM_*_OFFSET defines.
> +   */
> +  setup_vector_uniform_values(
> + (const gl_constant_value *)¶m->surface_idx, 1);
> +  setup_vector_uniform_values(
> + (const gl_constant_value *)param->offset, 2);
> +  setup_vector_uniform_values(
> + (const gl_constant_value *)param->size, 3);
> +  setup_vector_uniform_values(
> + (const gl_constant_value *)param->stride, 4);
> +  setup_vector_uniform_values(
> + (const gl_constant_value *)param->tiling, 3);
> +  setup_vector_uniform_values(
> + (const gl_constant_value *)param->swizzling, 2);
> +

I need to understand the concept of image index before I can tell how this
works.

But I checked that the order and dimensions of the individual fields match the
BRW_IMAGE_PARAM_* defines and the members of "struct brw_image_param". I
noticed that in the structure the member "size" is the second while we give
it as the third for the hw. Might be worth keep the order consistent between
the two just for clarity. (Of course it works as is as the compiled program
and the ubo-layout match).

> +  brw_mark_surface_used(
> + stage_prog_data,
> + stage_prog_data->binding_table.image_start + image_idx);
> +   }
> +}
> diff --git a/src/mesa/drivers/dri/i965/brw_shader.h 
> b/src/mesa/drivers/dri/i965/brw_shader.h
> index 925072f..2cc97f2 100644
> --- a/src/mesa/drivers/dri/i965/brw_shader.h
> +++ b/src/mesa/drivers/dri/i965/brw_shader.h
> @@ -272,6 +272,7 @@ public:
>  
> virtual void setup_vector_uniform_values(const gl_constant_value *values,
>  unsigned n) = 0;
> +   void setup_image_uniform_values(const gl_uniform_storage *storage);
>  };
>  
>  uint32_t brw_texture_offset(int *offsets, unsigned num_components);
> -- 
> 2.4.3
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 14/20] i965: Implement logic to set up and upload an image uniform.

2015-08-05 Thread Pohjolainen, Topi

On Wed, Aug 05, 2015 at 10:36:09AM +0300, Pohjolainen, Topi wrote:
> On Tue, Jul 21, 2015 at 07:38:49PM +0300, Francisco Jerez wrote:
> > v2: Move the image_params array back to brw_stage_prog_data.
> > ---
> >  src/mesa/drivers/dri/i965/brw_shader.cpp | 31 
> > +++
> >  src/mesa/drivers/dri/i965/brw_shader.h   |  1 +
> >  2 files changed, 32 insertions(+)
> > 
> > diff --git a/src/mesa/drivers/dri/i965/brw_shader.cpp 
> > b/src/mesa/drivers/dri/i965/brw_shader.cpp
> > index 24bf42d..f7186a4 100644
> > --- a/src/mesa/drivers/dri/i965/brw_shader.cpp
> > +++ b/src/mesa/drivers/dri/i965/brw_shader.cpp
> > @@ -1402,3 +1402,34 @@ 
> > backend_shader::assign_common_binding_table_offsets(uint32_t 
> > next_binding_table_
> >  
> > /* prog_data->base.binding_table.size will be set by 
> > brw_mark_surface_used. */
> >  }
> > +
> > +void
> > +backend_shader::setup_image_uniform_values(const gl_uniform_storage 
> > *storage)
> > +{
> > +   const unsigned stage = _mesa_program_enum_to_shader_stage(prog->Target);
> > +
> > +   for (unsigned i = 0; i < MAX2(storage->array_elements, 1); i++) {
> > +  const unsigned image_idx = storage->image[stage].index + i;
> > +  const brw_image_param *param = 
> > &stage_prog_data->image_param[image_idx];
> > +
> > +  /* Upload the brw_image_param structure.  The order is expected to 
> > match
> > +   * the BRW_IMAGE_PARAM_*_OFFSET defines.
> > +   */
> > +  setup_vector_uniform_values(
> > + (const gl_constant_value *)¶m->surface_idx, 1);
> > +  setup_vector_uniform_values(
> > + (const gl_constant_value *)param->offset, 2);
> > +  setup_vector_uniform_values(
> > + (const gl_constant_value *)param->size, 3);
> > +  setup_vector_uniform_values(
> > + (const gl_constant_value *)param->stride, 4);
> > +  setup_vector_uniform_values(
> > + (const gl_constant_value *)param->tiling, 3);
> > +  setup_vector_uniform_values(
> > + (const gl_constant_value *)param->swizzling, 2);
> > +
> 
> I need to understand the concept of image index before I can tell how this
> works.

The mechanism looks fine to me:

Reviewed-by: Topi Pohjolainen 

> 
> But I checked that the order and dimensions of the individual fields match the
> BRW_IMAGE_PARAM_* defines and the members of "struct brw_image_param". I
> noticed that in the structure the member "size" is the second while we give
> it as the third for the hw. Might be worth keep the order consistent between
> the two just for clarity. (Of course it works as is as the compiled program
> and the ubo-layout match).
> 
> > +  brw_mark_surface_used(
> > + stage_prog_data,
> > + stage_prog_data->binding_table.image_start + image_idx);
> > +   }
> > +}
> > diff --git a/src/mesa/drivers/dri/i965/brw_shader.h 
> > b/src/mesa/drivers/dri/i965/brw_shader.h
> > index 925072f..2cc97f2 100644
> > --- a/src/mesa/drivers/dri/i965/brw_shader.h
> > +++ b/src/mesa/drivers/dri/i965/brw_shader.h
> > @@ -272,6 +272,7 @@ public:
> >  
> > virtual void setup_vector_uniform_values(const gl_constant_value 
> > *values,
> >  unsigned n) = 0;
> > +   void setup_image_uniform_values(const gl_uniform_storage *storage);
> >  };
> >  
> >  uint32_t brw_texture_offset(int *offsets, unsigned num_components);
> > -- 
> > 2.4.3
> > 
> > ___
> > mesa-dev mailing list
> > mesa-dev@lists.freedesktop.org
> > http://lists.freedesktop.org/mailman/listinfo/mesa-dev
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCHv2 08/14] i965: Define and initialize image parameter structure.

2015-08-05 Thread Pohjolainen, Topi

On Mon, Jul 20, 2015 at 07:17:48PM +0300, Francisco Jerez wrote:
> This will be used to pass image meta-data to the shader when we cannot
> use typed surface reads and writes.  All entries except surface_idx
> and size are otherwise unused and will get eliminated by the uniform
> packing pass.  size will be used for bounds checking with some image
> formats and will be useful for ARB_shader_image_size too.  surface_idx
> is always used.
> 
> v2: Add CS support.  Move the image_params array back to
> brw_stage_prog_data.
> ---
> I'm resending this (and also patches 9 and 10) because I had to make
> some rather intrusive changes during one of my last rebases -- The
> image_param array is now part of brw_stage_prog_data again instead of
> brw_stage_state (ironically as it was in my very first submission of
> these patches) because the compiler no longer has access to
> brw_stage_state since the brw_context pointer was removed from the
> visitors.
> 
>  src/mesa/drivers/dri/i965/brw_context.h  | 54 
>  src/mesa/drivers/dri/i965/brw_cs.cpp |  3 +
>  src/mesa/drivers/dri/i965/brw_gs.c   |  3 +
>  src/mesa/drivers/dri/i965/brw_vs.c   |  5 +-
>  src/mesa/drivers/dri/i965/brw_wm.c   |  4 ++
>  src/mesa/drivers/dri/i965/brw_wm_surface_state.c | 82 
> 
>  6 files changed, 150 insertions(+), 1 deletion(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_context.h 
> b/src/mesa/drivers/dri/i965/brw_context.h
> index e16ad10..9ebad5b 100644
> --- a/src/mesa/drivers/dri/i965/brw_context.h
> +++ b/src/mesa/drivers/dri/i965/brw_context.h
> @@ -361,6 +361,7 @@ struct brw_stage_prog_data {
>  
> GLuint nr_params;   /**< number of float params/constants */
> GLuint nr_pull_params;
> +   unsigned nr_image_params;
>  
> unsigned curb_read_length;
> unsigned total_scratch;
> @@ -381,6 +382,59 @@ struct brw_stage_prog_data {
>  */
> const gl_constant_value **param;
> const gl_constant_value **pull_param;
> +
> +   /**
> +* Image metadata passed to the shader as uniforms.  This is deliberately
> +* ignored by brw_stage_prog_data_compare() because its contents don't 
> have
> +* any influence on program compilation.
> +*/
> +   struct brw_image_param *image_param;
> +};
> +
> +/*
> + * Image metadata structure as laid out in the shader parameter
> + * buffer.  Entries have to be 16B-aligned for the vec4 back-end to be
> + * able to use them.  That's okay because the padding and any unused
> + * entries [most of them except when we're doing untyped surface
> + * access] will be removed by the uniform packing pass.
> + */
> +#define BRW_IMAGE_PARAM_SURFACE_IDX_OFFSET  0
> +#define BRW_IMAGE_PARAM_OFFSET_OFFSET   4
> +#define BRW_IMAGE_PARAM_SIZE_OFFSET 8
> +#define BRW_IMAGE_PARAM_STRIDE_OFFSET   12
> +#define BRW_IMAGE_PARAM_TILING_OFFSET   16
> +#define BRW_IMAGE_PARAM_SWIZZLING_OFFSET20
> +#define BRW_IMAGE_PARAM_SIZE24
> +
> +struct brw_image_param {
> +   /** Surface binding table index. */
> +   uint32_t surface_idx;
> +
> +   /** Surface X, Y and Z dimensions. */
> +   uint32_t size[3];

Like I mentioned in one of the subsequent patches, it would clearer if
"size" would follow "offset". That way it matches the order of
BRW_IMAGE_PARAM_*_OFFSET defines and later on the layout in the uniform
buffer object.

> +
> +   /** Offset applied to the X and Y surface coordinates. */
> +   uint32_t offset[2];
> +
> +   /** X-stride in bytes, Y-stride in bytes, horizontal slice stride in
> +* pixels, vertical slice stride in pixels.
> +*/
> +   uint32_t stride[4];
> +
> +   /** Log2 of the tiling modulus in the X, Y and Z dimension. */
> +   uint32_t tiling[3];
> +
> +   /**
> +* Right shift to apply for bit 6 address swizzling.  Two different
> +* swizzles can be specified and will be applied one after the other.  The
> +* resulting address will be:
> +*
> +*  addr' = addr ^ ((1 << 6) & ((addr >> swizzling[0]) ^
> +*  (addr >> swizzling[1])))
> +*
> +* Use \c 0xff if any of the swizzles is not required.
> +*/
> +   uint32_t swizzling[2];
>  };
>  
>  /* Data about a particular attempt to compile a program.  Note that
> diff --git a/src/mesa/drivers/dri/i965/brw_cs.cpp 
> b/src/mesa/drivers/dri/i965/brw_cs.cpp
> index d61bba0..144aa27 100644
> --- a/src/mesa/drivers/dri/i965/brw_cs.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_cs.cpp
> @@ -190,7 +190,10 @@ brw_codegen_cs_prog(struct brw_context *brw,
>rzalloc_array(NULL, const gl_constant_value *, param_count);
> prog_data.base.pull_param =
>rzalloc_array(NULL, const gl_constant_value *, param_count);
> +   prog_data.base.image_param =
> +  rzalloc_array(NULL, struct brw_image_param, cs->NumImages);
> prog_data.base.nr_params = param_count;
> +   prog_data.base.nr_image_params = cs->N

Re: [Mesa-dev] [PATCHv2 08/14] i965: Define and initialize image parameter structure.

2015-08-05 Thread Pohjolainen, Topi

On Wed, Aug 05, 2015 at 12:11:02PM +0300, Pohjolainen, Topi wrote:
> On Mon, Jul 20, 2015 at 07:17:48PM +0300, Francisco Jerez wrote:
> > This will be used to pass image meta-data to the shader when we cannot
> > use typed surface reads and writes.  All entries except surface_idx
> > and size are otherwise unused and will get eliminated by the uniform
> > packing pass.  size will be used for bounds checking with some image
> > formats and will be useful for ARB_shader_image_size too.  surface_idx
> > is always used.
> > 
> > v2: Add CS support.  Move the image_params array back to
> > brw_stage_prog_data.
> > ---
> > I'm resending this (and also patches 9 and 10) because I had to make
> > some rather intrusive changes during one of my last rebases -- The
> > image_param array is now part of brw_stage_prog_data again instead of
> > brw_stage_state (ironically as it was in my very first submission of
> > these patches) because the compiler no longer has access to
> > brw_stage_state since the brw_context pointer was removed from the
> > visitors.
> > 
> >  src/mesa/drivers/dri/i965/brw_context.h  | 54 
> >  src/mesa/drivers/dri/i965/brw_cs.cpp |  3 +
> >  src/mesa/drivers/dri/i965/brw_gs.c   |  3 +
> >  src/mesa/drivers/dri/i965/brw_vs.c   |  5 +-
> >  src/mesa/drivers/dri/i965/brw_wm.c   |  4 ++
> >  src/mesa/drivers/dri/i965/brw_wm_surface_state.c | 82 
> > 
> >  6 files changed, 150 insertions(+), 1 deletion(-)
> > 
> > diff --git a/src/mesa/drivers/dri/i965/brw_context.h 
> > b/src/mesa/drivers/dri/i965/brw_context.h
> > index e16ad10..9ebad5b 100644
> > --- a/src/mesa/drivers/dri/i965/brw_context.h
> > +++ b/src/mesa/drivers/dri/i965/brw_context.h
> > @@ -361,6 +361,7 @@ struct brw_stage_prog_data {
> >  
> > GLuint nr_params;   /**< number of float params/constants */
> > GLuint nr_pull_params;
> > +   unsigned nr_image_params;
> >  
> > unsigned curb_read_length;
> > unsigned total_scratch;
> > @@ -381,6 +382,59 @@ struct brw_stage_prog_data {
> >  */
> > const gl_constant_value **param;
> > const gl_constant_value **pull_param;
> > +
> > +   /**
> > +* Image metadata passed to the shader as uniforms.  This is 
> > deliberately
> > +* ignored by brw_stage_prog_data_compare() because its contents don't 
> > have
> > +* any influence on program compilation.
> > +*/
> > +   struct brw_image_param *image_param;
> > +};
> > +
> > +/*
> > + * Image metadata structure as laid out in the shader parameter
> > + * buffer.  Entries have to be 16B-aligned for the vec4 back-end to be
> > + * able to use them.  That's okay because the padding and any unused
> > + * entries [most of them except when we're doing untyped surface
> > + * access] will be removed by the uniform packing pass.
> > + */
> > +#define BRW_IMAGE_PARAM_SURFACE_IDX_OFFSET  0
> > +#define BRW_IMAGE_PARAM_OFFSET_OFFSET   4
> > +#define BRW_IMAGE_PARAM_SIZE_OFFSET 8
> > +#define BRW_IMAGE_PARAM_STRIDE_OFFSET   12
> > +#define BRW_IMAGE_PARAM_TILING_OFFSET   16
> > +#define BRW_IMAGE_PARAM_SWIZZLING_OFFSET20
> > +#define BRW_IMAGE_PARAM_SIZE24
> > +
> > +struct brw_image_param {
> > +   /** Surface binding table index. */
> > +   uint32_t surface_idx;
> > +
> > +   /** Surface X, Y and Z dimensions. */
> > +   uint32_t size[3];
> 
> Like I mentioned in one of the subsequent patches, it would clearer if
> "size" would follow "offset". That way it matches the order of
> BRW_IMAGE_PARAM_*_OFFSET defines and later on the layout in the uniform
> buffer object.
> 
> > +
> > +   /** Offset applied to the X and Y surface coordinates. */
> > +   uint32_t offset[2];
> > +
> > +   /** X-stride in bytes, Y-stride in bytes, horizontal slice stride in
> > +* pixels, vertical slice stride in pixels.
> > +*/
> > +   uint32_t stride[4];
> > +
> > +   /** Log2 of the tiling modulus in the X, Y and Z dimension. */
> > +   uint32_t tiling[3];
> > +
> > +   /**
> > +* Right shift to apply for bit 6 address swizzling.  Two different
> > +* swizzles can be specified and will be applied one after the other.  
> > The
> > +* resulting address will be:
> > +*
> > +*  addr' = addr ^ ((1 << 6) &a

Re: [Mesa-dev] [PATCHv2 08/14] i965: Define and initialize image parameter structure.

2015-08-05 Thread Pohjolainen, Topi

On Mon, Jul 20, 2015 at 07:17:48PM +0300, Francisco Jerez wrote:
> This will be used to pass image meta-data to the shader when we cannot
> use typed surface reads and writes.  All entries except surface_idx
> and size are otherwise unused and will get eliminated by the uniform
> packing pass.  size will be used for bounds checking with some image
> formats and will be useful for ARB_shader_image_size too.  surface_idx
> is always used.
> 
> v2: Add CS support.  Move the image_params array back to
> brw_stage_prog_data.
> ---
> I'm resending this (and also patches 9 and 10) because I had to make
> some rather intrusive changes during one of my last rebases -- The
> image_param array is now part of brw_stage_prog_data again instead of
> brw_stage_state (ironically as it was in my very first submission of
> these patches) because the compiler no longer has access to
> brw_stage_state since the brw_context pointer was removed from the
> visitors.
> 
>  src/mesa/drivers/dri/i965/brw_context.h  | 54 
>  src/mesa/drivers/dri/i965/brw_cs.cpp |  3 +
>  src/mesa/drivers/dri/i965/brw_gs.c   |  3 +
>  src/mesa/drivers/dri/i965/brw_vs.c   |  5 +-
>  src/mesa/drivers/dri/i965/brw_wm.c   |  4 ++
>  src/mesa/drivers/dri/i965/brw_wm_surface_state.c | 82 
> 
>  6 files changed, 150 insertions(+), 1 deletion(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_context.h 
> b/src/mesa/drivers/dri/i965/brw_context.h
> index e16ad10..9ebad5b 100644
> --- a/src/mesa/drivers/dri/i965/brw_context.h
> +++ b/src/mesa/drivers/dri/i965/brw_context.h
> @@ -361,6 +361,7 @@ struct brw_stage_prog_data {
>  
> GLuint nr_params;   /**< number of float params/constants */
> GLuint nr_pull_params;
> +   unsigned nr_image_params;
>  
> unsigned curb_read_length;
> unsigned total_scratch;
> @@ -381,6 +382,59 @@ struct brw_stage_prog_data {
>  */
> const gl_constant_value **param;
> const gl_constant_value **pull_param;
> +
> +   /**
> +* Image metadata passed to the shader as uniforms.  This is deliberately
> +* ignored by brw_stage_prog_data_compare() because its contents don't 
> have
> +* any influence on program compilation.
> +*/
> +   struct brw_image_param *image_param;
> +};
> +
> +/*
> + * Image metadata structure as laid out in the shader parameter
> + * buffer.  Entries have to be 16B-aligned for the vec4 back-end to be
> + * able to use them.  That's okay because the padding and any unused
> + * entries [most of them except when we're doing untyped surface
> + * access] will be removed by the uniform packing pass.
> + */
> +#define BRW_IMAGE_PARAM_SURFACE_IDX_OFFSET  0
> +#define BRW_IMAGE_PARAM_OFFSET_OFFSET   4
> +#define BRW_IMAGE_PARAM_SIZE_OFFSET 8
> +#define BRW_IMAGE_PARAM_STRIDE_OFFSET   12
> +#define BRW_IMAGE_PARAM_TILING_OFFSET   16
> +#define BRW_IMAGE_PARAM_SWIZZLING_OFFSET20
> +#define BRW_IMAGE_PARAM_SIZE24
> +
> +struct brw_image_param {
> +   /** Surface binding table index. */
> +   uint32_t surface_idx;
> +
> +   /** Surface X, Y and Z dimensions. */
> +   uint32_t size[3];
> +
> +   /** Offset applied to the X and Y surface coordinates. */
> +   uint32_t offset[2];
> +
> +   /** X-stride in bytes, Y-stride in bytes, horizontal slice stride in
> +* pixels, vertical slice stride in pixels.
> +*/
> +   uint32_t stride[4];
> +
> +   /** Log2 of the tiling modulus in the X, Y and Z dimension. */
> +   uint32_t tiling[3];
> +
> +   /**
> +* Right shift to apply for bit 6 address swizzling.  Two different
> +* swizzles can be specified and will be applied one after the other.  The
> +* resulting address will be:
> +*
> +*  addr' = addr ^ ((1 << 6) & ((addr >> swizzling[0]) ^
> +*  (addr >> swizzling[1])))
> +*
> +* Use \c 0xff if any of the swizzles is not required.
> +*/
> +   uint32_t swizzling[2];

I couldn't find any patches actually reading the swizzle values (only found
writes to it). Hence I checked your branch "image-load-store-lower" and still
couldn't find any users for it. Could you help me?
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 14/20] i965: Implement logic to set up and upload an image uniform.

2015-08-06 Thread Pohjolainen, Topi

On Wed, Aug 05, 2015 at 01:47:26PM +0300, Francisco Jerez wrote:
> "Pohjolainen, Topi"  writes:
> 
> > On Wed, Aug 05, 2015 at 10:36:09AM +0300, Pohjolainen, Topi wrote:
> >> On Tue, Jul 21, 2015 at 07:38:49PM +0300, Francisco Jerez wrote:
> >> > v2: Move the image_params array back to brw_stage_prog_data.
> >> > ---
> >> >  src/mesa/drivers/dri/i965/brw_shader.cpp | 31 
> >> > +++
> >> >  src/mesa/drivers/dri/i965/brw_shader.h   |  1 +
> >> >  2 files changed, 32 insertions(+)
> >> > 
> >> > diff --git a/src/mesa/drivers/dri/i965/brw_shader.cpp 
> >> > b/src/mesa/drivers/dri/i965/brw_shader.cpp
> >> > index 24bf42d..f7186a4 100644
> >> > --- a/src/mesa/drivers/dri/i965/brw_shader.cpp
> >> > +++ b/src/mesa/drivers/dri/i965/brw_shader.cpp
> >> > @@ -1402,3 +1402,34 @@ 
> >> > backend_shader::assign_common_binding_table_offsets(uint32_t 
> >> > next_binding_table_
> >> >  
> >> > /* prog_data->base.binding_table.size will be set by 
> >> > brw_mark_surface_used. */
> >> >  }
> >> > +
> >> > +void
> >> > +backend_shader::setup_image_uniform_values(const gl_uniform_storage 
> >> > *storage)
> >> > +{
> >> > +   const unsigned stage = 
> >> > _mesa_program_enum_to_shader_stage(prog->Target);
> >> > +
> >> > +   for (unsigned i = 0; i < MAX2(storage->array_elements, 1); i++) {
> >> > +  const unsigned image_idx = storage->image[stage].index + i;
> >> > +  const brw_image_param *param = 
> >> > &stage_prog_data->image_param[image_idx];
> >> > +
> >> > +  /* Upload the brw_image_param structure.  The order is expected 
> >> > to match
> >> > +   * the BRW_IMAGE_PARAM_*_OFFSET defines.
> >> > +   */
> >> > +  setup_vector_uniform_values(
> >> > + (const gl_constant_value *)¶m->surface_idx, 1);
> >> > +  setup_vector_uniform_values(
> >> > + (const gl_constant_value *)param->offset, 2);
> >> > +  setup_vector_uniform_values(
> >> > + (const gl_constant_value *)param->size, 3);
> >> > +  setup_vector_uniform_values(
> >> > + (const gl_constant_value *)param->stride, 4);
> >> > +  setup_vector_uniform_values(
> >> > + (const gl_constant_value *)param->tiling, 3);
> >> > +  setup_vector_uniform_values(
> >> > + (const gl_constant_value *)param->swizzling, 2);
> >> > +
> >> 
> >> I need to understand the concept of image index before I can tell how this
> >> works.
> >
> > The mechanism looks fine to me:
> >
> > Reviewed-by: Topi Pohjolainen 
> >
> >> 
> >> But I checked that the order and dimensions of the individual fields match 
> >> the
> >> BRW_IMAGE_PARAM_* defines and the members of "struct brw_image_param". I
> >> noticed that in the structure the member "size" is the second while we give
> >> it as the third for the hw. Might be worth keep the order consistent 
> >> between
> >> the two just for clarity. (Of course it works as is as the compiled program
> >> and the ubo-layout match).
> >> 
> 
> Hah, sure, I've reordered them locally so that the brw_image_param
> structure matches the order in which they are written to the push param
> array.  Do you want me to resend?

No, that's fine.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCHv2 08/14] i965: Define and initialize image parameter structure.

2015-08-06 Thread Pohjolainen, Topi

On Wed, Aug 05, 2015 at 12:51:19PM +0300, Pohjolainen, Topi wrote:
> On Mon, Jul 20, 2015 at 07:17:48PM +0300, Francisco Jerez wrote:
> > This will be used to pass image meta-data to the shader when we cannot
> > use typed surface reads and writes.  All entries except surface_idx
> > and size are otherwise unused and will get eliminated by the uniform
> > packing pass.  size will be used for bounds checking with some image
> > formats and will be useful for ARB_shader_image_size too.  surface_idx
> > is always used.
> > 
> > v2: Add CS support.  Move the image_params array back to
> > brw_stage_prog_data.
> > ---
> > I'm resending this (and also patches 9 and 10) because I had to make
> > some rather intrusive changes during one of my last rebases -- The
> > image_param array is now part of brw_stage_prog_data again instead of
> > brw_stage_state (ironically as it was in my very first submission of
> > these patches) because the compiler no longer has access to
> > brw_stage_state since the brw_context pointer was removed from the
> > visitors.
> > 
> >  src/mesa/drivers/dri/i965/brw_context.h  | 54 
> >  src/mesa/drivers/dri/i965/brw_cs.cpp |  3 +
> >  src/mesa/drivers/dri/i965/brw_gs.c   |  3 +
> >  src/mesa/drivers/dri/i965/brw_vs.c   |  5 +-
> >  src/mesa/drivers/dri/i965/brw_wm.c   |  4 ++
> >  src/mesa/drivers/dri/i965/brw_wm_surface_state.c | 82 
> > 
> >  6 files changed, 150 insertions(+), 1 deletion(-)
> > 
> > diff --git a/src/mesa/drivers/dri/i965/brw_context.h 
> > b/src/mesa/drivers/dri/i965/brw_context.h
> > index e16ad10..9ebad5b 100644
> > --- a/src/mesa/drivers/dri/i965/brw_context.h
> > +++ b/src/mesa/drivers/dri/i965/brw_context.h
> > @@ -361,6 +361,7 @@ struct brw_stage_prog_data {
> >  
> > GLuint nr_params;   /**< number of float params/constants */
> > GLuint nr_pull_params;
> > +   unsigned nr_image_params;
> >  
> > unsigned curb_read_length;
> > unsigned total_scratch;
> > @@ -381,6 +382,59 @@ struct brw_stage_prog_data {
> >  */
> > const gl_constant_value **param;
> > const gl_constant_value **pull_param;
> > +
> > +   /**
> > +* Image metadata passed to the shader as uniforms.  This is 
> > deliberately
> > +* ignored by brw_stage_prog_data_compare() because its contents don't 
> > have
> > +* any influence on program compilation.
> > +*/
> > +   struct brw_image_param *image_param;
> > +};
> > +
> > +/*
> > + * Image metadata structure as laid out in the shader parameter
> > + * buffer.  Entries have to be 16B-aligned for the vec4 back-end to be
> > + * able to use them.  That's okay because the padding and any unused
> > + * entries [most of them except when we're doing untyped surface
> > + * access] will be removed by the uniform packing pass.
> > + */
> > +#define BRW_IMAGE_PARAM_SURFACE_IDX_OFFSET  0
> > +#define BRW_IMAGE_PARAM_OFFSET_OFFSET   4
> > +#define BRW_IMAGE_PARAM_SIZE_OFFSET 8
> > +#define BRW_IMAGE_PARAM_STRIDE_OFFSET   12
> > +#define BRW_IMAGE_PARAM_TILING_OFFSET   16
> > +#define BRW_IMAGE_PARAM_SWIZZLING_OFFSET20
> > +#define BRW_IMAGE_PARAM_SIZE24
> > +
> > +struct brw_image_param {
> > +   /** Surface binding table index. */
> > +   uint32_t surface_idx;
> > +
> > +   /** Surface X, Y and Z dimensions. */
> > +   uint32_t size[3];
> > +
> > +   /** Offset applied to the X and Y surface coordinates. */
> > +   uint32_t offset[2];
> > +
> > +   /** X-stride in bytes, Y-stride in bytes, horizontal slice stride in
> > +* pixels, vertical slice stride in pixels.
> > +*/
> > +   uint32_t stride[4];
> > +
> > +   /** Log2 of the tiling modulus in the X, Y and Z dimension. */
> > +   uint32_t tiling[3];
> > +
> > +   /**
> > +* Right shift to apply for bit 6 address swizzling.  Two different
> > +* swizzles can be specified and will be applied one after the other.  
> > The
> > +* resulting address will be:
> > +*
> > +*  addr' = addr ^ ((1 << 6) & ((addr >> swizzling[0]) ^
> > +*  (addr >> swizzling[1])))
> > +*
> > +* Use \c 0xff if any of the swizzles is not required.
> > +*/
> > +   uint32_t swizzling[2];
> 
> I couldn't find any patches actually reading the swizzle values (only found
> writes to it). Hence I checked your branch "image-load-store-lower" and still
> couldn't find any users for it. Could you help me?

Okay, we discussed this offline and it was my mistake. There is a consumer
for this in brw_fs_surface_builder.cpp::emit_address_calculation().
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCHv2 08/14] i965: Define and initialize image parameter structure.

2015-08-06 Thread Pohjolainen, Topi

On Wed, Aug 05, 2015 at 12:11:02PM +0300, Pohjolainen, Topi wrote:
> On Mon, Jul 20, 2015 at 07:17:48PM +0300, Francisco Jerez wrote:
> > This will be used to pass image meta-data to the shader when we cannot
> > use typed surface reads and writes.  All entries except surface_idx
> > and size are otherwise unused and will get eliminated by the uniform
> > packing pass.  size will be used for bounds checking with some image
> > formats and will be useful for ARB_shader_image_size too.  surface_idx
> > is always used.
> > 
> > v2: Add CS support.  Move the image_params array back to
> > brw_stage_prog_data.
> > ---
> > I'm resending this (and also patches 9 and 10) because I had to make
> > some rather intrusive changes during one of my last rebases -- The
> > image_param array is now part of brw_stage_prog_data again instead of
> > brw_stage_state (ironically as it was in my very first submission of
> > these patches) because the compiler no longer has access to
> > brw_stage_state since the brw_context pointer was removed from the
> > visitors.
> > 
> >  src/mesa/drivers/dri/i965/brw_context.h  | 54 
> >  src/mesa/drivers/dri/i965/brw_cs.cpp |  3 +
> >  src/mesa/drivers/dri/i965/brw_gs.c   |  3 +
> >  src/mesa/drivers/dri/i965/brw_vs.c   |  5 +-
> >  src/mesa/drivers/dri/i965/brw_wm.c   |  4 ++
> >  src/mesa/drivers/dri/i965/brw_wm_surface_state.c | 82 
> > 
> >  6 files changed, 150 insertions(+), 1 deletion(-)
> > 
> > diff --git a/src/mesa/drivers/dri/i965/brw_context.h 
> > b/src/mesa/drivers/dri/i965/brw_context.h
> > index e16ad10..9ebad5b 100644
> > --- a/src/mesa/drivers/dri/i965/brw_context.h
> > +++ b/src/mesa/drivers/dri/i965/brw_context.h
> > @@ -361,6 +361,7 @@ struct brw_stage_prog_data {
> >  
> > GLuint nr_params;   /**< number of float params/constants */
> > GLuint nr_pull_params;
> > +   unsigned nr_image_params;
> >  
> > unsigned curb_read_length;
> > unsigned total_scratch;
> > @@ -381,6 +382,59 @@ struct brw_stage_prog_data {
> >  */
> > const gl_constant_value **param;
> > const gl_constant_value **pull_param;
> > +
> > +   /**
> > +* Image metadata passed to the shader as uniforms.  This is 
> > deliberately
> > +* ignored by brw_stage_prog_data_compare() because its contents don't 
> > have
> > +* any influence on program compilation.
> > +*/
> > +   struct brw_image_param *image_param;
> > +};
> > +
> > +/*
> > + * Image metadata structure as laid out in the shader parameter
> > + * buffer.  Entries have to be 16B-aligned for the vec4 back-end to be
> > + * able to use them.  That's okay because the padding and any unused
> > + * entries [most of them except when we're doing untyped surface
> > + * access] will be removed by the uniform packing pass.
> > + */
> > +#define BRW_IMAGE_PARAM_SURFACE_IDX_OFFSET  0
> > +#define BRW_IMAGE_PARAM_OFFSET_OFFSET   4
> > +#define BRW_IMAGE_PARAM_SIZE_OFFSET 8
> > +#define BRW_IMAGE_PARAM_STRIDE_OFFSET   12
> > +#define BRW_IMAGE_PARAM_TILING_OFFSET   16
> > +#define BRW_IMAGE_PARAM_SWIZZLING_OFFSET20
> > +#define BRW_IMAGE_PARAM_SIZE24
> > +
> > +struct brw_image_param {
> > +   /** Surface binding table index. */
> > +   uint32_t surface_idx;
> > +
> > +   /** Surface X, Y and Z dimensions. */
> > +   uint32_t size[3];
> 
> Like I mentioned in one of the subsequent patches, it would clearer if
> "size" would follow "offset". That way it matches the order of
> BRW_IMAGE_PARAM_*_OFFSET defines and later on the layout in the uniform
> buffer object.
> 
> > +
> > +   /** Offset applied to the X and Y surface coordinates. */
> > +   uint32_t offset[2];
> > +
> > +   /** X-stride in bytes, Y-stride in bytes, horizontal slice stride in
> > +* pixels, vertical slice stride in pixels.
> > +*/
> > +   uint32_t stride[4];
> > +
> > +   /** Log2 of the tiling modulus in the X, Y and Z dimension. */
> > +   uint32_t tiling[3];
> > +
> > +   /**
> > +* Right shift to apply for bit 6 address swizzling.  Two different
> > +* swizzles can be specified and will be applied one after the other.  
> > The
> > +* resulting address will be:
> > +*
> > +*  addr' = addr ^ ((1 << 6) &a

Re: [Mesa-dev] [PATCHv2 09/14] i965: Reserve enough parameter entries for all image uniforms used in the program.

2015-08-06 Thread Pohjolainen, Topi

On Mon, Jul 20, 2015 at 07:23:00PM +0300, Francisco Jerez wrote:
> v2: Add CS support.
> ---
>  src/mesa/drivers/dri/i965/brw_cs.cpp | 3 ++-
>  src/mesa/drivers/dri/i965/brw_gs.c   | 1 +
>  src/mesa/drivers/dri/i965/brw_vs.c   | 3 ++-
>  src/mesa/drivers/dri/i965/brw_wm.c   | 3 ++-
>  4 files changed, 7 insertions(+), 3 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_cs.cpp 
> b/src/mesa/drivers/dri/i965/brw_cs.cpp
> index 144aa27..232ea18 100644
> --- a/src/mesa/drivers/dri/i965/brw_cs.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_cs.cpp
> @@ -182,7 +182,8 @@ brw_codegen_cs_prog(struct brw_context *brw,
>  * prog_data associated with the compiled program, and which will be freed
>  * by the state cache.
>  */
> -   int param_count = cs->num_uniform_components;
> +   int param_count = cs->num_uniform_components +
> + cs->NumImages * BRW_IMAGE_PARAM_SIZE;
>  
> /* The backend also sometimes adds params for texture size. */
> param_count += 2 * 
> ctx->Const.Program[MESA_SHADER_COMPUTE].MaxTextureImageUnits;
> diff --git a/src/mesa/drivers/dri/i965/brw_gs.c 
> b/src/mesa/drivers/dri/i965/brw_gs.c
> index d1a955a..5c0d923 100644
> --- a/src/mesa/drivers/dri/i965/brw_gs.c
> +++ b/src/mesa/drivers/dri/i965/brw_gs.c
> @@ -64,6 +64,7 @@ brw_codegen_gs_prog(struct brw_context *brw,
>  
> /* We also upload clip plane data as uniforms */
> param_count += MAX_CLIP_PLANES * 4;
> +   param_count += gs->NumImages * BRW_IMAGE_PARAM_SIZE;
>  
> c.prog_data.base.base.param =
>rzalloc_array(NULL, const gl_constant_value *, param_count);
> diff --git a/src/mesa/drivers/dri/i965/brw_vs.c 
> b/src/mesa/drivers/dri/i965/brw_vs.c
> index 20bc7a9..96aa56d 100644
> --- a/src/mesa/drivers/dri/i965/brw_vs.c
> +++ b/src/mesa/drivers/dri/i965/brw_vs.c
> @@ -121,7 +121,8 @@ brw_codegen_vs_prog(struct brw_context *brw,
> * case being a float value that gets blown up to a vec4, so be
> * conservative here.
> */
> -  param_count = vs->num_uniform_components * 4;
> +  param_count = (vs->num_uniform_components * 4  +

Extra space before +

> + vs->NumImages * BRW_IMAGE_PARAM_SIZE);

Above in compute case you don't have the surrounding (), I think you can drop
them here and further down in scalar case as well.

Not really a big thing though:

Reviewed-by: Topi Pohjolainen 

>stage_prog_data->nr_image_params = vs->NumImages;
> } else {
>param_count = vp->program.Base.Parameters->NumParameters * 4;
> diff --git a/src/mesa/drivers/dri/i965/brw_wm.c 
> b/src/mesa/drivers/dri/i965/brw_wm.c
> index e0e0bb7..9d3da49 100644
> --- a/src/mesa/drivers/dri/i965/brw_wm.c
> +++ b/src/mesa/drivers/dri/i965/brw_wm.c
> @@ -195,7 +195,8 @@ brw_codegen_wm_prog(struct brw_context *brw,
>  */
> int param_count;
> if (fs) {
> -  param_count = fs->num_uniform_components;
> +  param_count = (fs->num_uniform_components +
> + fs->NumImages * BRW_IMAGE_PARAM_SIZE);
>prog_data.base.nr_image_params = fs->NumImages;
> } else {
>param_count = fp->program.Base.Parameters->NumParameters * 4;
> -- 
> 2.4.3
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCHv2 10/14] i965: Hook up image state upload.

2015-08-06 Thread Pohjolainen, Topi

On Mon, Jul 20, 2015 at 07:23:47PM +0300, Francisco Jerez wrote:
> v2: Add CS support.  Move the image_params array back to
> brw_stage_prog_data.

Reviewed-by: Topi Pohjolainen 

> ---
>  src/mesa/drivers/dri/i965/brw_context.h  | 10 +++-
>  src/mesa/drivers/dri/i965/brw_gs_surface_state.c | 25 
>  src/mesa/drivers/dri/i965/brw_state.h|  4 ++
>  src/mesa/drivers/dri/i965/brw_state_upload.c | 12 
>  src/mesa/drivers/dri/i965/brw_vs_surface_state.c | 25 
>  src/mesa/drivers/dri/i965/brw_wm_surface_state.c | 72 
> 
>  6 files changed, 146 insertions(+), 2 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_context.h 
> b/src/mesa/drivers/dri/i965/brw_context.h
> index 9ebad5b..b6f993e 100644
> --- a/src/mesa/drivers/dri/i965/brw_context.h
> +++ b/src/mesa/drivers/dri/i965/brw_context.h
> @@ -201,6 +201,7 @@ enum brw_state_id {
> BRW_STATE_STATS_WM,
> BRW_STATE_UNIFORM_BUFFER,
> BRW_STATE_ATOMIC_BUFFER,
> +   BRW_STATE_IMAGE_UNITS,
> BRW_STATE_META_IN_PROGRESS,
> BRW_STATE_INTERPOLATION_MAP,
> BRW_STATE_PUSH_CONSTANT_ALLOCATION,
> @@ -282,6 +283,7 @@ enum brw_state_id {
>  #define BRW_NEW_STATS_WM(1ull << BRW_STATE_STATS_WM)
>  #define BRW_NEW_UNIFORM_BUFFER  (1ull << BRW_STATE_UNIFORM_BUFFER)
>  #define BRW_NEW_ATOMIC_BUFFER   (1ull << BRW_STATE_ATOMIC_BUFFER)
> +#define BRW_NEW_IMAGE_UNITS (1ull << BRW_STATE_IMAGE_UNITS)
>  #define BRW_NEW_META_IN_PROGRESS(1ull << BRW_STATE_META_IN_PROGRESS)
>  #define BRW_NEW_INTERPOLATION_MAP   (1ull << BRW_STATE_INTERPOLATION_MAP)
>  #define BRW_NEW_PUSH_CONSTANT_ALLOCATION (1ull << 
> BRW_STATE_PUSH_CONSTANT_ALLOCATION)
> @@ -1513,8 +1515,8 @@ struct brw_context
> } perfmon;
>  
> int num_atoms[BRW_NUM_PIPELINES];
> -   const struct brw_tracked_state render_atoms[57];
> -   const struct brw_tracked_state compute_atoms[3];
> +   const struct brw_tracked_state render_atoms[60];
> +   const struct brw_tracked_state compute_atoms[4];
>  
> /* If (INTEL_DEBUG & DEBUG_BATCH) */
> struct {
> @@ -1792,6 +1794,10 @@ void brw_upload_abo_surfaces(struct brw_context *brw,
>   struct gl_shader_program *prog,
>   struct brw_stage_state *stage_state,
>   struct brw_stage_prog_data *prog_data);
> +void brw_upload_image_surfaces(struct brw_context *brw,
> +   struct gl_shader *shader,
> +   struct brw_stage_state *stage_state,
> +   struct brw_stage_prog_data *prog_data);
>  
>  /* brw_surface_formats.c */
>  bool brw_render_target_supported(struct brw_context *brw,
> diff --git a/src/mesa/drivers/dri/i965/brw_gs_surface_state.c 
> b/src/mesa/drivers/dri/i965/brw_gs_surface_state.c
> index 0b8bfc3..0bb3074 100644
> --- a/src/mesa/drivers/dri/i965/brw_gs_surface_state.c
> +++ b/src/mesa/drivers/dri/i965/brw_gs_surface_state.c
> @@ -119,3 +119,28 @@ const struct brw_tracked_state brw_gs_abo_surfaces = {
> },
> .emit = brw_upload_gs_abo_surfaces,
>  };
> +
> +static void
> +brw_upload_gs_image_surfaces(struct brw_context *brw)
> +{
> +   struct gl_context *ctx = &brw->ctx;
> +   /* BRW_NEW_GEOMETRY_PROGRAM */
> +   struct gl_shader_program *prog =
> +  ctx->_Shader->CurrentProgram[MESA_SHADER_GEOMETRY];
> +
> +   if (prog) {
> +  /* BRW_NEW_GS_PROG_DATA, BRW_NEW_IMAGE_UNITS */
> +  brw_upload_image_surfaces(brw, 
> prog->_LinkedShaders[MESA_SHADER_GEOMETRY],
> +&brw->gs.base, 
> &brw->gs.prog_data->base.base);
> +   }
> +}
> +
> +const struct brw_tracked_state brw_gs_image_surfaces = {
> +   .dirty = {
> +  .brw = BRW_NEW_BATCH |
> + BRW_NEW_GEOMETRY_PROGRAM |
> + BRW_NEW_GS_PROG_DATA |
> + BRW_NEW_IMAGE_UNITS,
> +   },
> +   .emit = brw_upload_gs_image_surfaces,
> +};
> diff --git a/src/mesa/drivers/dri/i965/brw_state.h 
> b/src/mesa/drivers/dri/i965/brw_state.h
> index 2eff1b5..0a09b44 100644
> --- a/src/mesa/drivers/dri/i965/brw_state.h
> +++ b/src/mesa/drivers/dri/i965/brw_state.h
> @@ -72,8 +72,10 @@ extern const struct brw_tracked_state brw_vs_samplers;
>  extern const struct brw_tracked_state brw_gs_samplers;
>  extern const struct brw_tracked_state brw_vs_ubo_surfaces;
>  extern const struct brw_tracked_state brw_vs_abo_surfaces;
> +extern const struct brw_tracked_state brw_vs_image_surfaces;
>  extern const struct brw_tracked_state brw_gs_ubo_surfaces;
>  extern const struct brw_tracked_state brw_gs_abo_surfaces;
> +extern const struct brw_tracked_state brw_gs_image_surfaces;
>  extern const struct brw_tracked_state brw_vs_unit;
>  extern const struct brw_tracked_state brw_gs_prog;
>  extern const struct brw_tracked_state brw_wm_prog;
> @@ -84,7 +86,9 @@ extern const struct brw_tracked_state brw_gs_binding_table;
>  extern const struct brw_tracked_stat

Re: [Mesa-dev] [PATCH v3 4/5] i965: handle nir_intrinsic_image_size

2015-08-13 Thread Pohjolainen, Topi

On Thu, Aug 13, 2015 at 08:00:43PM +0300, Martin Peres wrote:
> v2, Review from Francisco Jerez:
> - avoid the camelCase for the booleans
> - init the booleans using the sampler type
> - force the initialization of all the components of the output register
> 
> Signed-off-by: Martin Peres 
> ---
>  src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 48 
> 
>  1 file changed, 48 insertions(+)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp 
> b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> index ce4153d..cc0a5a6 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> @@ -1406,6 +1406,54 @@ fs_visitor::nir_emit_intrinsic(const fs_builder &bld, 
> nir_intrinsic_instr *instr
>break;
> }
>  
> +   case nir_intrinsic_image_size: {
> +  /* Get the referenced image variable and type. */
> +  const nir_variable *var = instr->variables[0]->var;
> +  const glsl_type *type = var->type->without_array();
> +  const brw_reg_type base_type = get_image_base_type(type);
> +
> +  /* Get the size of the image. */
> +  const fs_reg image = get_nir_image_deref(instr->variables[0]);
> +  const fs_reg size = offset(image, bld, BRW_IMAGE_PARAM_SIZE_OFFSET);
> +
> +  /*
> +   * For 1DArray image types, the array index is stored in the Z 
> component.

Just a few style nits from my part.

Usually (and in the rest of the file being modified) multi-line comments do
not have separate first line, instead:

 /* For 1DArray image types, the array index is stored in the Z
  * component.

> +   * Fix this by swizzling the Z component to the Y component.
> +   */
> +  const bool is_1d_array_image =
> +  (type->sampler_dimensionality == GLSL_SAMPLER_DIM_1D &&
> +   type->sampler_array);

Indentation here looks a little odd and you can drop the extra (). I would
write this:

 const bool is_1d_array_image =
type->sampler_dimensionality == GLSL_SAMPLER_DIM_1D &&
type->sampler_array;

Same comments just below.

> +
> +  /*
> +   * For CubeMapArray images, we should count the number of cubes instead
> +   * of the number of faces. Fix it by dividing the (Z component) by 6.
> +   */
> +  const bool is_cube_map_array_image =
> +  (type->sampler_dimensionality == GLSL_SAMPLER_DIM_CUBE &&
> +   type->sampler_array);
> +
> +  /* Copy all the components. */
> +  const nir_intrinsic_info *info = 
> &nir_intrinsic_infos[instr->intrinsic];
> +  for (int c = 0; c < info->dest_components; ++c) {
> + if (c > type->coordinate_components()) {
> + bld.MOV(offset(retype(dest, base_type), bld, c),
> + fs_reg(1));
> + } else if (c == 1 && is_1d_array_image) {
> +bld.MOV(offset(retype(dest, base_type), bld, c),
> +offset(size, bld, 2));
> + } else if (c == 2 && is_cube_map_array_image) {
> +bld.emit(SHADER_OPCODE_INT_QUOTIENT,
> + offset(retype(dest, base_type), bld, c),
> + offset(size, bld, c), fs_reg(6));
> + } else {
> +bld.MOV(offset(retype(dest, base_type), bld, c),
> +offset(size, bld, c));
> + }
> +   }
> +
> +  break;
> +   }
> +
> case nir_intrinsic_load_front_face:
>bld.MOV(retype(dest, BRW_REGISTER_TYPE_D),
>*emit_frontfacing_interpolation());
> -- 
> 2.5.0
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v3 2/5] glsl: add support for the imageSize builtin

2015-08-13 Thread Pohjolainen, Topi

On Thu, Aug 13, 2015 at 07:58:53PM +0300, Martin Peres wrote:
> The code is heavily inspired from Francisco Jerez's code supporting the
> image_load_store extension.
> 
> Backends willing to support this builtin should handle
> __intrinsic_image_size.
> 
> v2: Based on the review of Ilia Mirkin
> - Enable the extension for GLES 3.1
> - Fix indentation
> - Fix the return type (float to int, number of components for CubeImages)
> - Add a warning related to GLES 3.1
> 
> v3: Based on the review of Francisco Jerez
> - Refactor the code to share both add_image_function and _image with the other
>   image-related functions
> 
> Signed-off-by: Martin Peres 
> ---
>  src/glsl/builtin_functions.cpp | 109 
> +++--
>  1 file changed, 93 insertions(+), 16 deletions(-)
> 
> diff --git a/src/glsl/builtin_functions.cpp b/src/glsl/builtin_functions.cpp
> index 2175c66..5d0a825 100644
> --- a/src/glsl/builtin_functions.cpp
> +++ b/src/glsl/builtin_functions.cpp
> @@ -399,6 +399,13 @@ shader_image_load_store(const _mesa_glsl_parse_state 
> *state)
>  }
>  
>  static bool
> +shader_image_size(const _mesa_glsl_parse_state *state)
> +{
> +   return (state->is_version(430, 310) ||
> +   state->ARB_shader_image_size_enable);

You can drop the extra ().
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 3/7] i965/gen9: Handle the GL_TEXTURE_{1D, 1D_ARRAY} targets inside switch

2015-08-13 Thread Pohjolainen, Topi

On Thu, Aug 13, 2015 at 02:51:58PM -0700, Anuj Phogat wrote:
> Signed-off-by: Anuj Phogat 
> ---
>  src/mesa/drivers/dri/i965/brw_tex_layout.c | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_tex_layout.c 
> b/src/mesa/drivers/dri/i965/brw_tex_layout.c
> index edd7518..6ac4024 100644
> --- a/src/mesa/drivers/dri/i965/brw_tex_layout.c
> +++ b/src/mesa/drivers/dri/i965/brw_tex_layout.c
> @@ -194,9 +194,7 @@ tr_mode_vertical_texture_alignment(const struct 
> brw_context *brw,
> const unsigned align_3d_ys[] = {32, 32, 32, 16, 16};
> int i = 0;
>  
> -   assert(brw->gen >= 9 &&
> -  mt->target != GL_TEXTURE_1D &&
> -  mt->target != GL_TEXTURE_1D_ARRAY);
> +   assert(brw->gen >= 9);
>  
> /* Alignment computations below assume bpp >= 8 and a power of 2. */
> assert (bpp >= 8 && bpp <= 128 && _mesa_is_pow_two(bpp)) ;
> @@ -216,8 +214,10 @@ tr_mode_vertical_texture_alignment(const struct 
> brw_context *brw,
>align_yf = align_3d_yf;
>align_ys = align_3d_ys;
>break;
> +   case GL_TEXTURE_1D:
> +   case GL_TEXTURE_1D_ARRAY:

These two cases are actually unnecessary - without it will drop to the
default anyway. I checked the rest of the series and didn't find anything that
would take advantage of them either.
Anyway, I think it is cleaner to deal with them in the switch-case and

Reviewed-by: Topi Pohjolainen 

> default:
> -  unreachable("not reached");
> +  unreachable("Unexpected miptree target");
> }
>  
> /* Compute array index. */
> -- 
> 2.4.3
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 1/8] i965: Change the parameters passed to intel_miptree_get_tile_masks()

2015-08-16 Thread Pohjolainen, Topi

On Fri, Aug 14, 2015 at 04:51:52PM -0700, Anuj Phogat wrote:
> This change is required by the later patches.
> 
> Cc: Ben Widawsky 
> Signed-off-by: Anuj Phogat 
> ---
>  src/mesa/drivers/dri/i965/brw_blorp.cpp   | 3 ++-
>  src/mesa/drivers/dri/i965/brw_misc_state.c| 8 +---
>  src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 7 ++-
>  src/mesa/drivers/dri/i965/intel_mipmap_tree.h | 2 +-
>  4 files changed, 10 insertions(+), 10 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_blorp.cpp 
> b/src/mesa/drivers/dri/i965/brw_blorp.cpp
> index eac1f00..cb5ef58 100644
> --- a/src/mesa/drivers/dri/i965/brw_blorp.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_blorp.cpp
> @@ -144,7 +144,8 @@ brw_blorp_surface_info::compute_tile_offsets(uint32_t 
> *tile_x,
>  {
> uint32_t mask_x, mask_y;
>  
> -   intel_miptree_get_tile_masks(mt, &mask_x, &mask_y, 
> map_stencil_as_y_tiled);
> +   intel_miptree_get_tile_masks(mt->tiling, mt->cpp, &mask_x, &mask_y,
> +map_stencil_as_y_tiled);
>  
> *tile_x = x_offset & mask_x;
> *tile_y = y_offset & mask_y;
> diff --git a/src/mesa/drivers/dri/i965/brw_misc_state.c 
> b/src/mesa/drivers/dri/i965/brw_misc_state.c
> index e9d9467..246aefb 100644
> --- a/src/mesa/drivers/dri/i965/brw_misc_state.c
> +++ b/src/mesa/drivers/dri/i965/brw_misc_state.c
> @@ -174,11 +174,13 @@ brw_get_depthstencil_tile_masks(struct 
> intel_mipmap_tree *depth_mt,
> uint32_t tile_mask_x = 0, tile_mask_y = 0;
>  
> if (depth_mt) {
> -  intel_miptree_get_tile_masks(depth_mt, &tile_mask_x, &tile_mask_y, 
> false);
> +  intel_miptree_get_tile_masks(depth_mt->tiling, depth_mt->cpp,
> +   &tile_mask_x, &tile_mask_y, false);
>  
>if (intel_miptree_level_has_hiz(depth_mt, depth_level)) {
>   uint32_t hiz_tile_mask_x, hiz_tile_mask_y;
> - intel_miptree_get_tile_masks(depth_mt->hiz_buf->mt,
> + intel_miptree_get_tile_masks(depth_mt->hiz_buf->mt->tiling,
> +  depth_mt->hiz_buf->mt->cpp,
>&hiz_tile_mask_x, &hiz_tile_mask_y,
>false);
>  
> @@ -200,7 +202,7 @@ brw_get_depthstencil_tile_masks(struct intel_mipmap_tree 
> *depth_mt,
>   tile_mask_y |= 63;
>} else {
>   uint32_t stencil_tile_mask_x, stencil_tile_mask_y;
> - intel_miptree_get_tile_masks(stencil_mt,
> + intel_miptree_get_tile_masks(stencil_mt->tiling, stencil_mt->cpp,
>&stencil_tile_mask_x,
>&stencil_tile_mask_y, false);
>  
> diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c 
> b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
> index e85c3f0..b4f2bd8 100644
> --- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
> +++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
> @@ -1087,13 +1087,10 @@ intel_miptree_get_image_offset(const struct 
> intel_mipmap_tree *mt,
>   * untiled, the masks are set to 0.
>   */
>  void
> -intel_miptree_get_tile_masks(const struct intel_mipmap_tree *mt,
> +intel_miptree_get_tile_masks(uint32_t tiling, uint32_t cpp,
>   uint32_t *mask_x, uint32_t *mask_y,
>   bool map_stencil_as_y_tiled)
>  {
> -   int cpp = mt->cpp;
> -   uint32_t tiling = mt->tiling;
> -
> if (map_stencil_as_y_tiled)
>tiling = I915_TILING_Y;
>  
> @@ -1176,7 +1173,7 @@ intel_miptree_get_tile_offsets(const struct 
> intel_mipmap_tree *mt,
> uint32_t x, y;
> uint32_t mask_x, mask_y;
>  
> -   intel_miptree_get_tile_masks(mt, &mask_x, &mask_y, false);
> +   intel_miptree_get_tile_masks(mt->tiling, mt->cpp, &mask_x, &mask_y, 
> false);
> intel_miptree_get_image_offset(mt, level, slice, &x, &y);
>  
> *tile_x = x & mask_x;
> diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.h 
> b/src/mesa/drivers/dri/i965/intel_mipmap_tree.h
> index 790d312..b1617a2 100644
> --- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.h
> +++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.h
> @@ -622,7 +622,7 @@ intel_miptree_get_dimensions_for_image(struct 
> gl_texture_image *image,
> int *width, int *height, int *depth);
>  
>  void
> -intel_miptree_get_tile_masks(const struct intel_mipmap_tree *mt,
> +intel_miptree_get_tile_masks(uint32_t tiling, uint32_t cpp,
>   uint32_t *mask_x, uint32_t *mask_y,
>   bool map_stencil_as_y_tiled);

I'm always for limiting the input to the bare minimum needed.

Should we also drop "_miptree_" in the middle and use, for example, just
intel_get_tile_masks() instead? The function doesn't have direct relation to
miptrees anymore after the change.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 09/12] i965/fs: make SIMD-splitting respect the original stride/offset

2015-08-17 Thread Pohjolainen, Topi

On Fri, Aug 14, 2015 at 03:30:18PM -0700, Connor Abbott wrote:
> In some cases, we need to emit ALU instructions with a certain stride
> due to a HW limitation. When splitting that instruction, we need to
> respect the original stride when creating the temporaries we load from
> and store into. Otherwise, we'll reintroduce the problem we were trying
> to work around.
> 
> Signed-off-by: Connor Abbott 
> ---
>  src/mesa/drivers/dri/i965/brw_fs.cpp | 14 ++
>  1 file changed, 10 insertions(+), 4 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
> b/src/mesa/drivers/dri/i965/brw_fs.cpp
> index 812648f..386e9a2 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
> @@ -2370,12 +2370,14 @@ fs_visitor::opt_register_renaming()
>  
>if (depth == 0 &&
>inst->dst.file == GRF &&
> -  alloc.sizes[inst->dst.reg] == inst->exec_size / 8 &&
> +  alloc.sizes[inst->dst.reg] ==
> +inst->dst.component_size(inst->exec_size) &&
>!inst->is_partial_write()) {
>   if (remap[dst] == -1) {
>  remap[dst] = dst;
>   } else {
> -remap[dst] = alloc.allocate(inst->exec_size / 8);
> +remap[dst] =
> +   alloc.allocate(inst->dst.component_size(inst->exec_size));
>  inst->dst.reg = remap[dst];
>  progress = true;
>   }
> @@ -4334,6 +4336,8 @@ fs_visitor::lower_simd_width()
> * temporary passed as source to the lowered instruction.
> */
>split_inst.src[j] = lbld.vgrf(inst->src[j].type, src_size);
> +  split_inst.src[j].subreg_offset = 
> inst->src[j].subreg_offset;

The fixes for the wider component size (64-bits) and the stride are clearly
needed for doubles. I'm wondering though why the sub-register offset hasn't
caused us any problems before. That change is not needed just for doubles,
is it?

> +  split_inst.src[j].stride = inst->src[j].stride;
>emit_transpose(lbld.group(copy_width, 0),
>   split_inst.src[j], &src, 1, src_size, n);
> }
> @@ -4343,8 +4347,10 @@ fs_visitor::lower_simd_width()
> /* Allocate enough space to hold the result of the lowered
>  * instruction and fix up the number of registers written.
>  */
> -   split_inst.dst = dsts[i] =
> -  lbld.vgrf(inst->dst.type, dst_size);
> +   fs_reg dst = lbld.vgrf(inst->dst.type, dst_size);
> +   dst.stride = inst->dst.stride;
> +   dst.subreg_offset = inst->dst.subreg_offset;
> +   split_inst.dst = dsts[i] = dst;
> split_inst.regs_written =
>DIV_ROUND_UP(inst->regs_written * lower_width,
> inst->exec_size);
> -- 
> 2.4.3
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1507 matches

Mail list logo