date:20131021

Re: [Mesa-dev] [PATCH] i965: Implement ARB_texture_mirror_clamp.

2013-10-21 Thread Rico Schüller

I have one minor nitpick (see below). But either way, with the subject
fixed (as mentioned by Matt), this is:
Reviewed-by: Rico Schüller 

On 21.10.2013 07:24, Kenneth Graunke wrote:
> This passes Piglit's texwrap tests (after applying Rico's patch to
> make them use this extension).
> 
> Cc: Rico Schüller 
> Cc: Ian Romanick 
> Signed-off-by: Kenneth Graunke 
> ---
>  src/mesa/drivers/dri/i965/brw_wm_sampler_state.c | 2 ++
>  src/mesa/drivers/dri/i965/intel_extensions.c | 1 +
>  2 files changed, 3 insertions(+)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_wm_sampler_state.c 
> b/src/mesa/drivers/dri/i965/brw_wm_sampler_state.c
> index b716d61..db7ab60 100644
> --- a/src/mesa/drivers/dri/i965/brw_wm_sampler_state.c
> +++ b/src/mesa/drivers/dri/i965/brw_wm_sampler_state.c
> @@ -71,6 +71,8 @@ translate_wrap_mode(GLenum wrap, bool using_nearest)
>return BRW_TEXCOORDMODE_CLAMP_BORDER;
> case GL_MIRRORED_REPEAT: 
>return BRW_TEXCOORDMODE_MIRROR;
> +   case GL_MIRROR_CLAMP_TO_EDGE_EXT:
> +  return BRW_TEXCOORDMODE_MIRROR_ONCE;
I'd prefer GL_MIRROR_CLAMP_TO_EDGE instead of
GL_MIRROR_CLAMP_TO_EDGE_EXT but as it is the same value it really
shouldn't matter.

> default: 
>return BRW_TEXCOORDMODE_WRAP;
> }
> diff --git a/src/mesa/drivers/dri/i965/intel_extensions.c 
> b/src/mesa/drivers/dri/i965/intel_extensions.c
> index 803d090..87cc87d 100644
> --- a/src/mesa/drivers/dri/i965/intel_extensions.c
> +++ b/src/mesa/drivers/dri/i965/intel_extensions.c
> @@ -75,6 +75,7 @@ intelInitExtensions(struct gl_context *ctx)
> ctx->Extensions.ARB_texture_env_crossbar = true;
> ctx->Extensions.ARB_texture_env_dot3 = true;
> ctx->Extensions.ARB_texture_float = true;
> +   ctx->Extensions.ARB_texture_mirror_clamp_to_edge = true;
> ctx->Extensions.ARB_texture_non_power_of_two = true;
> ctx->Extensions.ARB_texture_rg = true;
> ctx->Extensions.ARB_texture_rgb10_a2ui = true;
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] gallium: new, unified pipe_context::set_sampler_views() function

2013-10-21 Thread Emil Velikov

On 16/10/13 03:23, Emil Velikov wrote:
> On 08/10/13 01:23, Brian Paul wrote:
[...]
>> This change touches quite a few files.  I've probably missed
>> something in drivers or state trackers that I can't test.
>> Please test if you're able.  Thanks.
>> ---
> Will run a quick piglit with and w/o on my nv50 this weekend.
> 
Running a quick piglit test has proven a bit trickier than last time.
With that said I see no regressions on my nv50, apart from the 2-3 tests
with somewhat random result :)

Tested-by: Emil Velikov 

Cheers
Emil

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] docs: Update docs for ARB_texture_mirror_clamp_to_edge.

2013-10-21 Thread Rico Schüller

Signed-off-by: Rico Schüller 
---
 docs/GL3.txt| 2 +-
 docs/relnotes/10.0.html | 1 +
 2 Dateien geändert, 2 Zeilen hinzugefügt(+), 1 Zeile entfernt(-)

diff --git a/docs/GL3.txt b/docs/GL3.txt
index a56e7fe..e8e797a 100644
--- a/docs/GL3.txt
+++ b/docs/GL3.txt
@@ -173,7 +173,7 @@ ARB_clear_texturenot 
started
 ARB_enhanced_layouts not started
 ARB_multi_bind   not started
 ARB_query_buffer_object  not started
-ARB_texture_mirror_clamp_to_edge not started
+ARB_texture_mirror_clamp_to_edge DONE (i965, nv30, nv50, 
nvc0, r300, r600, radeonsi, swrast)
 ARB_texture_stencil8 not started
 ARB_vertex_type_10f_11f_11f_rev  not started
 
diff --git a/docs/relnotes/10.0.html b/docs/relnotes/10.0.html
index 0b25f49..ef550d1 100644
--- a/docs/relnotes/10.0.html
+++ b/docs/relnotes/10.0.html
@@ -48,6 +48,7 @@ Note: some of the new features are only available with 
certain drivers.
 GL_ARB_conservative_depth on i965.
 GL_ARB_texture_gather on i965.
 GL_ARB_texture_query_levels on i965.
+GL_ARB_texture_mirror_clamp_to_edge.
 GL_KHR_debug
 
 
-- 
1.7.11.7

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 1/2] gallivm: implement seamless cube filtering

2013-10-21 Thread Jose Fonseca

Looks great AFAICT.

Jose

- Original Message -
> From: Roland Scheidegger 
> 
> For seamless cube filtering it is necessary to determine new faces and new
> coords per sample. The logic for this is _seriously_ complex (what needs
> to happen is very "asymmetric" wrt face, x/y under/overflow), further
> complicated by the fact that if the 4 samples are in a corner (meaning we
> only have actually 3 samples, and all 3 are on different faces) then
> falling off the edge is happening _both_ on x and y axis simultaneously.
> There was a noticeable performance hit in mesa's cubemap demo when seamless
> filtering was forced on (just below 10 percent or so in a debug build, when
> disabling all filtering hacks, otherwise it would probably be a bit more) and
> when always doing the logic, hence use a branch which it only does it if any
> of the pixels in a quad (or in two quads) actually hit this. With that there
> was no measurable performance hit in the cubemap demo (neither in a debug nor
> release buidl), but this will vary (cubemap demo very rarely hits edges).
> Might also be different on other cpus, as this forces SoA sampling path which
> potentially can be quite a bit slower.
> Note that as for corners, this code gets all the 3 samples which actually
> exist right, and the 4th texel will simply be the same as one of the others,
> meaning that filter weights will be a bit wrong. This however should be
> enough for full OpenGL (but not d3d10) compliance.
> ---
>  src/gallium/auxiliary/gallivm/lp_bld_sample.c |  138 +++
>  src/gallium/auxiliary/gallivm/lp_bld_sample.h |   13 ++
>  src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c |  257
>  +
>  3 files changed, 368 insertions(+), 40 deletions(-)
> 
> diff --git a/src/gallium/auxiliary/gallivm/lp_bld_sample.c
> b/src/gallium/auxiliary/gallivm/lp_bld_sample.c
> index 1c35200..a032d9d 100644
> --- a/src/gallium/auxiliary/gallivm/lp_bld_sample.c
> +++ b/src/gallium/auxiliary/gallivm/lp_bld_sample.c
> @@ -1402,6 +1402,144 @@ lp_build_unnormalized_coords(struct
> lp_build_sample_context *bld,
> }
>  }
>  
> +/**
> + * Generate new coords and faces for cubemap texels falling off the face.
> + *
> + * @param face   face (center) of the pixel
> + * @param x0 lower x coord
> + * @param x1 higher x coord (must be x0 + 1)
> + * @param y0 lower y coord
> + * @param y1 higher y coord (must be x0 + 1)
> + * @param max_coord texture cube (level) size - 1
> + * @param next_facesnew face values when falling off
> + * @param next_xcoords  new x coord values when falling off
> + * @param next_ycoords  new y coord values when falling off
> + *
> + * The arrays hold the new values when under/overflow of
> + * lower x, higher x, lower y, higher y coord would occur (in this order).
> + * next_xcoords/next_ycoords have two entries each (for both new lower and
> + * higher coord).
> + */
> +void
> +lp_build_cube_new_coords(struct lp_build_context *ivec_bld,
> +LLVMValueRef face,
> +LLVMValueRef x0,
> +LLVMValueRef x1,
> +LLVMValueRef y0,
> +LLVMValueRef y1,
> +LLVMValueRef max_coord,
> +LLVMValueRef next_faces[4],
> +LLVMValueRef next_xcoords[4][2],
> +LLVMValueRef next_ycoords[4][2])
> +{
> +   /*
> +* Lookup tables aren't nice for simd code hence try some logic here.
> +* (Note that while it would not be necessary to do per-sample (4)
> lookups
> +* when using a LUT as it's impossible that texels fall off of positive
> +* and negative edges simultaneously, it would however be necessary to
> +* do 2 lookups for corner handling as in this case texels both fall off
> +* of x and y axes.)
> +*/
> +   /*
> +* Next faces (for face 012345):
> +* x < 0.0  : 451110
> +* x >= 1.0 : 540001
> +* y < 0.0  : 225422
> +* y >= 1.0 : 334533
> +* Hence nfx+ (and nfy+) == nfx- (nfy-) xor 1
> +* nfx-: face > 1 ? (face == 5 ? 0 : 1) : (4 + face & 1)
> +* nfy+: face & ~4 > 1 ? face + 2 : 3;
> +* This could also use pshufb instead, but would need (manually coded)
> +* ssse3 intrinsic (llvm won't do non-constant shuffles).
> +*/
> +   struct gallivm_state *gallivm = ivec_bld->gallivm;
> +   LLVMValueRef sel, sel_f2345, sel_f23, sel_f2, tmpsel, tmp;
> +   LLVMValueRef faceand1, sel_fand1, maxmx0, maxmx1, maxmy0, maxmy1;
> +   LLVMValueRef c2 = lp_build_const_int_vec(gallivm, ivec_bld->type, 2);
> +   LLVMValueRef c3 = lp_build_const_int_vec(gallivm, ivec_bld->type, 3);
> +   LLVMValueRef c4 = lp_build_const_int_vec(gallivm, ivec_bld->type, 4);
> +   LLVMValueRef c5 = lp_build_const_int_vec(gallivm, ivec_bld->type, 5);
> +
> +   sel = lp_build_cmp(ivec_bld, PIPE_FUNC_EQUAL, face, c5);
> +   tmpsel = lp_build_select(ivec_bld, sel, ivec_bld->zero, ivec_bld

Re: [Mesa-dev] [PATCH] gallium: new, unified pipe_context::set_sampler_views() function

2013-10-21 Thread Brian Paul


On 10/21/2013 02:09 AM, Emil Velikov wrote:

On 16/10/13 03:23, Emil Velikov wrote:

On 08/10/13 01:23, Brian Paul wrote:

[...]

This change touches quite a few files.  I've probably missed
something in drivers or state trackers that I can't test.
Please test if you're able.  Thanks.
---

Will run a quick piglit with and w/o on my nv50 this weekend.


Running a quick piglit test has proven a bit trickier than last time.
With that said I see no regressions on my nv50, apart from the 2-3 tests
with somewhat random result :)

Tested-by: Emil Velikov 


OK, thanks.  I'll probably push the patch later today.

-Brian


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] libGL without X

2013-10-21 Thread Chris Healy

I have a headless platform I need OpenGL to work on that does not have X.
It is x86 with Intel HD 4000 graphics.

Ultimately, I'm just wanting to use OpenGL to render to memory for encoding
to H.264 and streaming.

I'm trying to build Mesa for this platform without X and cannot get it to
build libGL.so.

What am I missing here?  Is it not possible to use OpenGL without X?  I was
hoping I could use OpenGL with EGL for testing purposes.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 1/2] mesa: remove remnants of GL_MESA_shader_debug

2013-10-21 Thread Ian Romanick

With the one comment below taken care of, the series is

Reviewed-by: Ian Romanick 

On 10/19/2013 07:34 AM, Brian Paul wrote:
> This extension never saw any real use so remove it.
> ---
>  include/GL/gl.h   |   20 
>  src/mapi/glapi/gen/gl_API.xml |   32 
>  2 files changed, 52 deletions(-)
> 
> diff --git a/include/GL/gl.h b/include/GL/gl.h
> index babb746..968032c 100644
> --- a/include/GL/gl.h
> +++ b/include/GL/gl.h
> @@ -2086,26 +2086,6 @@ typedef void (APIENTRYP PFNGLMULTITEXCOORD4SVARBPROC) 
> (GLenum target, const GLsh
>  
>  
>  
> -#if GL_ARB_shader_objects
> -
> -#ifndef GL_MESA_shader_debug
> -#define GL_MESA_shader_debug 1
> -
> -#define GL_DEBUG_OBJECT_MESA  0x8759
> -#define GL_DEBUG_PRINT_MESA   0x875A
> -#define GL_DEBUG_ASSERT_MESA  0x875B
> -
> -GLAPI GLhandleARB GLAPIENTRY glCreateDebugObjectMESA (void);
> -GLAPI void GLAPIENTRY glClearDebugLogMESA (GLhandleARB obj, GLenum logType, 
> GLenum shaderType);
> -GLAPI void GLAPIENTRY glGetDebugLogMESA (GLhandleARB obj, GLenum logType, 
> GLenum shaderType, GLsizei maxLength,
> - GLsizei *length, GLcharARB 
> *debugLog);
> -GLAPI GLsizei GLAPIENTRY glGetDebugLogLengthMESA (GLhandleARB obj, GLenum 
> logType, GLenum shaderType);
> -
> -#endif /* GL_MESA_shader_debug */
> -
> -#endif /* GL_ARB_shader_objects */
> -
> -
>  /*
>   * ???. GL_MESA_packed_depth_stencil
>   * XXX obsolete
> diff --git a/src/mapi/glapi/gen/gl_API.xml b/src/mapi/glapi/gen/gl_API.xml
> index 48fce36..30ab9c9 100644
> --- a/src/mapi/glapi/gen/gl_API.xml
> +++ b/src/mapi/glapi/gen/gl_API.xml
> @@ -13027,38 +13027,6 @@
>  
>  
>  
> -
> -
> -
> -

You also need to remove the enums from
src/mesa/main/tests/enum_strings.cpp.  I suspect 'make check' will fail
otherwise.

> -
> -
> -
> -
> -
> -
> -
> -
> -
> -
> -
> -
> -
> -
> -
> -
> -
> -
> -
> -
> -
> -
> -
> -
> -
> -
> -
> -
>  
>  
>  
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] docs: Mark GLSL 1.50, 3.30, and geometry shaders done for i965.

2013-10-21 Thread Ian Romanick

With the issue that Kaelyn pointed out resolved,

Reviewed-by: Ian Romanick 

On 10/18/2013 03:12 PM, Matt Turner wrote:
> ---
>  docs/GL3.txt | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/docs/GL3.txt b/docs/GL3.txt
> index c269f19..a7c7ae6 100644
> --- a/docs/GL3.txt
> +++ b/docs/GL3.txt
> @@ -63,9 +63,9 @@ Signed normalized textures (GL_EXT_texture_snorm) DONE 
> (i965, r300, r600)
>  
>  GL 3.2:
>  
> -Core/compatibility profiles   DONE
> -GLSL 1.50 in progress
> -Geometry shaders (GL_ARB_geometry_shader4)partially done
> +Core/compatibility profiles   DONE (i965)
> +GLSL 1.50 DONE (i965)
> +Geometry shaders  DONE (i965)
>  BGRA vertex order (GL_ARB_vertex_array_bgra)  DONE (i965, r300, 
> r600, swrast)
>  Base vertex offset(GL_ARB_draw_elements_base_vertex)  DONE (i965, r300, 
> r600, swrast)
>  Frag shader coord (GL_ARB_fragment_coord_conventions) DONE (i965, r300, 
> r600, swrast)
> @@ -79,7 +79,7 @@ GLX_ARB_create_context_profileDONE
>  
>  GL 3.3:
>  
> -GLSL 3.30 new features in this 
> version pretty much done
> +GLSL 3.30 DONE (i965)
>  GL_ARB_blend_func_extendedDONE (i965, r600, 
> softpipe)
>  GL_ARB_explicit_attrib_location   DONE (i915, i965, 
> r300, r600, swrast)
>  GL_ARB_occlusion_query2   DONE (i965, r300, 
> r600, swrast)
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] Ivybridge support for ARB_transform_feedback2

2013-10-21 Thread Ian Romanick

On 10/17/2013 11:09 PM, Kenneth Graunke wrote:
> Here's my implementation of ARB_transform_feedback2.  I believe it's
> complete; it passes all of our Piglit tests and a lot of Intel's
> oglconform tests.
> 
> This should work out of the box on Ivybridge and Baytrail.  It won't
> work on Haswell at the moment, due to restrictions on register writes
> (to be solved in a future kernel version).  Patch 9 will need to be
> replaced with something that detects whether or not we can write
> registers from userspace batchbuffers.
> 
> In the meantime, I figured I'd send out the rest for review.
> 
> Porting this back to Sandybridge is probably doable, but annoying.
> Sandybridge doesn't have the MI_LOAD_REGISTER_MEM command, so we'd have
> to map the buffers and use MI_LOAD_REGISTER_IMM.  Seems pretty gross.
> Plus, transform feedback is done very differently pre-Ivybridge.  I'm
> not sure it's worth it, seeing as it's a GL 4.0 feature.

I assume this is just to support glDrawTransformFeedback?

Can you add that information to http://dri.freedesktop.org/wiki/I965Todo/ ?

> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 3/8] mesa: Pass number of samples as a program state variable

2013-10-21 Thread Anuj Phogat

On Fri, Oct 18, 2013 at 2:44 PM, Paul Berry  wrote:
> On 14 October 2013 10:12, Anuj Phogat  wrote:
>>
>> Number of samples will be required in fragment shader program by new
>> GLSL builtin uniform "gl_NumSamples".
>>
>> Signed-off-by: Anuj Phogat 
>> ---
>>  src/mesa/program/prog_statevars.c | 11 +++
>>  src/mesa/program/prog_statevars.h |  2 ++
>>  2 files changed, 13 insertions(+)
>>
>> diff --git a/src/mesa/program/prog_statevars.c
>> b/src/mesa/program/prog_statevars.c
>> index 145c07c..8f798da 100644
>> --- a/src/mesa/program/prog_statevars.c
>> +++ b/src/mesa/program/prog_statevars.c
>> @@ -349,6 +349,9 @@ _mesa_fetch_state(struct gl_context *ctx, const
>> gl_state_index state[],
>>   }
>>}
>>return;
>> +   case STATE_NUM_SAMPLES:
>> +  ((int *)value)[0] = ctx->DrawBuffer->Visual.samples;
>> +  return;
>> case STATE_DEPTH_RANGE:
>>value[0] = ctx->Viewport.Near; /* near   */
>>value[1] = ctx->Viewport.Far;  /* far*/
>> @@ -665,6 +668,9 @@ _mesa_program_state_flags(const gl_state_index
>> state[STATE_LENGTH])
>> case STATE_PROGRAM_MATRIX:
>>return _NEW_TRACK_MATRIX;
>>
>> +   case STATE_NUM_SAMPLES:
>> +  return _NEW_MULTISAMPLE;
>
>
> I think this should be _NEW_BUFFERS.  _NEW_MULTISAMPLE is only flagged when
> something in gl_multisample_attrib changes, and nothing in that category
> affects ctx->DrawBuffer->Visual.samples.
Right. Thanks for noticing this. I'll fix it.

> With that fixed, this patch is:
>
> Reviewed-by: Paul Berry 
>
>>
>> +
>> case STATE_DEPTH_RANGE:
>>return _NEW_VIEWPORT;
>>
>> @@ -852,6 +858,9 @@ append_token(char *dst, gl_state_index k)
>> case STATE_TEXENV_COLOR:
>>append(dst, "texenv");
>>break;
>> +   case STATE_NUM_SAMPLES:
>> +  append(dst, "num.samples");
>> +  break;
>> case STATE_DEPTH_RANGE:
>>append(dst, "depth.range");
>>break;
>> @@ -1027,6 +1036,8 @@ _mesa_program_state_string(const gl_state_index
>> state[STATE_LENGTH])
>>break;
>> case STATE_FOG_COLOR:
>>break;
>> +   case STATE_NUM_SAMPLES:
>> +  break;
>> case STATE_DEPTH_RANGE:
>>break;
>> case STATE_FRAGMENT_PROGRAM:
>> diff --git a/src/mesa/program/prog_statevars.h
>> b/src/mesa/program/prog_statevars.h
>> index ec22b73..c3081c4 100644
>> --- a/src/mesa/program/prog_statevars.h
>> +++ b/src/mesa/program/prog_statevars.h
>> @@ -103,6 +103,8 @@ typedef enum gl_state_index_ {
>>
>> STATE_TEXENV_COLOR,
>>
>> +   STATE_NUM_SAMPLES,
>> +
>> STATE_DEPTH_RANGE,
>>
>> STATE_VERTEX_PROGRAM,
>> --
>> 1.8.1.4
>>
>> ___
>> mesa-dev mailing list
>> mesa-dev@lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] i965/vec4: Reduce working set size of live variables computation.

2013-10-21 Thread Eric Anholt

Orbital Explorer was generating a 4000 instruction geometry shader, which
was taking 275 trips through dead code elimination and register
coalescing, each of which updated live variables to get its work done, and
invalidated those live variables afterwards.

By using bitfields instead of bools (reducing the working set size by a
factor of 8) in live variables analysis, it drops from 88% of the profile
to 57%, and reduces overall runtime from I-got-bored-and-killed-it (Paul
says 3+ minutes) to 10.5 seconds.

Compare to f179f419d1d0a03fad36c2b0a58e8b853bae6118 on the FS side.
---
 .../drivers/dri/i965/brw_vec4_live_variables.cpp   | 41 --
 .../drivers/dri/i965/brw_vec4_live_variables.h | 10 +++---
 2 files changed, 28 insertions(+), 23 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4_live_variables.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_live_variables.cpp
index db3787b..f6675c8 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_live_variables.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_live_variables.cpp
@@ -83,8 +83,8 @@ vec4_live_variables::setup_def_use()
 
for (int j = 0; j < 4; j++) {
   int c = BRW_GET_SWZ(inst->src[i].swizzle, j);
-  if (!bd[b].def[reg * 4 + c])
- bd[b].use[reg * 4 + c] = true;
+  if (!BITSET_TEST(bd[b].def, reg * 4 + c))
+ BITSET_SET(bd[b].use, reg * 4 + c);
}
}
 }
@@ -99,8 +99,8 @@ vec4_live_variables::setup_def_use()
 for (int c = 0; c < 4; c++) {
if (inst->dst.writemask & (1 << c)) {
   int reg = inst->dst.reg;
-  if (!bd[b].use[reg * 4 + c])
- bd[b].def[reg * 4 + c] = true;
+  if (!BITSET_TEST(bd[b].use, reg * 4 + c))
+ BITSET_SET(bd[b].def, reg * 4 + c);
}
 }
  }
@@ -126,12 +126,12 @@ vec4_live_variables::compute_live_variables()
 
   for (int b = 0; b < cfg->num_blocks; b++) {
 /* Update livein */
-for (int i = 0; i < num_vars; i++) {
-   if (bd[b].use[i] || (bd[b].liveout[i] && !bd[b].def[i])) {
-  if (!bd[b].livein[i]) {
- bd[b].livein[i] = true;
- cont = true;
-  }
+for (int i = 0; i < bitset_words; i++) {
+BITSET_WORD new_livein = (bd[b].use[i] |
+  (bd[b].liveout[i] & ~bd[b].def[i]));
+if (new_livein & ~bd[b].livein[i]) {
+   bd[b].livein[i] |= new_livein;
+   cont = true;
}
 }
 
@@ -140,9 +140,11 @@ vec4_live_variables::compute_live_variables()
bblock_link *link = (bblock_link *)block_node;
bblock_t *block = link->block;
 
-   for (int i = 0; i < num_vars; i++) {
-  if (bd[block->block_num].livein[i] && !bd[b].liveout[i]) {
- bd[b].liveout[i] = true;
+   for (int i = 0; i < bitset_words; i++) {
+   BITSET_WORD new_liveout = (bd[block->block_num].livein[i] &
+  ~bd[b].liveout[i]);
+   if (new_liveout) {
+  bd[b].liveout[i] |= new_liveout;
  cont = true;
   }
}
@@ -159,11 +161,12 @@ vec4_live_variables::vec4_live_variables(vec4_visitor *v, 
cfg_t *cfg)
num_vars = v->virtual_grf_count * 4;
bd = rzalloc_array(mem_ctx, struct block_data, cfg->num_blocks);
 
+   bitset_words = BITSET_WORDS(num_vars);
for (int i = 0; i < cfg->num_blocks; i++) {
-  bd[i].def = rzalloc_array(mem_ctx, bool, num_vars);
-  bd[i].use = rzalloc_array(mem_ctx, bool, num_vars);
-  bd[i].livein = rzalloc_array(mem_ctx, bool, num_vars);
-  bd[i].liveout = rzalloc_array(mem_ctx, bool, num_vars);
+  bd[i].def = rzalloc_array(mem_ctx, BITSET_WORD, bitset_words);
+  bd[i].use = rzalloc_array(mem_ctx, BITSET_WORD, bitset_words);
+  bd[i].livein = rzalloc_array(mem_ctx, BITSET_WORD, bitset_words);
+  bd[i].liveout = rzalloc_array(mem_ctx, BITSET_WORD, bitset_words);
}
 
setup_def_use();
@@ -248,12 +251,12 @@ vec4_visitor::calculate_live_intervals()
 
for (int b = 0; b < cfg.num_blocks; b++) {
   for (int i = 0; i < livevars.num_vars; i++) {
-if (livevars.bd[b].livein[i]) {
+if (BITSET_TEST(livevars.bd[b].livein, i)) {
start[i / 4] = MIN2(start[i / 4], cfg.blocks[b]->start_ip);
end[i / 4] = MAX2(end[i / 4], cfg.blocks[b]->start_ip);
 }
 
-if (livevars.bd[b].liveout[i]) {
+if (BITSET_TEST(livevars.bd[b].liveout, i)) {
start[i / 4] = MIN2(start[i / 4], cfg.blocks[b]->end_ip);
end[i / 4] = MAX2(end[i / 4], cfg.blocks[b]->end_ip);
 }
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_live_variables.h 
b/src/mesa/drivers/dri/i965/brw_vec4_live_variables.h
index 296468

Re: [Mesa-dev] [PATCH] i965: Implement ARB_texture_mirror_clamp.

2013-10-21 Thread Kenneth Graunke

On 10/21/2013 12:58 AM, Rico Schüller wrote:
> I have one minor nitpick (see below). But either way, with the subject
> fixed (as mentioned by Matt), this is:
> Reviewed-by: Rico Schüller 
> 
> On 21.10.2013 07:24, Kenneth Graunke wrote:
>> This passes Piglit's texwrap tests (after applying Rico's patch to
>> make them use this extension).
>>
>> Cc: Rico Schüller 
>> Cc: Ian Romanick 
>> Signed-off-by: Kenneth Graunke 
>> ---
>>  src/mesa/drivers/dri/i965/brw_wm_sampler_state.c | 2 ++
>>  src/mesa/drivers/dri/i965/intel_extensions.c | 1 +
>>  2 files changed, 3 insertions(+)
>>
>> diff --git a/src/mesa/drivers/dri/i965/brw_wm_sampler_state.c 
>> b/src/mesa/drivers/dri/i965/brw_wm_sampler_state.c
>> index b716d61..db7ab60 100644
>> --- a/src/mesa/drivers/dri/i965/brw_wm_sampler_state.c
>> +++ b/src/mesa/drivers/dri/i965/brw_wm_sampler_state.c
>> @@ -71,6 +71,8 @@ translate_wrap_mode(GLenum wrap, bool using_nearest)
>>return BRW_TEXCOORDMODE_CLAMP_BORDER;
>> case GL_MIRRORED_REPEAT: 
>>return BRW_TEXCOORDMODE_MIRROR;
>> +   case GL_MIRROR_CLAMP_TO_EDGE_EXT:
>> +  return BRW_TEXCOORDMODE_MIRROR_ONCE;
> I'd prefer GL_MIRROR_CLAMP_TO_EDGE instead of
> GL_MIRROR_CLAMP_TO_EDGE_EXT but as it is the same value it really
> shouldn't matter.

Me too.  My system GL headers didn't have "GL_MIRROR_CLAMP_TO_EDGE", so
I didn't realize it existed.  But it does!  Thanks!

--Ken
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] i965/fs: In the pre-regalloc schedule, try harder at reducing reg pressure.

2013-10-21 Thread Eric Anholt

Chia-I Wu  writes:

> On Thu, Oct 17, 2013 at 3:24 AM, Matt Turner  wrote:
>> On Mon, Oct 14, 2013 at 4:14 PM, Eric Anholt  wrote:
>>> Previously, the best thing we had was to schedule the things unblocked by
>>> the current instruction, on the hope that it would be consuming two values
>>> at the end of their live intervals while only producing one new value.
>>> Sometimes that wasn't the case.
>>>
>>> Now, when an instruction is the first user of a GRF we schedule (i.e. it
>>> will probably be the virtual_grf_def[] instruction after computing live
>>> intervals again), penalize it by how many regs it would take up.  When an
>>> instruction is the last user of a GRF we have to schedule (when it will
>>> probably be the virtual_grf_end[] instruction), give it a boost by how
>>> many regs it would free.
>>>
>>> The new functions are made virtual (only 1 of 2 really needs to be
>>> virtual) because I expect we'll soon lift the pre-regalloc scheduling
>>> heuristic over to the vec4 backend.
>>>
>>> shader-db:
>>> total instructions in shared programs: 1512756 -> 1511604 (-0.08%)
>>> instructions in affected programs: 10292 -> 9140 (-11.19%)
>>> GAINED:121
>>> LOST:  38
>>>
>>> Improves tropics performance at my current settings by 4.50602% +/-
>>> 2.60694% (n=5).  No difference on Lightsmark (n=5).  No difference on
>>> GLB2.7 (n=11).
>>>
>>> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=70445
>>> ---
>>
>> I think we're on the right track by considering register pressure when
>> scheduling, but one aspect we're not considering is simply how many
>> registers we think we're using.
>>
>> If I understand correctly, the pre-register allocation wants to
>> shorten live intervals as much as possible which reduces register
>> pressure but at the cost of larger stalls and less instruction level
>> parallelism. We end up scheduling things like
>>
>> produce result 4
>> produce result 3
>> produce result 2
>> produce result 1
>> use result 1
>> use result 2
>> use result 3
>> use result 4
>>
>> (this is why the MRF writes for the FB write are always done in the
>> reverse order)
> In this example, it will actually be
>
>  produce result 4
>  use result 4
>  produce result 3
>  use result 3
>  produce result 2
>  use result 2
>  produce result 1
>  use result 1
>
> and post-regalloc will schedule again to something like
>
>  produce result 4
>  produce result 3
>  produce result 2
>  produce result 1
>  use result 4
>  use result 3
>  use result 2
>  use result 1
>
> The pre-regalloc scheduling attempts to consume the results as soon as
> they are available.
>
> FB write is done in reverse order because, when a result is available,
> its consumers are scheduled in reverse order.  The epilog of fragment
> shaders is usually like this:
>
>  placeholder_halt
>  mov m1, g1
>  mov m2, g2
>  mov m3, g3
>  mov m4, g4
>  send
>
> MOVs depend on placeholder_halt, and send depends on MOVs.  The
> scheduler will schedule it as follows:
>
>  placeholder_halt
>  mov m4, g4
>  mov m3, g3
>  mov m2, g2
>  mov m1, g1
>  send
>
> The order can be corrected with the change proposed here
>
>   http://lists.freedesktop.org/archives/mesa-dev/2013-October/046570.html
>
> But there is no point for making the change the current heuristic for
> pre-regalloc is to be reworked.

Flipping the order in which we prefer ties (on betterthanlifo-2):

commit 11a511576e465f02875f39c452561775a97416a1
Author: Eric Anholt 
Date:   Mon Oct 21 11:45:53 2013 -0700

otherway

diff --git a/src/mesa/drivers/dri/i965/brw_schedule_instructions.cpp b/src/mesa/
index 9a480b4..b123015 100644
--- a/src/mesa/drivers/dri/i965/brw_schedule_instructions.cpp
+++ b/src/mesa/drivers/dri/i965/brw_schedule_instructions.cpp
@@ -1049,9 +1049,9 @@ fs_instruction_scheduler::choose_instruction_to_schedule()
* it's the first use of a GRF, reduce its score since it means it
* should be increasing register pressure.
*/
-  for (schedule_node *node = (schedule_node *)instructions.get_tail();
-   node != instructions.get_head()->prev;
-   node = (schedule_node *)node->prev) {
+  for (schedule_node *node = (schedule_node *)instructions.get_head();
+   node != instructions.get_head()->next;
+   node = (schedule_node *)node->next) {
  schedule_node *n = (schedule_node *)node;
  fs_inst *inst = (fs_inst *)n->inst;

gives:

total instructions in shared programs: 1544638 -> 1546794 (0.14%)
instructions in affected programs: 7163 -> 9319 (30.10%)
GAINED:16
LOST:  289

with massive spilling on tropics, and a bit on lightsmark and csgo.


pgpD2js8_JPZE.pgp
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] libGL without X

2013-10-21 Thread Chris Healy

Ken,

I assume the new ABI for libOpenGL.so is not far enough along to be usable
in production, correct?

Our application is quite big and already written against OpenGL so moving
to GLESv2 or 3.0 would be a considerable effort so this is not an option.

Do you know the minimal amount of X libs necessary to support building
libGL?


On Mon, Oct 21, 2013 at 11:56 AM, Kenneth Graunke wrote:

> On 10/21/2013 07:05 AM, Chris Healy wrote:
> > I have a headless platform I need OpenGL to work on that does not have
> > X.  It is x86 with Intel HD 4000 graphics.
> >
> > Ultimately, I'm just wanting to use OpenGL to render to memory for
> > encoding to H.264 and streaming.
> >
> > I'm trying to build Mesa for this platform without X and cannot get it
> > to build libGL.so.
> >
> > What am I missing here?  Is it not possible to use OpenGL without X?  I
> > was hoping I could use OpenGL with EGL for testing purposes.
>
> Unfortunately, libGL.so contains both the OpenGL and GLX interfaces, so
> I don't think it's possible today.  People are working on a new ABI,
> libOpenGL.so, which doesn't include GLX.  So eventually, it should be
> possible.
>
> You can definitely use EGL + OpenGL ES 3.0 (libGLESv2.so) today.
>
> --Ken
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] Ivybridge support for ARB_transform_feedback2

2013-10-21 Thread Kenneth Graunke

On 10/21/2013 08:40 AM, Ian Romanick wrote:
> On 10/17/2013 11:09 PM, Kenneth Graunke wrote:
>> Here's my implementation of ARB_transform_feedback2.  I believe it's
>> complete; it passes all of our Piglit tests and a lot of Intel's
>> oglconform tests.
>>
>> This should work out of the box on Ivybridge and Baytrail.  It won't
>> work on Haswell at the moment, due to restrictions on register writes
>> (to be solved in a future kernel version).  Patch 9 will need to be
>> replaced with something that detects whether or not we can write
>> registers from userspace batchbuffers.
>>
>> In the meantime, I figured I'd send out the rest for review.
>>
>> Porting this back to Sandybridge is probably doable, but annoying.
>> Sandybridge doesn't have the MI_LOAD_REGISTER_MEM command, so we'd have
>> to map the buffers and use MI_LOAD_REGISTER_IMM.  Seems pretty gross.
>> Plus, transform feedback is done very differently pre-Ivybridge.  I'm
>> not sure it's worth it, seeing as it's a GL 4.0 feature.
> 
> I assume this is just to support glDrawTransformFeedback?

No, it's to support glResumeTransformFeedback.

glDrawTransformFeedback actually just reads pipeline statistics counters
and leaves them free-running.

> Can you add that information to http://dri.freedesktop.org/wiki/I965Todo/ ?

Actually, I'm probably wrong...on Gen7 we use MI_LOAD_REGISTER_MEM to
copy offsets into the SO_WRITE_OFFSET(n) registers.  But on Sandybridge,
XFB is done using the geometry shader, so it works entirely differently.
 I don't think there is a register to load.

I'll just let whoever looks into it figure it out.  Not much insight anyway.

--Ken
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] libGL without X

2013-10-21 Thread Kenneth Graunke

On 10/21/2013 07:05 AM, Chris Healy wrote:
> I have a headless platform I need OpenGL to work on that does not have
> X.  It is x86 with Intel HD 4000 graphics.
> 
> Ultimately, I'm just wanting to use OpenGL to render to memory for
> encoding to H.264 and streaming.
> 
> I'm trying to build Mesa for this platform without X and cannot get it
> to build libGL.so.
> 
> What am I missing here?  Is it not possible to use OpenGL without X?  I
> was hoping I could use OpenGL with EGL for testing purposes.

Unfortunately, libGL.so contains both the OpenGL and GLX interfaces, so
I don't think it's possible today.  People are working on a new ABI,
libOpenGL.so, which doesn't include GLX.  So eventually, it should be
possible.

You can definitely use EGL + OpenGL ES 3.0 (libGLESv2.so) today.

--Ken
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] R600: Make sure OQAP defs and uses happen in the same clause

2013-10-21 Thread Vincent Lejeune





- Mail original -
> De : Tom Stellard 
> À : llvm-comm...@cs.uiuc.edu
> Cc : mesa-dev@lists.freedesktop.org; Tom Stellard 
> Envoyé le : Vendredi 11 octobre 2013 20h10
> Objet : [PATCH] R600: Make sure OQAP defs and uses happen in the same clause
> 
> From: Tom Stellard 
> 
> Reading the special OQAP register pops the top value off the LDS
> input queue and returns it to the instruction.  This queue is
> invalidated at the end of an ALU clause and leaving values in the queue
> can lead to GPU hangs.  This means that if we load a value into the queue,
> we must use it before the end of the clause.
> 
> This fixes some hangs in the OpenCV test suite.
> ---
> lib/Target/R600/R600MachineScheduler.cpp | 25 +
> lib/Target/R600/R600MachineScheduler.h   |  4 ++--
> test/CodeGen/R600/lds-input-queue.ll     | 26 ++
> 3 files changed, 41 insertions(+), 14 deletions(-)
> create mode 100644 test/CodeGen/R600/lds-input-queue.ll
> 
> diff --git a/lib/Target/R600/R600MachineScheduler.cpp 
> b/lib/Target/R600/R600MachineScheduler.cpp
> index 6c26d9e..611b7f4 100644
> --- a/lib/Target/R600/R600MachineScheduler.cpp
> +++ b/lib/Target/R600/R600MachineScheduler.cpp
> @@ -93,11 +93,12 @@ SUnit* R600SchedStrategy::pickNode(bool &IsTopNode) 
> {
>    }
> 
> 
> -  // We want to scheduled AR defs as soon as possible to make sure they 
> aren't
> -  // put in a different ALU clause from their uses.
> -  if (!SU && !UnscheduledARDefs.empty()) {
> -      SU = UnscheduledARDefs[0];
> -      UnscheduledARDefs.erase(UnscheduledARDefs.begin());
> +  // We want to scheduled defs that cannot be live outside of this clause 
> +  // as soon as possible to make sure they aren't put in a different
> +  // ALU clause from their uses.
> +  if (!SU && !UnscheduledNoLiveOutDefs.empty()) {
> +      SU = UnscheduledNoLiveOutDefs[0];
> +      UnscheduledNoLiveOutDefs.erase(UnscheduledNoLiveOutDefs.begin());
>        NextInstKind = IDAlu;
>    }
> 
> @@ -132,9 +133,9 @@ SUnit* R600SchedStrategy::pickNode(bool &IsTopNode) 
> {
> 
>    // We want to schedule the AR uses as late as possible to make sure that
>    // the AR defs have been released.
> -  if (!SU && !UnscheduledARUses.empty()) {
> -      SU = UnscheduledARUses[0];
> -      UnscheduledARUses.erase(UnscheduledARUses.begin());
> +  if (!SU && !UnscheduledNoLiveOutUses.empty()) {
> +      SU = UnscheduledNoLiveOutUses[0];
> +      UnscheduledNoLiveOutUses.erase(UnscheduledNoLiveOutUses.begin());

Can we use std::queue instead of a std::vector for 
UnscheduledNoLiveOutUses ?
I had to use a vector because I needed to be able to pop non topmost SUnit in 
some case
(to fit Instruction Group const read limitation) but I would rather avoid 
erase(iterator) call
when possible.


>        NextInstKind = IDAlu;
>    }
> 
> @@ -217,15 +218,15 @@ void R600SchedStrategy::releaseBottomNode(SUnit *SU) 
> {
> 
>    int IK = getInstKind(SU);
> 
> -  // Check for AR register defines
> +  // Check for registers that do not live across ALU clauses.
>    for (MachineInstr::const_mop_iterator I = 
> SU->getInstr()->operands_begin(),
>                                          E = 
> SU->getInstr()->operands_end();
>                                          I != E; ++I) {
> -    if (I->isReg() && I->getReg() == AMDGPU::AR_X) {
> +    if (I->isReg() && (I->getReg() == AMDGPU::AR_X || 
> I->getReg() == AMDGPU::OQAP)) {
>        if (I->isDef()) {
> -        UnscheduledARDefs.push_back(SU);
> +        UnscheduledNoLiveOutDefs.push_back(SU);
>        } else {
> -        UnscheduledARUses.push_back(SU);
> +        UnscheduledNoLiveOutUses.push_back(SU);
>        }
>        return;
>      }
> diff --git a/lib/Target/R600/R600MachineScheduler.h 
> b/lib/Target/R600/R600MachineScheduler.h
> index 0a6f120..db2e188 100644
> --- a/lib/Target/R600/R600MachineScheduler.h
> +++ b/lib/Target/R600/R600MachineScheduler.h
> @@ -53,8 +53,8 @@ class R600SchedStrategy : public MachineSchedStrategy {
> 
>    std::vector Available[IDLast], Pending[IDLast];
>    std::vector AvailableAlus[AluLast];
> -  std::vector UnscheduledARDefs;
> -  std::vector UnscheduledARUses;
> +  std::vector UnscheduledNoLiveOutDefs;
> +  std::vector UnscheduledNoLiveOutUses;
>    std::vector PhysicalRegCopy;
> 
>    InstKind CurInstKind;
> diff --git a/test/CodeGen/R600/lds-input-queue.ll 
> b/test/CodeGen/R600/lds-input-queue.ll
> new file mode 100644
> index 000..548b41c
> --- /dev/null
> +++ b/test/CodeGen/R600/lds-input-queue.ll
> @@ -0,0 +1,26 @@
> +; RUN: llc < %s -march=r600 -mcpu=redwood | FileCheck %s

Does the test work with -verify-machineinstrs flag set ?

> +;
> +; This test checks that the lds input queue will is empty at the end of
> +; the ALU clause.
> +
> +; CHECK-LABEL: @lds_input_queue
> +; CHECK: LDS_READ_RET * OQAP
> +; CHECK-NOT: ALU clause
> +; CHECK: MOV T{{[0-9]\.[XYZW]}}, OQAP
> +
> +@local_mem = internal addrspace(3) unnamed_addr global [2 x i32] [i32 1, i32 
> 2], align 4
>

Re: [Mesa-dev] libGL without X

2013-10-21 Thread Chris Healy

Actually, I think I found the answer to the minimal amount of X libs
necessary.

I got rid of --disable-glx from my build config and ran into the following
when running configure:

checking for XF86VIDMODE... no
checking for DRIGL... no
configure: error: Package requirements (x11 xext xdamage xfixes x11-xcb
xcb-glx >= 1.8.1 xcb-dri2 >= 1.8) were not met:

No package 'x11' found
No package 'xext' found
No package 'xdamage' found
No package 'xfixes' found
No package 'x11-xcb' found
No package 'xcb-glx' found
No package 'xcb-dri2' found

Is there a way around needing all of these just to build libGL if I just
want to run OpenGL with EGL and write to memory?


On Mon, Oct 21, 2013 at 12:03 PM, Chris Healy  wrote:

> Ken,
>
> I assume the new ABI for libOpenGL.so is not far enough along to be usable
> in production, correct?
>
> Our application is quite big and already written against OpenGL so moving
> to GLESv2 or 3.0 would be a considerable effort so this is not an option.
>
> Do you know the minimal amount of X libs necessary to support building
> libGL?
>
>
> On Mon, Oct 21, 2013 at 11:56 AM, Kenneth Graunke 
> wrote:
>
>> On 10/21/2013 07:05 AM, Chris Healy wrote:
>> > I have a headless platform I need OpenGL to work on that does not have
>> > X.  It is x86 with Intel HD 4000 graphics.
>> >
>> > Ultimately, I'm just wanting to use OpenGL to render to memory for
>> > encoding to H.264 and streaming.
>> >
>> > I'm trying to build Mesa for this platform without X and cannot get it
>> > to build libGL.so.
>> >
>> > What am I missing here?  Is it not possible to use OpenGL without X?  I
>> > was hoping I could use OpenGL with EGL for testing purposes.
>>
>> Unfortunately, libGL.so contains both the OpenGL and GLX interfaces, so
>> I don't think it's possible today.  People are working on a new ABI,
>> libOpenGL.so, which doesn't include GLX.  So eventually, it should be
>> possible.
>>
>> You can definitely use EGL + OpenGL ES 3.0 (libGLESv2.so) today.
>>
>> --Ken
>>
>
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 1/2] r600/llvm: Fix texbuf for pre EG gen

2013-10-21 Thread Vincent Lejeune

---
 src/gallium/drivers/r600/r600_llvm.c | 29 +
 1 file changed, 29 insertions(+)

diff --git a/src/gallium/drivers/r600/r600_llvm.c 
b/src/gallium/drivers/r600/r600_llvm.c
index 34dd3ad..d7fa5f8 100644
--- a/src/gallium/drivers/r600/r600_llvm.c
+++ b/src/gallium/drivers/r600/r600_llvm.c
@@ -427,6 +427,35 @@ static void llvm_emit_tex(
emit_data->output[0] = build_intrinsic(gallivm->builder,
"llvm.R600.load.texbuf",
emit_data->dst_type, 
args, 2, LLVMReadNoneAttribute);
+   if (ctx->chip_class >= EVERGREEN)
+   return;
+   ctx->uses_tex_buffers = true;
+   LLVMDumpValue(emit_data->output[0]);
+   emit_data->output[0] = 
LLVMBuildBitCast(gallivm->builder,
+   emit_data->output[0], 
LLVMVectorType(bld_base->base.int_elem_type, 4),
+   "");
+   LLVMValueRef Mask = llvm_load_const_buffer(bld_base,
+   lp_build_const_int32(gallivm, 0),
+   LLVM_R600_BUFFER_INFO_CONST_BUFFER);
+   Mask = LLVMBuildBitCast(gallivm->builder, Mask,
+   LLVMVectorType(bld_base->base.int_elem_type, 
4), "");
+   emit_data->output[0] = 
lp_build_emit_llvm_binary(bld_base, TGSI_OPCODE_AND,
+   emit_data->output[0],
+   Mask);
+   LLVMValueRef WComponent = 
LLVMBuildExtractElement(gallivm->builder,
+   emit_data->output[0], 
lp_build_const_int32(gallivm, 3), "");
+   Mask = llvm_load_const_buffer(bld_base, 
lp_build_const_int32(gallivm, 1),
+   LLVM_R600_BUFFER_INFO_CONST_BUFFER);
+   Mask = LLVMBuildExtractElement(gallivm->builder, Mask,
+   lp_build_const_int32(gallivm, 0), "");
+   Mask = LLVMBuildBitCast(gallivm->builder, Mask,
+   bld_base->base.int_elem_type, "");
+   WComponent = lp_build_emit_llvm_binary(bld_base, 
TGSI_OPCODE_OR,
+   WComponent, Mask);
+   emit_data->output[0] = 
LLVMBuildInsertElement(gallivm->builder,
+   emit_data->output[0], WComponent, 
lp_build_const_int32(gallivm, 3), "");
+   emit_data->output[0] = 
LLVMBuildBitCast(gallivm->builder,
+   emit_data->output[0], 
LLVMVectorType(bld_base->base.elem_type, 4), "");
}
return;
default:
-- 
1.8.3.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 2/2] r600/llvm: Fix isampleBuffer on preEG

2013-10-21 Thread Vincent Lejeune

---
 src/gallium/drivers/r600/r600_llvm.c | 15 ++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/src/gallium/drivers/r600/r600_llvm.c 
b/src/gallium/drivers/r600/r600_llvm.c
index d7fa5f8..5afe3cb 100644
--- a/src/gallium/drivers/r600/r600_llvm.c
+++ b/src/gallium/drivers/r600/r600_llvm.c
@@ -415,9 +415,22 @@ static void llvm_emit_tex(
case TGSI_OPCODE_TXQ: {
struct radeon_llvm_context * ctx = 
radeon_llvm_context(bld_base);
ctx->uses_tex_buffers = true;
-   LLVMValueRef offset = 
lp_build_const_int32(bld_base->base.gallivm, 0);
+   bool isEgPlus = (ctx->chip_class >= EVERGREEN);
+   LLVMValueRef offset = 
lp_build_const_int32(bld_base->base.gallivm,
+   isEgPlus ? 0 : 1);
LLVMValueRef cvecval = llvm_load_const_buffer(bld_base, 
offset,
LLVM_R600_BUFFER_INFO_CONST_BUFFER);
+   if (!isEgPlus) {
+   LLVMValueRef maskval[4] = {
+   lp_build_const_int32(gallivm, 1),
+   lp_build_const_int32(gallivm, 2),
+   lp_build_const_int32(gallivm, 3),
+   lp_build_const_int32(gallivm, 0),
+   };
+   LLVMValueRef mask = LLVMConstVector(maskval, 4);
+   cvecval = 
LLVMBuildShuffleVector(gallivm->builder, cvecval, cvecval,
+   mask, "");
+   }
emit_data->output[0] = cvecval;
return;
}
-- 
1.8.3.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] libGL without X

2013-10-21 Thread Erik Faye-Lund

On Mon, Oct 21, 2013 at 4:05 PM, Chris Healy  wrote:
> I have a headless platform I need OpenGL to work on that does not have X.
> It is x86 with Intel HD 4000 graphics.
>
> Ultimately, I'm just wanting to use OpenGL to render to memory for encoding
> to H.264 and streaming.
>
> I'm trying to build Mesa for this platform without X and cannot get it to
> build libGL.so.
>
> What am I missing here?  Is it not possible to use OpenGL without X?  I was
> hoping I could use OpenGL with EGL for testing purposes.

If you build mesa with GBM support, you should be able to render without X.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] i965/fs: In the pre-regalloc schedule, try harder at reducing reg pressure.

2013-10-21 Thread Eric Anholt

Matt Turner  writes:

> On Mon, Oct 14, 2013 at 4:14 PM, Eric Anholt  wrote:
>> Previously, the best thing we had was to schedule the things unblocked by
>> the current instruction, on the hope that it would be consuming two values
>> at the end of their live intervals while only producing one new value.
>> Sometimes that wasn't the case.
>>
>> Now, when an instruction is the first user of a GRF we schedule (i.e. it
>> will probably be the virtual_grf_def[] instruction after computing live
>> intervals again), penalize it by how many regs it would take up.  When an
>> instruction is the last user of a GRF we have to schedule (when it will
>> probably be the virtual_grf_end[] instruction), give it a boost by how
>> many regs it would free.
>>
>> The new functions are made virtual (only 1 of 2 really needs to be
>> virtual) because I expect we'll soon lift the pre-regalloc scheduling
>> heuristic over to the vec4 backend.
>>
>> shader-db:
>> total instructions in shared programs: 1512756 -> 1511604 (-0.08%)
>> instructions in affected programs: 10292 -> 9140 (-11.19%)
>> GAINED:121
>> LOST:  38
>>
>> Improves tropics performance at my current settings by 4.50602% +/-
>> 2.60694% (n=5).  No difference on Lightsmark (n=5).  No difference on
>> GLB2.7 (n=11).
>>
>> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=70445
>> ---
>
> I think we're on the right track by considering register pressure when
> scheduling, but one aspect we're not considering is simply how many
> registers we think we're using.
>
> If I understand correctly, the pre-register allocation wants to
> shorten live intervals as much as possible which reduces register
> pressure but at the cost of larger stalls and less instruction level
> parallelism. We end up scheduling things like
>
> produce result 4
> produce result 3
> produce result 2
> produce result 1
> use result 1
> use result 2
> use result 3
> use result 4
>
> (this is why the MRF writes for the FB write are always done in the
> reverse order)
>
> Take the main shader from FillTestC24Z16 in GLB2.5 or 2.7 as an
> example. Before texture-grf we serialized the eight texture sends.
> After that branch landed, we scheduled the code much better, leading
> to a performance improvement.
>
> This patch causes us again to serialize the 8 texture ops in
> GLB25_FillTestC24Z16, like we did before texture-from-grf. It reduces
> performance from 7.0 billion texels/sec to ~6.5 on IVB.

This is mostly a problem, as far as I can see, of unfortunate GRF
choices between the send sources and dests.  I haven't seen an easy way
out of that beyond what we're doing with the round_robin flag in the
register allocator already, so let's play with scheduling some more for
the moment...

> Can we accurately track the number of registers in use and decide what
> to do based on that?

An attempt to do this is on betterthanlifo-3 of my tree.  The quick
results:

total instructions in shared programs: 1599565 -> 1599757 (0.01%)
instructions in affected programs: 2014 -> 2206 (9.53%)
GAINED:22
LOST:  110

That's not at all what I hoped for.  But maybe the problem is that we
end up faced with a ton of multiplies of components of texture results
and we don't know which one we should pick next once we've picked one of
them?  Maybe if we give a higher weight to things that will help finish
off a VGRF's use?  I present betterthanlifo-6:

anholt@eliezer:anholt/src/shader-db% ./report.py sched-lifo3 sched-lifo6 
total instructions in shared programs: 1606060 -> 1606060 (0.00%)
instructions in affected programs: 0 -> 0
GAINED:0
LOST:  0

Well that wasn't the result I was expecting.  But it kinda makes sense:
Once we've scheduled processing of .x, the next thing we'll probably
choose even in the absence of weighting is .y, not some *other* texture
which had been inserted into the list at a totally separate time.

Looking at performance going from betterthanlifo-2 to betterthanlifo-3:

GLB2.7: 1.39845% +/- 0.797931% (n=15/16)
lm: No difference (n=3)
minecraft: No difference (n=10)
tropics: -4.12118% +/- 2.48834% (n=4)
nexuiz: No difference (n=8)
openarena: -1.46747% +/- 1.08201% (n=110)

At this point I think I want to go forward with -2 (this patch) as
opposed to -3.

(Note: Results presented in this thread, after the original patch
posting, are on top of glsl-cse, trying to reduce the significance of
that one crazy Tropics shader that spawned all this flailing about in
register allocation).


pgpN5P2mxR8Fk.pgp
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] glBufferSubData() GPU stall reduction (DOTA2 optimization).

2013-10-21 Thread Jordan Justen

The only feedback would be that it would be nice if patch 8
were broken down somewhat. But, the only suggestions I could think
of for possible items to split out were:
* intel_bufferobj_buffer: remove flag parameter
* set gpu_active_start/end in a lead-up patch

Another question for patch 8. Would gpu_active_start/end be
able to also handle the job of gpu_active? (Set them out of
range when !gpu_active?) Doesn't seem all that important
though.

Series (with or without changes mentioned above):
Reviewed-by: Jordan Justen 

On Tue, 2013-10-08 at 14:00 -0700, Eric Anholt wrote:
> Since it sounds like valve won't be able to fix dota2's rendering to use
> ARB_mbr soon, here's a series to add just a little bit of tracking that
> works around most of the overhead of not using ARB_mbr with their
> rendering pattern.  7.69854% +/- 0.909163% (n=3) fps improvement with
> default settings.  We could also leverage this for some apps that misuse
> ARB_mbr in the future.
> 
> This doesn't look like it affects GLB2.7, which has a very special-looking
> access pattern to its BO.
> 
> Code is also on the "subdata" branch of my tree.
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] i965: Only emit interpolation setup if there are actual FS inputs.

2013-10-21 Thread Eric Anholt

Kenneth Graunke  writes:

> Dead code elimination would get rid of the extra instructions, but
> skipping this saves iterations through the optimization loop.
>
> From shader-db:
>
>   N Min MaxMedian   AvgStddev
> x 14672   3  16 3 3.13345150.59904168
> + 14672   1  16 3 2.89551530.77732963
> Difference at 95.0% confidence
> -0.237936 +/- 0.0158798
> -7.59342% +/- 0.506783%
> (Student's t, pooled s = 0.693935)
>
> Embarassingly, the classic shadow mapping shader:
>
>void main() { }
>
> used to require three iterations through the optimization loop.
> With this patch, it only requires one (which makes no progress).

Reviewed-by: Eric Anholt 


pgp2dBKoPyomx.pgp
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 1/2] i965: Fix glTexImage when packing alignment != cpp

2013-10-21 Thread Chad Versace

Fixes texture corruption of Weston clients on cairo-glesv2 backend.
Commit 49ed599 introduced the bug.

Corruption occured when glTexSubImage called
intel_texsubimage_tiled_memcpy() with:
  x,y=10,9
  w,h=7,7
  format=GL_ALPHA(0x1906)
  type=GL_UNSIGNED_BYTE(0x1401)
  gl_format=MESA_FORMAT_A8(0x18)
  packing.alignemnt=4

The function miscalculated the source image's stride as w*cpp=7 without
taking into account the packing alignment. The actual stride was 8.

CC: Frank Henigman 
CC: Kristian Høgsberg 
Reported-by: U. Artie Eoff 
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=70435
Signed-off-by: Chad Versace 
---


This series lives on my branch bug-70435. Kristian verified that it
fixed weston-terminal.


 src/mesa/drivers/dri/i965/intel_tex_subimage.c | 13 +++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/intel_tex_subimage.c 
b/src/mesa/drivers/dri/i965/intel_tex_subimage.c
index 5cfdbd9..157108f 100644
--- a/src/mesa/drivers/dri/i965/intel_tex_subimage.c
+++ b/src/mesa/drivers/dri/i965/intel_tex_subimage.c
@@ -27,6 +27,7 @@
  **/
 
 #include "main/bufferobj.h"
+#include "main/image.h"
 #include "main/macros.h"
 #include "main/mtypes.h"
 #include "main/pbo.h"
@@ -532,6 +533,7 @@ intel_texsubimage_tiled_memcpy(struct gl_context * ctx,
 {
struct brw_context *brw = brw_context(ctx);
struct intel_texture_image *image = intel_texture_image(texImage);
+   int src_pitch;
 
/* The miptree's buffer. */
drm_intel_bo *bo;
@@ -544,6 +546,11 @@ intel_texsubimage_tiled_memcpy(struct gl_context * ctx,
/* This fastpath is restricted to specific texture types: level 0 of
 * a 2D BGRA, RGBA, L8 or A8 texture. It could be generalized to support
 * more types.
+*
+* FINISHME: The restrictions below on packing alignment and packing row
+* length are likely unneeded now because we calculate the source stride
+* with _mesa_image_row_stride. However, before removing the restrictions
+* we need tests.
 */
if (!brw->has_llc ||
type != GL_UNSIGNED_BYTE ||
@@ -609,6 +616,8 @@ intel_texsubimage_tiled_memcpy(struct gl_context * ctx,
   return false;
}
 
+   src_pitch = _mesa_image_row_stride(packing, width, format, type);
+
/* We postponed printing this message until having committed to executing
 * the function.
 */
@@ -618,8 +627,8 @@ intel_texsubimage_tiled_memcpy(struct gl_context * ctx,
linear_to_tiled(
   xoffset * cpp, (xoffset + width) * cpp,
   yoffset, yoffset + height,
-  bo->virtual, pixels - (xoffset + yoffset * width) * cpp,
-  image->mt->region->pitch, width * cpp,
+  bo->virtual, pixels - yoffset * src_pitch - xoffset * cpp,
+  image->mt->region->pitch, src_pitch,
   brw->has_swizzling,
   image->mt->region->tiling,
   mem_copy
-- 
1.8.3.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 2/2] i965: Print more debuginfo in intel_texsubimage_memcpy()

2013-10-21 Thread Chad Versace

Print info about packing, format, type, and tiling. This will help debug
future issues with this fastpath.

CC: Frank Henigman 
Signed-off-by: Chad Versace 
---
 src/mesa/drivers/dri/i965/intel_tex_subimage.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/intel_tex_subimage.c 
b/src/mesa/drivers/dri/i965/intel_tex_subimage.c
index 157108f..0384bcc 100644
--- a/src/mesa/drivers/dri/i965/intel_tex_subimage.c
+++ b/src/mesa/drivers/dri/i965/intel_tex_subimage.c
@@ -621,8 +621,14 @@ intel_texsubimage_tiled_memcpy(struct gl_context * ctx,
/* We postponed printing this message until having committed to executing
 * the function.
 */
-   DBG("%s: level=%d offset=(%d,%d) (w,h)=(%d,%d)\n",
-   __FUNCTION__, texImage->Level, xoffset, yoffset, width, height);
+   DBG("%s: level=%d offset=(%d,%d) (w,h)=(%d,%d) format=0x%x type=0x%x "
+   "gl_format=0x%x tiling=%d "
+   "packing=(alignment=%d row_length=%d skip_pixels=%d skip_rows=%d) "
+   "for_glTexImage=%d\n",
+   __FUNCTION__, texImage->Level, xoffset, yoffset, width, height,
+   format, type, texImage->TexFormat, image->mt->region->tiling,
+   packing->Alignment, packing->RowLength, packing->SkipPixels,
+   packing->SkipRows, for_glTexImage);
 
linear_to_tiled(
   xoffset * cpp, (xoffset + width) * cpp,
-- 
1.8.3.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 1/3] glsl: Use saved values instead of recomputing them.

2013-10-21 Thread Paul Berry

On 16 October 2013 16:56, Matt Turner  wrote:

> ---
>  src/glsl/opt_algebraic.cpp | 12 
>  1 file changed, 4 insertions(+), 8 deletions(-)
>

Series is:

Reviewed-by: Paul Berry 


>
> diff --git a/src/glsl/opt_algebraic.cpp b/src/glsl/opt_algebraic.cpp
> index 3e5802e..b915f3c 100644
> --- a/src/glsl/opt_algebraic.cpp
> +++ b/src/glsl/opt_algebraic.cpp
> @@ -257,11 +257,9 @@ ir_algebraic_visitor::handle_expression(ir_expression
> *ir)
> * folding.
> */
>if (op_const[0] && !op_const[1])
> -reassociate_constant(ir, 0, op_const[0],
> - ir->operands[1]->as_expression());
> +reassociate_constant(ir, 0, op_const[0], op_expr[1]);
>if (op_const[1] && !op_const[0])
> -reassociate_constant(ir, 1, op_const[1],
> - ir->operands[0]->as_expression());
> +reassociate_constant(ir, 1, op_const[1], op_expr[0]);
>break;
>
> case ir_binop_sub:
> @@ -315,11 +313,9 @@ ir_algebraic_visitor::handle_expression(ir_expression
> *ir)
> * constant folding.
> */
>if (op_const[0] && !op_const[1])
> -reassociate_constant(ir, 0, op_const[0],
> - ir->operands[1]->as_expression());
> +reassociate_constant(ir, 0, op_const[0], op_expr[1]);
>if (op_const[1] && !op_const[0])
> -reassociate_constant(ir, 1, op_const[1],
> - ir->operands[0]->as_expression());
> +reassociate_constant(ir, 1, op_const[1], op_expr[0]);
>
>break;
>
> --
> 1.8.3.2
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] nv50: implement multisample textures

2013-10-21 Thread Bryan Cain

This is a port of 4da54c91d24da ("nvc0: implement multisample textures") to
nv50.

When coupled with the patch to only report 16 texture samplers (to fix
crashes), all of the Piglit tests in spec/arb_texture_multisample pass.
---
 .../nouveau/codegen/nv50_ir_lowering_nv50.cpp  |5 ++-
 src/gallium/drivers/nouveau/nv50/nv50_context.c|   46 
 src/gallium/drivers/nouveau/nv50/nv50_miptree.c|2 +
 src/gallium/drivers/nouveau/nv50/nv50_screen.c |3 +-
 src/gallium/drivers/nouveau/nv50/nv50_tex.c|   20 +++--
 5 files changed, 70 insertions(+), 6 deletions(-)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp
index caaf09f..d5d1f1e 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp
@@ -569,6 +569,7 @@ NV50LoweringPreSSA::handleTEX(TexInstruction *i)
const int arg = i->tex.target.getArgCount();
const int dref = arg;
const int lod = i->tex.target.isShadow() ? (arg + 1) : arg;
+   const int lyr = arg - (i->tex.target.isMS() ? 2 : 1);
 
// dref comes before bias/lod
if (i->tex.target.isShadow())
@@ -577,11 +578,11 @@ NV50LoweringPreSSA::handleTEX(TexInstruction *i)
 
// array index must be converted to u32
if (i->tex.target.isArray()) {
-  Value *layer = i->getSrc(arg - 1);
+  Value *layer = i->getSrc(lyr);
   LValue *src = new_LValue(func, FILE_GPR);
   bld.mkCvt(OP_CVT, TYPE_U32, src, TYPE_F32, layer);
   bld.mkOp2(OP_MIN, TYPE_U32, src, src, bld.loadImm(NULL, 511));
-  i->setSrc(arg - 1, src);
+  i->setSrc(lyr, src);
 
   if (i->tex.target.isCube()) {
  std::vector acube, a2d;
diff --git a/src/gallium/drivers/nouveau/nv50/nv50_context.c 
b/src/gallium/drivers/nouveau/nv50/nv50_context.c
index b6bdf79..45e3f5d 100644
--- a/src/gallium/drivers/nouveau/nv50/nv50_context.c
+++ b/src/gallium/drivers/nouveau/nv50/nv50_context.c
@@ -194,6 +194,10 @@ nv50_invalidate_resource_storage(struct nouveau_context 
*ctx,
return ref;
 }
 
+static void
+nv50_context_get_sample_position(struct pipe_context *, unsigned, unsigned,
+ float *);
+
 struct pipe_context *
 nv50_create(struct pipe_screen *pscreen, void *priv)
 {
@@ -237,6 +241,7 @@ nv50_create(struct pipe_screen *pscreen, void *priv)
 
pipe->flush = nv50_flush;
pipe->texture_barrier = nv50_texture_barrier;
+   pipe->get_sample_position = nv50_context_get_sample_position;
 
if (!screen->cur_ctx) {
   screen->cur_ctx = nv50;
@@ -315,3 +320,44 @@ nv50_bufctx_fence(struct nouveau_bufctx *bufctx, boolean 
on_flush)
  nv50_resource_validate(res, (unsigned)ref->priv_data);
}
 }
+
+static void
+nv50_context_get_sample_position(struct pipe_context *pipe,
+ unsigned sample_count, unsigned sample_index,
+ float *xy)
+{
+   static const uint8_t ms1[1][2] = { { 0x8, 0x8 } };
+   static const uint8_t ms2[2][2] = {
+  { 0x4, 0x4 }, { 0xc, 0xc } }; /* surface coords (0,0), (1,0) */
+   static const uint8_t ms4[4][2] = {
+  { 0x6, 0x2 }, { 0xe, 0x6 },   /* (0,0), (1,0) */
+  { 0x2, 0xa }, { 0xa, 0xe } }; /* (0,1), (1,1) */
+   static const uint8_t ms8[8][2] = {
+  { 0x1, 0x7 }, { 0x5, 0x3 },   /* (0,0), (1,0) */
+  { 0x3, 0xd }, { 0x7, 0xb },   /* (0,1), (1,1) */
+  { 0x9, 0x5 }, { 0xf, 0x1 },   /* (2,0), (3,0) */
+  { 0xb, 0xf }, { 0xd, 0x9 } }; /* (2,1), (3,1) */
+#if 0
+   /* NOTE: NVA3+ has alternative modes for MS2 and MS8, currently not used */
+   static const uint8_t ms8_alt[8][2] = {
+  { 0x9, 0x5 }, { 0x7, 0xb },   /* (2,0), (1,1) */
+  { 0xd, 0x9 }, { 0x5, 0x3 },   /* (3,1), (1,0) */
+  { 0x3, 0xd }, { 0x1, 0x7 },   /* (0,1), (0,0) */
+  { 0xb, 0xf }, { 0xf, 0x1 } }; /* (2,1), (3,0) */
+#endif
+
+   const uint8_t (*ptr)[2];
+
+   switch (sample_count) {
+   case 0:
+   case 1: ptr = ms1; break;
+   case 2: ptr = ms2; break;
+   case 4: ptr = ms4; break;
+   case 8: ptr = ms8; break;
+   default:
+  assert(0);
+  return; /* bad sample count -> undefined locations */
+   }
+   xy[0] = ptr[sample_index][0] * 0.0625f;
+   xy[1] = ptr[sample_index][1] * 0.0625f;
+}
diff --git a/src/gallium/drivers/nouveau/nv50/nv50_miptree.c 
b/src/gallium/drivers/nouveau/nv50/nv50_miptree.c
index 513d8f9..1963a4a 100644
--- a/src/gallium/drivers/nouveau/nv50/nv50_miptree.c
+++ b/src/gallium/drivers/nouveau/nv50/nv50_miptree.c
@@ -277,6 +277,8 @@ nv50_miptree_init_layout_tiled(struct nv50_miptree *mt)
 */
d = mt->layout_3d ? pt->depth0 : 1;
 
+   assert(!mt->ms_mode || !pt->last_level);
+
for (l = 0; l <= pt->last_level; ++l) {
   struct nv50_miptree_level *lvl = &mt->level[l];
   unsigned tsx, tsy, tsz;
diff --git a/src/gallium/drivers/nouveau/nv50/nv50_screen.c 
b/src/gallium/drivers/nouveau/nv50/nv50_s

[Mesa-dev] Bug with glBlitFramebufferEXT(), and a piglit test for it

2013-10-21 Thread Federico Mena Quintero

Hello, everyone,

Attached is a new test for piglit which exposes a bug in Mesa's
software rendering (and another bug in hardware rendering, but that's
not its main purpose).

The bug is as follows.  With software rendering, after doing a buffer
swap, glBlitFrameBufferEXT() appears to to copy the whole framebuffer
instead of just the specified region.  This breaks clutter and cogl,
since they keep track of a dirty region themselves, and they use blits
instead of full buffer swaps to avoid updating the whole display on
every frame unnecessarily.

What is happening is actually a bit more complicated.
glBlitFrameBufferEXT()'s basic machinery works correctly, but if there
has been a buffer swap before it, the following happens:

1. Draw some stuff (say, to GL_BACK)

2. Swap buffers.  As far as I can tell, this just causes an
XPutImage() from the GL_BACK buffer to the X window.

3. Draw some stuff to GL_BACK.

4. Do glBlitFrameBufferEXT() from GL_BACK to GL_FRONT with the area you
are interested in.

5. Internally, Mesa sees that the buffer for GL_FRONT has not been
created yet, so it creates it and does the blit.

6. Do glFlush() so that GL_FRONT actually gets sent to the screen.  This
causes an XPutImage() of the *whole* of GL_FRONT, thus giving
incorrect results - the area that should have been updated is the one
from (4), i.e. just the blit.

If you run the test program with hardware acceleration, it will work
correctly.  But if you run it with LIBGL_ALWAYS_SOFTWARE=1, it will
fail.

For a related bug, do the following:  in the test program change the
line that says

  #define SWAP_BUFFERS_BEFORE_BLIT 1

from 1 to 0.  Run the program again; this time it will work correctly
with software rendering, but at least on my box it fails with hardware
rendering (Intel).

I don't know enough about Mesa's internals to fix this quickly.  Any
help is appreciated.

Thanks,

  Federico

>From 5c565b6cb053b3917be826276b8e0d2254699a8f Mon Sep 17 00:00:00 2001
From: Federico Mena Quintero 
Date: Thu, 17 Oct 2013 14:52:31 -0500
Subject: [PATCH] fbo-blit-after-swap: New test for partial blits after a
 buffer swap

The clutter/cogl libraries try to minimize the area that gets updated on every frame.
They do this by doing glBlitFramebufferEXT() from the back buffer to the front buffer.

However, this is buggy with software rendering if there has been a buffer swap
*before* the first blit from the back buffer to the front buffer.  In this case,
Mesa copies the whole back buffer into the front buffer, instead of just the
requested region.
---
 tests/all.tests |   1 +
 tests/fbo/CMakeLists.gl.txt |   1 +
 tests/fbo/fbo-blit-after-swap.c | 136 
 3 files changed, 138 insertions(+)
 create mode 100644 tests/fbo/fbo-blit-after-swap.c

diff --git a/tests/all.tests b/tests/all.tests
index 7ab841e..6c92ebf 100644
--- a/tests/all.tests
+++ b/tests/all.tests
@@ -1163,6 +1163,7 @@ for format in ('rgba', 'depth', 'stencil'):
 test_name = ' '.join(['framebuffer-blit-levels', test_mode, format])
 arb_framebuffer_object[test_name] = PlainExecTest(test_name + ' -auto')
 add_plain_test(arb_framebuffer_object, 'fbo-alpha')
+add_plain_test(arb_framebuffer_object, 'fbo-blit-after-swap')
 add_plain_test(arb_framebuffer_object, 'fbo-blit-stretch')
 add_plain_test(arb_framebuffer_object, 'fbo-blit-scaled-linear')
 add_plain_test(arb_framebuffer_object, 'fbo-attachments-blit-scaled-linear')
diff --git a/tests/fbo/CMakeLists.gl.txt b/tests/fbo/CMakeLists.gl.txt
index 588fe26..3ad9ec0 100644
--- a/tests/fbo/CMakeLists.gl.txt
+++ b/tests/fbo/CMakeLists.gl.txt
@@ -31,6 +31,7 @@ piglit_add_executable (fbo-alpha fbo-alpha.c)
 piglit_add_executable (fbo-luminance-alpha fbo-luminance-alpha.c)
 piglit_add_executable (fbo-bind-renderbuffer fbo-bind-renderbuffer.c)
 piglit_add_executable (fbo-blit fbo-blit.c)
+piglit_add_executable (fbo-blit-after-swap fbo-blit-after-swap.c)
 piglit_add_executable (fbo-blit-d24s8 fbo-blit-d24s8.c)
 piglit_add_executable (fbo-blit-stretch fbo-blit-stretch.cpp)
 piglit_add_executable (fbo-blending-formats fbo-blending-formats.c)
diff --git a/tests/fbo/fbo-blit-after-swap.c b/tests/fbo/fbo-blit-after-swap.c
new file mode 100644
index 000..38fc870
--- /dev/null
+++ b/tests/fbo/fbo-blit-after-swap.c
@@ -0,0 +1,136 @@
+/*
+ * Copyright 2013 Suse, Inc. 
+ * Copyright © 2011 Henri Verbeet 
+ * Copyright 2011 Red Hat, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * pa

Re: [Mesa-dev] [PATCH 1/2] r600/llvm: Fix texbuf for pre EG gen

2013-10-21 Thread Tom Stellard

On Mon, Oct 21, 2013 at 10:02:12PM +0200, Vincent Lejeune wrote:

Can you add an explanation to the commit messages for both patches about
what was wrong with the old code?

Thanks,
Tom

> ---
>  src/gallium/drivers/r600/r600_llvm.c | 29 +
>  1 file changed, 29 insertions(+)
> 
> diff --git a/src/gallium/drivers/r600/r600_llvm.c 
> b/src/gallium/drivers/r600/r600_llvm.c
> index 34dd3ad..d7fa5f8 100644
> --- a/src/gallium/drivers/r600/r600_llvm.c
> +++ b/src/gallium/drivers/r600/r600_llvm.c
> @@ -427,6 +427,35 @@ static void llvm_emit_tex(
>   emit_data->output[0] = build_intrinsic(gallivm->builder,
>   "llvm.R600.load.texbuf",
>   emit_data->dst_type, 
> args, 2, LLVMReadNoneAttribute);
> + if (ctx->chip_class >= EVERGREEN)
> + return;
> + ctx->uses_tex_buffers = true;
> + LLVMDumpValue(emit_data->output[0]);
> + emit_data->output[0] = 
> LLVMBuildBitCast(gallivm->builder,
> + emit_data->output[0], 
> LLVMVectorType(bld_base->base.int_elem_type, 4),
> + "");
> + LLVMValueRef Mask = llvm_load_const_buffer(bld_base,
> + lp_build_const_int32(gallivm, 0),
> + LLVM_R600_BUFFER_INFO_CONST_BUFFER);
> + Mask = LLVMBuildBitCast(gallivm->builder, Mask,
> + LLVMVectorType(bld_base->base.int_elem_type, 
> 4), "");
> + emit_data->output[0] = 
> lp_build_emit_llvm_binary(bld_base, TGSI_OPCODE_AND,
> + emit_data->output[0],
> + Mask);
> + LLVMValueRef WComponent = 
> LLVMBuildExtractElement(gallivm->builder,
> + emit_data->output[0], 
> lp_build_const_int32(gallivm, 3), "");
> + Mask = llvm_load_const_buffer(bld_base, 
> lp_build_const_int32(gallivm, 1),
> + LLVM_R600_BUFFER_INFO_CONST_BUFFER);
> + Mask = LLVMBuildExtractElement(gallivm->builder, Mask,
> + lp_build_const_int32(gallivm, 0), "");
> + Mask = LLVMBuildBitCast(gallivm->builder, Mask,
> + bld_base->base.int_elem_type, "");
> + WComponent = lp_build_emit_llvm_binary(bld_base, 
> TGSI_OPCODE_OR,
> + WComponent, Mask);
> + emit_data->output[0] = 
> LLVMBuildInsertElement(gallivm->builder,
> + emit_data->output[0], WComponent, 
> lp_build_const_int32(gallivm, 3), "");
> + emit_data->output[0] = 
> LLVMBuildBitCast(gallivm->builder,
> + emit_data->output[0], 
> LLVMVectorType(bld_base->base.elem_type, 4), "");
>   }
>   return;
>   default:
> -- 
> 1.8.3.1
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] egl: Enable EGL_EXT_client_extensions

2013-10-21 Thread Chad Versace

Insert two fields into _egl_global to hold the client extensions and
statically initialize them:
ClientExtensions // a struct of bools
ClientExtensionString

Post-patch, Mesa supports exactly one client extension,
EGL_EXT_client_extensions.

Signed-off-by: Chad Versace 
---
 src/egl/main/eglapi.c | 8 +++-
 src/egl/main/eglglobals.c | 8 
 src/egl/main/eglglobals.h | 7 +++
 3 files changed, 22 insertions(+), 1 deletion(-)

diff --git a/src/egl/main/eglapi.c b/src/egl/main/eglapi.c
index 2d8653f..66f96de 100644
--- a/src/egl/main/eglapi.c
+++ b/src/egl/main/eglapi.c
@@ -87,6 +87,7 @@
 #include 
 #include 
 
+#include "eglglobals.h"
 #include "eglcontext.h"
 #include "egldisplay.h"
 #include "egltypedefs.h"
@@ -354,10 +355,15 @@ eglTerminate(EGLDisplay dpy)
 const char * EGLAPIENTRY
 eglQueryString(EGLDisplay dpy, EGLint name)
 {
-   _EGLDisplay *disp = _eglLockDisplay(dpy);
+   _EGLDisplay *disp;
_EGLDriver *drv;
const char *ret;
 
+   if (dpy == EGL_NO_DISPLAY && name == EGL_EXTENSIONS) {
+  RETURN_EGL_SUCCESS(NULL, _eglGlobal.ClientExtensionString);
+   }
+
+   disp = _eglLockDisplay(dpy);
_EGL_CHECK_DISPLAY(disp, NULL, drv);
ret = drv->API.QueryString(drv, disp, name);
 
diff --git a/src/egl/main/eglglobals.c b/src/egl/main/eglglobals.c
index f53f078..5c2fddf 100644
--- a/src/egl/main/eglglobals.c
+++ b/src/egl/main/eglglobals.c
@@ -47,6 +47,14 @@ struct _egl_global _eglGlobal =
   _eglUnloadDrivers, /* always called last */
   _eglFiniDisplay
},
+
+   /* ClientExtensions */
+   {
+  true /* EGL_EXT_client_extensions */
+   },
+
+   /* ClientExtensionsString */
+   "EGL_EXT_client_extensions"
 };
 
 
diff --git a/src/egl/main/eglglobals.h b/src/egl/main/eglglobals.h
index b40e30e..63428f7 100644
--- a/src/egl/main/eglglobals.h
+++ b/src/egl/main/eglglobals.h
@@ -31,6 +31,7 @@
 #ifndef EGLGLOBALS_INCLUDED
 #define EGLGLOBALS_INCLUDED
 
+#include 
 
 #include "egltypedefs.h"
 #include "eglmutex.h"
@@ -48,6 +49,12 @@ struct _egl_global
 
EGLint NumAtExitCalls;
void (*AtExitCalls[10])(void);
+
+   struct _egl_client_extensions {
+  bool EXT_client_extensions;
+   } ClientExtensions;
+
+   const char *ClientExtensionString;
 };
 
 
-- 
1.8.3.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 70743] New: Compilation on VS2013

2013-10-21 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=70743

  Priority: medium
Bug ID: 70743
  Assignee: mesa-dev@lists.freedesktop.org
   Summary: Compilation on VS2013
  Severity: normal
Classification: Unclassified
OS: All
  Reporter: scott.freedesk...@h4ck3r.net
  Hardware: Other
Status: NEW
   Version: unspecified
 Component: Mesa core
   Product: Mesa

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 70743] Compilation on VS2013

2013-10-21 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=70743

--- Comment #1 from Scott Graham  ---
Doesn't compile due to changes in VS2013's standard library.

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 70743] Compilation on VS2013

2013-10-21 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=70743

--- Comment #2 from Scott Graham  ---
Created attachment 87961
  --> https://bugs.freedesktop.org/attachment.cgi?id=87961&action=edit
compile fix for vs2013

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 70743] Compilation on VS2013

2013-10-21 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=70743

--- Comment #3 from Scott Graham  ---
Comment on attachment 87961
  --> https://bugs.freedesktop.org/attachment.cgi?id=87961
compile fix for vs2013

Index: include/c99/stdbool.h
===
--- include/c99/stdbool.h(revision 229946)
+++ include/c99/stdbool.h(working copy)
@@ -35,7 +35,8 @@
 #define bool_Bool

 /* For compilers that don't have the builtin _Bool type. */
-#if defined(_MSC_VER) || (__STDC_VERSION__ < 199901L && __GNUC__ < 3)
+#if (defined(_MSC_VER) && _MSC_VER < 1800) || \
+(defined __GNUC__&& __STDC_VERSION__ < 199901L && __GNUC__ < 3)
 typedef unsigned char _Bool;
 #endif

Index: src/mesa/main/querymatrix.c
===
--- src/mesa/main/querymatrix.c(revision 229946)
+++ src/mesa/main/querymatrix.c(working copy)
@@ -37,6 +37,7 @@
 #define FLOAT_TO_FIXED(x) ((GLfixed) ((x) * 65536.0))

 #if defined(_MSC_VER)
+#if _MSC_VER < 1800  // Not required on VS2013 and above.
 /* Oddly, the fpclassify() function doesn't exist in such a form
  * on MSVC.  This is an implementation using slightly different
  * lower-level Windows functions.
@@ -69,6 +70,7 @@
 return FP_NAN;
 }
 }
+#endif  // _MSC_VER < 1800

 #elif defined(__APPLE__) || defined(__CYGWIN__) || defined(__FreeBSD__) || \
  defined(__OpenBSD__) || defined(__NetBSD__) || defined(__DragonFly__) ||
\

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 70743] Compilation on VS2013

2013-10-21 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=70743

Scott Graham  changed:

   What|Removed |Added

  Attachment #87961|0   |1
is obsolete||

--- Comment #4 from Scott Graham  ---
Created attachment 87963
  --> https://bugs.freedesktop.org/attachment.cgi?id=87963&action=edit
fix compilation on 2013

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] R600/SI: fix MIMG writemask adjustement

2013-10-21 Thread Marek Olšák

From: Marek Olšák 

This fixes piglit:
- shaders/glsl-fs-texture2d-masked
- shaders/glsl-fs-texture2d-masked-4

Signed-off-by: Marek Olšák 
Reviewed-by: Tom Stellard 
---
 lib/Target/R600/SIISelLowering.cpp | 27 +++--
 test/CodeGen/R600/llvm.SI.sample-masked.ll | 93 ++
 2 files changed, 114 insertions(+), 6 deletions(-)
 create mode 100644 test/CodeGen/R600/llvm.SI.sample-masked.ll

diff --git a/lib/Target/R600/SIISelLowering.cpp 
b/lib/Target/R600/SIISelLowering.cpp
index 2c9270e..bfc9e8d 100644
--- a/lib/Target/R600/SIISelLowering.cpp
+++ b/lib/Target/R600/SIISelLowering.cpp
@@ -1065,7 +1065,9 @@ static unsigned SubIdx2Lane(unsigned Idx) {
 void SITargetLowering::adjustWritemask(MachineSDNode *&Node,
SelectionDAG &DAG) const {
   SDNode *Users[4] = { };
-  unsigned Writemask = 0, Lane = 0;
+  unsigned Lane = 0;
+  unsigned OldDmask = Node->getConstantOperandVal(0);
+  unsigned NewDmask = 0;
 
   // Try to figure out the used register components
   for (SDNode::use_iterator I = Node->use_begin(), E = Node->use_end();
@@ -1076,29 +1078,42 @@ void SITargetLowering::adjustWritemask(MachineSDNode 
*&Node,
 I->getMachineOpcode() != TargetOpcode::EXTRACT_SUBREG)
   return;
 
+// Lane means which subreg of %VGPRa_VGPRb_VGPRc_VGPRd is used.
+// Note that subregs are packed, i.e. Lane==0 is the first bit set
+// in OldDmask, so it can be any of X,Y,Z,W; Lane==1 is the second bit
+// set, etc.
 Lane = SubIdx2Lane(I->getConstantOperandVal(1));
 
+// Set which texture component corresponds to the lane.
+unsigned Comp;
+for (unsigned i = 0, Dmask = OldDmask; i <= Lane; i++) {
+  assert(Dmask);
+  Comp = ffs(Dmask)-1;
+  Dmask &= ~(1 << Comp);
+}
+
 // Abort if we have more than one user per component
 if (Users[Lane])
   return;
 
 Users[Lane] = *I;
-Writemask |= 1 << Lane;
+NewDmask |= 1 << Comp;
   }
 
-  // Abort if all components are used
-  if (Writemask == 0xf)
+  // Abort if there's no change
+  if (NewDmask == OldDmask)
 return;
 
   // Adjust the writemask in the node
   std::vector Ops;
-  Ops.push_back(DAG.getTargetConstant(Writemask, MVT::i32));
+  Ops.push_back(DAG.getTargetConstant(NewDmask, MVT::i32));
   for (unsigned i = 1, e = Node->getNumOperands(); i != e; ++i)
 Ops.push_back(Node->getOperand(i));
   Node = (MachineSDNode*)DAG.UpdateNodeOperands(Node, Ops.data(), Ops.size());
 
   // If we only got one lane, replace it with a copy
-  if (Writemask == (1U << Lane)) {
+  // (if NewDmask has only one bit set...)
+  if (NewDmask && (NewDmask & (NewDmask-1)) == 0) {
 SDValue RC = DAG.getTargetConstant(AMDGPU::VReg_32RegClassID, MVT::i32);
 SDNode *Copy = DAG.getMachineNode(TargetOpcode::COPY_TO_REGCLASS,
   SDLoc(), Users[Lane]->getValueType(0),
diff --git a/test/CodeGen/R600/llvm.SI.sample-masked.ll 
b/test/CodeGen/R600/llvm.SI.sample-masked.ll
new file mode 100644
index 000..454e48b
--- /dev/null
+++ b/test/CodeGen/R600/llvm.SI.sample-masked.ll
@@ -0,0 +1,93 @@
+;RUN: llc < %s -march=r600 -mcpu=verde | FileCheck %s
+
+; CHECK-LABEL: @v1
+; CHECK: IMAGE_SAMPLE VGPR{{[[0-9]}}_VGPR{{[0-9]}}_VGPR{{[0-9]}}, 13
+define void @v1(i32 %a1) {
+entry:
+  %0 = insertelement <1 x i32> undef, i32 %a1, i32 0
+  %1 = call <4 x float> @llvm.SI.sample.v1i32(<1 x i32> %0, <32 x i8> undef, 
<16 x i8> undef, i32 0)
+  %2 = extractelement <4 x float> %1, i32 0
+  %3 = extractelement <4 x float> %1, i32 2
+  %4 = extractelement <4 x float> %1, i32 3
+  call void @llvm.SI.export(i32 15, i32 0, i32 1, i32 12, i32 0, float %2, 
float %3, float %4, float %4)
+  ret void
+}
+
+; CHECK-LABEL: @v2
+; CHECK: IMAGE_SAMPLE VGPR{{[[0-9]}}_VGPR{{[0-9]}}_VGPR{{[0-9]}}, 11
+define void @v2(i32 %a1) {
+entry:
+  %0 = insertelement <1 x i32> undef, i32 %a1, i32 0
+  %1 = call <4 x float> @llvm.SI.sample.v1i32(<1 x i32> %0, <32 x i8> undef, 
<16 x i8> undef, i32 0)
+  %2 = extractelement <4 x float> %1, i32 0
+  %3 = extractelement <4 x float> %1, i32 1
+  %4 = extractelement <4 x float> %1, i32 3
+  call void @llvm.SI.export(i32 15, i32 0, i32 1, i32 12, i32 0, float %2, 
float %3, float %4, float %4)
+  ret void
+}
+
+; CHECK-LABEL: @v3
+; CHECK: IMAGE_SAMPLE VGPR{{[[0-9]}}_VGPR{{[0-9]}}_VGPR{{[0-9]}}, 14
+define void @v3(i32 %a1) {
+entry:
+  %0 = insertelement <1 x i32> undef, i32 %a1, i32 0
+  %1 = call <4 x float> @llvm.SI.sample.v1i32(<1 x i32> %0, <32 x i8> undef, 
<16 x i8> undef, i32 0)
+  %2 = extractelement <4 x float> %1, i32 1
+  %3 = extractelement <4 x float> %1, i32 2
+  %4 = extractelement <4 x float> %1, i32 3
+  call void @llvm.SI.export(i32 15, i32 0, i32 1, i32 12, i32 0, float %2, 
float %3, float %4, float %4)
+  ret void
+}
+
+; CHECK-LABEL: @v4
+; CHECK: IMAGE_SAMPLE VGPR{{[[0-9]}}_VGPR{{[0-9]}}_VGPR{{[0-9]}}, 7
+define void @v4(i32 %a1) {
+entry:
+  %0 = insertelement <1 x i32> undef, i32 %a1, i32

Re: [Mesa-dev] [PATCH] i965/fs: In the pre-regalloc schedule, try harder at reducing reg pressure.

2013-10-21 Thread Chia-I Wu

On Tue, Oct 22, 2013 at 3:05 AM, Eric Anholt  wrote:
> Chia-I Wu  writes:
>
>> On Thu, Oct 17, 2013 at 3:24 AM, Matt Turner  wrote:
>>> On Mon, Oct 14, 2013 at 4:14 PM, Eric Anholt  wrote:
 Previously, the best thing we had was to schedule the things unblocked by
 the current instruction, on the hope that it would be consuming two values
 at the end of their live intervals while only producing one new value.
 Sometimes that wasn't the case.

 Now, when an instruction is the first user of a GRF we schedule (i.e. it
 will probably be the virtual_grf_def[] instruction after computing live
 intervals again), penalize it by how many regs it would take up.  When an
 instruction is the last user of a GRF we have to schedule (when it will
 probably be the virtual_grf_end[] instruction), give it a boost by how
 many regs it would free.

 The new functions are made virtual (only 1 of 2 really needs to be
 virtual) because I expect we'll soon lift the pre-regalloc scheduling
 heuristic over to the vec4 backend.

 shader-db:
 total instructions in shared programs: 1512756 -> 1511604 (-0.08%)
 instructions in affected programs: 10292 -> 9140 (-11.19%)
 GAINED:121
 LOST:  38

 Improves tropics performance at my current settings by 4.50602% +/-
 2.60694% (n=5).  No difference on Lightsmark (n=5).  No difference on
 GLB2.7 (n=11).

 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=70445
 ---
>>>
>>> I think we're on the right track by considering register pressure when
>>> scheduling, but one aspect we're not considering is simply how many
>>> registers we think we're using.
>>>
>>> If I understand correctly, the pre-register allocation wants to
>>> shorten live intervals as much as possible which reduces register
>>> pressure but at the cost of larger stalls and less instruction level
>>> parallelism. We end up scheduling things like
>>>
>>> produce result 4
>>> produce result 3
>>> produce result 2
>>> produce result 1
>>> use result 1
>>> use result 2
>>> use result 3
>>> use result 4
>>>
>>> (this is why the MRF writes for the FB write are always done in the
>>> reverse order)
>> In this example, it will actually be
>>
>>  produce result 4
>>  use result 4
>>  produce result 3
>>  use result 3
>>  produce result 2
>>  use result 2
>>  produce result 1
>>  use result 1
>>
>> and post-regalloc will schedule again to something like
>>
>>  produce result 4
>>  produce result 3
>>  produce result 2
>>  produce result 1
>>  use result 4
>>  use result 3
>>  use result 2
>>  use result 1
>>
>> The pre-regalloc scheduling attempts to consume the results as soon as
>> they are available.
>>
>> FB write is done in reverse order because, when a result is available,
>> its consumers are scheduled in reverse order.  The epilog of fragment
>> shaders is usually like this:
>>
>>  placeholder_halt
>>  mov m1, g1
>>  mov m2, g2
>>  mov m3, g3
>>  mov m4, g4
>>  send
>>
>> MOVs depend on placeholder_halt, and send depends on MOVs.  The
>> scheduler will schedule it as follows:
>>
>>  placeholder_halt
>>  mov m4, g4
>>  mov m3, g3
>>  mov m2, g2
>>  mov m1, g1
>>  send
>>
>> The order can be corrected with the change proposed here
>>
>>   http://lists.freedesktop.org/archives/mesa-dev/2013-October/046570.html
>>
>> But there is no point for making the change the current heuristic for
>> pre-regalloc is to be reworked.
>
> Flipping the order in which we prefer ties (on betterthanlifo-2):
>
> commit 11a511576e465f02875f39c452561775a97416a1
> Author: Eric Anholt 
> Date:   Mon Oct 21 11:45:53 2013 -0700
>
> otherway
>
> diff --git a/src/mesa/drivers/dri/i965/brw_schedule_instructions.cpp 
> b/src/mesa/
> index 9a480b4..b123015 100644
> --- a/src/mesa/drivers/dri/i965/brw_schedule_instructions.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_schedule_instructions.cpp
> @@ -1049,9 +1049,9 @@ 
> fs_instruction_scheduler::choose_instruction_to_schedule()
> * it's the first use of a GRF, reduce its score since it means it
> * should be increasing register pressure.
> */
> -  for (schedule_node *node = (schedule_node *)instructions.get_tail();
> -   node != instructions.get_head()->prev;
> -   node = (schedule_node *)node->prev) {
> +  for (schedule_node *node = (schedule_node *)instructions.get_head();
> +   node != instructions.get_head()->next;
> +   node = (schedule_node *)node->next) {
>   schedule_node *n = (schedule_node *)node;
>   fs_inst *inst = (fs_inst *)n->inst;
>
> gives:
>
> total instructions in shared programs: 1544638 -> 1546794 (0.14%)
> instructions in affected programs: 7163 -> 9319 (30.10%)
> GAINED:16
> LOST:  289
>
> with massive spilling on tropics, and a bit on lightsmark and csgo.
Children of a

[Mesa-dev] [Bug 70743] Compilation on VS2013

2013-10-21 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=70743

Stephane Marchesin  changed:

   What|Removed |Added

 CC||marche...@icps.u-strasbg.fr

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] i965: Improving FS register spilling performance.

2013-10-21 Thread Eric Anholt

In the process of trying to work around the spilling in the huge unigine
soft shadowing shaders, I got to wondering if we couldn't just reduce the
cost of spilling to the point of "I don't care".  Notably, there is this
nice message for doing unspills on gen7 where you don't need to set up the
message beyond passing in g0.

It turns out to be a slight win.  Unfortunately, the complementary spill
message was a loss.  You can find the code in this submission on
gen7-scratch-read of my tree, and the code I'm not trying to push is on
gen7-scratch-write.

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 1/5] i965/fs: Fix broken register spilling debug code.

2013-10-21 Thread Eric Anholt

Now that reg spilling generates new vgrfs, we were looping forever if you
ever turned it on.

Instead, move the debug code into the register allocator right near where
we'd be doing spilling anyway, which should more accurately reflect how
register spilling occurs in the wild.
---
 src/mesa/drivers/dri/i965/brw_fs.cpp  |  7 ---
 src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp | 11 +++
 2 files changed, 11 insertions(+), 7 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index 65a4b66..5a8a45e 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -3091,13 +3091,6 @@ fs_visitor::run()
   assign_curb_setup();
   assign_urb_setup();
 
-  if (0) {
-/* Debug of register spilling: Go spill everything. */
-for (int i = 0; i < virtual_grf_count; i++) {
-   spill_reg(i);
-}
-  }
-
   if (0)
 assign_regs_trivial();
   else {
diff --git a/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp
index 157c9ae..7826cd4 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp
@@ -461,6 +461,17 @@ fs_visitor::assign_regs()
if (brw->gen >= 7)
   setup_mrf_hack_interference(g, first_mrf_hack_node);
 
+   /* Debug of register spilling: Go spill everything. */
+   if (0) {
+  int reg = choose_spill_reg(g);
+
+  if (reg != -1) {
+ spill_reg(reg);
+ ralloc_free(g);
+ return false;
+  }
+   }
+
if (!ra_allocate_no_spills(g)) {
   /* Failed to allocate registers.  Spill a reg, and the caller will
* loop back into here to try again.
-- 
1.8.4.rc3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 3/5] i965/fs: Fix register unspills from a reg_offset.

2013-10-21 Thread Eric Anholt

We were clearing the reg_offset before trying to use it.  Oops.  Fixes
glsl-fs-texture2drect with the reg spilling debug enabled.
---
 src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp
index ed0ce0d..a7ca319 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp
@@ -642,13 +642,13 @@ fs_visitor::spill_reg(int spill_reg)
 if (inst->src[i].file == GRF &&
 inst->src[i].reg == spill_reg) {
 int regs_read = inst->regs_read(this, i);
+int subset_spill_offset = (spill_offset +
+   reg_size * inst->src[i].reg_offset);
 
 inst->src[i].reg = virtual_grf_alloc(regs_read);
 inst->src[i].reg_offset = 0;
 
-emit_unspill(inst, inst->src[i],
- spill_offset + reg_size * inst->src[i].reg_offset,
- regs_read);
+emit_unspill(inst, inst->src[i], subset_spill_offset, regs_read);
 }
   }
 
-- 
1.8.4.rc3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 2/5] i965/fs: Fix register spilling for 16-wide.

2013-10-21 Thread Eric Anholt

Things blew up when I enabled the debug register spill code without
disabling 16-wide, so I decided to just fix 16-wide spilling.

We still don't generate 16-wide when register spilling happens as part of
allocation (since we expect it to be slower), but now we can experiment
with allowing it in some cases in the future.
---
 src/mesa/drivers/dri/i965/brw_fs_generator.cpp|  8 
 src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp | 15 ---
 2 files changed, 12 insertions(+), 11 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
index fa15f7b..6c8fb76 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
@@ -718,8 +718,8 @@ fs_generator::generate_spill(fs_inst *inst, struct brw_reg 
src)
brw_MOV(p,
   retype(brw_message_reg(inst->base_mrf + 1), BRW_REGISTER_TYPE_UD),
   retype(src, BRW_REGISTER_TYPE_UD));
-   brw_oword_block_write_scratch(p, brw_message_reg(inst->base_mrf), 1,
-inst->offset);
+   brw_oword_block_write_scratch(p, brw_message_reg(inst->base_mrf),
+ inst->mlen, inst->offset);
 }
 
 void
@@ -727,8 +727,8 @@ fs_generator::generate_unspill(fs_inst *inst, struct 
brw_reg dst)
 {
assert(inst->mlen != 0);
 
-   brw_oword_block_read_scratch(p, dst, brw_message_reg(inst->base_mrf), 1,
-   inst->offset);
+   brw_oword_block_read_scratch(p, dst, brw_message_reg(inst->base_mrf),
+dispatch_width / 8, inst->offset);
 }
 
 void
diff --git a/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp
index 7826cd4..ed0ce0d 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp
@@ -540,7 +540,7 @@ fs_visitor::emit_unspill(fs_inst *inst, fs_reg dst, 
uint32_t spill_offset,
   inst->insert_before(unspill_inst);
 
   dst.reg_offset++;
-  spill_offset += REG_SIZE;
+  spill_offset += dispatch_width * sizeof(float);
}
 }
 
@@ -624,10 +624,11 @@ fs_visitor::choose_spill_reg(struct ra_graph *g)
 void
 fs_visitor::spill_reg(int spill_reg)
 {
+   int reg_size = dispatch_width * sizeof(float);
int size = virtual_grf_sizes[spill_reg];
unsigned int spill_offset = c->last_scratch;
assert(ALIGN(spill_offset, 16) == spill_offset); /* oword read/write req. */
-   c->last_scratch += size * REG_SIZE;
+   c->last_scratch += size * reg_size;
 
/* Generate spill/unspill instructions for the objects being
 * spilled.  Right now, we spill or unspill the whole thing to a
@@ -646,7 +647,7 @@ fs_visitor::spill_reg(int spill_reg)
 inst->src[i].reg_offset = 0;
 
 emit_unspill(inst, inst->src[i],
- spill_offset + REG_SIZE * inst->src[i].reg_offset,
+ spill_offset + reg_size * inst->src[i].reg_offset,
  regs_read);
 }
   }
@@ -654,7 +655,7 @@ fs_visitor::spill_reg(int spill_reg)
   if (inst->dst.file == GRF &&
  inst->dst.reg == spill_reg) {
  int subset_spill_offset = (spill_offset +
-REG_SIZE * inst->dst.reg_offset);
+reg_size * inst->dst.reg_offset);
  inst->dst.reg = virtual_grf_alloc(inst->regs_written);
  inst->dst.reg_offset = 0;
 
@@ -677,11 +678,11 @@ fs_visitor::spill_reg(int spill_reg)
fs_inst *spill_inst = new(mem_ctx) fs_inst(FS_OPCODE_SPILL,
   reg_null_f, spill_src);
spill_src.reg_offset++;
-   spill_inst->offset = subset_spill_offset + chan * REG_SIZE;
+   spill_inst->offset = subset_spill_offset + chan * reg_size;
spill_inst->ir = inst->ir;
spill_inst->annotation = inst->annotation;
-   spill_inst->base_mrf = 14;
-   spill_inst->mlen = 2; /* header, value */
+   spill_inst->mlen = 1 + dispatch_width / 8; /* header, value */
+   spill_inst->base_mrf = 16 - spill_inst->mlen;
inst->insert_after(spill_inst);
 }
   }
-- 
1.8.4.rc3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 4/5] i965: Merge together opcodes for SHADER_OPCODE_GEN4_SCRATCH_READ/WRITE

2013-10-21 Thread Eric Anholt

I'm going to be introducing gen7 variants, and the previous naming was
going to get confusing.
---
 src/mesa/drivers/dri/i965/brw_defines.h |  7 +++
 src/mesa/drivers/dri/i965/brw_fs.cpp|  4 ++--
 src/mesa/drivers/dri/i965/brw_fs.h  |  4 ++--
 src/mesa/drivers/dri/i965/brw_fs_generator.cpp  | 12 ++--
 src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp   | 12 +++-
 src/mesa/drivers/dri/i965/brw_shader.cpp| 14 +-
 src/mesa/drivers/dri/i965/brw_vec4.cpp  |  4 ++--
 src/mesa/drivers/dri/i965/brw_vec4_generator.cpp|  4 ++--
 src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp |  4 ++--
 src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp  |  4 ++--
 10 files changed, 33 insertions(+), 36 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_defines.h 
b/src/mesa/drivers/dri/i965/brw_defines.h
index 5ba9d45..72a0891 100644
--- a/src/mesa/drivers/dri/i965/brw_defines.h
+++ b/src/mesa/drivers/dri/i965/brw_defines.h
@@ -774,14 +774,15 @@ enum opcode {
 
SHADER_OPCODE_SHADER_TIME_ADD,
 
+   SHADER_OPCODE_GEN4_SCRATCH_READ,
+   SHADER_OPCODE_GEN4_SCRATCH_WRITE,
+
FS_OPCODE_DDX,
FS_OPCODE_DDY,
FS_OPCODE_PIXEL_X,
FS_OPCODE_PIXEL_Y,
FS_OPCODE_CINTERP,
FS_OPCODE_LINTERP,
-   FS_OPCODE_SPILL,
-   FS_OPCODE_UNSPILL,
FS_OPCODE_UNIFORM_PULL_CONSTANT_LOAD,
FS_OPCODE_UNIFORM_PULL_CONSTANT_LOAD_GEN7,
FS_OPCODE_VARYING_PULL_CONSTANT_LOAD,
@@ -795,8 +796,6 @@ enum opcode {
FS_OPCODE_PLACEHOLDER_HALT,
 
VS_OPCODE_URB_WRITE,
-   VS_OPCODE_SCRATCH_READ,
-   VS_OPCODE_SCRATCH_WRITE,
VS_OPCODE_PULL_CONSTANT_LOAD,
VS_OPCODE_PULL_CONSTANT_LOAD_GEN7,
VS_OPCODE_UNPACK_FLAGS_SIMD4X2,
diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index 5a8a45e..c9ea731 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -763,11 +763,11 @@ fs_visitor::implied_mrf_writes(fs_inst *inst)
case FS_OPCODE_FB_WRITE:
   return 2;
case FS_OPCODE_UNIFORM_PULL_CONSTANT_LOAD:
-   case FS_OPCODE_UNSPILL:
+   case SHADER_OPCODE_GEN4_SCRATCH_READ:
   return 1;
case FS_OPCODE_VARYING_PULL_CONSTANT_LOAD:
   return inst->mlen;
-   case FS_OPCODE_SPILL:
+   case SHADER_OPCODE_GEN4_SCRATCH_WRITE:
   return 2;
default:
   assert(!"not reached");
diff --git a/src/mesa/drivers/dri/i965/brw_fs.h 
b/src/mesa/drivers/dri/i965/brw_fs.h
index b5aed23..f9c87c7 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.h
+++ b/src/mesa/drivers/dri/i965/brw_fs.h
@@ -519,8 +519,8 @@ private:
void generate_ddx(fs_inst *inst, struct brw_reg dst, struct brw_reg src);
void generate_ddy(fs_inst *inst, struct brw_reg dst, struct brw_reg src,
  bool negate_value);
-   void generate_spill(fs_inst *inst, struct brw_reg src);
-   void generate_unspill(fs_inst *inst, struct brw_reg dst);
+   void generate_scratch_write(fs_inst *inst, struct brw_reg src);
+   void generate_scratch_read(fs_inst *inst, struct brw_reg dst);
void generate_uniform_pull_constant_load(fs_inst *inst, struct brw_reg dst,
 struct brw_reg index,
 struct brw_reg offset);
diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
index 6c8fb76..6aebc41 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
@@ -711,7 +711,7 @@ fs_generator::generate_discard_jump(fs_inst *inst)
 }
 
 void
-fs_generator::generate_spill(fs_inst *inst, struct brw_reg src)
+fs_generator::generate_scratch_write(fs_inst *inst, struct brw_reg src)
 {
assert(inst->mlen != 0);
 
@@ -723,7 +723,7 @@ fs_generator::generate_spill(fs_inst *inst, struct brw_reg 
src)
 }
 
 void
-fs_generator::generate_unspill(fs_inst *inst, struct brw_reg dst)
+fs_generator::generate_scratch_read(fs_inst *inst, struct brw_reg dst)
 {
assert(inst->mlen != 0);
 
@@ -1509,12 +1509,12 @@ fs_generator::generate_code(exec_list *instructions)
 generate_ddy(inst, dst, src[0], c->key.render_to_fbo);
 break;
 
-  case FS_OPCODE_SPILL:
-generate_spill(inst, src[0]);
+  case SHADER_OPCODE_GEN4_SCRATCH_WRITE:
+generate_scratch_write(inst, src[0]);
 break;
 
-  case FS_OPCODE_UNSPILL:
-generate_unspill(inst, dst);
+  case SHADER_OPCODE_GEN4_SCRATCH_READ:
+generate_scratch_read(inst, dst);
 break;
 
   case FS_OPCODE_UNIFORM_PULL_CONSTANT_LOAD:
diff --git a/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp
index a7ca319..75090a6 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp
@@ -527,7 +527,8 @@ fs_visitor::emit_unspill(fs_inst *inst, fs_reg dst, 
uint32_t spill_offset,

[Mesa-dev] [PATCH 5/5] i965/fs: Use the gen7 scratch read opcode when possible.

2013-10-21 Thread Eric Anholt

This avoids a lot of message setup we had to do otherwise.  Improves
GLB2.7 performance with register spilling force enabled by 1.6442% +/-
0.553218% (n=4).
---
 src/mesa/drivers/dri/i965/brw_defines.h|  7 
 src/mesa/drivers/dri/i965/brw_eu.h |  5 +++
 src/mesa/drivers/dri/i965/brw_eu_emit.c| 41 ++
 src/mesa/drivers/dri/i965/brw_fs.h |  1 +
 src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 10 ++
 src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp  | 21 +++
 .../drivers/dri/i965/brw_schedule_instructions.cpp | 12 +++
 src/mesa/drivers/dri/i965/brw_shader.cpp   |  2 ++
 8 files changed, 93 insertions(+), 6 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_defines.h 
b/src/mesa/drivers/dri/i965/brw_defines.h
index 72a0891..276ab44 100644
--- a/src/mesa/drivers/dri/i965/brw_defines.h
+++ b/src/mesa/drivers/dri/i965/brw_defines.h
@@ -776,6 +776,7 @@ enum opcode {
 
SHADER_OPCODE_GEN4_SCRATCH_READ,
SHADER_OPCODE_GEN4_SCRATCH_WRITE,
+   SHADER_OPCODE_GEN7_SCRATCH_READ,
 
FS_OPCODE_DDX,
FS_OPCODE_DDY,
@@ -1135,6 +1136,12 @@ enum brw_message_target {
 #define GEN7_DATAPORT_DC_BYTE_SCATTERED_WRITE   12
 #define GEN7_DATAPORT_DC_UNTYPED_SURFACE_WRITE  13
 
+#define GEN7_DATAPORT_SCRATCH_READ((1 << 18) | \
+   (0 << 17))
+#define GEN7_DATAPORT_SCRATCH_WRITE   ((1 << 18) | \
+   (1 << 17))
+#define GEN7_DATAPORT_SCRATCH_NUM_REGS_SHIFT12
+
 /* HSW */
 #define HSW_DATAPORT_DC_PORT0_OWORD_BLOCK_READ  0
 #define HSW_DATAPORT_DC_PORT0_UNALIGNED_OWORD_BLOCK_READ1
diff --git a/src/mesa/drivers/dri/i965/brw_eu.h 
b/src/mesa/drivers/dri/i965/brw_eu.h
index 072310d..a307948 100644
--- a/src/mesa/drivers/dri/i965/brw_eu.h
+++ b/src/mesa/drivers/dri/i965/brw_eu.h
@@ -379,6 +379,11 @@ void brw_oword_block_write_scratch(struct brw_compile *p,
   int num_regs,
   GLuint offset);
 
+void gen7_block_read_scratch(struct brw_compile *p,
+ struct brw_reg dest,
+ int num_regs,
+ GLuint offset);
+
 void brw_shader_time_add(struct brw_compile *p,
  struct brw_reg payload,
  uint32_t surf_index);
diff --git a/src/mesa/drivers/dri/i965/brw_eu_emit.c 
b/src/mesa/drivers/dri/i965/brw_eu_emit.c
index 8efd679..accf324 100644
--- a/src/mesa/drivers/dri/i965/brw_eu_emit.c
+++ b/src/mesa/drivers/dri/i965/brw_eu_emit.c
@@ -2055,6 +2055,47 @@ brw_oword_block_read_scratch(struct brw_compile *p,
}
 }
 
+void
+gen7_block_read_scratch(struct brw_compile *p,
+struct brw_reg dest,
+int num_regs,
+GLuint offset)
+{
+   dest = retype(dest, BRW_REGISTER_TYPE_UW);
+
+   struct brw_instruction *insn = next_insn(p, BRW_OPCODE_SEND);
+
+   assert(insn->header.predicate_control == 0);
+   insn->header.compression_control = BRW_COMPRESSION_NONE;
+
+   brw_set_dest(p, insn, dest);
+
+   /* The HW requires that the header is present; this is to get the g0.5
+* scratch offset.
+*/
+   bool header_present = true;
+   brw_set_src0(p, insn, brw_vec8_grf(0, 0));
+
+   brw_set_message_descriptor(p, insn,
+  GEN7_SFID_DATAPORT_DATA_CACHE,
+  1, /* mlen: just g0 */
+  num_regs,
+  header_present,
+  false);
+
+   insn->bits3.ud |= GEN7_DATAPORT_SCRATCH_READ;
+
+   assert(num_regs == 1 || num_regs == 2 || num_regs == 4);
+   insn->bits3.ud |= (num_regs - 1) << GEN7_DATAPORT_SCRATCH_NUM_REGS_SHIFT;
+
+   /* The "HWORD" unit in the docs just happens to mean "the size of a
+* register"
+*/
+   offset /= REG_SIZE;
+   assert(offset < (1 << 12));
+   insn->bits3.ud |= offset;
+}
+
 /**
  * Read a float[4] vector from the data port Data Cache (const buffer).
  * Location (in buffer) should be a multiple of 16.
diff --git a/src/mesa/drivers/dri/i965/brw_fs.h 
b/src/mesa/drivers/dri/i965/brw_fs.h
index f9c87c7..432f3df 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.h
+++ b/src/mesa/drivers/dri/i965/brw_fs.h
@@ -521,6 +521,7 @@ private:
  bool negate_value);
void generate_scratch_write(fs_inst *inst, struct brw_reg src);
void generate_scratch_read(fs_inst *inst, struct brw_reg dst);
+   void generate_scratch_read_gen7(fs_inst *inst, struct brw_reg dst);
void generate_uniform_pull_constant_load(fs_inst *inst, struct brw_reg dst,
 struct brw_reg index,
 struct b

Re: [Mesa-dev] [PATCH 6/9] mesa: Add a new GetTransformFeedbackVertexCount() driver hook.

2013-10-21 Thread Eric Anholt

Kenneth Graunke  writes:

> DrawTransformFeedback() needs to obtain the number of vertices written
> to a particular stream during the last Begin/EndTransformFeedback block.
> The new driver hook returns exactly that information.
>
> Gallium drivers already implement this functionality by passing the
> transform feedback object to the drawing function.  I prefer to avoid
> this for two reasons:
>
> 1. Complexity:
>
> Normally, the drawing function takes an array of _mesa_prim objects,
> each of which specifies a vertex count.  If tfb_vertcount != NULL,
> however, there will only be one _mesa_prim object with an invalid
> vertex count (of 1), so it needs to be ignored.
>
> Since the _mesa_prim pointers are const, you can't even override it to
> the proper value; you need to pass around extra "ignore that, here's
> the real count" parameters.
>
> The drawing function is already terribly complicated, so I don't want to
> make it even more complicated.
>
> 2. Primitive restart:
>
> vbo_draw_arrays() performs software primitive restart, splitting a draw
> call in two when necessary.  vbo_draw_transform_feedback() currently
> doesn't because it has no idea how many vertices need to be drawn.  The
> new driver hook gives it that information, allowing us to reuse the
> existing vbo_draw_arrays() code to do everything right.

This interface means synchronizing with the GPU, which sucks when we
have the ability to actually do DTFB in the hardware pipeline (Indirect
Parameter Enable of 3DPRIMITIVE).  We could mostly use the hw pipelined
version only, as long as we had core contexts (meaning that we don't
need vertex start/count to figure out how much user vertex array data to
upload).

But, given that we have sw primitive restart on some lame hardware that
we want to support this on, we've got to have this path anyway.


pgpXWlpz9ioTY.pgp
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] libGL without X

2013-10-21 Thread Chris Healy

I would still need to build Mesa with X so that libGL is built though,
correct?


On Mon, Oct 21, 2013 at 2:03 PM, Erik Faye-Lund  wrote:

> On Mon, Oct 21, 2013 at 4:05 PM, Chris Healy  wrote:
> > I have a headless platform I need OpenGL to work on that does not have X.
> > It is x86 with Intel HD 4000 graphics.
> >
> > Ultimately, I'm just wanting to use OpenGL to render to memory for
> encoding
> > to H.264 and streaming.
> >
> > I'm trying to build Mesa for this platform without X and cannot get it to
> > build libGL.so.
> >
> > What am I missing here?  Is it not possible to use OpenGL without X?  I
> was
> > hoping I could use OpenGL with EGL for testing purposes.
>
> If you build mesa with GBM support, you should be able to render without X.
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 9/9] i965: Enable the ARB_transform_feedback2 extension on Gen7+.

2013-10-21 Thread Eric Anholt

Kenneth Graunke  writes:

> All the necessary pieces are now in place.
>
> Signed-off-by: Kenneth Graunke 
> ---
>  src/mesa/drivers/dri/i965/intel_extensions.c | 4 
>  1 file changed, 4 insertions(+)
>
> diff --git a/src/mesa/drivers/dri/i965/intel_extensions.c 
> b/src/mesa/drivers/dri/i965/intel_extensions.c
> index 334be05..c09ee39 100644
> --- a/src/mesa/drivers/dri/i965/intel_extensions.c
> +++ b/src/mesa/drivers/dri/i965/intel_extensions.c
> @@ -133,6 +133,10 @@ intelInitExtensions(struct gl_context *ctx)
>ctx->Const.GLSLVersion = 120;
> _mesa_override_glsl_version(ctx);
>  
> +   if (brw->gen >= 7) {
> +  ctx->Extensions.ARB_transform_feedback2 = true;
> +   }

If HSW doesn't actually work because the kernel won't let us run the
commands, I think we shouldn't turn it on on hsw.


pgpZg2637g2Xu.pgp
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 8/9] i965: Implement glDrawTransformFeedback().

2013-10-21 Thread Eric Anholt

Kenneth Graunke  writes:

> Implementing the GetTransformFeedbackVertexCount() driver hook allows
> the VBO module to call us with the right number of vertices.
>
> The hardware doesn't directly count the number of vertices written by
> SOL, so we instead use the SO_NUM_PRIMS_WRITTEN(n) counters and multiply
> by the number of vertices per primitive.
>
> Unfortunately, counting the number of primitives generated is tricky:
> a program might pause a transform feedback operation, start a second one
> with a different object, then switch back and resume.  Both transform
> feedback operations share the SO_NUM_PRIMS_WRITTEN counters.
>
> To work around this, we save the counter values at Begin, Pause, Resume,
> and End.  This "bookends" each section where transform feedback is
> active for the current object.  Adding up differences of pairs gives
> us the number of primitives generated.  (This is similar to what we
> do for occlusion queries on platforms without hardware contexts.)
>
> Signed-off-by: Kenneth Graunke 
> ---
>  src/mesa/drivers/dri/i965/brw_context.c|   2 +
>  src/mesa/drivers/dri/i965/brw_context.h|  26 
>  src/mesa/drivers/dri/i965/gen6_sol.c   |   1 +
>  src/mesa/drivers/dri/i965/gen7_sol_state.c | 190 
> -
>  4 files changed, 218 insertions(+), 1 deletion(-)

> +/**
> + * Tally the number of primitives generated so far.
> + *
> + * The buffer contains a series of pairs:
> + * (, ) ;
> + * (, ) ;
> + *
> + * For each stream, we subtract the pair of values (end - start) to get the
> + * number of primitives generated during one section.  We accumulate these
> + * values, adding them up to get the total number of primitives generated.
> + */
> +static void
> +gen7_tally_prims_generated(struct brw_context *brw,
> +   struct brw_transform_feedback_object *obj)
> +{
> +   /* If the current batch is still contributing to the number of primitives
> +* generated, flush it now so the results will be present when mapped.
> +*/
> +   if (drm_intel_bo_references(brw->batch.bo, obj->prim_count_bo))
> +  intel_batchbuffer_flush(brw);
> +
> +   if (unlikely(brw->perf_debug && drm_intel_bo_busy(obj->prim_count_bo)))
> +  perf_debug("Stalling for # of transform feedback primitives 
> written.\n");
> +
> +   drm_intel_bo_map(obj->prim_count_bo, false);
> +   uint64_t *prim_counts = obj->prim_count_bo->virtual;
> +
> +   assert(obj->prim_count_buffer_index % 2 * BRW_MAX_XFB_STREAMS == 0);

I think you want parens around "2 * BRW_MAX_XFB_STREAMS" here.

I was really impressed with how legible I found this patch.  Thanks!

All but patches 4, 9 are:

Reviewed-by: Eric Anholt 

and 9 gets r-b with the obvious change.


pgpFAU3WwDDJ9.pgp
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 1/3] glsl: Use saved values instead of recomputing them.

2013-10-21 Thread Eric Anholt

Matt Turner  writes:

> ---
>  src/glsl/opt_algebraic.cpp | 12 
>  1 file changed, 4 insertions(+), 8 deletions(-)
>
> diff --git a/src/glsl/opt_algebraic.cpp b/src/glsl/opt_algebraic.cpp
> index 3e5802e..b915f3c 100644
> --- a/src/glsl/opt_algebraic.cpp
> +++ b/src/glsl/opt_algebraic.cpp
> @@ -257,11 +257,9 @@ ir_algebraic_visitor::handle_expression(ir_expression 
> *ir)
> * folding.
> */
>if (op_const[0] && !op_const[1])
> -  reassociate_constant(ir, 0, op_const[0],
> -   ir->operands[1]->as_expression());
> +  reassociate_constant(ir, 0, op_const[0], op_expr[1]);
>if (op_const[1] && !op_const[0])
> -  reassociate_constant(ir, 1, op_const[1],
> -   ir->operands[0]->as_expression());
> +  reassociate_constant(ir, 1, op_const[1], op_expr[0]);
>break;
>  
> case ir_binop_sub:
> @@ -315,11 +313,9 @@ ir_algebraic_visitor::handle_expression(ir_expression 
> *ir)
> * constant folding.
> */
>if (op_const[0] && !op_const[1])
> -  reassociate_constant(ir, 0, op_const[0],
> -   ir->operands[1]->as_expression());
> +  reassociate_constant(ir, 0, op_const[0], op_expr[1]);
>if (op_const[1] && !op_const[0])
> -  reassociate_constant(ir, 1, op_const[1],
> -   ir->operands[0]->as_expression());
> +  reassociate_constant(ir, 1, op_const[1], op_expr[0]);
>  
>break;

Series is:

Reviewed-by: Eric Anholt 


pgpD2z1aiCmrx.pgp
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 5/8] i965/gs: Fix up gl_PointSize input swizzling for DUAL_INSTANCED gs.

2013-10-21 Thread Eric Anholt

Paul Berry  writes:

> On 18 October 2013 17:04, Eric Anholt  wrote:
>> Putting these fixups for a couple of weird cases in just MOV and ADD
>> feels wrong to me, but maybe when I understand better what's going on
>> it'll seem more natural.
>>
>
> Another possibility I'd be equally happy with would be to put the fixup at
> the top of vec4_generator::generate_vec4_instruction(), before the switch
> statement.  It would look something like this:
>
>if (dst.width == BRW_WIDTH_4) {
>   /* This happens in attribute fixups for "dual instanced" geometry
>* shaders, since they use attributes that are vec4's.  Since the exec
>* width is only 4, it's essential that the caller set
>* force_writemask_all in order to make sure the instruction is
> executed
>* regardless of which channels are enabled.
>*/
>   assert(inst->force_writemask_all);
>
>   /* Fix up any <8;8,1> or <0;4,1> source registers to <4;4,1> to
> satisfy
>* the following register region restrictions (from Graphics BSpec:
>* 3D-Media-GPGPU Engine > EU Overview > Registers and Register
> Regions
>* > Register Region Restrictions)
>*
>* 1. ExecSize must be greater than or equal to Width.
>*
>* 2. If ExecSize = Width and HorzStride != 0, VertStride must be
> set
>*to Width * HorzStride."
>*/
>   for (int i = 0; i < 3; i++) {
>  if (src[i].file == BRW_GENERAL_REGISTER_FILE)
> src[i] = stride(src[i], 4, 4, 1);
>   }
>}
>
> Does that seem better to you?  I actually think I like it slightly better
> because by making the assertion more general, I caught another case where I
> think I should be setting force_writemask_all to be on the safe side (the
> "clear r0.2" instruction in the gs prolog).

I like this better -- it makes more sense to me for the fixup to be
non-opcode-specific.  Any I think I get the problem now -- our registers
would make a ton of sense as <4;4,1> in general (that's how I think of
our GRFs in align16, at least!), except that we can't because then we'd
guess an execsize of 4.  I'm in favor of the kill-guess-execsize plan,
even if we leave it in place for gen4/5 SF/CLIP threads (which bounce
execsize all over the place iirc and would be a pain to convert) and
only convert the new backends.

Another option: How about instead of that assert in brw_eu_emit.c, we
just smash the vstride to be width * hstride?  We know the vstride
doesn't matter, because you're only using execsize components, so let's
just not bother our brw_eu.c callers with this little problem.

pgpJ7q3F_3aVA.pgp
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] libGL without X

2013-10-21 Thread Erik Faye-Lund

On Tue, Oct 22, 2013 at 2:58 AM, Chris Healy  wrote:
> On Mon, Oct 21, 2013 at 2:03 PM, Erik Faye-Lund  wrote:
>> On Mon, Oct 21, 2013 at 4:05 PM, Chris Healy  wrote:
>> > I have a headless platform I need OpenGL to work on that does not have
>> > X.
>> > It is x86 with Intel HD 4000 graphics.
>> >
>> > Ultimately, I'm just wanting to use OpenGL to render to memory for
>> > encoding
>> > to H.264 and streaming.
>> >
>> > I'm trying to build Mesa for this platform without X and cannot get it
>> > to
>> > build libGL.so.
>> >
>> > What am I missing here?  Is it not possible to use OpenGL without X?  I
>> > was
>> > hoping I could use OpenGL with EGL for testing purposes.
>>
>> If you build mesa with GBM support, you should be able to render without
>> X.
>
> I would still need to build Mesa with X so that libGL is built though,
> correct?
>

Probably, yeah. But you can run the resulting binary without an
x-server running. Dunno if that's sufficient for you or not.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] gallivm: implement fully accurate corner filtering for seamless cube maps

2013-10-21 Thread sroland

From: Roland Scheidegger 

d3d10 requires that cube corners are filtered with accurate weights (that
is, the weight of the non-existing corner texel should be evenly distributed
to the other 3 texels). OpenGL does not require this (but recommends it).
This requires us to use different filtering code, since we need per-texel
weights which our 2d lerp doesn't (and can't) do. And of course the (now
per element) weights need to be adjusted too for it to work.
Invoke the new filtering code whenever there's an edge to keep things simpler,
as it will work for edges too not just corners but of course it's only needed
with corners.
More ugly code for not much gain but at least a hacked up cubemap demo
shows very nice corners now... Not sure yet if and how this should be
configurable...
---
 src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c |  138 +++--
 1 file changed, 130 insertions(+), 8 deletions(-)

diff --git a/src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c 
b/src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c
index 8e2d0d9..5d3511d 100644
--- a/src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c
+++ b/src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c
@@ -840,7 +840,11 @@ lp_build_sample_image_linear(struct 
lp_build_sample_context *bld,
  const LLVMValueRef *offsets,
  LLVMValueRef colors_out[4])
 {
+   LLVMBuilderRef builder = bld->gallivm->builder;
+   struct lp_build_context *ivec_bld = &bld->int_coord_bld;
+   struct lp_build_context *coord_bld = &bld->coord_bld;
const unsigned dims = bld->dims;
+   struct lp_build_if_state edge_if;
LLVMValueRef width_vec;
LLVMValueRef height_vec;
LLVMValueRef depth_vec;
@@ -848,6 +852,7 @@ lp_build_sample_image_linear(struct lp_build_sample_context 
*bld,
LLVMValueRef flt_width_vec;
LLVMValueRef flt_height_vec;
LLVMValueRef flt_depth_vec;
+   LLVMValueRef fall_off[4], have_edge;
LLVMValueRef z1 = NULL;
LLVMValueRef z00 = NULL, z01 = NULL, z10 = NULL, z11 = NULL;
LLVMValueRef x00 = NULL, x01 = NULL, x10 = NULL, x11 = NULL;
@@ -856,6 +861,7 @@ lp_build_sample_image_linear(struct lp_build_sample_context 
*bld,
LLVMValueRef xs[4], ys[4], zs[4];
LLVMValueRef neighbors[2][2][4];
int chan, texel_index;
+   boolean silly_but_accurate_cube_corner_filtering = TRUE;
 
lp_build_extract_image_sizes(bld,
 &bld->int_size_bld,
@@ -918,12 +924,7 @@ lp_build_sample_image_linear(struct 
lp_build_sample_context *bld,
   }
}
else {
-  LLVMBuilderRef builder = bld->gallivm->builder;
-  struct lp_build_context *ivec_bld = &bld->int_coord_bld;
-  struct lp_build_context *coord_bld = &bld->coord_bld;
-  struct lp_build_if_state edge_if;
-  LLVMValueRef new_faces[4], new_xcoords[4][2], new_ycoords[4][2];
-  LLVMValueRef fall_off[4], coord, have_edge;
+  LLVMValueRef new_faces[4], new_xcoords[4][2], new_ycoords[4][2], coord;
   LLVMValueRef fall_off_ym_notxm, fall_off_ym_notxp;
   LLVMValueRef fall_off_yp_notxm, fall_off_yp_notxp;
   LLVMValueRef x0, x1, y0, y1, y0_clamped, y1_clamped;
@@ -1074,7 +1075,7 @@ lp_build_sample_image_linear(struct 
lp_build_sample_context *bld,
 
if (linear_mask) {
   /*
-   * Whack filter weights into place. Whatever pixel had more weight is
+   * Whack filter weights into place. Whatever texel had more weight is
* the one which should have been selected by nearest filtering hence
* just use 100% weight for it.
*/
@@ -1135,7 +1136,7 @@ lp_build_sample_image_linear(struct 
lp_build_sample_context *bld,
}
else {
   /* 2D/3D texture */
-  LLVMValueRef colors0[4];
+  LLVMValueRef colors0[4], colorss[4];
 
   /* get x0/x1 texels at y1 */
   lp_build_sample_texel_soa(bld,
@@ -1149,6 +1150,111 @@ lp_build_sample_image_linear(struct 
lp_build_sample_context *bld,
 row_stride_vec, img_stride_vec,
 data_ptr, mipoffsets, neighbors[1][1]);
 
+  /*
+   * To avoid having to duplicate linear_mask / fetch code use
+   * another branch (with same edge condition) here (note that
+   * since we're using another branch anyway we COULD restrict this
+   * rather easily to just corners).
+   */
+  if (silly_but_accurate_cube_corner_filtering &&
+  bld->static_texture_state->target == PIPE_TEXTURE_CUBE &&
+  bld->static_sampler_state->seamless_cube_map) {
+ LLVMValueRef w00, w01, w10, w11, wx0, wy0;
+ LLVMValueRef c_weight, c00, c01, c10, c11;
+ LLVMValueRef one_third, tmp;
+
+ colorss[0] = lp_build_alloca(bld->gallivm, coord_bld->vec_type, "cs");
+ colorss[1] = lp_build_alloca(bld->gallivm, coord_bld->vec_type, "cs");
+ colorss[2] = lp_build_alloca(bld->gallivm, coord_bld->vec_type, "cs");
+ colorss[3] = lp_build_alloca(bld->gallivm, coord_bld->vec_type,

[Mesa-dev] [PATCH] clover: Refuse to create context with invalid properties

2013-10-21 Thread Jan Vesely

the specs say that clCreateContext reutrns error
"if platform value specified in properties is not a valid platform"

The orignal approach fials if invalid valu other than NULL pointer is provided.

Fixes piglit cl-api-create-context.

Signed-off-by: Jan Vesely 
---
 src/gallium/state_trackers/clover/api/context.cpp | 11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/src/gallium/state_trackers/clover/api/context.cpp 
b/src/gallium/state_trackers/clover/api/context.cpp
index 7b020a6..67adf8f 100644
--- a/src/gallium/state_trackers/clover/api/context.cpp
+++ b/src/gallium/state_trackers/clover/api/context.cpp
@@ -34,14 +34,19 @@ clCreateContext(const cl_context_properties *d_props, 
cl_uint num_devs,
 void *user_data, cl_int *r_errcode) try {
auto props = obj(d_props);
auto devs = objs(d_devs, num_devs);
+   cl_platform_id platform;
+   cl_uint num_platforms;
 
if (!pfn_notify && user_data)
   throw error(CL_INVALID_VALUE);
+   
+   int ret = clGetPlatformIDs(1, &platform, &num_platforms);
+   if (ret || !num_platforms)
+  throw error(CL_INVALID_PLATFORM);
 
for (auto &prop : props) {
-  if (prop.first == CL_CONTEXT_PLATFORM)
- obj(prop.second.as());
-  else
+  if (prop.first != CL_CONTEXT_PLATFORM ||
+ prop.second.as() != platform)
  throw error(CL_INVALID_PROPERTY);
}
 
-- 
1.8.3.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 1/2] glsl: mark variables produced by lower_named_interface_blocks.

2013-10-21 Thread Paul Berry

These variables will need to be treated specially by
program_resource_visitor, so that they can be addressed through the
API using their interface block name (and array index, for interface
block arrays).
---
 src/glsl/ir.h | 12 
 src/glsl/lower_named_interface_blocks.cpp |  2 ++
 2 files changed, 14 insertions(+)

diff --git a/src/glsl/ir.h b/src/glsl/ir.h
index aac8cbb..91eb4c6 100644
--- a/src/glsl/ir.h
+++ b/src/glsl/ir.h
@@ -579,6 +579,18 @@ public:
unsigned location_frac:2;
 
/**
+* Non-zero if this variable was created by lowering a named interface
+* block which was not an array.
+*/
+   unsigned from_named_ifc_block_nonarray:1;
+
+   /**
+* Non-zero if this variable was created by lowering a named interface
+* block which was an array.
+*/
+   unsigned from_named_ifc_block_array:1;
+
+   /**
 * \brief Layout qualifier for gl_FragDepth.
 *
 * This is not equal to \c ir_depth_layout_none if and only if this
diff --git a/src/glsl/lower_named_interface_blocks.cpp 
b/src/glsl/lower_named_interface_blocks.cpp
index f415252..6329d5a 100644
--- a/src/glsl/lower_named_interface_blocks.cpp
+++ b/src/glsl/lower_named_interface_blocks.cpp
@@ -140,6 +140,7 @@ flatten_named_interface_blocks_declarations::run(exec_list 
*instructions)
   new(mem_ctx) ir_variable(iface_t->fields.structure[i].type,
var_name,
(ir_variable_mode) var->mode);
+   new_var->from_named_ifc_block_nonarray = 1;
 } else {
const glsl_type *new_array_type =
   glsl_type::get_array_instance(
@@ -149,6 +150,7 @@ flatten_named_interface_blocks_declarations::run(exec_list 
*instructions)
   new(mem_ctx) ir_variable(new_array_type,
var_name,
(ir_variable_mode) var->mode);
+   new_var->from_named_ifc_block_array = 1;
 }
 new_var->location = iface_t->fields.structure[i].location;
 
-- 
1.8.4.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 2/2] glsl: Account for interface block lowering in program_resource_visitor.

2013-10-21 Thread Paul Berry

When program_resource_visitor visits variables that were created by
lower_named_interface_blocks, it needs to do extra work to un-do the
effects of lower_named_interface_blocks and construct the proper API
names.

Fixes piglit test
spec/glsl-1.50/execution/interface-blocks-api-access-members.
---
 src/glsl/link_uniforms.cpp | 58 +-
 1 file changed, 57 insertions(+), 1 deletion(-)

diff --git a/src/glsl/link_uniforms.cpp b/src/glsl/link_uniforms.cpp
index 4bd4034..ea71b30 100644
--- a/src/glsl/link_uniforms.cpp
+++ b/src/glsl/link_uniforms.cpp
@@ -75,7 +75,63 @@ program_resource_visitor::process(ir_variable *var)
 */
 
/* Only strdup the name if we actually will need to modify it. */
-   if (t->is_record() || (t->is_array() && t->fields.array->is_record())) {
+   if (var->from_named_ifc_block_array) {
+  /* lower_named_interface_blocks created this variable by lowering an
+   * interface block array to an array variable.  For example if the
+   * original source code was:
+   *
+   * out Blk { vec4 bar } foo[3];
+   *
+   * Then the variable is now:
+   *
+   * out vec4 bar[3];
+   *
+   * We need to visit each array element using the names constructed like
+   * so:
+   *
+   * Blk[0].bar
+   * Blk[1].bar
+   * Blk[2].bar
+   */
+  assert(t->is_array());
+  const glsl_type *ifc_type = var->get_interface_type();
+  char *name = ralloc_strdup(NULL, ifc_type->name);
+  size_t name_length = strlen(name);
+  for (unsigned i = 0; i < t->length; i++) {
+ size_t new_length = name_length;
+ ralloc_asprintf_rewrite_tail(&name, &new_length, "[%u].%s", i,
+  var->name);
+ /* Note: row_major is only meaningful for uniform blocks, and
+  * lowering is only applied to non-uniform interface blocks, so we
+  * can safely pass false for row_major.
+  */
+ recursion(var->type, &name, new_length, false, NULL);
+  }
+  ralloc_free(name);
+   } else if (var->from_named_ifc_block_nonarray) {
+  /* lower_named_interface_blocks created this variable by lowering a
+   * named interface block (non-array) to an ordinary variable.  For
+   * example if the original source code was:
+   *
+   * out Blk { vec4 bar } foo;
+   *
+   * Then the variable is now:
+   *
+   * out vec4 bar;
+   *
+   * We need to visit this variable using the name:
+   *
+   * Blk.bar
+   */
+  const glsl_type *ifc_type = var->get_interface_type();
+  char *name = ralloc_asprintf(NULL, "%s.%s", ifc_type->name, var->name);
+  /* Note: row_major is only meaningful for uniform blocks, and lowering
+   * is only applied to non-uniform interface blocks, so we can safely
+   * pass false for row_major.
+   */
+  recursion(var->type, &name, strlen(name), false, NULL);
+  ralloc_free(name);
+   } else if (t->is_record() || (t->is_array() && 
t->fields.array->is_record())) {
   char *name = ralloc_strdup(NULL, var->name);
   recursion(var->type, &name, strlen(name), false, NULL);
   ralloc_free(name);
-- 
1.8.4.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] clover: Refuse to create context with invalid properties

2013-10-21 Thread Francisco Jerez

Jan Vesely  writes:

> the specs say that clCreateContext reutrns error
> "if platform value specified in properties is not a valid platform"
>
> The orignal approach fials if invalid valu other than NULL pointer is 
> provided.
>
> Fixes piglit cl-api-create-context.
>
Honestly, I don't think this test makes much sense.  It's unreasonable
to expect that the CL will be able to catch any bad pointer you give it
as argument and fail gracefully.  The only reliable solution that comes
to my mind would be to build a global hash table for each CL object type
that keeps track of the valid objects that have been allocated.  That
seems like a lot of effort with the only purpose of finding out if the
user is doing something *very* stupid and very unlikely.

That said, we're already doing three forms of object validation: first,
the pointers provided by the user are compared against NULL; second, we
make sure that the dispatch table pointer is at the correct location in
memory; third, if the object is part of a non-trivial class hierarchy,
as is the case for events and memory objects, we use RTTI to make sure
that the object is of the expected type.  I don't think we want or need
more validation, it would probably be more useful to drop that test from
piglit.

Apparently nVidia's libOpenCL fails the test as well.

Thanks.

> Signed-off-by: Jan Vesely 
> ---
>  src/gallium/state_trackers/clover/api/context.cpp | 11 ---
>  1 file changed, 8 insertions(+), 3 deletions(-)
>
> diff --git a/src/gallium/state_trackers/clover/api/context.cpp 
> b/src/gallium/state_trackers/clover/api/context.cpp
> index 7b020a6..67adf8f 100644
> --- a/src/gallium/state_trackers/clover/api/context.cpp
> +++ b/src/gallium/state_trackers/clover/api/context.cpp
> @@ -34,14 +34,19 @@ clCreateContext(const cl_context_properties *d_props, 
> cl_uint num_devs,
>  void *user_data, cl_int *r_errcode) try {
> auto props = obj(d_props);
> auto devs = objs(d_devs, num_devs);
> +   cl_platform_id platform;
> +   cl_uint num_platforms;
>  
> if (!pfn_notify && user_data)
>throw error(CL_INVALID_VALUE);
> +   
> +   int ret = clGetPlatformIDs(1, &platform, &num_platforms);
> +   if (ret || !num_platforms)
> +  throw error(CL_INVALID_PLATFORM);
>  
> for (auto &prop : props) {
> -  if (prop.first == CL_CONTEXT_PLATFORM)
> - obj(prop.second.as());
> -  else
> +  if (prop.first != CL_CONTEXT_PLATFORM ||
> +   prop.second.as() != platform)
>   throw error(CL_INVALID_PROPERTY);
> }
>  
> -- 
> 1.8.3.1
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev


pgpKP5W0G9sUP.pgp
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] i965: Only emit interpolation setup if there are actual FS inputs.

2013-10-21 Thread Matt Turner

Reviewed-by: Matt Turner 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

58 matches

Mail list logo