date:20130219

Re: [Mesa-dev] r600g: status of my work on the shader optimization

2013-02-19 Thread Vadim Girlin


On 02/18/2013 02:20 PM, Andy Furniss wrote:

Stefan Seifert wrote:

Hi!

Amazing work! I see some 50 % speed ups in FlightGear and even more.
While
normally 3D clouds tear performance down to an unflyable stutter, with
your
branch I can fly in densly clouded conditions at usable framerates. I
can now
turn all shaders to maximum and enjoy the view. This makes a huge
difference.

Unfortunately there's a downside as well:



Testing with rv790 with drm-fixes kernel not much works -

etqw runs but in a level 50% of screen is junk.

nexuiz menus total junk, didn't test further.

xonotic menus OK but gpu lock on starting timedemo.

vdpau mpeg2 decode - renders 90% junk.

heaven 3.0 (on a different pure 64 bit setup) gpu lock.


I've pushed the patch to improve support for the r6xx, r7xx and cayman.
I believe the chances that it will work on these chips are higher now, 
so you might want to give it another try.


Vadim



Unrelated question wtr heaven 3.0 - does it work properly anyway?

For me running 64bit on rv790 with vanilla mesa with or without llvm I
have to set shaders to medium, on high it works but I get no
lighting/effects. There are also a couple of scenes that render as
flared out black and white.




___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] glsl: Initialize parcel_out_uniform_storage member variables.

2013-02-19 Thread Adrian M Negreanu

Hi,

I have tested your changes:

[Mesa-dev] [PATCH] glsl: Initialize parcel_out_uniform_storage member
variables.

Project: mesa (Mesa build tests)
Configurations: android linux

 

 Tested the patch(es) on top of the following commits:
 07cdfdb st/mesa: remove what is left from u_blit
 40ee93c st/mesa: simplify and improve CopyTexSubImage
 6520a86 st/mesa: don't do sRGB conversion in CopyTexSubImage
 0a1479c st/mesa: implement blit-based TexImage and TexSubImage
 a6e0ac9 st/mesa: fix blit-based GetTexImage for 1D array textures
 91acf62 st/mesa: fix blit-based GetTexImage for depth/stencil formats
 0181e18 st/mesa: factor out code for determining blit.mask from CopyTexSubImage

 

 Failed to build for "android""

 07cdfdb st/mesa: remove what is left from u_blit
 40ee93c st/mesa: simplify and improve CopyTexSubImage
 6520a86 st/mesa: don't do sRGB conversion in CopyTexSubImage
 0a1479c st/mesa: implement blit-based TexImage and TexSubImage
 a6e0ac9 st/mesa: fix blit-based GetTexImage for 1D array textures
 91acf62 st/mesa: fix blit-based GetTexImage for depth/stencil formats
 0181e18 st/mesa: factor out code for determining blit.mask from CopyTexSubImage

 src/glsl/glsl_parser.yy: conflicts: 1 shift/reduce
 src/glsl/./link_uniforms.cpp: In constructor
'parcel_out_uniform_storage::parcel_out_uniform_storage(string_to_uint_map*,
gl_uniform_storage*, gl_constant_value*)':
 src/glsl/./link_uniforms.cpp:267:18: error: expected '{' before 'uniforms'
 make: *** 
[out/host/linux-x86/obj/EXECUTABLES/mesa_builtin_compiler_intermediates/./link_uniforms.o]
Error 1
 FAILURE

 


 Successfully built configuration "linux", no issues

 




-- 
Regards!
http://groleo.wordpress.com
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v2] glsl: Initialize parcel_out_uniform_storage member variables.

2013-02-19 Thread Vinson Lee

Fixes uninitialized scalar field defect reported by Coverity.

Signed-off-by: Vinson Lee 
---
 src/glsl/link_uniforms.cpp | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/src/glsl/link_uniforms.cpp b/src/glsl/link_uniforms.cpp
index d457e4d..b5bfe13 100644
--- a/src/glsl/link_uniforms.cpp
+++ b/src/glsl/link_uniforms.cpp
@@ -263,9 +263,11 @@ public:
parcel_out_uniform_storage(struct string_to_uint_map *map,
  struct gl_uniform_storage *uniforms,
  union gl_constant_value *values)
-  : map(map), uniforms(uniforms), next_sampler(0), values(values)
+  : ubo_block_index(0), ubo_byte_offset(0), ubo_row_major(false),
+map(map), uniforms(uniforms), next_sampler(0), values(values),
+targets(), shader_samplers_used(0), shader_shadow_samplers(0)
{
-  memset(this->targets, 0, sizeof(this->targets));
+  /* empty */
}
 
void start_shader()
-- 
1.8.1.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v2] configure.ac: Do not check for clock_gettime on MinGW.

2013-02-19 Thread Vinson Lee

MinGW does not have clock_gettime.

Signed-off-by: Vinson Lee 
---
 configure.ac | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/configure.ac b/configure.ac
index 16c2f8c..1e11b4e 100644
--- a/configure.ac
+++ b/configure.ac
@@ -500,7 +500,7 @@ AC_CHECK_FUNC([dlopen], [DEFINES="$DEFINES -DHAVE_DLOPEN"],
 AC_SUBST([DLOPEN_LIBS])
 
 case "$host_os" in
-darwin*)
+darwin*|mingw*)
 ;;
 *)
 AC_CHECK_FUNCS([clock_gettime], [CLOCK_LIB=],
-- 
1.8.1.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] llvmpipe: fix lp_resource_copy using more than one 3d slice

2013-02-19 Thread Jose Fonseca

Thanks for fixing this Roland.

This is definitely an improvement. I'd recommend a few tweaks (it could even be 
as a follow on change):

- Calling llvmpipe_flush_resource() in a loop is overkill (it will call 
llvmpipe_flush() to be called many times needlessly). Please refactor 
llvmpipe_flush_resource() and llvmpipe_is_resource_referenced() to receive 
start_layer, end_layer pair.

- call util_copy_box instead of util_copy_rect

Jose


- Original Message -
> From: Roland Scheidegger 
> 
> These used to be illegal a very long time ago, then for some more time
> nothing really emitted these so this code path wasn't hit.
> Just trivially iterate over box->depth.
> (Might be worth refactoring at some point since nowadays all the code
> doesn't really do much except for depth textures.)
> 
> This fixes https://bugs.freedesktop.org/show_bug.cgi?id=61093
> ---
>  src/gallium/drivers/llvmpipe/lp_surface.c |  170
>  +++--
>  1 file changed, 86 insertions(+), 84 deletions(-)
> 
> diff --git a/src/gallium/drivers/llvmpipe/lp_surface.c
> b/src/gallium/drivers/llvmpipe/lp_surface.c
> index 11475fd..dbaed95 100644
> --- a/src/gallium/drivers/llvmpipe/lp_surface.c
> +++ b/src/gallium/drivers/llvmpipe/lp_surface.c
> @@ -65,7 +65,7 @@ lp_resource_copy(struct pipe_context *pipe,
> const enum pipe_format format = src_tex->base.format;
> unsigned width = src_box->width;
> unsigned height = src_box->height;
> -   assert(src_box->depth == 1);
> +   unsigned z;
>  
> /* Fallback for buffers. */
> if (dst->target == PIPE_BUFFER && src->target == PIPE_BUFFER) {
> @@ -74,99 +74,101 @@ lp_resource_copy(struct pipe_context *pipe,
>return;
> }
>  
> -   llvmpipe_flush_resource(pipe,
> -   dst, dst_level, dstz,
> -   FALSE, /* read_only */
> -   TRUE, /* cpu_access */
> -   FALSE, /* do_not_block */
> -   "blit dest");
> -
> -   llvmpipe_flush_resource(pipe,
> -   src, src_level, src_box->z,
> -   TRUE, /* read_only */
> -   TRUE, /* cpu_access */
> -   FALSE, /* do_not_block */
> -   "blit src");
> -
> -   /*
> -   printf("surface copy from %u lvl %u to %u lvl %u: %u,%u,%u to %u,%u,%u %u
> x %u x %u\n",
> -  src_tex->id, src_level, dst_tex->id, dst_level,
> -  src_box->x, src_box->y, src_box->z, dstx, dsty, dstz,
> -  src_box->width, src_box->height, src_box->depth);
> -   */
> -
> -   /* set src tiles to linear layout */
> -   {
> -  unsigned tx, ty, tw, th;
> -  unsigned x, y;
> -
> -  adjust_to_tile_bounds(src_box->x, src_box->y, width, height,
> -&tx, &ty, &tw, &th);
> -
> -  for (y = 0; y < th; y += TILE_SIZE) {
> - for (x = 0; x < tw; x += TILE_SIZE) {
> -(void) llvmpipe_get_texture_tile_linear(src_tex,
> -src_box->z, src_level,
> -LP_TEX_USAGE_READ,
> -tx + x, ty + y);
> +   for (z = 0; z < src_box->depth; z++){
> +  llvmpipe_flush_resource(pipe,
> +  dst, dst_level, dstz + z,
> +  FALSE, /* read_only */
> +  TRUE, /* cpu_access */
> +  FALSE, /* do_not_block */
> +  "blit dest");
> +
> +  llvmpipe_flush_resource(pipe,
> +  src, src_level, src_box->z + z,
> +  TRUE, /* read_only */
> +  TRUE, /* cpu_access */
> +  FALSE, /* do_not_block */
> +  "blit src");
> +
> +  /*
> +  printf("surface copy from %u lvl %u to %u lvl %u: %u,%u,%u to %u,%u,%u
> %u x %u x %u\n",
> + src_tex->id, src_level, dst_tex->id, dst_level,
> + src_box->x, src_box->y, src_box->z, dstx, dsty, dstz,
> + src_box->width, src_box->height, src_box->depth);
> +  */
> +
> +  /* set src tiles to linear layout */
> +  {
> + unsigned tx, ty, tw, th;
> + unsigned x, y;
> +
> + adjust_to_tile_bounds(src_box->x, src_box->y, width, height,
> +   &tx, &ty, &tw, &th);
> +
> + for (y = 0; y < th; y += TILE_SIZE) {
> +for (x = 0; x < tw; x += TILE_SIZE) {
> +   (void) llvmpipe_get_texture_tile_linear(src_tex,
> +   src_box->z + z,
> src_level,
> +   LP_TEX_USAGE_READ,
> +   tx + x, ty + y);
> +}
>   }
>}
> -   }
> -
> -   /* set dst

Re: [Mesa-dev] [PATCH] radeonsi: Fix blending using destination alpha factor but non-alpha destination

2013-02-19 Thread Christian König


Am 18.02.2013 20:11, schrieb Roland Scheidegger:

Am 18.02.2013 19:14, schrieb Michel Dänzer:

From: Michel Dänzer 

11 more little piglits.

NOTE: This is a candidate for the 9.1 branch.

Signed-off-by: Michel Dänzer 
---

Any ideas why this seems necessary with radeonsi but not with r600g?

Maybe the hw uses an implicit 1 if the format has no alpha (though I'm
not sure if it can always know with bgrx formats and the like).
I'm wondering if there should be a helper for those fixups. Looks to me
like quite some drivers need it (though well so far I think just
non-gallium i965 does this plus llvmpipe, but for some of the others I'm
skeptical if not doing it is really correct...).


I agree alpha blending with a buffer format that doesn't have alpha is a 
bit strange, that should be catched by the upper layers.





  src/gallium/drivers/radeonsi/si_state.c | 116 +---
  src/gallium/drivers/radeonsi/si_state.h |   3 +-
  2 files changed, 61 insertions(+), 58 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_state.c 
b/src/gallium/drivers/radeonsi/si_state.c
index d20e3ff..144a29d 100644
--- a/src/gallium/drivers/radeonsi/si_state.c
+++ b/src/gallium/drivers/radeonsi/si_state.c
@@ -36,33 +36,6 @@
  #include "si_state.h"
  #include "sid.h"
  
-/*

- * inferred framebuffer and blender state
- */
-static void si_update_fb_blend_state(struct r600_context *rctx)
-{
-   struct si_pm4_state *pm4;
-   struct si_state_blend *blend = rctx->queued.named.blend;
-   uint32_t mask;
-
-   if (blend == NULL)
-   return;
-
-   pm4 = CALLOC_STRUCT(si_pm4_state);
-   if (pm4 == NULL)
-   return;
-
-   mask = (1ULL << ((unsigned)rctx->framebuffer.nr_cbufs * 4)) - 1;
-   mask &= blend->cb_target_mask;
-   si_pm4_set_reg(pm4, R_028238_CB_TARGET_MASK, mask);
-
-   si_pm4_set_state(rctx, fb_blend, pm4);
-}
-
-/*
- * Blender functions
- */
-
  static uint32_t si_translate_blend_function(int blend_func)
  {
switch (blend_func) {
@@ -84,7 +57,7 @@ static uint32_t si_translate_blend_function(int blend_func)
return 0;
  }
  
-static uint32_t si_translate_blend_factor(int blend_fact)

+static uint32_t si_translate_blend_factor(int blend_fact, bool dst_alpha)
  {
switch (blend_fact) {
case PIPE_BLENDFACTOR_ONE:
@@ -94,7 +67,7 @@ static uint32_t si_translate_blend_factor(int blend_fact)
case PIPE_BLENDFACTOR_SRC_ALPHA:
return V_028780_BLEND_SRC_ALPHA;
case PIPE_BLENDFACTOR_DST_ALPHA:
-   return V_028780_BLEND_DST_ALPHA;
+   return dst_alpha ? V_028780_BLEND_DST_ALPHA : 
V_028780_BLEND_ONE;
case PIPE_BLENDFACTOR_DST_COLOR:
return V_028780_BLEND_DST_COLOR;
case PIPE_BLENDFACTOR_SRC_ALPHA_SATURATE:
@@ -110,7 +83,7 @@ static uint32_t si_translate_blend_factor(int blend_fact)
case PIPE_BLENDFACTOR_INV_SRC_ALPHA:
return V_028780_BLEND_ONE_MINUS_SRC_ALPHA;
case PIPE_BLENDFACTOR_INV_DST_ALPHA:
-   return V_028780_BLEND_ONE_MINUS_DST_ALPHA;
+   return dst_alpha ? V_028780_BLEND_ONE_MINUS_DST_ALPHA : 
V_028780_BLEND_ZERO;
case PIPE_BLENDFACTOR_INV_DST_COLOR:
return V_028780_BLEND_ONE_MINUS_DST_COLOR;
case PIPE_BLENDFACTOR_INV_CONST_COLOR:
@@ -133,30 +106,25 @@ static uint32_t si_translate_blend_factor(int blend_fact)
return 0;
  }

I think you might also need to patch up SRC_ALPHA_SATURATE (to zero).

Can't comment on the hw stuff but at least llvmpipe does the same
otherwise :-).


Why should we do so? SRC_ALPHA_SATURATE should still work fine, even 
when the destination buffer doesn't have an alpha component.


Christian.



Roland
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] radeonsi: Fix blending using destination alpha factor but non-alpha destination

2013-02-19 Thread Michel Dänzer

On Die, 2013-02-19 at 10:33 +0100, Christian König wrote: 
> Am 18.02.2013 20:11, schrieb Roland Scheidegger:
> > Am 18.02.2013 19:14, schrieb Michel Dänzer:
> >> From: Michel Dänzer 
> >>
> >> 11 more little piglits.
> >>
> >> NOTE: This is a candidate for the 9.1 branch.
> >>
> >> Signed-off-by: Michel Dänzer 
> >> ---
> >>
> >> Any ideas why this seems necessary with radeonsi but not with r600g?
> > Maybe the hw uses an implicit 1 if the format has no alpha (though I'm
> > not sure if it can always know with bgrx formats and the like).
> > I'm wondering if there should be a helper for those fixups. Looks to me
> > like quite some drivers need it (though well so far I think just
> > non-gallium i965 does this plus llvmpipe, but for some of the others I'm
> > skeptical if not doing it is really correct...).
> 
> I agree alpha blending with a buffer format that doesn't have alpha is a 
> bit strange, that should be catched by the upper layers.

If it was that simple. :\

The problem is that AFAICT for formats such as R8G8B8X8, there's no
other way to tell the hardware to always use 1 for the destination
alpha. And I'm not sure we can just not support any such formats, I
certainly don't think that would be a good idea.


> >> @@ -84,7 +57,7 @@ static uint32_t si_translate_blend_function(int 
> >> blend_func)
> >>return 0;
> >>   }
> >>   
> >> -static uint32_t si_translate_blend_factor(int blend_fact)
> >> +static uint32_t si_translate_blend_factor(int blend_fact, bool dst_alpha)
> >>   {
> >>switch (blend_fact) {
> >>case PIPE_BLENDFACTOR_ONE:
> >> @@ -94,7 +67,7 @@ static uint32_t si_translate_blend_factor(int blend_fact)
> >>case PIPE_BLENDFACTOR_SRC_ALPHA:
> >>return V_028780_BLEND_SRC_ALPHA;
> >>case PIPE_BLENDFACTOR_DST_ALPHA:
> >> -  return V_028780_BLEND_DST_ALPHA;
> >> +  return dst_alpha ? V_028780_BLEND_DST_ALPHA : 
> >> V_028780_BLEND_ONE;
> >>case PIPE_BLENDFACTOR_DST_COLOR:
> >>return V_028780_BLEND_DST_COLOR;
> >>case PIPE_BLENDFACTOR_SRC_ALPHA_SATURATE:
> >> @@ -110,7 +83,7 @@ static uint32_t si_translate_blend_factor(int 
> >> blend_fact)
> >>case PIPE_BLENDFACTOR_INV_SRC_ALPHA:
> >>return V_028780_BLEND_ONE_MINUS_SRC_ALPHA;
> >>case PIPE_BLENDFACTOR_INV_DST_ALPHA:
> >> -  return V_028780_BLEND_ONE_MINUS_DST_ALPHA;
> >> +  return dst_alpha ? V_028780_BLEND_ONE_MINUS_DST_ALPHA : 
> >> V_028780_BLEND_ZERO;
> >>case PIPE_BLENDFACTOR_INV_DST_COLOR:
> >>return V_028780_BLEND_ONE_MINUS_DST_COLOR;
> >>case PIPE_BLENDFACTOR_INV_CONST_COLOR:
> >> @@ -133,30 +106,25 @@ static uint32_t si_translate_blend_factor(int 
> >> blend_fact)
> >>return 0;
> >>   }
> > I think you might also need to patch up SRC_ALPHA_SATURATE (to zero).
> >
> > Can't comment on the hw stuff but at least llvmpipe does the same
> > otherwise :-).
> 
> Why should we do so? SRC_ALPHA_SATURATE should still work fine, even 
> when the destination buffer doesn't have an alpha component.

I think Roland is right. When the destination has no alpha, the
destination alpha value is supposed to be always 1, so
SRC_ALPHA_SATURATE is always 0. But with a format as described above,
the destination X8 channel may contain any value.


Really, what I don't understand is why r600g doesn't seem affected by
this... at least on my RS880 it's passing the piglit tests this change
fixes with radeonsi. So maybe I'm just missing some magic bit for
radeonsi.


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast |  Debian, X and DRI developer
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] radeonsi: Fix blending using destination alpha factor but non-alpha destination

2013-02-19 Thread Michel Dänzer

On Mon, 2013-02-18 at 20:11 +0100, Roland Scheidegger wrote: 
> Am 18.02.2013 19:14, schrieb Michel Dänzer:
> > From: Michel Dänzer 
> > 
> > 11 more little piglits.
> > 
> > NOTE: This is a candidate for the 9.1 branch.
> > 
> > Signed-off-by: Michel Dänzer 
> > ---
> > 
> > Any ideas why this seems necessary with radeonsi but not with r600g?
> Maybe the hw uses an implicit 1 if the format has no alpha (though I'm
> not sure if it can always know with bgrx formats and the like).

Yeah, I can't seem to find anything like that.


> I'm wondering if there should be a helper for those fixups. Looks to me
> like quite some drivers need it (though well so far I think just
> non-gallium i965 does this plus llvmpipe, but for some of the others I'm
> skeptical if not doing it is really correct...).

Some kind of helper might be nice, maybe that could also simplify the
other blending parameters accordingly when possible.


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast |  Debian, X and DRI developer
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 38086] Mesa 7.11-devel implementation error: Unexpected program target in destroy_program_variants_cb()

2013-02-19 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=38086

--- Comment #5 from Laurent carlier  ---
Can reproduce this bug also with mesa-9.2-devel (git) but also 9.0.x with
counter strike: Source (steam-linux)

gdb backtrace:
Breakpoint 2, destroy_program_variants (st=0x8171910, program=0xf52863c0
<_mesa_DummyProgram>) at ../../src/mesa/state_tracker/st_program.c:1212
1212  _mesa_problem(NULL, "Unexpected program target 0x%x in "
(gdb) bt full
#0  destroy_program_variants (st=0x8171910, program=0xf52863c0
<_mesa_DummyProgram>) at ../../src/mesa/state_tracker/st_program.c:1212
No locals.
#1  0xf405cb16 in destroy_program_variants_cb (key=2, data=0xf52863c0
<_mesa_DummyProgram>, userData=0x8171910)
at ../../src/mesa/state_tracker/st_program.c:1266
st = 0x8171910
program = 0xf52863c0 <_mesa_DummyProgram>
#2  0xf3f3b343 in _mesa_HashWalk (table=0x80b5240, callback=0xf405caf2
, userData=0x8171910) at
../../src/mesa/main/hash.c:329
table2 = 0x80b5240
entry = 0xbf3fe30
__PRETTY_FUNCTION__ = "_mesa_HashWalk"
#3  0xf405cb51 in st_destroy_program_variants (st=0x8171910) at
../../src/mesa/state_tracker/st_program.c:1279
No locals.
#4  0xf403761c in st_destroy_context (st=0x8171910) at
../../src/mesa/state_tracker/st_context.c:301
pipe = 0x80a6f90
cso = 0x8176e48
ctx = 0x818f788
i = 4
#5  0xf4054e92 in st_context_destroy (stctxi=0x8171910) at
../../src/mesa/state_tracker/st_manager.c:598
st = 0x8171910
#6  0xf42a08e3 in dri_destroy_context (cPriv=0x80b6110) at dri_context.c:187
ctx = 0x80a6f00
#7  0xf3e9ef3e in driDestroyContext (pcp=0x80b6110) at
../../../../src/mesa/drivers/dri/common/dri_util.c:329
No locals.
#8  0xf7c520c6 in ?? () from /usr/lib32/libGL.so.1
No symbol table info available.
#9  0xf7c26b98 in glXDestroyContext () from /usr/lib32/libGL.so.1
No symbol table info available.
#10 0xf73ab759 in X11_GL_DeleteContext (_this=0x805e380, context=0x80b6028) at
src/video/x11/SDL_x11opengl.c:747
display = 0x805edb0
#11 0xf7387d00 in SDL_GL_DeleteContext (context=0x80b6028) at
src/video/SDL_video.c:2785
No locals.
#12 0xf7a0b38f in ?? () from bin/launcher.so
No symbol table info available.
#13 0xf741418b in ?? () from
/home/lordh/.local/share/Steam/SteamApps/lordheavy/Counter-Strike
Source/bin/libtogl.so
No symbol table info available.
#14 0xf74143a6 in ?? () from
/home/lordh/.local/share/Steam/SteamApps/lordheavy/Counter-Strike
Source/bin/libtogl.so
No symbol table info available.
#15 0xf7403596 in IDirect3DDevice9::~IDirect3DDevice9() () from
/home/lordh/.local/share/Steam/SteamApps/lordheavy/Counter-Strike
Source/bin/libtogl.so
No symbol table info available.
---Type  to continue, or q  to quit---
#16 0xf74036f2 in IDirect3DDevice9::~IDirect3DDevice9() () from
/home/lordh/.local/share/Steam/SteamApps/lordheavy/Counter-Strike
Source/bin/libtogl.so
No symbol table info available.
#17 0xec66482f in ?? () from
/home/lordh/.local/share/Steam/SteamApps/lordheavy/Counter-Strike
Source/bin/shaderapidx9.so
No symbol table info available.
#18 0xec6631fe in ?? () from
/home/lordh/.local/share/Steam/SteamApps/lordheavy/Counter-Strike
Source/bin/shaderapidx9.so
No symbol table info available.
#19 0xf1cf7731 in ?? () from
/home/lordh/.local/share/Steam/SteamApps/lordheavy/Counter-Strike
Source/bin/materialsystem.so
No symbol table info available.
#20 0xf7a07f8c in ?? () from bin/launcher.so
No symbol table info available.
#21 0xf7a0801b in ?? () from bin/launcher.so
No symbol table info available.
#22 0xf7a08010 in ?? () from bin/launcher.so
No symbol table info available.
#23 0xf79f04cd in LauncherMain () from bin/launcher.so
No symbol table info available.
#24 0x08048474 in main ()

(gdb) print *program
$1 = {Id = 0, String = 0x0, RefCount = 0, Target = 0, Format = 0, Instructions
= 0x0, InputsRead = 0, OutputsWritten = 0, SystemValuesRead = 0, 
  InputFlags = {0 }, OutputFlags = {0 },
TexturesUsed = {0 }, SamplersUsed = 0, ShadowSamplers = 0, 
  Parameters = 0x0, LocalParams = {{0, 0, 0, 0} },
SamplerUnits = '\000' , IndirectRegisterFiles = 0, 
  NumInstructions = 0, NumTemporaries = 0, NumParameters = 0, NumAttributes =
0, NumAddressRegs = 0, NumAluInstructions = 0, NumTexInstructions = 0, 
  NumTexIndirections = 0, NumNativeInstructions = 0, NumNativeTemporaries = 0,
NumNativeParameters = 0, NumNativeAttributes = 0, NumNativeAddressRegs = 0, 
  NumNativeAluInstructions = 0, NumNativeTexInstructions = 0,
NumNativeTexIndirections = 0}

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 61093] [llvmpipe] lp_surface.c:68:lp_resource_copy: Assertion `src_box->depth == 1' failed.

2013-02-19 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=61093

--- Comment #1 from Marek Olšák  ---
The assertion in lp_resource_copy can be fixed easily, but I can't reproduce
it. llvmpipe is failing a different assertion here:

texsubimage: /home/marek/dev/llvm/include/llvm/CodeGen/SelectionDAGNodes.h:539:
const llvm::SDValue &llvm::SDNode::getOperand(unsigned int) const: Assertion
`Num < NumOperands && "Invalid child # of SDNode!"' failed.

The way I see it, my work only uncovered this bug.

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH mesa] freedreno: gallium driver for adreno

2013-02-19 Thread Marek Olšák

On Mon, Feb 18, 2013 at 7:58 PM, Rob Clark  wrote:
> On Mon, Feb 18, 2013 at 12:47 PM, Matt Turner  wrote:
>> On Sun, Feb 17, 2013 at 11:33 AM, Rob Clark  wrote:
>>>
>>> diff --git a/src/gallium/drivers/freedreno/Makefile.am 
>>> b/src/gallium/drivers/freedreno/Makefile.am
>>> new file mode 100644
>>> index 000..10abdfb
>>> --- /dev/null
>>> +++ b/src/gallium/drivers/freedreno/Makefile.am
>>> @@ -0,0 +1,35 @@
>>> +include $(top_srcdir)/src/gallium/Automake.inc
>>> +
>>> +noinst_LTLIBRARIES = libfreedreno.la
>>> +
>>> +AM_CFLAGS = \
>>> +   -Werror -Wno-packed-bitfield-compat \
>>> +   -I$(top_srcdir)/src/gallium/include \ <--
>>> +   -I$(top_srcdir)/src/gallium/auxiliary \ <--
>>> +   -I$(top_srcdir)/src/gallium/drivers \
>>> +   -I$(top_srcdir)/include \ <--
>>> +   $(FREEDRENO_CFLAGS) \
>>> +   $(DEFINES) \ <--
>>> +   $(PIC_FLAGS) \
>>> +   $(VISIBILITY_CFLAGS)
>>
>> The <-- mark things that are provided by the GALLIUM_CFLAGS variable
>> in Automake.inc that you've already included. PIC_FLAGS is now dead.
>> Distributions don't like -Werror being hardcoded into upstream's
>> CFLAGS.
>
> Hmm, is there a better way to get -Werror for just freedreno when I am
> building myself?  I do find that it is pretty useful to let the
> compiler help me catch problems rather than debugging them the hard
> way ;-)

You can set the CFLAGS and CXXFLAGS environment variables before
configuring Mesa.

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] radeonsi: Fix blending using destination alpha factor but non-alpha destination

2013-02-19 Thread Marek Olšák

On Tue, Feb 19, 2013 at 10:33 AM, Christian König
 wrote:
> Am 18.02.2013 20:11, schrieb Roland Scheidegger:
>
>> Am 18.02.2013 19:14, schrieb Michel Dänzer:
>>>
>>> From: Michel Dänzer 
>>>
>>> 11 more little piglits.
>>>
>>> NOTE: This is a candidate for the 9.1 branch.
>>>
>>> Signed-off-by: Michel Dänzer 
>>> ---
>>>
>>> Any ideas why this seems necessary with radeonsi but not with r600g?
>>
>> Maybe the hw uses an implicit 1 if the format has no alpha (though I'm
>> not sure if it can always know with bgrx formats and the like).
>> I'm wondering if there should be a helper for those fixups. Looks to me
>> like quite some drivers need it (though well so far I think just
>> non-gallium i965 does this plus llvmpipe, but for some of the others I'm
>> skeptical if not doing it is really correct...).
>
>
> I agree alpha blending with a buffer format that doesn't have alpha is a bit
> strange, that should be catched by the upper layers.

I think it's better to do that in drivers instead.

r300g also uses a different blend state for RGBX and RGBA. The R300
blend state CSO actually contains 11 command buffers and the driver
switches between them when needed. Two of those command buffers
contain blend state for RGBX and RGBA. This approach of having
multiple command buffers per CSO has a much lower overhead than any
other solution I've seen (including rebuilding states on the fly and
having the state tracker figure it out).

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] r600g: don't reserve more stack space than required v3

2013-02-19 Thread Andy Furniss


Vadim Girlin wrote:

v3: handle hw-specific cases

Signed-off-by: Vadim Girlin 
---

cc: Andy Furniss 
Hopefully this should work better on the non-evergreen chips


This one seems to work OK for me.


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] r600g: status of my work on the shader optimization

2013-02-19 Thread Andy Furniss


Vadim Girlin wrote:


Testing with rv790 with drm-fixes kernel not much works -

etqw runs but in a level 50% of screen is junk.

nexuiz menus total junk, didn't test further.

xonotic menus OK but gpu lock on starting timedemo.

vdpau mpeg2 decode - renders 90% junk.

heaven 3.0 (on a different pure 64 bit setup) gpu lock.


I've pushed the patch to improve support for the r6xx, r7xx and cayman.
I believe the chances that it will work on these chips are higher now,
so you might want to give it another try.


It's still the same for me.

I tested with and without llvm this time -

nexuiz renders OK with llvm but is still corrupt without.

etqw as above with/without

xonotic locks with/without

vdpau junk without gpu lock with.


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] radeonsi: Fix blending using destination alpha factor but non-alpha destination

2013-02-19 Thread Marek Olšák

On Tue, Feb 19, 2013 at 11:02 AM, Michel Dänzer  wrote:
> On Die, 2013-02-19 at 10:33 +0100, Christian König wrote:
>> Am 18.02.2013 20:11, schrieb Roland Scheidegger:
>> > Am 18.02.2013 19:14, schrieb Michel Dänzer:
>> >> From: Michel Dänzer 
>> >>
>> >> 11 more little piglits.
>> >>
>> >> NOTE: This is a candidate for the 9.1 branch.
>> >>
>> >> Signed-off-by: Michel Dänzer 
>> >> ---
>> >>
>> >> Any ideas why this seems necessary with radeonsi but not with r600g?
>> > Maybe the hw uses an implicit 1 if the format has no alpha (though I'm
>> > not sure if it can always know with bgrx formats and the like).
>> > I'm wondering if there should be a helper for those fixups. Looks to me
>> > like quite some drivers need it (though well so far I think just
>> > non-gallium i965 does this plus llvmpipe, but for some of the others I'm
>> > skeptical if not doing it is really correct...).
>>
>> I agree alpha blending with a buffer format that doesn't have alpha is a
>> bit strange, that should be catched by the upper layers.
>
> If it was that simple. :\
>
> The problem is that AFAICT for formats such as R8G8B8X8, there's no
> other way to tell the hardware to always use 1 for the destination
> alpha. And I'm not sure we can just not support any such formats, I
> certainly don't think that would be a good idea.
>
>
>> >> @@ -84,7 +57,7 @@ static uint32_t si_translate_blend_function(int 
>> >> blend_func)
>> >>return 0;
>> >>   }
>> >>
>> >> -static uint32_t si_translate_blend_factor(int blend_fact)
>> >> +static uint32_t si_translate_blend_factor(int blend_fact, bool dst_alpha)
>> >>   {
>> >>switch (blend_fact) {
>> >>case PIPE_BLENDFACTOR_ONE:
>> >> @@ -94,7 +67,7 @@ static uint32_t si_translate_blend_factor(int 
>> >> blend_fact)
>> >>case PIPE_BLENDFACTOR_SRC_ALPHA:
>> >>return V_028780_BLEND_SRC_ALPHA;
>> >>case PIPE_BLENDFACTOR_DST_ALPHA:
>> >> -  return V_028780_BLEND_DST_ALPHA;
>> >> +  return dst_alpha ? V_028780_BLEND_DST_ALPHA : 
>> >> V_028780_BLEND_ONE;
>> >>case PIPE_BLENDFACTOR_DST_COLOR:
>> >>return V_028780_BLEND_DST_COLOR;
>> >>case PIPE_BLENDFACTOR_SRC_ALPHA_SATURATE:
>> >> @@ -110,7 +83,7 @@ static uint32_t si_translate_blend_factor(int 
>> >> blend_fact)
>> >>case PIPE_BLENDFACTOR_INV_SRC_ALPHA:
>> >>return V_028780_BLEND_ONE_MINUS_SRC_ALPHA;
>> >>case PIPE_BLENDFACTOR_INV_DST_ALPHA:
>> >> -  return V_028780_BLEND_ONE_MINUS_DST_ALPHA;
>> >> +  return dst_alpha ? V_028780_BLEND_ONE_MINUS_DST_ALPHA : 
>> >> V_028780_BLEND_ZERO;
>> >>case PIPE_BLENDFACTOR_INV_DST_COLOR:
>> >>return V_028780_BLEND_ONE_MINUS_DST_COLOR;
>> >>case PIPE_BLENDFACTOR_INV_CONST_COLOR:
>> >> @@ -133,30 +106,25 @@ static uint32_t si_translate_blend_factor(int 
>> >> blend_fact)
>> >>return 0;
>> >>   }
>> > I think you might also need to patch up SRC_ALPHA_SATURATE (to zero).
>> >
>> > Can't comment on the hw stuff but at least llvmpipe does the same
>> > otherwise :-).
>>
>> Why should we do so? SRC_ALPHA_SATURATE should still work fine, even
>> when the destination buffer doesn't have an alpha component.
>
> I think Roland is right. When the destination has no alpha, the
> destination alpha value is supposed to be always 1, so
> SRC_ALPHA_SATURATE is always 0. But with a format as described above,
> the destination X8 channel may contain any value.
>
>
> Really, what I don't understand is why r600g doesn't seem affected by
> this... at least on my RS880 it's passing the piglit tests this change
> fixes with radeonsi. So maybe I'm just missing some magic bit for
> radeonsi.

RGB formats do fail fbo-blending-formats with r600g/redwood here.

However the alpha channel can sometimes contain 1 in memory even if
the format is RGBX. Off the top of my head, glClear, glTex[Sub]Image,
glCopyTex[Sub]Image always set alpha to 1. Blits do too if they use
RGBX as a source. One way to set alpha != 1 is to draw some geometry.

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] r600g: status of my work on the shader optimization

2013-02-19 Thread Vadim Girlin


On 02/19/2013 04:54 PM, Andy Furniss wrote:

Vadim Girlin wrote:


Testing with rv790 with drm-fixes kernel not much works -

etqw runs but in a level 50% of screen is junk.

nexuiz menus total junk, didn't test further.

xonotic menus OK but gpu lock on starting timedemo.

vdpau mpeg2 decode - renders 90% junk.

heaven 3.0 (on a different pure 64 bit setup) gpu lock.


I've pushed the patch to improve support for the r6xx, r7xx and cayman.
I believe the chances that it will work on these chips are higher now,
so you might want to give it another try.


It's still the same for me.

I tested with and without llvm this time -

nexuiz renders OK with llvm but is still corrupt without.

etqw as above with/without

xonotic locks with/without

vdpau junk without gpu lock with.


Could you please test glxgears and other simple mesa demos? It's easier 
to spot the problems with small apps that don't use a lot of complex 
shaders. If some of them don't work correctly, please send me the dumps 
with "R600_DUMP_SHADERS=2 R600_SB_DUMP=3". Also it might help if you can 
look for piglit regressions against the piglit results with R600_SB=0 
and send me the dumps for a few regressed tests.


Thanks for testing.

Vadim



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 3/8] R600/SI: rework VOP2_* pattern

2013-02-19 Thread Christian König

From: Christian König 

Fixing asm operation names.

Signed-off-by: Christian König 
---
 lib/Target/R600/SIISelLowering.cpp |3 ---
 lib/Target/R600/SIInstrInfo.td |   37 ++--
 2 files changed, 18 insertions(+), 22 deletions(-)

diff --git a/lib/Target/R600/SIISelLowering.cpp 
b/lib/Target/R600/SIISelLowering.cpp
index 4085890..5a468ae 100644
--- a/lib/Target/R600/SIISelLowering.cpp
+++ b/lib/Target/R600/SIISelLowering.cpp
@@ -75,7 +75,6 @@ MachineBasicBlock * 
SITargetLowering::EmitInstrWithCustomInserter(
.addOperand(MI->getOperand(0))
.addOperand(MI->getOperand(1))
.addImm(0x80) // SRC1
-   .addImm(0x80) // SRC2
.addImm(0) // ABS
.addImm(1) // CLAMP
.addImm(0) // OMOD
@@ -88,7 +87,6 @@ MachineBasicBlock * 
SITargetLowering::EmitInstrWithCustomInserter(
  .addOperand(MI->getOperand(0))
  .addOperand(MI->getOperand(1))
  .addImm(0x80) // SRC1
- .addImm(0x80) // SRC2
  .addImm(1) // ABS
  .addImm(0) // CLAMP
  .addImm(0) // OMOD
@@ -101,7 +99,6 @@ MachineBasicBlock * 
SITargetLowering::EmitInstrWithCustomInserter(
  .addOperand(MI->getOperand(0))
  .addOperand(MI->getOperand(1))
  .addImm(0x80) // SRC1
- .addImm(0x80) // SRC2
  .addImm(0) // ABS
  .addImm(0) // CLAMP
  .addImm(0) // OMOD
diff --git a/lib/Target/R600/SIInstrInfo.td b/lib/Target/R600/SIInstrInfo.td
index dbe616d..be791e2 100644
--- a/lib/Target/R600/SIInstrInfo.td
+++ b/lib/Target/R600/SIInstrInfo.td
@@ -123,29 +123,28 @@ multiclass VOP1_32  op, string opName, list 
pattern>
 multiclass VOP1_64  op, string opName, list pattern>
   : VOP1_Helper ;
 
-class VOP2_Helper  op, RegisterClass vrc, RegisterClass arc,
-   string opName, list pattern> :
-  VOP2 <
-op, (outs vrc:$dst), (ins arc:$src0, vrc:$src1), opName, pattern
-  >;
-
-multiclass VOP2_32  op, string opName, list pattern> {
-
-  def _e32 : VOP2_Helper ;
-
-  def _e64 : VOP3_32 <{1, 0, 0, op{5}, op{4}, op{3}, op{2}, op{1}, op{0}},
-  opName, []
+multiclass VOP2_Helper  op, RegisterClass vrc, RegisterClass arc,
+string opName, list pattern> {
+  def _e32 : VOP2 <
+op, (outs vrc:$dst), (ins arc:$src0, vrc:$src1), opName#"_e32", pattern
   >;
+  def _e64 : VOP3 <
+{1, 0, 0, op{5}, op{4}, op{3}, op{2}, op{1}, op{0}},
+(outs vrc:$dst),
+(ins arc:$src0, vrc:$src1,
+ i32imm:$abs, i32imm:$clamp,
+ i32imm:$omod, i32imm:$neg),
+opName#"_e64", []
+  > {
+let SRC2 = 0x80;
+  }
 }
 
-multiclass VOP2_64  op, string opName, list pattern> {
-  def _e32: VOP2_Helper ;
+multiclass VOP2_32  op, string opName, list pattern>
+  : VOP2_Helper ;
 
-  def _e64 : VOP3_64 <
-{1, 0, 0, op{5}, op{4}, op{3}, op{2}, op{1}, op{0}},
-opName, []
-  >;
-}
+multiclass VOP2_64  op, string opName, list pattern>
+  : VOP2_Helper ;
 
 class SOPK_32  op, string opName, list pattern>
   : SOPK ;
-- 
1.7.10.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 2/8] R600/SI: rework VOP1_* patterns

2013-02-19 Thread Christian König

From: Christian König 

Fixing asm operation names.

Signed-off-by: Christian König 
---
 lib/Target/R600/SIInstrInfo.td |   36 ++--
 1 file changed, 18 insertions(+), 18 deletions(-)

diff --git a/lib/Target/R600/SIInstrInfo.td b/lib/Target/R600/SIInstrInfo.td
index 77c57b7..dbe616d 100644
--- a/lib/Target/R600/SIInstrInfo.td
+++ b/lib/Target/R600/SIInstrInfo.td
@@ -100,28 +100,28 @@ class SOP2_32  op, string opName, list 
pattern>
 class SOP2_64  op, string opName, list pattern>
   : SOP2 ;
 
-class VOP1_Helper  op, RegisterClass vrc, RegisterClass arc,
-   string opName, list pattern> : 
-  VOP1 <
-op, (outs vrc:$dst), (ins arc:$src0), opName, pattern
-  >;
+multiclass VOP1_Helper  op, RegisterClass drc, RegisterClass src,
+string opName, list pattern> {
 
-multiclass VOP1_32  op, string opName, list pattern> {
-  def _e32: VOP1_Helper ;
-  def _e64 : VOP3_32 <{1, 1, op{6}, op{5}, op{4}, op{3}, op{2}, op{1}, op{0}},
-  opName, []
-  >;
+  def _e32: VOP1 ;
+  def _e64 : VOP3 <
+{1, 1, op{6}, op{5}, op{4}, op{3}, op{2}, op{1}, op{0}},
+(outs drc:$dst),
+(ins src:$src0,
+ i32imm:$abs, i32imm:$clamp,
+ i32imm:$omod, i32imm:$neg),
+opName#"_e64", []
+  > {
+let SRC1 = 0x80;
+let SRC2 = 0x80;
+  }
 }
 
-multiclass VOP1_64  op, string opName, list pattern> {
-
-  def _e32 : VOP1_Helper ;
+multiclass VOP1_32  op, string opName, list pattern>
+  : VOP1_Helper ;
 
-  def _e64 : VOP3_64 <
-{1, 1, op{6}, op{5}, op{4}, op{3}, op{2}, op{1}, op{0}},
-opName, []
-  >;
-}
+multiclass VOP1_64  op, string opName, list pattern>
+  : VOP1_Helper ;
 
 class VOP2_Helper  op, RegisterClass vrc, RegisterClass arc,
string opName, list pattern> :
-- 
1.7.10.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 4/8] R600/SI: simplify VOPC_* patterns

2013-02-19 Thread Christian König

From: Christian König 

Fixing asm operation names.

Signed-off-by: Christian König 
---
 lib/Target/R600/AMDGPUInstructions.td |5 +
 lib/Target/R600/SIInstrInfo.td|   19 +-
 lib/Target/R600/SIInstructions.td |  444 +++--
 3 files changed, 213 insertions(+), 255 deletions(-)

diff --git a/lib/Target/R600/AMDGPUInstructions.td 
b/lib/Target/R600/AMDGPUInstructions.td
index 0559a5a..960f108 100644
--- a/lib/Target/R600/AMDGPUInstructions.td
+++ b/lib/Target/R600/AMDGPUInstructions.td
@@ -77,6 +77,11 @@ def COND_LE : PatLeaf <
  case ISD::SETLE: return true;}}}]
 >;
 
+def COND_NULL : PatLeaf <
+  (cond),
+  [{return false;}]
+>;
+
 
//===--===//
 // Load/Store Pattern Fragments
 
//===--===//
diff --git a/lib/Target/R600/SIInstrInfo.td b/lib/Target/R600/SIInstrInfo.td
index be791e2..69357ce 100644
--- a/lib/Target/R600/SIInstrInfo.td
+++ b/lib/Target/R600/SIInstrInfo.td
@@ -153,26 +153,31 @@ class SOPK_64  op, string opName, list 
pattern>
   : SOPK ;
 
 multiclass VOPC_Helper  op, RegisterClass vrc, RegisterClass arc,
-string opName, list pattern> {
+string opName, ValueType vt, PatLeaf cond> {
 
-  def _e32 : VOPC ;
+  def _e32 : VOPC ;
   def _e64 : VOP3 <
 {0, op{7}, op{6}, op{5}, op{4}, op{3}, op{2}, op{1}, op{0}},
 (outs SReg_64:$dst),
 (ins arc:$src0, vrc:$src1,
  InstFlag:$abs, InstFlag:$clamp,
  InstFlag:$omod, InstFlag:$neg),
-opName, pattern
+opName#"_e32",
+!if(!eq(!cast(cond), "COND_NULL"), [],
+  [(set SReg_64:$dst, (i1 (setcc (vt arc:$src0), vrc:$src1, cond)))]
+)
   > {
 let SRC2 = 0x80;
   }
 }
 
-multiclass VOPC_32  op, string opName, list pattern>
-  : VOPC_Helper ;
+multiclass VOPC_32  op, string opName,
+  ValueType vt = untyped, PatLeaf cond = COND_NULL>
+  : VOPC_Helper ;
 
-multiclass VOPC_64  op, string opName, list pattern>
-  : VOPC_Helper ;
+multiclass VOPC_64  op, string opName,
+  ValueType vt = untyped, PatLeaf cond = COND_NULL>
+  : VOPC_Helper ;
 
 class SOPC_32  op, string opName, list pattern>
   : SOPC ;
diff --git a/lib/Target/R600/SIInstructions.td 
b/lib/Target/R600/SIInstructions.td
index b4a263d..700b8f8 100644
--- a/lib/Target/R600/SIInstructions.td
+++ b/lib/Target/R600/SIInstructions.td
@@ -127,286 +127,234 @@ def S_GETREG_REGRD_B32 : SOPK_32 <0x0014, 
"S_GETREG_REGRD_B32", []>;
 //def S_SETREG_IMM32_B32 : SOPK_32 <0x0015, "S_SETREG_IMM32_B32", []>;
 //def EXP : EXP_ <0x, "EXP", []>;
 
-defm V_CMP_F_F32 : VOPC_32 <0x, "V_CMP_F_F32", []>;
-defm V_CMP_LT_F32 : VOPC_32 <0x0001, "V_CMP_LT_F32", []>;
-def : Pat <
-  (i1 (setcc (f32 VSrc_32:$src0), VReg_32:$src1, COND_LT)),
-  (V_CMP_LT_F32_e64 VSrc_32:$src0, VReg_32:$src1)
->;
-defm V_CMP_EQ_F32 : VOPC_32 <0x0002, "V_CMP_EQ_F32", []>;
-def : Pat <
-  (i1 (setcc (f32 VSrc_32:$src0), VReg_32:$src1, COND_EQ)),
-  (V_CMP_EQ_F32_e64 VSrc_32:$src0, VReg_32:$src1)
->;
-defm V_CMP_LE_F32 : VOPC_32 <0x0003, "V_CMP_LE_F32", []>;
-def : Pat <
-  (i1 (setcc (f32 VSrc_32:$src0), VReg_32:$src1, COND_LE)),
-  (V_CMP_LE_F32_e64 VSrc_32:$src0, VReg_32:$src1)
->;
-defm V_CMP_GT_F32 : VOPC_32 <0x0004, "V_CMP_GT_F32", []>;
-def : Pat <
-  (i1 (setcc (f32 VSrc_32:$src0), VReg_32:$src1, COND_GT)),
-  (V_CMP_GT_F32_e64 VSrc_32:$src0, VReg_32:$src1)
->;
-defm V_CMP_LG_F32 : VOPC_32 <0x0005, "V_CMP_LG_F32", []>;
-def : Pat <
-  (i1 (setcc (f32 VSrc_32:$src0), VReg_32:$src1, COND_NE)),
-  (V_CMP_LG_F32_e64 VSrc_32:$src0, VReg_32:$src1)
->;
-defm V_CMP_GE_F32 : VOPC_32 <0x0006, "V_CMP_GE_F32", []>;
-def : Pat <
-  (i1 (setcc (f32 VSrc_32:$src0), VReg_32:$src1, COND_GE)),
-  (V_CMP_GE_F32_e64 VSrc_32:$src0, VReg_32:$src1)
->;
-defm V_CMP_O_F32 : VOPC_32 <0x0007, "V_CMP_O_F32", []>;
-defm V_CMP_U_F32 : VOPC_32 <0x0008, "V_CMP_U_F32", []>;
-defm V_CMP_NGE_F32 : VOPC_32 <0x0009, "V_CMP_NGE_F32", []>;
-defm V_CMP_NLG_F32 : VOPC_32 <0x000a, "V_CMP_NLG_F32", []>;
-defm V_CMP_NGT_F32 : VOPC_32 <0x000b, "V_CMP_NGT_F32", []>;
-defm V_CMP_NLE_F32 : VOPC_32 <0x000c, "V_CMP_NLE_F32", []>;
-defm V_CMP_NEQ_F32 : VOPC_32 <0x000d, "V_CMP_NEQ_F32", []>;
-def : Pat <
-  (i1 (setcc (f32 VSrc_32:$src0), VReg_32:$src1, COND_NE)),
-  (V_CMP_NEQ_F32_e64 VSrc_32:$src0, VReg_32:$src1)
->;
-defm V_CMP_NLT_F32 : VOPC_32 <0x000e, "V_CMP_NLT_F32", []>;
-defm V_CMP_TRU_F32 : VOPC_32 <0x000f, "V_CMP_TRU_F32", []>;
+defm V_CMP_F_F32 : VOPC_32 <0x, "V_CMP_F_F32">;
+defm V_CMP_LT_F32 : VOPC_32 <0x0001, "V_CMP_LT_F32", f32, COND_LT>;
+defm V_CMP_EQ_F32 : VOPC_32 <0x0002, "V_CMP_EQ_F32", f32, COND_EQ>;
+defm V_CMP_LE_F32 : VOPC_32 <0x0003, "V_CMP_LE_F32", f32, COND_LE>;
+defm V_CMP_GT_F32 : VOPC_32 <0x0004, "V_CMP_GT_F32", f32, COND_GT>;
+defm V_CMP_LG_F32 : VOPC_32 <0x0005, "V_

[Mesa-dev] [PATCH 1/8] R600/SI: cleanup SIInstrInfo.td and SIInstrFormat.td

2013-02-19 Thread Christian König

From: Christian König 

Those two files got mixed up.

Signed-off-by: Christian König 
---
 lib/Target/R600/SIInstrFormats.td |  500 +++--
 lib/Target/R600/SIInstrInfo.td|  495 +++-
 2 files changed, 509 insertions(+), 486 deletions(-)

diff --git a/lib/Target/R600/SIInstrFormats.td 
b/lib/Target/R600/SIInstrFormats.td
index 40e37aa..fe417d6 100644
--- a/lib/Target/R600/SIInstrFormats.td
+++ b/lib/Target/R600/SIInstrFormats.td
@@ -1,4 +1,4 @@
-//===-- SIInstrFormats.td - SI Instruction Formats 
===//
+//===-- SIInstrFormats.td - SI Instruction Encodings 
--===//
 //
 // The LLVM Compiler Infrastructure
 //
@@ -9,180 +9,418 @@
 //
 // SI Instruction format definitions.
 //
-// Instructions with _32 take 32-bit operands.
-// Instructions with _64 take 64-bit operands.
-//
-// VOP_* instructions can use either a 32-bit or 64-bit encoding.  The 32-bit
-// encoding is the standard encoding, but instruction that make use of
-// any of the instruction modifiers must use the 64-bit encoding.
-//
-// Instructions with _e32 use the 32-bit encoding.
-// Instructions with _e64 use the 64-bit encoding.
-//
 
//===--===//
 
-class VOP3_32  op, string opName, list pattern>
-  : VOP3 ;
+class InstSI  pattern> :
+AMDGPUInst {
+
+  field bits<1> VM_CNT = 0;
+  field bits<1> EXP_CNT = 0;
+  field bits<1> LGKM_CNT = 0;
+
+  let TSFlags{0} = VM_CNT;
+  let TSFlags{1} = EXP_CNT;
+  let TSFlags{2} = LGKM_CNT;
+}
+
+class Enc32  pattern> :
+InstSI  {
+
+  field bits<32> Inst;
+  let Size = 4;
+}
 
-class VOP3_64  op, string opName, list pattern>
-  : VOP3 ;
+class Enc64  pattern> :
+InstSI  {
 
-class SOP1_32  op, string opName, list pattern>
-  : SOP1 ;
+  field bits<64> Inst;
+  let Size = 8;
+}
 
-class SOP1_64  op, string opName, list pattern>
-  : SOP1 ;
+//===--===//
+// Scalar operations
+//===--===//
 
-class SOP2_32  op, string opName, list pattern>
-  : SOP2 ;
+class SOP1  op, dag outs, dag ins, string asm, list pattern> :
+Enc32 {
 
-class SOP2_64  op, string opName, list pattern>
-  : SOP2 ;
+  bits<7> SDST;
+  bits<8> SSRC0;
 
-class VOP1_Helper  op, RegisterClass vrc, RegisterClass arc,
-   string opName, list pattern> : 
-  VOP1 <
-op, (outs vrc:$dst), (ins arc:$src0), opName, pattern
-  >;
+  let Inst{7-0} = SSRC0;
+  let Inst{15-8} = op;
+  let Inst{22-16} = SDST;
+  let Inst{31-23} = 0x17d; //encoding;
 
-multiclass VOP1_32  op, string opName, list pattern> {
-  def _e32: VOP1_Helper ;
-  def _e64 : VOP3_32 <{1, 1, op{6}, op{5}, op{4}, op{3}, op{2}, op{1}, op{0}},
-  opName, []
-  >;
+  let mayLoad = 0;
+  let mayStore = 0;
+  let hasSideEffects = 0;
 }
 
-multiclass VOP1_64  op, string opName, list pattern> {
+class SOP2  op, dag outs, dag ins, string asm, list pattern> :
+Enc32  {
+  
+  bits<7> SDST;
+  bits<8> SSRC0;
+  bits<8> SSRC1;
 
-  def _e32 : VOP1_Helper ;
+  let Inst{7-0} = SSRC0;
+  let Inst{15-8} = SSRC1;
+  let Inst{22-16} = SDST;
+  let Inst{29-23} = op;
+  let Inst{31-30} = 0x2; // encoding
 
-  def _e64 : VOP3_64 <
-{1, 1, op{6}, op{5}, op{4}, op{3}, op{2}, op{1}, op{0}},
-opName, []
-  >;
+  let mayLoad = 0;
+  let mayStore = 0;
+  let hasSideEffects = 0;
 }
 
-class VOP2_Helper  op, RegisterClass vrc, RegisterClass arc,
-   string opName, list pattern> :
-  VOP2 <
-op, (outs vrc:$dst), (ins arc:$src0, vrc:$src1), opName, pattern
-  >;
+class SOPC  op, dag outs, dag ins, string asm, list pattern> :
+  Enc32 {
 
-multiclass VOP2_32  op, string opName, list pattern> {
+  bits<8> SSRC0;
+  bits<8> SSRC1;
 
-  def _e32 : VOP2_Helper ;
+  let Inst{7-0} = SSRC0;
+  let Inst{15-8} = SSRC1;
+  let Inst{22-16} = op;
+  let Inst{31-23} = 0x17e;
 
-  def _e64 : VOP3_32 <{1, 0, 0, op{5}, op{4}, op{3}, op{2}, op{1}, op{0}},
-  opName, []
-  >;
+  let DisableEncoding = "$dst";
+  let mayLoad = 0;
+  let mayStore = 0;
+  let hasSideEffects = 0;
 }
 
-multiclass VOP2_64  op, string opName, list pattern> {
-  def _e32: VOP2_Helper ;
+class SOPK  op, dag outs, dag ins, string asm, list pattern> :
+   Enc32  {
 
-  def _e64 : VOP3_64 <
-{1, 0, 0, op{5}, op{4}, op{3}, op{2}, op{1}, op{0}},
-opName, []
-  >;
+  bits <7> SDST;
+  bits <16> SIMM16;
+  
+  let Inst{15-0} = SIMM16;
+  let Inst{22-16} = SDST;
+  let Inst{27-23} = op;
+  let Inst{31-28} = 0xb; //encoding
+
+  let mayLoad = 0;
+  let mayStore = 0;
+  let hasSideEffects = 0;
 }
 
-class SOPK_32  op, string opName, list pattern>
-  : SOPK ;
+class SOPP  op, dag ins, string asm, list pattern> : Enc32 <
+  (outs),
+  ins,
+  asm,
+  pattern > {
 
-class SOPK_64  op, string opName, list pattern>
-  : SOPK ;
+  bits

[Mesa-dev] [PATCH 5/8] R600/SI: sort and cleanup SIInstrInfo.td

2013-02-19 Thread Christian König

From: Christian König 

Fix code formating and sort/group the classes.

Signed-off-by: Christian König 
---
 lib/Target/R600/SIInstrInfo.td |  100 +++-
 1 file changed, 58 insertions(+), 42 deletions(-)

diff --git a/lib/Target/R600/SIInstrInfo.td b/lib/Target/R600/SIInstrInfo.td
index 69357ce..9bdab10 100644
--- a/lib/Target/R600/SIInstrInfo.td
+++ b/lib/Target/R600/SIInstrInfo.td
@@ -82,11 +82,9 @@ include "SIInstrFormats.td"
 //
 
//===--===//
 
-class VOP3_32  op, string opName, list pattern>
-  : VOP3 ;
-
-class VOP3_64  op, string opName, list pattern>
-  : VOP3 ;
+//===--===//
+// Scalar classes
+//===--===//
 
 class SOP1_32  op, string opName, list pattern>
   : SOP1 ;
@@ -100,6 +98,36 @@ class SOP2_32  op, string opName, list pattern>
 class SOP2_64  op, string opName, list pattern>
   : SOP2 ;
 
+class SOPC_32  op, string opName, list pattern>
+  : SOPC ;
+
+class SOPC_64  op, string opName, list pattern>
+  : SOPC ;
+
+class SOPK_32  op, string opName, list pattern>
+  : SOPK ;
+
+class SOPK_64  op, string opName, list pattern>
+  : SOPK ;
+
+multiclass SMRD_Helper  op, string asm, RegisterClass dstClass> {
+  def _IMM : SMRD <
+op, 1, (outs dstClass:$dst),
+(ins GPR2Align:$sbase, i32imm:$offset),
+asm, []
+  >;
+
+  def _SGPR : SMRD <
+op, 0, (outs dstClass:$dst),
+(ins GPR2Align:$sbase, SReg_32:$soff),
+asm, []
+  >;
+}
+
+//===--===//
+// Vector ALU classes
+//===--===//
+
 multiclass VOP1_Helper  op, RegisterClass drc, RegisterClass src,
 string opName, list pattern> {
 
@@ -146,11 +174,19 @@ multiclass VOP2_32  op, string opName, list 
pattern>
 multiclass VOP2_64  op, string opName, list pattern>
   : VOP2_Helper ;
 
-class SOPK_32  op, string opName, list pattern>
-  : SOPK ;
+class VOP3_32  op, string opName, list pattern> : VOP3 <
+  op, (outs VReg_32:$dst),
+  (ins VSrc_32:$src0, VReg_32:$src1, VReg_32:$src2, i32imm:$src3,
+   i32imm:$src4, i32imm:$src5, i32imm:$src6),
+  opName, pattern
+>;
 
-class SOPK_64  op, string opName, list pattern>
-  : SOPK ;
+class VOP3_64  op, string opName, list pattern> : VOP3 <
+  op, (outs VReg_64:$dst),
+  (ins VSrc_64:$src0, VReg_64:$src1, VReg_64:$src2,
+   i32imm:$src3, i32imm:$src4, i32imm:$src5, i32imm:$src6),
+  opName, pattern
+>;
 
 multiclass VOPC_Helper  op, RegisterClass vrc, RegisterClass arc,
 string opName, ValueType vt, PatLeaf cond> {
@@ -179,23 +215,9 @@ multiclass VOPC_64  op, string opName,
   ValueType vt = untyped, PatLeaf cond = COND_NULL>
   : VOPC_Helper ;
 
-class SOPC_32  op, string opName, list pattern>
-  : SOPC ;
-
-class SOPC_64  op, string opName, list pattern>
-  : SOPC ;
-
-class MIMG_Load_Helper  op, string asm> : MIMG <
-  op,
-  (outs VReg_128:$vdata),
-  (ins i32imm:$dmask, i1imm:$unorm, i1imm:$glc, i1imm:$da, i1imm:$r128,
-   i1imm:$tfe, i1imm:$lwe, i1imm:$slc, VReg_32:$vaddr,
-   GPR4Align:$srsrc, GPR4Align:$ssamp),
-  asm,
-  []> {
-  let mayLoad = 1;
-  let mayStore = 0;
-}
+//===--===//
+// Vector ALU classes
+//===--===//
 
 class MTBUF_Store_Helper  op, string asm, RegisterClass regClass> : 
MTBUF <
   op,
@@ -233,22 +255,16 @@ class MTBUF_Load_Helper  op, string asm, 
RegisterClass regClass> : MTBUF
   let mayStore = 0;
 }
 
-multiclass SMRD_Helper  op, string asm, RegisterClass dstClass> {
-  def _IMM : SMRD <
- op, 1,
- (outs dstClass:$dst),
- (ins GPR2Align:$sbase, i32imm:$offset),
- asm,
- []
-  >;
-
-  def _SGPR : SMRD <
-  op, 0,
-  (outs dstClass:$dst),
-  (ins GPR2Align:$sbase, SReg_32:$soff),
-  asm,
-  []
-  >;
+class MIMG_Load_Helper  op, string asm> : MIMG <
+  op,
+  (outs VReg_128:$vdata),
+  (ins i32imm:$dmask, i1imm:$unorm, i1imm:$glc, i1imm:$da, i1imm:$r128,
+   i1imm:$tfe, i1imm:$lwe, i1imm:$slc, VReg_32:$vaddr,
+   GPR4Align:$srsrc, GPR4Align:$ssamp),
+  asm,
+  []> {
+  let mayLoad = 1;
+  let mayStore = 0;
 }
 
 include "SIInstructions.td"
-- 
1.7.10.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 6/8] R600/SI: use patterns for clamp, fabs, fneg

2013-02-19 Thread Christian König

From: Christian König 

Instead of using custom inserters, it's simpler and
should make DAG folding easier.

Signed-off-by: Christian König 
---
 lib/Target/R600/SIISelLowering.cpp |   36 
 lib/Target/R600/SIInstructions.td  |   26 ++
 2 files changed, 22 insertions(+), 40 deletions(-)

diff --git a/lib/Target/R600/SIISelLowering.cpp 
b/lib/Target/R600/SIISelLowering.cpp
index 5a468ae..2f304eb 100644
--- a/lib/Target/R600/SIISelLowering.cpp
+++ b/lib/Target/R600/SIISelLowering.cpp
@@ -62,7 +62,6 @@ SITargetLowering::SITargetLowering(TargetMachine &TM) :
 
 MachineBasicBlock * SITargetLowering::EmitInstrWithCustomInserter(
 MachineInstr * MI, MachineBasicBlock * BB) const {
-  const TargetInstrInfo * TII = getTargetMachine().getInstrInfo();
   MachineRegisterInfo & MRI = BB->getParent()->getRegInfo();
   MachineBasicBlock::iterator I = MI;
 
@@ -70,41 +69,6 @@ MachineBasicBlock * 
SITargetLowering::EmitInstrWithCustomInserter(
   default:
 return AMDGPUTargetLowering::EmitInstrWithCustomInserter(MI, BB);
   case AMDGPU::BRANCH: return BB;
-  case AMDGPU::CLAMP_SI:
-BuildMI(*BB, I, BB->findDebugLoc(I), TII->get(AMDGPU::V_ADD_F32_e64))
-   .addOperand(MI->getOperand(0))
-   .addOperand(MI->getOperand(1))
-   .addImm(0x80) // SRC1
-   .addImm(0) // ABS
-   .addImm(1) // CLAMP
-   .addImm(0) // OMOD
-   .addImm(0); // NEG
-MI->eraseFromParent();
-break;
-
-  case AMDGPU::FABS_SI:
-BuildMI(*BB, I, BB->findDebugLoc(I), TII->get(AMDGPU::V_ADD_F32_e64))
- .addOperand(MI->getOperand(0))
- .addOperand(MI->getOperand(1))
- .addImm(0x80) // SRC1
- .addImm(1) // ABS
- .addImm(0) // CLAMP
- .addImm(0) // OMOD
- .addImm(0); // NEG
-MI->eraseFromParent();
-break;
-
-  case AMDGPU::FNEG_SI:
-BuildMI(*BB, I, BB->findDebugLoc(I), TII->get(AMDGPU::V_ADD_F32_e64))
- .addOperand(MI->getOperand(0))
- .addOperand(MI->getOperand(1))
- .addImm(0x80) // SRC1
- .addImm(0) // ABS
- .addImm(0) // CLAMP
- .addImm(0) // OMOD
- .addImm(1); // NEG
-MI->eraseFromParent();
-break;
   case AMDGPU::SHADER_TYPE:
 BB->getParent()->getInfo()->ShaderType =
 MI->getOperand(0).getImm();
diff --git a/lib/Target/R600/SIInstructions.td 
b/lib/Target/R600/SIInstructions.td
index 700b8f8..71de032 100644
--- a/lib/Target/R600/SIInstructions.td
+++ b/lib/Target/R600/SIInstructions.td
@@ -1184,10 +1184,6 @@ defm : SamplePatterns;
 defm : SamplePatterns;
 defm : SamplePatterns;
 
-def CLAMP_SI : CLAMP;
-def FABS_SI : FABS;
-def FNEG_SI : FNEG;
-
 def : Extract_Element ;
 def : Extract_Element ;
 def : Extract_Element ;
@@ -1211,6 +1207,28 @@ def : BitConvert ;
 def : BitConvert ;
 def : BitConvert ;
 
+/** === **/
+/** Src & Dst modifiers **/
+/** === **/
+
+def : Pat <
+  (int_AMDIL_clamp VReg_32:$src, (f32 FP_ZERO), (f32 FP_ONE)),
+  (V_ADD_F32_e64 VReg_32:$src, (i32 0x80 /* SRC1 */),
+   0 /* ABS */, 1 /* CLAMP */, 0 /* OMOD */, 0 /* NEG */)
+>;
+
+def : Pat <
+  (fabs VReg_32:$src),
+  (V_ADD_F32_e64 VReg_32:$src, (i32 0x80 /* SRC1 */),
+   1 /* ABS */, 0 /* CLAMP */, 0 /* OMOD */, 0 /* NEG */)
+>;
+
+def : Pat <
+  (fneg VReg_32:$src),
+  (V_ADD_F32_e64 VReg_32:$src, (i32 0x80 /* SRC1 */),
+   0 /* ABS */, 0 /* CLAMP */, 0 /* OMOD */, 1 /* NEG */)
+>;
+
 /** == **/
 /** Immediate Patterns **/
 /** == **/
-- 
1.7.10.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 7/8] R600/SI: add OMOD patterns

2013-02-19 Thread Christian König

From: Christian König 

Signed-off-by: Christian König 
---
 lib/Target/R600/AMDGPUInstructions.td |   15 +++
 lib/Target/R600/SIInstructions.td |   18 ++
 2 files changed, 33 insertions(+)

diff --git a/lib/Target/R600/AMDGPUInstructions.td 
b/lib/Target/R600/AMDGPUInstructions.td
index 960f108..da3d7b7 100644
--- a/lib/Target/R600/AMDGPUInstructions.td
+++ b/lib/Target/R600/AMDGPUInstructions.td
@@ -102,11 +102,26 @@ def FP_ZERO : PatLeaf <
   [{return N->getValueAPF().isZero();}]
 >;
 
+def FP_0_5 : PatLeaf <
+  (fpimm),
+  [{return N->isExactlyValue(0.5);}]
+>;
+
 def FP_ONE : PatLeaf <
   (fpimm),
   [{return N->isExactlyValue(1.0);}]
 >;
 
+def FP_TWO : PatLeaf <
+  (fpimm),
+  [{return N->isExactlyValue(2.0);}]
+>;
+
+def FP_FOUR : PatLeaf <
+  (fpimm),
+  [{return N->isExactlyValue(4.0);}]
+>;
+
 let isCodeGenOnly = 1, isPseudo = 1 in {
 
 let usesCustomInserter = 1  in {
diff --git a/lib/Target/R600/SIInstructions.td 
b/lib/Target/R600/SIInstructions.td
index 71de032..3b7cc6f 100644
--- a/lib/Target/R600/SIInstructions.td
+++ b/lib/Target/R600/SIInstructions.td
@@ -1229,6 +1229,24 @@ def : Pat <
0 /* ABS */, 0 /* CLAMP */, 0 /* OMOD */, 1 /* NEG */)
 >;
 
+def : Pat <
+  (fmul VReg_32:$src, (f32 FP_0_5)),
+  (V_ADD_F32_e64 VReg_32:$src, (i32 0x80 /* SRC1 */),
+   0 /* ABS */, 0 /* CLAMP */, 3 /* OMOD */, 0 /* NEG */)
+>;
+
+def : Pat <
+  (fmul VReg_32:$src, (f32 FP_TWO)),
+  (V_ADD_F32_e64 VReg_32:$src, (i32 0x80 /* SRC1 */),
+   0 /* ABS */, 0 /* CLAMP */, 1 /* OMOD */, 0 /* NEG */)
+>;
+
+def : Pat <
+  (fmul VReg_32:$src, (f32 FP_FOUR)),
+  (V_ADD_F32_e64 VReg_32:$src, (i32 0x80 /* SRC1 */),
+   0 /* ABS */, 0 /* CLAMP */, 2 /* OMOD */, 0 /* NEG */)
+>;
+
 /** == **/
 /** Immediate Patterns **/
 /** == **/
-- 
1.7.10.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 8/8] R600/SI: replace SI_V_CNDLT with a pattern

2013-02-19 Thread Christian König

From: Christian König 

It actually fixes quite a bunch of piglit tests.

Signed-off-by: Christian König 
---
 lib/Target/R600/SIISelLowering.cpp |   22 --
 lib/Target/R600/SIISelLowering.h   |2 --
 lib/Target/R600/SIInstructions.td  |   12 +---
 3 files changed, 5 insertions(+), 31 deletions(-)

diff --git a/lib/Target/R600/SIISelLowering.cpp 
b/lib/Target/R600/SIISelLowering.cpp
index 2f304eb..212e3f2 100644
--- a/lib/Target/R600/SIISelLowering.cpp
+++ b/lib/Target/R600/SIISelLowering.cpp
@@ -81,9 +81,6 @@ MachineBasicBlock * 
SITargetLowering::EmitInstrWithCustomInserter(
   case AMDGPU::SI_WQM:
 LowerSI_WQM(MI, *BB, I, MRI);
 break;
-  case AMDGPU::SI_V_CNDLT:
-LowerSI_V_CNDLT(MI, *BB, I, MRI);
-break;
   }
   return BB;
 }
@@ -127,25 +124,6 @@ void SITargetLowering::LowerSI_INTERP(MachineInstr *MI, 
MachineBasicBlock &BB,
   MI->eraseFromParent();
 }
 
-void SITargetLowering::LowerSI_V_CNDLT(MachineInstr *MI, MachineBasicBlock &BB,
-MachineBasicBlock::iterator I, MachineRegisterInfo & MRI) const {
-  unsigned VCC = MRI.createVirtualRegister(&AMDGPU::SReg_64RegClass);
-
-  BuildMI(BB, I, BB.findDebugLoc(I),
-  TII->get(AMDGPU::V_CMP_GT_F32_e32),
-  VCC)
-  .addImm(0)
-  .addOperand(MI->getOperand(1));
-
-  BuildMI(BB, I, BB.findDebugLoc(I), TII->get(AMDGPU::V_CNDMASK_B32_e32))
-  .addOperand(MI->getOperand(0))
-  .addOperand(MI->getOperand(3))
-  .addOperand(MI->getOperand(2))
-  .addReg(VCC);
-
-  MI->eraseFromParent();
-}
-
 EVT SITargetLowering::getSetCCResultType(EVT VT) const {
   return MVT::i1;
 }
diff --git a/lib/Target/R600/SIISelLowering.h b/lib/Target/R600/SIISelLowering.h
index a8429b7..5d048f8 100644
--- a/lib/Target/R600/SIISelLowering.h
+++ b/lib/Target/R600/SIISelLowering.h
@@ -29,8 +29,6 @@ class SITargetLowering : public AMDGPUTargetLowering {
   MachineBasicBlock::iterator I, MachineRegisterInfo & MRI) const;
   void LowerSI_WQM(MachineInstr *MI, MachineBasicBlock &BB,
   MachineBasicBlock::iterator I, MachineRegisterInfo & MRI) const;
-  void LowerSI_V_CNDLT(MachineInstr *MI, MachineBasicBlock &BB,
-  MachineBasicBlock::iterator I, MachineRegisterInfo & MRI) const;
 
   SDValue LowerLOAD(SDValue Op, SelectionDAG &DAG) const;
   SDValue LowerSELECT_CC(SDValue Op, SelectionDAG &DAG) const;
diff --git a/lib/Target/R600/SIInstructions.td 
b/lib/Target/R600/SIInstructions.td
index 3b7cc6f..01e7933 100644
--- a/lib/Target/R600/SIInstructions.td
+++ b/lib/Target/R600/SIInstructions.td
@@ -987,13 +987,6 @@ def LOAD_CONST : AMDGPUShaderInst <
 
 let usesCustomInserter = 1 in {
 
-def SI_V_CNDLT : InstSI <
-  (outs VReg_32:$dst),
-  (ins VReg_32:$src0, VReg_32:$src1, VReg_32:$src2),
-  "SI_V_CNDLT $dst, $src0, $src1, $src2",
-  [(set VReg_32:$dst, (int_AMDGPU_cndlt VReg_32:$src0, VReg_32:$src1, 
VReg_32:$src2))]
->;
-
 def SI_INTERP : InstSI <
   (outs VReg_32:$dst),
   (ins VReg_32:$i, VReg_32:$j, i32imm:$attr_chan, i32imm:$attr, 
SReg_32:$params),
@@ -1083,6 +1076,11 @@ def SI_KILL : InstSI <
 
 } // end IsCodeGenOnly, isPseudo
 
+def : Pat<
+  (int_AMDGPU_cndlt VReg_32:$src0, VReg_32:$src1, VReg_32:$src2),
+  (V_CNDMASK_B32_e64 VReg_32:$src2, VReg_32:$src1, (V_CMP_GT_F32_e64 0, 
VReg_32:$src0))
+>;
+
 def : Pat <
   (int_AMDGPU_kilp),
   (SI_KILL (V_MOV_B32_e32 0xbf80))
-- 
1.7.10.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 61093] [llvmpipe] lp_surface.c:68:lp_resource_copy: Assertion `src_box->depth == 1' failed.

2013-02-19 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=61093

José Fonseca  changed:

   What|Removed |Added

   Assignee|mesa-dev@lists.freedesktop. |srol...@vmware.com
   |org |
 CC||jfons...@vmware.com

--- Comment #2 from José Fonseca  ---

(In reply to comment #1)
> The assertion in lp_resource_copy can be fixed easily, but I can't reproduce
> it. 

Roland already has fix for review on mesa3d-dev.

> llvmpipe is failing a different assertion here:
> 
> texsubimage:
> /home/marek/dev/llvm/include/llvm/CodeGen/SelectionDAGNodes.h:539: const
> llvm::SDValue &llvm::SDNode::getOperand(unsigned int) const: Assertion `Num
> < NumOperands && "Invalid child # of SDNode!"' failed.

This must be something different.

> The way I see it, my work only uncovered this bug.

Yes, I agree.

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] radeonsi: Fix blending using destination alpha factor but non-alpha destination

2013-02-19 Thread Michel Dänzer

On Die, 2013-02-19 at 14:04 +0100, Marek Olšák wrote: 
> On Tue, Feb 19, 2013 at 11:02 AM, Michel Dänzer  wrote:
> >
> > Really, what I don't understand is why r600g doesn't seem affected by
> > this... at least on my RS880 it's passing the piglit tests this change
> > fixes with radeonsi. So maybe I'm just missing some magic bit for
> > radeonsi.
> 
> RGB formats do fail fbo-blending-formats with r600g/redwood here.

Okay.


> However the alpha channel can sometimes contain 1 in memory even if
> the format is RGBX. Off the top of my head, glClear, glTex[Sub]Image,
> glCopyTex[Sub]Image always set alpha to 1.

Well, but they shouldn't for these formats. :) The memory corresponding
to X* channels should remain unchanged. I'm working on a separate patch
for that for radeonsi.


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast |  Debian, X and DRI developer
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 38086] Mesa 7.11-devel implementation error: Unexpected program target in destroy_program_variants_cb()

2013-02-19 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=38086

--- Comment #6 from Brian Paul  ---
Can you make a trace of this issue with apitrace? 
https://github.com/apitrace/apitrace

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 60938] [softpipe] piglit interpolation-noperspective-gl_BackColor-flat-fixed regression

2013-02-19 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=60938

Brian Paul  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #1 from Brian Paul  ---
Fixed with commit 5da967aff5adb3e27954488206fb885ea1ede0fd

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 61026] Segfault in glBitmap when called with PBO source

2013-02-19 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=61026

Brian Paul  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #5 from Brian Paul  ---
Fixed with commit 63c30d7e4fd9676c72d5d94640e1e136bd9dd09f

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 59876] glGetTexLevelParameteriv broken for indirect rendering

2013-02-19 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=59876

Brian Paul  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #6 from Brian Paul  ---
Patch committed as 5876a5dbc0a6ec9ae7f44b5e483d38ae0d24a259

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 61012] alloc_layout_array tx * ty assertion failure when making pbuffer current

2013-02-19 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=61012

Brian Paul  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #9 from Brian Paul  ---
Fixed by commit e2091f64cb9ea79f3b51c353ed9facc03ec5690a

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 4/8] R600/SI: simplify VOPC_* patterns

2013-02-19 Thread Michel Dänzer

On Die, 2013-02-19 at 14:54 +0100, Christian König wrote: 
> From: Christian König 
> 
> Fixing asm operation names.

[...]

> diff --git a/lib/Target/R600/SIInstrInfo.td b/lib/Target/R600/SIInstrInfo.td
> index be791e2..69357ce 100644
> --- a/lib/Target/R600/SIInstrInfo.td
> +++ b/lib/Target/R600/SIInstrInfo.td
> @@ -153,26 +153,31 @@ class SOPK_64  op, string opName, list 
> pattern>
>: SOPK ;
>  
>  multiclass VOPC_Helper  op, RegisterClass vrc, RegisterClass arc,
> -string opName, list pattern> {
> +string opName, ValueType vt, PatLeaf cond> {
>  
> -  def _e32 : VOPC ;
> +  def _e32 : VOPC ;
>def _e64 : VOP3 <
>  {0, op{7}, op{6}, op{5}, op{4}, op{3}, op{2}, op{1}, op{0}},
>  (outs SReg_64:$dst),
>  (ins arc:$src0, vrc:$src1,
>   InstFlag:$abs, InstFlag:$clamp,
>   InstFlag:$omod, InstFlag:$neg),
> -opName, pattern
> +opName#"_e32",

I think this should be _e64, shouldn't it?

Also, while you're changing the asm strings, could you add the operands
to them?


>  let SRC2 = 0x80;

Hmm, we're scattering quite a few of these magic 0x80 around, would be
nice to make those more self-documenting somehow...


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast |  Debian, X and DRI developer
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] draw: make sure key size is calculated consistently.

2013-02-19 Thread Brian Paul


On 02/18/2013 05:27 PM, srol...@vmware.com wrote:

From: Roland Scheidegger

Some parts calculated key size by using shader information, others by using
the pipe_vertex_element information. Since it is perfectly valid to have more
vertex_elements set than the vertex shader is using those may not be the same,
so we weren't copying over all vertex_element state - this caused the tgsi dump
to assert (iterates over all vertex elements). With some luck it didn't
crash otherwise even though the llvm generate_fetch code also iterates over
all vertex elements (probably because llvm threw away the unused inputs anyway),
but if in this situation vertex texturing would be used things would definitely
go wrong (as the sampler information wouldn't be copied).
So drop the key size calculation using shader information.
---
  src/gallium/auxiliary/draw/draw_llvm.c |   13 -
  src/gallium/auxiliary/draw/draw_llvm.h |1 -
  .../draw/draw_pt_fetch_shade_pipeline_llvm.c   |7 ++-
  src/gallium/auxiliary/draw/draw_vs_llvm.c  |6 --
  4 files changed, 14 insertions(+), 13 deletions(-)



Looks OK to me.

Reviewed-by: Brian Paul 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] radeonsi: Fix blending using destination alpha factor but non-alpha destination

2013-02-19 Thread Marek Olšák

On Tue, Feb 19, 2013 at 3:28 PM, Michel Dänzer  wrote:
> On Die, 2013-02-19 at 14:04 +0100, Marek Olšák wrote:
>> On Tue, Feb 19, 2013 at 11:02 AM, Michel Dänzer  wrote:
>> >
>> > Really, what I don't understand is why r600g doesn't seem affected by
>> > this... at least on my RS880 it's passing the piglit tests this change
>> > fixes with radeonsi. So maybe I'm just missing some magic bit for
>> > radeonsi.
>>
>> RGB formats do fail fbo-blending-formats with r600g/redwood here.
>
> Okay.
>
>
>> However the alpha channel can sometimes contain 1 in memory even if
>> the format is RGBX. Off the top of my head, glClear, glTex[Sub]Image,
>> glCopyTex[Sub]Image always set alpha to 1.
>
> Well, but they shouldn't for these formats. :) The memory corresponding
> to X* channels should remain unchanged. I'm working on a separate patch
> for that for radeonsi.

I think the only way you could do that is to set the colormask to RGB.
Doesn't it have a negative effect on performance if some channels are
masked out?

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 61091] piglit glsl-fs-texture2drect regression

2013-02-19 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=61091

--- Comment #1 from Marek Olšák  ---
glBlitFramebuffer with rectangle textures is also broken with both softpipe and
llvmpipe and it has been so for quite a while. I have a piglit test for that.

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] VTK Tests fails with Mesa Swrast passes with OSMesa

2013-02-19 Thread Brian Paul


On 02/15/2013 09:00 AM, Kevin H. Hobbs wrote:

I have two machines {bubbles, murron} doing nightly dashboard builds of
VTK using nightly Mesa.

Each machine does a build of VTK using swrast and one with OSMesa.

Many tests pass on both machines when using OSMesa and fail on both
machines using swrast.

This is an example :

Test failing on bubbles and murron with swrast
http://open.cdash.org/testDetails.php?test=177420341&build=2813128
http://open.cdash.org/testDetails.php?test=177431601&build=2813212

Test passing on bubbles and murron with OSMesa
http://open.cdash.org/testDetails.php?test=177404326&build=2812997
http://open.cdash.org/testDetails.php?test=177411381&build=2813049

Many of the tests fail in a similar way: some of the elements of the
image are just missing.


Looks like lines, in particular, are missing.  I don't see any recent 
changes to swrast/osmesa that would seem to cause this.


I think there's two approaches to narrowing this down:

1. You do a git-bisect of mesa to find the regression
2. Make an apitrace of the failing test so I can investigate.

Thanks.

-Brian
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] radeonsi: Fix blending using destination alpha factor but non-alpha destination

2013-02-19 Thread Michel Dänzer

On Die, 2013-02-19 at 15:48 +0100, Marek Olšák wrote: 
> On Tue, Feb 19, 2013 at 3:28 PM, Michel Dänzer  wrote:
> > On Die, 2013-02-19 at 14:04 +0100, Marek Olšák wrote:
> >> On Tue, Feb 19, 2013 at 11:02 AM, Michel Dänzer  wrote:
> >> >
> >> > Really, what I don't understand is why r600g doesn't seem affected by
> >> > this... at least on my RS880 it's passing the piglit tests this change
> >> > fixes with radeonsi. So maybe I'm just missing some magic bit for
> >> > radeonsi.
> >>
> >> RGB formats do fail fbo-blending-formats with r600g/redwood here.
> >
> > Okay.
> >
> >
> >> However the alpha channel can sometimes contain 1 in memory even if
> >> the format is RGBX. Off the top of my head, glClear, glTex[Sub]Image,
> >> glCopyTex[Sub]Image always set alpha to 1.
> >
> > Well, but they shouldn't for these formats. :) The memory corresponding
> > to X* channels should remain unchanged. I'm working on a separate patch
> > for that for radeonsi.
> 
> I think the only way you could do that is to set the colormask to RGB.

Exactly.

> Doesn't it have a negative effect on performance if some channels are
> masked out?

It might, but I don't see that we really have a choice. If the app /
state tracker doesn't want to preserve those bits, it should use a
non-X* format.


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast |  Debian, X and DRI developer
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] draw: make sure key size is calculated consistently.

2013-02-19 Thread Jose Fonseca

There may be more vertex elements that used in the shader. But why should the 
key contain those elements? Won't this cause needless recompilations (e.g., in 
situations where the state tracker leaves unneeded elements from previous 
draw?)?

That is, it seems to be that the key should have the number of elements from 
pipe_vertex_element information, but only copy those that vertex shader uses.

Jose


- Original Message -
> From: Roland Scheidegger 
> 
> Some parts calculated key size by using shader information, others by using
> the pipe_vertex_element information. Since it is perfectly valid to have more
> vertex_elements set than the vertex shader is using those may not be the
> same,
> so we weren't copying over all vertex_element state - this caused the tgsi
> dump
> to assert (iterates over all vertex elements). With some luck it didn't
> crash otherwise even though the llvm generate_fetch code also iterates over
> all vertex elements (probably because llvm threw away the unused inputs
> anyway),
> but if in this situation vertex texturing would be used things would
> definitely
> go wrong (as the sampler information wouldn't be copied).
> So drop the key size calculation using shader information.
> ---
>  src/gallium/auxiliary/draw/draw_llvm.c |   13 -
>  src/gallium/auxiliary/draw/draw_llvm.h |1 -
>  .../draw/draw_pt_fetch_shade_pipeline_llvm.c   |7 ++-
>  src/gallium/auxiliary/draw/draw_vs_llvm.c  |6 --
>  4 files changed, 14 insertions(+), 13 deletions(-)
> 
> diff --git a/src/gallium/auxiliary/draw/draw_llvm.c
> b/src/gallium/auxiliary/draw/draw_llvm.c
> index f3b..df57358 100644
> --- a/src/gallium/auxiliary/draw/draw_llvm.c
> +++ b/src/gallium/auxiliary/draw/draw_llvm.c
> @@ -420,17 +420,20 @@ draw_llvm_destroy(struct draw_llvm *llvm)
>   */
>  struct draw_llvm_variant *
>  draw_llvm_create_variant(struct draw_llvm *llvm,
> -  unsigned num_inputs,
> -  const struct draw_llvm_variant_key *key)
> + unsigned num_inputs,
> + const struct draw_llvm_variant_key *key)
>  {
> struct draw_llvm_variant *variant;
> struct llvm_vertex_shader *shader =
>llvm_vertex_shader(llvm->draw->vs.vertex_shader);
> LLVMTypeRef vertex_header;
> +   unsigned key_size = draw_llvm_variant_key_size(key->nr_vertex_elements,
> +  MAX2(key->nr_samplers,
> +
> key->nr_sampler_views));
>  
> variant = MALLOC(sizeof *variant +
> - shader->variant_key_size -
> - sizeof variant->key);
> +key_size -
> +sizeof variant->key);
> if (variant == NULL)
>return NULL;
>  
> @@ -440,7 +443,7 @@ draw_llvm_create_variant(struct draw_llvm *llvm,
>  
> create_jit_types(variant);
>  
> -   memcpy(&variant->key, key, shader->variant_key_size);
> +   memcpy(&variant->key, key, key_size);
>  
> vertex_header = create_jit_vertex_header(variant->gallivm, num_inputs);
>  
> diff --git a/src/gallium/auxiliary/draw/draw_llvm.h
> b/src/gallium/auxiliary/draw/draw_llvm.h
> index 17ca304..b20cee5 100644
> --- a/src/gallium/auxiliary/draw/draw_llvm.h
> +++ b/src/gallium/auxiliary/draw/draw_llvm.h
> @@ -281,7 +281,6 @@ struct draw_llvm_variant
>  struct llvm_vertex_shader {
> struct draw_vertex_shader base;
>  
> -   unsigned variant_key_size;
> struct draw_llvm_variant_list_item variants;
> unsigned variants_created;
> unsigned variants_cached;
> diff --git a/src/gallium/auxiliary/draw/draw_pt_fetch_shade_pipeline_llvm.c
> b/src/gallium/auxiliary/draw/draw_pt_fetch_shade_pipeline_llvm.c
> index b0c18ed..d7f855f 100644
> --- a/src/gallium/auxiliary/draw/draw_pt_fetch_shade_pipeline_llvm.c
> +++ b/src/gallium/auxiliary/draw/draw_pt_fetch_shade_pipeline_llvm.c
> @@ -127,13 +127,18 @@ llvm_middle_end_prepare( struct draw_pt_middle_end
> *middle,
>struct llvm_vertex_shader *shader = llvm_vertex_shader(vs);
>char store[DRAW_LLVM_MAX_VARIANT_KEY_SIZE];
>unsigned i;
> +  unsigned key_size;
>  
>key = draw_llvm_make_variant_key(fpme->llvm, store);
>  
> +  key_size = draw_llvm_variant_key_size(key->nr_vertex_elements,
> +MAX2(key->nr_samplers,
> + key->nr_sampler_views));
> +
>/* Search shader's list of variants for the key */
>li = first_elem(&shader->variants);
>while (!at_end(&shader->variants, li)) {
> - if (memcmp(&li->base->key, key, shader->variant_key_size) == 0) {
> + if (memcmp(&li->base->key, key, key_size) == 0) {
>  variant = li->base;
>  break;
>   }
> diff --git a/src/gallium/auxiliary/draw/draw_vs_llvm.c
> b/src/gallium/auxiliary/draw/draw_vs_llvm.c
> index ac3999e..50cef79 100644
> --- a/src/gallium/auxi

Re: [Mesa-dev] [PATCH 5/6] R600: Remove LowerConstCopyPass and lower CONST_COPY right after ISel.

2013-02-19 Thread Tom Stellard

On Mon, Feb 18, 2013 at 05:27:29PM +0100, Vincent Lejeune wrote:
> Maintaining CONST_COPY Instructions until Pre Emit may prevent some ifcvt case
> and taking them in account for scheduling is difficult for no real benefit.
> ---
>  lib/Target/R600/AMDGPU.h|   1 -
>  lib/Target/R600/AMDGPUTargetMachine.cpp |   1 -
>  lib/Target/R600/R600ISelLowering.cpp|   8 +-
>  lib/Target/R600/R600Instructions.td |   7 +-
>  lib/Target/R600/R600LowerConstCopy.cpp  | 222 
> 

Don't forget to remove this file from CMakeLists.txt

>  5 files changed, 11 insertions(+), 228 deletions(-)
>  delete mode 100644 lib/Target/R600/R600LowerConstCopy.cpp
> 
> diff --git a/lib/Target/R600/AMDGPU.h b/lib/Target/R600/AMDGPU.h
> index ba87918..67073ab 100644
> --- a/lib/Target/R600/AMDGPU.h
> +++ b/lib/Target/R600/AMDGPU.h
> @@ -23,7 +23,6 @@ class AMDGPUTargetMachine;
>  // R600 Passes
>  FunctionPass* createR600KernelParametersPass(const DataLayout *TD);
>  FunctionPass *createR600ExpandSpecialInstrsPass(TargetMachine &tm);
> -FunctionPass *createR600LowerConstCopy(TargetMachine &tm);
>  
>  // SI Passes
>  FunctionPass *createSIAnnotateControlFlowPass();
> diff --git a/lib/Target/R600/AMDGPUTargetMachine.cpp 
> b/lib/Target/R600/AMDGPUTargetMachine.cpp
> index e2f00be..70b34b0 100644
> --- a/lib/Target/R600/AMDGPUTargetMachine.cpp
> +++ b/lib/Target/R600/AMDGPUTargetMachine.cpp
> @@ -143,7 +143,6 @@ bool AMDGPUPassConfig::addPreEmitPass() {
>  addPass(createAMDGPUCFGStructurizerPass(*TM));
>  addPass(createR600ExpandSpecialInstrsPass(*TM));
>  addPass(&FinalizeMachineBundlesID);
> -addPass(createR600LowerConstCopy(*TM));
>} else {
>  addPass(createSILowerControlFlowPass(*TM));
>}
> diff --git a/lib/Target/R600/R600ISelLowering.cpp 
> b/lib/Target/R600/R600ISelLowering.cpp
> index ece0b9a..f25ced1 100644
> --- a/lib/Target/R600/R600ISelLowering.cpp
> +++ b/lib/Target/R600/R600ISelLowering.cpp
> @@ -150,7 +150,13 @@ MachineBasicBlock * 
> R600TargetLowering::EmitInstrWithCustomInserter(
>  TII->buildMovImm(*BB, I, MI->getOperand(0).getReg(),
>   MI->getOperand(1).getImm());
>  break;
> -
> +  case AMDGPU::CONST_COPY: {
> +MachineInstr *NewMI = TII->buildDefaultInstruction(*BB, MI, AMDGPU::MOV,
> +MI->getOperand(0).getReg(), AMDGPU::ALU_CONST);
> +TII->setImmOperand(NewMI, R600Operands::SRC0_SEL,
> +MI->getOperand(1).getImm());
> +break;
> +  }
>  
>case AMDGPU::RAT_WRITE_CACHELESS_32_eg:
>case AMDGPU::RAT_WRITE_CACHELESS_128_eg: {
> diff --git a/lib/Target/R600/R600Instructions.td 
> b/lib/Target/R600/R600Instructions.td
> index 74106c9..10bcdcf 100644
> --- a/lib/Target/R600/R600Instructions.td
> +++ b/lib/Target/R600/R600Instructions.td
> @@ -1650,17 +1650,18 @@ let isTerminator = 1, isReturn = 1, isBarrier = 1, 
> hasCtrlDep = 1,
>  // Constant Buffer Addressing Support
>  
> //===--===//
>  
> -let isCodeGenOnly = 1, isPseudo = 1, Namespace = "AMDGPU"  in {
> +let usesCustomInserter = 1, isCodeGenOnly = 1, isPseudo = 1, Namespace = 
> "AMDGPU"  in {
>  def CONST_COPY : Instruction {
>let OutOperandList = (outs R600_Reg32:$dst);
>let InOperandList = (ins i32imm:$src);
> -  let Pattern = [(set R600_Reg32:$dst, (CONST_ADDRESS 
> ADDRGA_CONST_OFFSET:$src))];
> +  let Pattern =
> +  [(set R600_Reg32:$dst, (CONST_ADDRESS ADDRGA_CONST_OFFSET:$src))];
>let AsmString = "CONST_COPY";
>let neverHasSideEffects = 1;
>let isAsCheapAsAMove = 1;
>let Itinerary = NullALU;
>  }
> -} // end isCodeGenOnly = 1, isPseudo = 1, Namespace = "AMDGPU"
> +} // end usesCustomInserter = 1, isCodeGenOnly = 1, isPseudo = 1, Namespace 
> = "AMDGPU"
>  
>  def TEX_VTX_CONSTBUF :
>InstR600ISA <(outs R600_Reg128:$dst), (ins MEMxi:$ptr, i32imm:$BUFFER_ID), 
> "VTX_READ_eg $dst, $ptr",
> diff --git a/lib/Target/R600/R600LowerConstCopy.cpp 
> b/lib/Target/R600/R600LowerConstCopy.cpp
> deleted file mode 100644
> index 3ebe653..000
> --- a/lib/Target/R600/R600LowerConstCopy.cpp
> +++ /dev/null
> @@ -1,222 +0,0 @@
> -//===-- R600LowerConstCopy.cpp - Propagate ConstCopy / lower them to 
> MOV---===//
> -//
> -// The LLVM Compiler Infrastructure
> -//
> -// This file is distributed under the University of Illinois Open Source
> -// License. See LICENSE.TXT for details.
> -//
> -//===--===//
> -//
> -/// \file
> -/// This pass is intended to handle remaining ConstCopy pseudo MachineInstr.
> -/// ISel will fold each Const Buffer read inside scalar ALU. However it 
> cannot
> -/// fold them inside vector instruction, like DOT4 or Cube ; ISel emits
> -/// ConstCopy instead. This pass (executed after ExpandingSpecialInstr) will 
> try
> -/// to fold them if possible or replace them by MOV otherwise.
> -//
> -//===-

Re: [Mesa-dev] [PATCH 6/6] R600: initial scheduler code

2013-02-19 Thread Tom Stellard


Hi Vincent,

>From now on, please cc llvm-comm...@cs.uiuc.edu when you submit a patch.
I'm cc'ing that list now.

This looks OK to me at first glance, but I would like to test it with
compute shaders before you merge it.

On Mon, Feb 18, 2013 at 05:27:30PM +0100, Vincent Lejeune wrote:
> From: Vadim Girlin 
> 
> This is a skeleton for a pre-RA MachineInstr scheduler strategy. Currently
> it only tries to expose more parallelism for ALU instructions (this also
> makes the distribution of GPR channels more uniform and increases the
> chances of ALU instructions to be packed together in a single VLIW group).
> Also it tries to reduce clause switching by grouping instruction of the
> same kind (ALU/FETCH/CF) together.
> 
> Vincent Lejeune:
>  - Support for VLIW4 Slot assignement
>  - Recomputation of ScheduleDAG to get more parallelism opportunities
> 
> Tom Stellard:
>  - Fix assertion failure when trying to determine an instruction's slot
>based on its destination register's class
>  - Fix some compiler warnings
> 
> Vincent Lejeune: [v2]
>  - Remove recomputation of ScheduleDAG (will be provided in a later patch)
>  - Improve estimation of an ALU clause size so that heuristic does not emit cf
>  instructions at the wrong position.
>  - Make schedule heuristic smarter using SUnit Depth
>  - Take constant read limitations into account
> ---
>  lib/Target/R600/AMDGPUTargetMachine.cpp  |  17 +-
>  lib/Target/R600/R600MachineScheduler.cpp | 483 
> +++
>  lib/Target/R600/R600MachineScheduler.h   | 121 
>  test/CodeGen/R600/fdiv.v4f32.ll  |   6 +-
>  4 files changed, 623 insertions(+), 4 deletions(-)
>  create mode 100644 lib/Target/R600/R600MachineScheduler.cpp
>  create mode 100644 lib/Target/R600/R600MachineScheduler.h
> 
> diff --git a/lib/Target/R600/AMDGPUTargetMachine.cpp 
> b/lib/Target/R600/AMDGPUTargetMachine.cpp
> index 70b34b0..eb58853 100644
> --- a/lib/Target/R600/AMDGPUTargetMachine.cpp
> +++ b/lib/Target/R600/AMDGPUTargetMachine.cpp
> @@ -17,6 +17,7 @@
>  #include "AMDGPU.h"
>  #include "R600ISelLowering.h"
>  #include "R600InstrInfo.h"
> +#include "R600MachineScheduler.h"
>  #include "SIISelLowering.h"
>  #include "SIInstrInfo.h"
>  #include "llvm/Analysis/Passes.h"
> @@ -39,6 +40,14 @@ extern "C" void LLVMInitializeR600Target() {
>RegisterTargetMachine X(TheAMDGPUTarget);
>  }
>  
> +static ScheduleDAGInstrs *createR600MachineScheduler(MachineSchedContext *C) 
> {
> +  return new ScheduleDAGMI(C, new R600SchedStrategy());
> +}
> +
> +static MachineSchedRegistry
> +SchedCustomRegistry("r600", "Run R600's custom scheduler",
> +createR600MachineScheduler);
> +
>  AMDGPUTargetMachine::AMDGPUTargetMachine(const Target &T, StringRef TT,
>  StringRef CPU, StringRef FS,
>TargetOptions Options,
> @@ -70,7 +79,13 @@ namespace {
>  class AMDGPUPassConfig : public TargetPassConfig {
>  public:
>AMDGPUPassConfig(AMDGPUTargetMachine *TM, PassManagerBase &PM)
> -: TargetPassConfig(TM, PM) {}
> +: TargetPassConfig(TM, PM) {
> +const AMDGPUSubtarget &ST = TM->getSubtarget();
> +if (ST.device()->getGeneration() <= AMDGPUDeviceInfo::HD6XXX) {
> +  enablePass(&MachineSchedulerID);
> +  MachineSchedRegistry::setDefault(createR600MachineScheduler);
> +}
> +  }
>  
>AMDGPUTargetMachine &getAMDGPUTargetMachine() const {
>  return getTM();
> diff --git a/lib/Target/R600/R600MachineScheduler.cpp 
> b/lib/Target/R600/R600MachineScheduler.cpp
> new file mode 100644
> index 000..efd9490
> --- /dev/null
> +++ b/lib/Target/R600/R600MachineScheduler.cpp
> @@ -0,0 +1,483 @@
> +//===-- R600MachineScheduler.cpp - R600 Scheduler Interface -*- C++ 
> -*-===//
> +//
> +// The LLVM Compiler Infrastructure
> +//
> +// This file is distributed under the University of Illinois Open Source
> +// License. See LICENSE.TXT for details.
> +//
> +//===--===//
> +//
> +/// \file
> +/// \brief R600 Machine Scheduler interface
> +// TODO: Scheduling is optimised for VLIW4 arch, modify it to support TRANS 
> slot
> +//
> +//===--===//
> +
> +#define DEBUG_TYPE "misched"
> +
> +#include "R600MachineScheduler.h"
> +#include "llvm/CodeGen/MachineRegisterInfo.h"
> +#include "llvm/CodeGen/LiveIntervalAnalysis.h"
> +#include "llvm/Pass.h"
> +#include "llvm/PassManager.h"
> +#include 
> +#include 
> +using namespace llvm;
> +
> +void R600SchedStrategy::initialize(ScheduleDAGMI *dag) {
> +
> +  DAG = dag;
> +  TII = static_cast(DAG->TII);
> +  TRI = static_cast(DAG->TRI);
> +  MRI = &DAG->MRI;
> +  Available[IDAlu]->clear();
> +  Available[IDFetch]->clear();
> +  Available[IDOther]->clear();
> +  CurInstKind = IDOther;
> +  CurEmitted = 0;
> +  memset(InstructionsGroupCandidate, 0, sizeof(InstructionsGroupCandidate));
> +  InstKindLimit[IDAlu] = 120; // 120 minus 8 for securi

Re: [Mesa-dev] [PATCH 1/6] R600: Use MUL_IEEE for trig/fdiv intrinsic

2013-02-19 Thread Tom Stellard

On Mon, Feb 18, 2013 at 05:27:25PM +0100, Vincent Lejeune wrote:

Reviewed-by: Tom Stellard 
> ---
>  lib/Target/R600/R600Instructions.td | 8 
>  test/CodeGen/R600/fdiv.v4f32.ll | 8 
>  2 files changed, 8 insertions(+), 8 deletions(-)
> 
> diff --git a/lib/Target/R600/R600Instructions.td 
> b/lib/Target/R600/R600Instructions.td
> index 0a01400..e4cc06e 100644
> --- a/lib/Target/R600/R600Instructions.td
> +++ b/lib/Target/R600/R600Instructions.td
> @@ -1090,12 +1090,12 @@ class COS_Common  inst> : R600_1OP <
>  multiclass DIV_Common  {
>  def : Pat<
>(int_AMDGPU_div R600_Reg32:$src0, R600_Reg32:$src1),
> -  (MUL R600_Reg32:$src0, (recip_ieee R600_Reg32:$src1))
> +  (MUL_IEEE R600_Reg32:$src0, (recip_ieee R600_Reg32:$src1))
>  >;
>  
>  def : Pat<
>(fdiv R600_Reg32:$src0, R600_Reg32:$src1),
> -  (MUL R600_Reg32:$src0, (recip_ieee R600_Reg32:$src1))
> +  (MUL_IEEE R600_Reg32:$src0, (recip_ieee R600_Reg32:$src1))
>  >;
>  }
>  
> @@ -1169,12 +1169,12 @@ let Predicates = [isR600] in {
>  // cards.
>  class COS_PAT  : Pat<
>(fcos R600_Reg32:$src),
> -  (trig (MUL (MOV_IMM_I32 CONST.TWO_PI_INV), R600_Reg32:$src))
> +  (trig (MUL_IEEE (MOV_IMM_I32 CONST.TWO_PI_INV), R600_Reg32:$src))
>  >;
>  
>  class SIN_PAT  : Pat<
>(fsin R600_Reg32:$src),
> -  (trig (MUL (MOV_IMM_I32 CONST.TWO_PI_INV), R600_Reg32:$src))
> +  (trig (MUL_IEEE (MOV_IMM_I32 CONST.TWO_PI_INV), R600_Reg32:$src))
>  >;
>  
>  
> //===--===//
> diff --git a/test/CodeGen/R600/fdiv.v4f32.ll b/test/CodeGen/R600/fdiv.v4f32.ll
> index b013fd6..459fd11 100644
> --- a/test/CodeGen/R600/fdiv.v4f32.ll
> +++ b/test/CodeGen/R600/fdiv.v4f32.ll
> @@ -1,13 +1,13 @@
>  ;RUN: llc < %s -march=r600 -mcpu=redwood | FileCheck %s
>  
>  ;CHECK: RECIP_IEEE T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
> -;CHECK: MUL NON-IEEE T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
> +;CHECK: MUL_IEEE T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
>  ;CHECK: RECIP_IEEE T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
> -;CHECK: MUL NON-IEEE T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
> +;CHECK: MUL_IEEE T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
>  ;CHECK: RECIP_IEEE T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
> -;CHECK: MUL NON-IEEE T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
> +;CHECK: MUL_IEEE T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
>  ;CHECK: RECIP_IEEE T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
> -;CHECK: MUL NON-IEEE T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
> +;CHECK: MUL_IEEE T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
>  
>  define void @test(<4 x float> addrspace(1)* %out, <4 x float> addrspace(1)* 
> %in) {
>%b_ptr = getelementptr <4 x float> addrspace(1)* %in, i32 1
> -- 
> 1.8.1.2
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 2/6] R600: CONST_ADDRESS node is not marked as mayLoad anymore

2013-02-19 Thread Tom Stellard

On Mon, Feb 18, 2013 at 05:27:26PM +0100, Vincent Lejeune wrote:
> mayLoad complexify scheduling and does not bring any usefull info
> as the location is not writeable at all.

Reviewed-by: Tom Stellard 
> ---
>  lib/Target/R600/R600Instructions.td | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/lib/Target/R600/R600Instructions.td 
> b/lib/Target/R600/R600Instructions.td
> index e4cc06e..0a777f1 100644
> --- a/lib/Target/R600/R600Instructions.td
> +++ b/lib/Target/R600/R600Instructions.td
> @@ -513,7 +513,7 @@ def INTERP_PAIR_ZW :  AMDGPUShaderInst <
>  
>  def CONST_ADDRESS: SDNode<"AMDGPUISD::CONST_ADDRESS",
>SDTypeProfile<1, -1, [SDTCisInt<0>, SDTCisPtrTy<1>]>,
> -  [SDNPMayLoad, SDNPVariadic]
> +  [SDNPVariadic]
>  >;
>  
>  
> //===--===//
> -- 
> 1.8.1.2
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 3/6] R600: Turn BUILD_VECTOR into Reg_Sequence

2013-02-19 Thread Tom Stellard

On Mon, Feb 18, 2013 at 05:27:27PM +0100, Vincent Lejeune wrote:

Reviewed-by: Tom Stellard 
> ---
>  lib/Target/R600/AMDILISelDAGToDAG.cpp | 29 +
>  1 file changed, 29 insertions(+)
> 
> diff --git a/lib/Target/R600/AMDILISelDAGToDAG.cpp 
> b/lib/Target/R600/AMDILISelDAGToDAG.cpp
> index 2e726e9..6b24117 100644
> --- a/lib/Target/R600/AMDILISelDAGToDAG.cpp
> +++ b/lib/Target/R600/AMDILISelDAGToDAG.cpp
> @@ -160,6 +160,35 @@ SDNode *AMDGPUDAGToDAGISel::Select(SDNode *N) {
>}
>switch (Opc) {
>default: break;
> +  case ISD::BUILD_VECTOR: {
> +const AMDGPUSubtarget &ST = TM.getSubtarget();
> +if (ST.device()->getGeneration() > AMDGPUDeviceInfo::HD6XXX) {
> +  break;
> +}
> +// BUILD_VECTOR is usually lowered into an IMPLICIT_DEF + 4 INSERT_SUBREG
> +// that adds a 128 bits reg copy when going through 
> TwoAddressInstructions
> +// pass. We want to avoid 128 bits copies as much as possible because 
> they
> +// can't be bundled by our scheduler.
> +SDValue RegSeqArgs[9] = {
> +  CurDAG->getTargetConstant(AMDGPU::R600_Reg128RegClassID, MVT::i32),
> +  SDValue(), CurDAG->getTargetConstant(AMDGPU::sub0, MVT::i32),
> +  SDValue(), CurDAG->getTargetConstant(AMDGPU::sub1, MVT::i32),
> +  SDValue(), CurDAG->getTargetConstant(AMDGPU::sub2, MVT::i32),
> +  SDValue(), CurDAG->getTargetConstant(AMDGPU::sub3, MVT::i32)
> +};
> +bool IsRegSeq = true;
> +for (unsigned i = 0; i < N->getNumOperands(); i++) {
> +  if (dyn_cast(N->getOperand(i))) {
> +IsRegSeq = false;
> +break;
> +  }
> +  RegSeqArgs[2 * i + 1] = N->getOperand(i);
> +}
> +if (!IsRegSeq)
> +  break;
> +return CurDAG->SelectNodeTo(N, AMDGPU::REG_SEQUENCE, N->getVTList(),
> +RegSeqArgs, 2 * N->getNumOperands() + 1);
> +  }
>case ISD::ConstantFP:
>case ISD::Constant: {
>  const AMDGPUSubtarget &ST = TM.getSubtarget();
> -- 
> 1.8.1.2
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 4/6] R600: Fix for Unigine when MachineSched is enabled

2013-02-19 Thread Tom Stellard

On Mon, Feb 18, 2013 at 05:27:28PM +0100, Vincent Lejeune wrote:

Reviewed-by: Tom Stellard 
> ---
>  lib/Target/R600/R600Instructions.td | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/lib/Target/R600/R600Instructions.td 
> b/lib/Target/R600/R600Instructions.td
> index 0a777f1..74106c9 100644
> --- a/lib/Target/R600/R600Instructions.td
> +++ b/lib/Target/R600/R600Instructions.td
> @@ -1587,6 +1587,7 @@ def PRED_X : InstR600 <
>(ins R600_Reg32:$src0, i32imm:$src1, i32imm:$flags),
>"", [], NullALU> {
>let FlagOperandIdx = 3;
> +  let isTerminator = 1;
>  }
>  
>  let isTerminator = 1, isBranch = 1, isBarrier = 1 in {
> -- 
> 1.8.1.2
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 1/8] R600/SI: cleanup SIInstrInfo.td and SIInstrFormat.td

2013-02-19 Thread Tom Stellard

Hi Christian,

>From now on can you cc llvm-comm...@cs.uiuc.edu when you submit a patch.

Thanks,
Tom


On Tue, Feb 19, 2013 at 02:54:23PM +0100, Christian König wrote:
> From: Christian König 
> 
> Those two files got mixed up.
> 
> Signed-off-by: Christian König 
> ---
>  lib/Target/R600/SIInstrFormats.td |  500 
> +++--
>  lib/Target/R600/SIInstrInfo.td|  495 +++-
>  2 files changed, 509 insertions(+), 486 deletions(-)
> 
> diff --git a/lib/Target/R600/SIInstrFormats.td 
> b/lib/Target/R600/SIInstrFormats.td
> index 40e37aa..fe417d6 100644
> --- a/lib/Target/R600/SIInstrFormats.td
> +++ b/lib/Target/R600/SIInstrFormats.td
> @@ -1,4 +1,4 @@
> -//===-- SIInstrFormats.td - SI Instruction Formats 
> ===//
> +//===-- SIInstrFormats.td - SI Instruction Encodings 
> --===//
>  //
>  // The LLVM Compiler Infrastructure
>  //
> @@ -9,180 +9,418 @@
>  //
>  // SI Instruction format definitions.
>  //
> -// Instructions with _32 take 32-bit operands.
> -// Instructions with _64 take 64-bit operands.
> -//
> -// VOP_* instructions can use either a 32-bit or 64-bit encoding.  The 32-bit
> -// encoding is the standard encoding, but instruction that make use of
> -// any of the instruction modifiers must use the 64-bit encoding.
> -//
> -// Instructions with _e32 use the 32-bit encoding.
> -// Instructions with _e64 use the 64-bit encoding.
> -//
>  
> //===--===//
>  
> -class VOP3_32  op, string opName, list pattern>
> -  : VOP3  VReg_32:$src2, i32imm:$src3, i32imm:$src4, i32imm:$src5, i32imm:$src6), 
> opName, pattern>;
> +class InstSI  pattern> :
> +AMDGPUInst {
> +
> +  field bits<1> VM_CNT = 0;
> +  field bits<1> EXP_CNT = 0;
> +  field bits<1> LGKM_CNT = 0;
> +
> +  let TSFlags{0} = VM_CNT;
> +  let TSFlags{1} = EXP_CNT;
> +  let TSFlags{2} = LGKM_CNT;
> +}
> +
> +class Enc32  pattern> :
> +InstSI  {
> +
> +  field bits<32> Inst;
> +  let Size = 4;
> +}
>  
> -class VOP3_64  op, string opName, list pattern>
> -  : VOP3  VReg_64:$src2, i32imm:$src3, i32imm:$src4, i32imm:$src5, i32imm:$src6), 
> opName, pattern>;
> +class Enc64  pattern> :
> +InstSI  {
>  
> -class SOP1_32  op, string opName, list pattern>
> -  : SOP1 ;
> +  field bits<64> Inst;
> +  let Size = 8;
> +}
>  
> -class SOP1_64  op, string opName, list pattern>
> -  : SOP1 ;
> +//===--===//
> +// Scalar operations
> +//===--===//
>  
> -class SOP2_32  op, string opName, list pattern>
> -  : SOP2  opName, pattern>;
> +class SOP1  op, dag outs, dag ins, string asm, list pattern> :
> +Enc32 {
>  
> -class SOP2_64  op, string opName, list pattern>
> -  : SOP2  opName, pattern>;
> +  bits<7> SDST;
> +  bits<8> SSRC0;
>  
> -class VOP1_Helper  op, RegisterClass vrc, RegisterClass arc,
> -   string opName, list pattern> : 
> -  VOP1 <
> -op, (outs vrc:$dst), (ins arc:$src0), opName, pattern
> -  >;
> +  let Inst{7-0} = SSRC0;
> +  let Inst{15-8} = op;
> +  let Inst{22-16} = SDST;
> +  let Inst{31-23} = 0x17d; //encoding;
>  
> -multiclass VOP1_32  op, string opName, list pattern> {
> -  def _e32: VOP1_Helper ;
> -  def _e64 : VOP3_32 <{1, 1, op{6}, op{5}, op{4}, op{3}, op{2}, op{1}, 
> op{0}},
> -  opName, []
> -  >;
> +  let mayLoad = 0;
> +  let mayStore = 0;
> +  let hasSideEffects = 0;
>  }
>  
> -multiclass VOP1_64  op, string opName, list pattern> {
> +class SOP2  op, dag outs, dag ins, string asm, list pattern> :
> +Enc32  {
> +  
> +  bits<7> SDST;
> +  bits<8> SSRC0;
> +  bits<8> SSRC1;
>  
> -  def _e32 : VOP1_Helper ;
> +  let Inst{7-0} = SSRC0;
> +  let Inst{15-8} = SSRC1;
> +  let Inst{22-16} = SDST;
> +  let Inst{29-23} = op;
> +  let Inst{31-30} = 0x2; // encoding
>  
> -  def _e64 : VOP3_64 <
> -{1, 1, op{6}, op{5}, op{4}, op{3}, op{2}, op{1}, op{0}},
> -opName, []
> -  >;
> +  let mayLoad = 0;
> +  let mayStore = 0;
> +  let hasSideEffects = 0;
>  }
>  
> -class VOP2_Helper  op, RegisterClass vrc, RegisterClass arc,
> -   string opName, list pattern> :
> -  VOP2 <
> -op, (outs vrc:$dst), (ins arc:$src0, vrc:$src1), opName, pattern
> -  >;
> +class SOPC  op, dag outs, dag ins, string asm, list pattern> :
> +  Enc32 {
>  
> -multiclass VOP2_32  op, string opName, list pattern> {
> +  bits<8> SSRC0;
> +  bits<8> SSRC1;
>  
> -  def _e32 : VOP2_Helper ;
> +  let Inst{7-0} = SSRC0;
> +  let Inst{15-8} = SSRC1;
> +  let Inst{22-16} = op;
> +  let Inst{31-23} = 0x17e;
>  
> -  def _e64 : VOP3_32 <{1, 0, 0, op{5}, op{4}, op{3}, op{2}, op{1}, op{0}},
> -  opName, []
> -  >;
> +  let DisableEncoding = "$dst";
> +  let mayLoad = 0;
> +  let mayStore = 0;
> +  let hasSideEffects = 0;
>  }
>  
> -multiclass VOP2_64  op

Re: [Mesa-dev] [PATCH v2] configure.ac: Do not check for clock_gettime on MinGW.

2013-02-19 Thread Matt Turner

On Tue, Feb 19, 2013 at 12:55 AM, Vinson Lee  wrote:
> MinGW does not have clock_gettime.
>
> Signed-off-by: Vinson Lee 
> ---
>  configure.ac | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/configure.ac b/configure.ac
> index 16c2f8c..1e11b4e 100644
> --- a/configure.ac
> +++ b/configure.ac
> @@ -500,7 +500,7 @@ AC_CHECK_FUNC([dlopen], [DEFINES="$DEFINES 
> -DHAVE_DLOPEN"],
>  AC_SUBST([DLOPEN_LIBS])
>
>  case "$host_os" in
> -darwin*)
> +darwin*|mingw*)
>  ;;
>  *)
>  AC_CHECK_FUNCS([clock_gettime], [CLOCK_LIB=],
> --
> 1.8.1.2
>

Reviewed-by: Matt Turner 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] radeonsi: Fix blending using destination alpha factor but non-alpha destination

2013-02-19 Thread Marek Olšák

On Tue, Feb 19, 2013 at 3:55 PM, Michel Dänzer  wrote:
> On Die, 2013-02-19 at 15:48 +0100, Marek Olšák wrote:
>> On Tue, Feb 19, 2013 at 3:28 PM, Michel Dänzer  wrote:
>> > On Die, 2013-02-19 at 14:04 +0100, Marek Olšák wrote:
>> >> On Tue, Feb 19, 2013 at 11:02 AM, Michel Dänzer  
>> >> wrote:
>> >> >
>> >> > Really, what I don't understand is why r600g doesn't seem affected by
>> >> > this... at least on my RS880 it's passing the piglit tests this change
>> >> > fixes with radeonsi. So maybe I'm just missing some magic bit for
>> >> > radeonsi.
>> >>
>> >> RGB formats do fail fbo-blending-formats with r600g/redwood here.
>> >
>> > Okay.
>> >
>> >
>> >> However the alpha channel can sometimes contain 1 in memory even if
>> >> the format is RGBX. Off the top of my head, glClear, glTex[Sub]Image,
>> >> glCopyTex[Sub]Image always set alpha to 1.
>> >
>> > Well, but they shouldn't for these formats. :) The memory corresponding
>> > to X* channels should remain unchanged. I'm working on a separate patch
>> > for that for radeonsi.
>>
>> I think the only way you could do that is to set the colormask to RGB.
>
> Exactly.
>
>> Doesn't it have a negative effect on performance if some channels are
>> masked out?
>
> It might, but I don't see that we really have a choice. If the app /
> state tracker doesn't want to preserve those bits, it should use a
> non-X* format.

We do have a choice: let's do nothing. ReadPixels and GetTexImage
always set the alpha to one, and we can patch the blend state manually
to get correct RGB blending. What could possibly be broken if the
alpha is modified by the hardware?

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [RFC] New EGL extension: EGL_EXT_platform_display

2013-02-19 Thread Chad Versace

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

I'm seeking feedback on an EGL extension that I'm drafting. The ideas have
already been discussed at Khronos meetings to a good reception, but I want
feedback from Mesa developers too.

Summary
- ---
The extension, tentatively named EGL_EXT_platform_display, enables EGL clients
to specify to which platform (X11, Wayland, gbm, etc) an EGL resource
(EGLDisplay, EGLSurface, etc) belongs when the resource is derived from
a platform-native type. As a corollary, the extension enables the creation of
EGL resources from different platforms within a single process.


Feedback
- 
I'd like to hear feeback about the details below. Do you see any potential
problems? Is it lacking a feature that you believe should be present?


Details
- ---
The draft extension defines the following new functions:

// This is the extenion's key function.
//
EGLDisplay
eglGetPlatformDisplayEXT(EGLenum platform, void *native_display);

// The two eglCreate functions below differ from their core counterparts
// only in their signature. The EGLNative types are replaced with void*.
// This makes the signature agnostic to which platform the native resource
// belongs.

EGLSurface
eglCreatePlatformWindowSurfaceEXT(EGLDisplay dpy,
  EGLConfig config,
  void *native_window,
  const EGLint *attrib_list);

EGLSurface
eglCreatePlatformPixmapSurface(EGLDisplay dpy,
   EGLConfig config,
   void *native_pixmap,
   const EGLint *attrib_list);

Valid values for `platform` are defined by layered extensions.  For
example, EGL_EXT_platform_x11 defines EGL_PLATFORM_X11, and
EGL_EXT_platform_wayland defines EGL_PLATFORM_WAYLAND.

Also, the layered extensions specify which native types should be passed as
the native parameters. For example, EGL_EXT_platform_wayland specifies that,
when calling eglCreatePlatformWindowSurfaceEXT with a display that was derived
from a Wayland display, then the native_window parameter must be `struct
wl_egl_window*`. Analogously, EGL_EXT_platform_x11 specifies that
native_window must be `Window*`.


Example Code for X11
- 
// The internal representation of the egl_dpy, created below, remembers that
// it was derived from an Xlib display.

Display *xlib_dpy = XOpenDisplay(NULL);
EGLDisplay *egl_dpy = eglGetPlatformDisplayEXT(EGL_PLATFORM_X11, xlib_dpy);

EGLConfig config;
eglChooseConfig(egl_dpy, &config, ...);

// Since egl_dpy remembers that it was derived from an Xlib display, when
// creating the EGLSurface below libEGL internally casts the
// `(void*) &xlib_win` to `Window*`.

Window xlib_win = XCreateWindow(xlib_dpy, ...);
EGLSurface egl_surface = eglCreatePlatformWindowSurfaceEXT(egl_dpy, config,
   (void*) &xlib_win,
   NULL);

Example Code for Wayland
- 
// The internal representation of the egl_dpy, created below, remembers that
// it was derived from a Wayland display.

struct wl_display *wl_dpy = wl_display_connect(NULL);
EGLDisplay *egl_dpy = eglGetPlatformDisplay(EGL_PLATFORM_WAYLAND, wl_dpy);


EGLConfig config;
eglChooseConfig(egl_dpy, &config, ...);

// Since egl_dpy remembers that it was derived from an Wayland display, when
// creating the EGLSurface below libEGL internally casts the
// `(void*) wl_win` to `struct wl_egl_window*`.

struct wl_egl_window *wl_win = wl_egl_window_create(...);
EGLSurface egl_surface = eglCreateWindowSurface(egl_dpy, config,
   (void*) wl_win, NULL);
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.19 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBAgAGBQJRI6ZfAAoJEAIvNt057x8iIiEP/0ikLSoTa7sy7bAi5lRasFUZ
/fKhktZbf062K6PNETS+kS5xiKcEnaJ7FnEjnga/ao2Lbp+7+7ArI0v8vKEpekS0
Ln9oQs7fXzp9dD5+YZT0ICpB7oanZsVy7VDBEq5GcH5zuHGVL1PwiPNKT4OEUi5R
7+j2UZ0kCWVGQS0vB0onoLHeSwud6mVURSvvOghhza3f32QgUDuw3XsEdrmpx0Bw
WMROUDgcpYFBJ8lQ5GO+yFkPPnWytwspECveMQXUg/M63s+UADfWFvEuOE92yddb
SMviKzlKzbG+ZZffvOBy4lt99NCO1oZ+FeR0Uc5m9wT3dpF0GDILR+sH+eemAbxn
JicvhPycgd9mfjtsG47+Y1atkdkh7nBIbk5qrkCq4eMxVVSeQLQ8PdBUJUQA1JI+
YIM4/+E4iFi8ynCIcWKXccnFnV+POHizIDPwxQHp7cbuOXvI8tQhxf0H83Qm3Gl+
amfRgJWj1nUGtz4UQK5DEq6KfxRIy84/OvrWd9fw610sFAALXiSZC2b0wmz0Alv4
bkQ3LxIJtN9Nyabcm2B1eXtl1SYFRwx24P0T+2IL4INCvUFPJVtzfORo8pWLIMYm
kA8p+0DSgrEkBE0D6bgluGhTjsWejyHoqoNG95YIcEGPnNJf12cS36oJSOttvnz2
BXfyGn39yyRGUIrRgpUR
=MP7i
-END PGP SIGNATURE-
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] r600g: status of my work on the shader optimization

2013-02-19 Thread Andy Furniss


Vadim Girlin wrote:


Could you please test glxgears and other simple mesa demos? It's easier
to spot the problems with small apps that don't use a lot of complex
shaders. If some of them don't work correctly, please send me the dumps
with "R600_DUMP_SHADERS=2 R600_SB_DUMP=3".


All of the mesa demos work with and without llvm.


Also it might help if you can
look for piglit regressions against the piglit results with R600_SB=0
and send me the dumps for a few regressed tests.


I don't actually have piglit - it was always a pain with cmake to get it 
to build on my old 32bit lfs with xorg/mesa installed under home.


I do now have a new 64bit clfs build with everything in normal places - 
so maybe I'll give it a go on that - but I don't know how to use it as such.


Even though it's "new" clfs uses gcc 4.6.3 so on there g++ is actually 
too old to build your tree - without changing some friends  to 
friends class ...


I don't know when I'll get time to learn piglit but for now here's 
working and not nexuiz.


R600_DUMP_SHADERS=2 R600_SB_DUMP=3 nexuiz &> nexuiz-working-dump

http://www.andyqos.ukfsn.org/nexuiz-working-dump

R600_LLVM=0 R600_DUMP_SHADERS=2 R600_SB_DUMP=3 nexuiz &> nexuiz-corrupt-dump

http://www.andyqos.ukfsn.org/nexuiz-corrupt-dump


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] llvmpipe: fix lp_resource_copy using more than one 3d slice

2013-02-19 Thread Roland Scheidegger

Am 19.02.2013 10:13, schrieb Jose Fonseca:
> Thanks for fixing this Roland.
> 
> This is definitely an improvement. I'd recommend a few tweaks (it could even 
> be as a follow on change):
> 
> - Calling llvmpipe_flush_resource() in a loop is overkill (it will call 
> llvmpipe_flush() to be called many times needlessly). Please refactor 
> llvmpipe_flush_resource() and llvmpipe_is_resource_referenced() to receive 
> start_layer, end_layer pair.
Actually I guess I'll just drop the layer parameter completely. It is
passed through another function however in the end it is just unused and
thrown away anyway, so it doesn't matter if we check for whole resource
or just parts (of course at some point we might want to change this but
that's how it looks for now).


> 
> - call util_copy_box instead of util_copy_rect
Ah you're right I thought it wouldn't work outside the loop but it
should (not that it makes much difference since util_copy_box will just
call util_copy_rect repeatedly but it is definitely nicer style).

Roland


> 
> Jose
> 
> 
> - Original Message -
>> From: Roland Scheidegger 
>>
>> These used to be illegal a very long time ago, then for some more time
>> nothing really emitted these so this code path wasn't hit.
>> Just trivially iterate over box->depth.
>> (Might be worth refactoring at some point since nowadays all the code
>> doesn't really do much except for depth textures.)
>>
>> This fixes https://bugs.freedesktop.org/show_bug.cgi?id=61093
>> ---
>>  src/gallium/drivers/llvmpipe/lp_surface.c |  170
>>  +++--
>>  1 file changed, 86 insertions(+), 84 deletions(-)
>>
>> diff --git a/src/gallium/drivers/llvmpipe/lp_surface.c
>> b/src/gallium/drivers/llvmpipe/lp_surface.c
>> index 11475fd..dbaed95 100644
>> --- a/src/gallium/drivers/llvmpipe/lp_surface.c
>> +++ b/src/gallium/drivers/llvmpipe/lp_surface.c
>> @@ -65,7 +65,7 @@ lp_resource_copy(struct pipe_context *pipe,
>> const enum pipe_format format = src_tex->base.format;
>> unsigned width = src_box->width;
>> unsigned height = src_box->height;
>> -   assert(src_box->depth == 1);
>> +   unsigned z;
>>  
>> /* Fallback for buffers. */
>> if (dst->target == PIPE_BUFFER && src->target == PIPE_BUFFER) {
>> @@ -74,99 +74,101 @@ lp_resource_copy(struct pipe_context *pipe,
>>return;
>> }
>>  
>> -   llvmpipe_flush_resource(pipe,
>> -   dst, dst_level, dstz,
>> -   FALSE, /* read_only */
>> -   TRUE, /* cpu_access */
>> -   FALSE, /* do_not_block */
>> -   "blit dest");
>> -
>> -   llvmpipe_flush_resource(pipe,
>> -   src, src_level, src_box->z,
>> -   TRUE, /* read_only */
>> -   TRUE, /* cpu_access */
>> -   FALSE, /* do_not_block */
>> -   "blit src");
>> -
>> -   /*
>> -   printf("surface copy from %u lvl %u to %u lvl %u: %u,%u,%u to %u,%u,%u %u
>> x %u x %u\n",
>> -  src_tex->id, src_level, dst_tex->id, dst_level,
>> -  src_box->x, src_box->y, src_box->z, dstx, dsty, dstz,
>> -  src_box->width, src_box->height, src_box->depth);
>> -   */
>> -
>> -   /* set src tiles to linear layout */
>> -   {
>> -  unsigned tx, ty, tw, th;
>> -  unsigned x, y;
>> -
>> -  adjust_to_tile_bounds(src_box->x, src_box->y, width, height,
>> -&tx, &ty, &tw, &th);
>> -
>> -  for (y = 0; y < th; y += TILE_SIZE) {
>> - for (x = 0; x < tw; x += TILE_SIZE) {
>> -(void) llvmpipe_get_texture_tile_linear(src_tex,
>> -src_box->z, src_level,
>> -LP_TEX_USAGE_READ,
>> -tx + x, ty + y);
>> +   for (z = 0; z < src_box->depth; z++){
>> +  llvmpipe_flush_resource(pipe,
>> +  dst, dst_level, dstz + z,
>> +  FALSE, /* read_only */
>> +  TRUE, /* cpu_access */
>> +  FALSE, /* do_not_block */
>> +  "blit dest");
>> +
>> +  llvmpipe_flush_resource(pipe,
>> +  src, src_level, src_box->z + z,
>> +  TRUE, /* read_only */
>> +  TRUE, /* cpu_access */
>> +  FALSE, /* do_not_block */
>> +  "blit src");
>> +
>> +  /*
>> +  printf("surface copy from %u lvl %u to %u lvl %u: %u,%u,%u to %u,%u,%u
>> %u x %u x %u\n",
>> + src_tex->id, src_level, dst_tex->id, dst_level,
>> + src_box->x, src_box->y, src_box->z, dstx, dsty, dstz,
>> + src_box->width, src_box->height, src_box->depth);
>> +  */
>> +
>> +  /* set src tiles to linear lay

[Mesa-dev] [Bug 38086] Mesa 7.11-devel implementation error: Unexpected program target in destroy_program_variants_cb()

2013-02-19 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=38086

--- Comment #7 from Laurent carlier  ---
(In reply to comment #6)
> Can you make a trace of this issue with apitrace? 
> https://github.com/apitrace/apitrace

You can find it here:
http://pkgbuild.com/~lcarlier/traces/hl2_linux.trace.tar.gz

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 1/3] radeonsi: use u_box_origin_2d helper function

2013-02-19 Thread Michel Dänzer

From: Marek Olšák 

[ Cherry-picked from r600g commit b278aba42310e8fa30f2408b9dcd58dbb4901724 ]

Signed-off-by: Michel Dänzer 
---
 src/gallium/drivers/radeonsi/r600_texture.c | 7 ++-
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/r600_texture.c 
b/src/gallium/drivers/radeonsi/r600_texture.c
index e8d9932..d546554 100644
--- a/src/gallium/drivers/radeonsi/r600_texture.c
+++ b/src/gallium/drivers/radeonsi/r600_texture.c
@@ -55,11 +55,8 @@ static void r600_copy_from_staging_texture(struct 
pipe_context *ctx, struct r600
struct pipe_resource *texture = transfer->resource;
struct pipe_box sbox;
 
-   sbox.x = sbox.y = sbox.z = 0;
-   sbox.width = transfer->box.width;
-   sbox.height = transfer->box.height;
-   /* XXX that might be wrong */
-   sbox.depth = 1;
+   u_box_origin_2d(transfer->box.width, transfer->box.height, &sbox);
+
ctx->resource_copy_region(ctx, texture, transfer->level,
  transfer->box.x, transfer->box.y, 
transfer->box.z,
  rtransfer->staging,
-- 
1.8.1.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 0/3] radeonsi: Cherry-pick transfer fixes from r600g

2013-02-19 Thread Michel Dänzer

These together get us 11 more little piglits with Marek's
glTex(Sub)Image improvements in st/mesa.

[PATCH 1/3] radeonsi: use u_box_origin_2d helper function
[PATCH 2/3] radeonsi: add assertions to prevent creation of invalid
[PATCH 3/3] radeonsi: implement 3D transfers
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 2/3] radeonsi: add assertions to prevent creation of invalid surfaces

2013-02-19 Thread Michel Dänzer

From: Marek Olšák 

[ Cherry-picked from r600g commit ef11ed61a0414d0405c3faf7f48fa3f1d083f82e ]

Signed-off-by: Michel Dänzer 
---
 src/gallium/drivers/radeonsi/r600_blit.c | 15 ---
 src/gallium/drivers/radeonsi/r600_texture.c  |  2 ++
 src/gallium/drivers/radeonsi/radeonsi_pipe.h | 16 
 3 files changed, 18 insertions(+), 15 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/r600_blit.c 
b/src/gallium/drivers/radeonsi/r600_blit.c
index 35c8f95..0b0eba3 100644
--- a/src/gallium/drivers/radeonsi/r600_blit.c
+++ b/src/gallium/drivers/radeonsi/r600_blit.c
@@ -98,21 +98,6 @@ static void r600_blitter_end(struct pipe_context *ctx)
r600_context_queries_resume(rctx);
 }
 
-static unsigned u_max_layer(struct pipe_resource *r, unsigned level)
-{
-   switch (r->target) {
-   case PIPE_TEXTURE_CUBE:
-   return 6 - 1;
-   case PIPE_TEXTURE_3D:
-   return u_minify(r->depth0, level) - 1;
-   case PIPE_TEXTURE_1D_ARRAY:
-   case PIPE_TEXTURE_2D_ARRAY:
-   return r->array_size - 1;
-   default:
-   return 0;
-   }
-}
-
 void si_blit_uncompress_depth(struct pipe_context *ctx,
struct r600_resource_texture *texture,
struct r600_resource_texture *staging,
diff --git a/src/gallium/drivers/radeonsi/r600_texture.c 
b/src/gallium/drivers/radeonsi/r600_texture.c
index d546554..5790974 100644
--- a/src/gallium/drivers/radeonsi/r600_texture.c
+++ b/src/gallium/drivers/radeonsi/r600_texture.c
@@ -545,6 +545,8 @@ static struct pipe_surface *r600_create_surface(struct 
pipe_context *pipe,
struct r600_surface *surface = CALLOC_STRUCT(r600_surface);
unsigned level = surf_tmpl->u.tex.level;
 
+   assert(surf_tmpl->u.tex.first_layer <= u_max_layer(texture, 
surf_tmpl->u.tex.level));
+   assert(surf_tmpl->u.tex.last_layer <= u_max_layer(texture, 
surf_tmpl->u.tex.level));
assert(surf_tmpl->u.tex.first_layer == surf_tmpl->u.tex.last_layer);
if (surface == NULL)
return NULL;
diff --git a/src/gallium/drivers/radeonsi/radeonsi_pipe.h 
b/src/gallium/drivers/radeonsi/radeonsi_pipe.h
index d0f04f4..8c6d908 100644
--- a/src/gallium/drivers/radeonsi/radeonsi_pipe.h
+++ b/src/gallium/drivers/radeonsi/radeonsi_pipe.h
@@ -277,4 +277,20 @@ static INLINE uint64_t r600_resource_va(struct pipe_screen 
*screen, struct pipe_
return rscreen->ws->buffer_get_virtual_address(rresource->cs_buf);
 }
 
+static INLINE unsigned u_max_layer(struct pipe_resource *r, unsigned level)
+{
+   switch (r->target) {
+   case PIPE_TEXTURE_CUBE:
+   return 6 - 1;
+   case PIPE_TEXTURE_3D:
+   return u_minify(r->depth0, level) - 1;
+   case PIPE_TEXTURE_1D_ARRAY:
+   case PIPE_TEXTURE_2D_ARRAY:
+   case PIPE_TEXTURE_CUBE_ARRAY:
+   return r->array_size - 1;
+   default:
+   return 0;
+   }
+}
+
 #endif
-- 
1.8.1.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 3/3] radeonsi: implement 3D transfers

2013-02-19 Thread Michel Dänzer

From: Marek Olšák 

That means we can map and read multiple slices with one transfer_map call.

[ Cherry-picked from r600g commit 1aebb6911e9aa1bd8900868b58d1750ca83a20c7 ]

Signed-off-by: Michel Dänzer 
---
 src/gallium/drivers/radeonsi/r600_texture.c | 49 +
 1 file changed, 29 insertions(+), 20 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/r600_texture.c 
b/src/gallium/drivers/radeonsi/r600_texture.c
index 5790974..153df00 100644
--- a/src/gallium/drivers/radeonsi/r600_texture.c
+++ b/src/gallium/drivers/radeonsi/r600_texture.c
@@ -55,7 +55,7 @@ static void r600_copy_from_staging_texture(struct 
pipe_context *ctx, struct r600
struct pipe_resource *texture = transfer->resource;
struct pipe_box sbox;
 
-   u_box_origin_2d(transfer->box.width, transfer->box.height, &sbox);
+   u_box_3d(0, 0, 0, transfer->box.width, transfer->box.height, 
transfer->box.depth, &sbox);
 
ctx->resource_copy_region(ctx, texture, transfer->level,
  transfer->box.x, transfer->box.y, 
transfer->box.z,
@@ -235,7 +235,6 @@ static void *si_texture_transfer_map(struct pipe_context 
*ctx,
 {
struct r600_context *rctx = (struct r600_context *)ctx;
struct r600_resource_texture *rtex = (struct 
r600_resource_texture*)texture;
-   struct pipe_resource resource;
struct r600_transfer *trans;
boolean use_staging_texture = FALSE;
struct radeon_winsys_cs_handle *buf;
@@ -295,42 +294,52 @@ static void *si_texture_transfer_map(struct pipe_context 
*ctx,
 level, level,
 box->z, box->z + box->depth - 1);
trans->transfer.stride = 
staging_depth->surface.level[level].pitch_bytes;
+   trans->transfer.layer_stride = 
staging_depth->surface.level[level].slice_size;
trans->offset = r600_texture_get_offset(staging_depth, level, 
box->z);
 
trans->staging = &staging_depth->resource.b.b;
} else if (use_staging_texture) {
-   resource.target = PIPE_TEXTURE_2D;
+   struct pipe_resource resource;
+   struct r600_resource_texture *staging;
+
+   memset(&resource, 0, sizeof(resource));
resource.format = texture->format;
resource.width0 = box->width;
resource.height0 = box->height;
resource.depth0 = 1;
resource.array_size = 1;
-   resource.last_level = 0;
-   resource.nr_samples = 0;
resource.usage = PIPE_USAGE_STAGING;
-   resource.bind = 0;
resource.flags = R600_RESOURCE_FLAG_TRANSFER;
-   /* For texture reading, the temporary (detiled) texture is used 
as
-* a render target when blitting from a tiled texture. */
-   if (usage & PIPE_TRANSFER_READ) {
-   resource.bind |= PIPE_BIND_RENDER_TARGET;
-   }
-   /* For texture writing, the temporary texture is used as a 
sampler
-* when blitting into a tiled texture. */
-   if (usage & PIPE_TRANSFER_WRITE) {
-   resource.bind |= PIPE_BIND_SAMPLER_VIEW;
+
+   /* We must set the correct texture target and dimensions if 
needed for a 3D transfer. */
+   if (box->depth > 1 && u_max_layer(texture, level) > 0)
+   resource.target = texture->target;
+   else
+   resource.target = PIPE_TEXTURE_2D;
+
+   switch (resource.target) {
+   case PIPE_TEXTURE_1D_ARRAY:
+   case PIPE_TEXTURE_2D_ARRAY:
+   case PIPE_TEXTURE_CUBE_ARRAY:
+   resource.array_size = box->depth;
+   break;
+   case PIPE_TEXTURE_3D:
+   resource.depth0 = box->depth;
+   break;
+   default:;
}
/* Create the temporary texture. */
-   trans->staging = ctx->screen->resource_create(ctx->screen, 
&resource);
-   if (trans->staging == NULL) {
+   staging = (struct 
r600_resource_texture*)ctx->screen->resource_create(ctx->screen, &resource);
+   if (staging == NULL) {
R600_ERR("failed to create temporary texture to hold 
untiled copy\n");
pipe_resource_reference(&trans->transfer.resource, 
NULL);
FREE(trans);
return NULL;
}
 
-   trans->transfer.stride = ((struct r600_resource_texture 
*)trans->staging)
-   ->surface.level[0].pitch_bytes;
+   trans->staging = &staging->resource.b.b;
+   trans->transfer.stride = staging->surface.level[0].pitch_bytes;
+   trans->tra

Re: [Mesa-dev] VTK Tests fails with Mesa Swrast passes with OSMesa

2013-02-19 Thread Kevin H. Hobbs

On 02/19/2013 09:51 AM, Brian Paul wrote:
> 
> Looks like lines, in particular, are missing.  I don't see any recent 
> changes to swrast/osmesa that would seem to cause this.
> 

There probably were none. I'm trying to track down long standing issues.

> 
> 1. You do a git-bisect of mesa to find the regression

Since I have no idea when this failure started..

> 2. Make an apitrace of the failing test so I can investigate.
> 


http://crab-lab.zool.ohiou.edu/kevin/vtk_apitraces.tar.bz2

vtk_apitraces/vtkTraceSwrast.trace fails
vtk_apitraces/vtkTraceOSMesa.trace passes



signature.asc
Description: OpenPGP digital signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] draw: make sure key size is calculated consistently.

2013-02-19 Thread Roland Scheidegger

Am 19.02.2013 15:57, schrieb Jose Fonseca:
> There may be more vertex elements that used in the shader. But why should the 
> key contain those elements? Won't this cause needless recompilations (e.g., 
> in situations where the state tracker leaves unneeded elements from previous 
> draw?)?
I don't think the state tracker would leave unneeded elements like that
(that is I think the nr_elements would be adjusted if the state tracker
has to figure it out on its own, causing recompiles in any case).
But yes if you set different pipe_vertex_element which only differ in
the unused elements then it will cause unnecessary recompile (I don't
think that's really something which matters here).

> 
> That is, it seems to be that the key should have the number of elements from 
> pipe_vertex_element information, but only copy those that vertex shader uses.
That doesn't sound very good. If we want to dump the
pipe_vertex_elements like we do now either we need to fix up the
nr_elements or copy all of them. Also vs_generate function seems to
create code for all pipe_vertex_elements, not just those used by the shader.
I guess that instead of using nr_elements we could just use the
information from the shader instead consistently, though I'm actually
unsure this works always - is it somehow possible to only use
vertex_element nr 2 and 4 for instance?

So I think you're suggesting instead of this fix that key->nr_elements
wouldn't be used for anything except the key comparison itself, and
everything else (calculating sampler offset in the key, tgsi dump, code
generation) would use the shader information?

Roland

> 
> Jose
> 
> 
> - Original Message -
>> From: Roland Scheidegger 
>>
>> Some parts calculated key size by using shader information, others by using
>> the pipe_vertex_element information. Since it is perfectly valid to have more
>> vertex_elements set than the vertex shader is using those may not be the
>> same,
>> so we weren't copying over all vertex_element state - this caused the tgsi
>> dump
>> to assert (iterates over all vertex elements). With some luck it didn't
>> crash otherwise even though the llvm generate_fetch code also iterates over
>> all vertex elements (probably because llvm threw away the unused inputs
>> anyway),
>> but if in this situation vertex texturing would be used things would
>> definitely
>> go wrong (as the sampler information wouldn't be copied).
>> So drop the key size calculation using shader information.
>> ---
>>  src/gallium/auxiliary/draw/draw_llvm.c |   13 -
>>  src/gallium/auxiliary/draw/draw_llvm.h |1 -
>>  .../draw/draw_pt_fetch_shade_pipeline_llvm.c   |7 ++-
>>  src/gallium/auxiliary/draw/draw_vs_llvm.c  |6 --
>>  4 files changed, 14 insertions(+), 13 deletions(-)
>>
>> diff --git a/src/gallium/auxiliary/draw/draw_llvm.c
>> b/src/gallium/auxiliary/draw/draw_llvm.c
>> index f3b..df57358 100644
>> --- a/src/gallium/auxiliary/draw/draw_llvm.c
>> +++ b/src/gallium/auxiliary/draw/draw_llvm.c
>> @@ -420,17 +420,20 @@ draw_llvm_destroy(struct draw_llvm *llvm)
>>   */
>>  struct draw_llvm_variant *
>>  draw_llvm_create_variant(struct draw_llvm *llvm,
>> - unsigned num_inputs,
>> - const struct draw_llvm_variant_key *key)
>> + unsigned num_inputs,
>> + const struct draw_llvm_variant_key *key)
>>  {
>> struct draw_llvm_variant *variant;
>> struct llvm_vertex_shader *shader =
>>llvm_vertex_shader(llvm->draw->vs.vertex_shader);
>> LLVMTypeRef vertex_header;
>> +   unsigned key_size = draw_llvm_variant_key_size(key->nr_vertex_elements,
>> +  MAX2(key->nr_samplers,
>> +
>> key->nr_sampler_views));
>>  
>> variant = MALLOC(sizeof *variant +
>> -shader->variant_key_size -
>> -sizeof variant->key);
>> +key_size -
>> +sizeof variant->key);
>> if (variant == NULL)
>>return NULL;
>>  
>> @@ -440,7 +443,7 @@ draw_llvm_create_variant(struct draw_llvm *llvm,
>>  
>> create_jit_types(variant);
>>  
>> -   memcpy(&variant->key, key, shader->variant_key_size);
>> +   memcpy(&variant->key, key, key_size);
>>  
>> vertex_header = create_jit_vertex_header(variant->gallivm, num_inputs);
>>  
>> diff --git a/src/gallium/auxiliary/draw/draw_llvm.h
>> b/src/gallium/auxiliary/draw/draw_llvm.h
>> index 17ca304..b20cee5 100644
>> --- a/src/gallium/auxiliary/draw/draw_llvm.h
>> +++ b/src/gallium/auxiliary/draw/draw_llvm.h
>> @@ -281,7 +281,6 @@ struct draw_llvm_variant
>>  struct llvm_vertex_shader {
>> struct draw_vertex_shader base;
>>  
>> -   unsigned variant_key_size;
>> struct draw_llvm_variant_list_item variants;
>> unsigned variants_created;
>> unsigned variants_cached;
>> diff --git a/src/gallium/auxiliary/draw/draw_pt_fetch_shade_pipeline_llvm.c
>> b/src/g

Re: [Mesa-dev] [PATCH] draw: make sure key size is calculated consistently.

2013-02-19 Thread Jose Fonseca



- Original Message -
> Am 19.02.2013 15:57, schrieb Jose Fonseca:
> > There may be more vertex elements that used in the shader. But why should
> > the key contain those elements? Won't this cause needless recompilations
> > (e.g., in situations where the state tracker leaves unneeded elements from
> > previous draw?)?
> I don't think the state tracker would leave unneeded elements like that
> (that is I think the nr_elements would be adjusted if the state tracker
> has to figure it out on its own, causing recompiles in any case).
> But yes if you set different pipe_vertex_element which only differ in
> the unused elements then it will cause unnecessary recompile (I don't
> think that's really something which matters here).
> 
> > 
> > That is, it seems to be that the key should have the number of elements
> > from pipe_vertex_element information, but only copy those that vertex
> > shader uses.
> That doesn't sound very good. If we want to dump the
> pipe_vertex_elements like we do now either we need to fix up the
> nr_elements or copy all of them. Also vs_generate function seems to
> create code for all pipe_vertex_elements, not just those used by the shader.
> I guess that instead of using nr_elements we could just use the
> information from the shader instead consistently, though I'm actually
> unsure this works always - is it somehow possible to only use
> vertex_element nr 2 and 4 for instance?

Fair enough. Let's get this is as is for now, and keep our eyes open for any 
performance regression.

Jose

> So I think you're suggesting instead of this fix that key->nr_elements
> wouldn't be used for anything except the key comparison itself, and
> everything else (calculating sampler offset in the key, tgsi dump, code
> generation) would use the shader information?
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 0/3] radeonsi: Cherry-pick transfer fixes from r600g

2013-02-19 Thread Alex Deucher

On Tue, Feb 19, 2013 at 12:15 PM, Michel Dänzer  wrote:
> These together get us 11 more little piglits with Marek's
> glTex(Sub)Image improvements in st/mesa.
>
> [PATCH 1/3] radeonsi: use u_box_origin_2d helper function
> [PATCH 2/3] radeonsi: add assertions to prevent creation of invalid
> [PATCH 3/3] radeonsi: implement 3D transfers

For the series:

Reviewed-by: Alex Deucher 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] draw: make sure key size is calculated consistently.

2013-02-19 Thread Roland Scheidegger

Am 19.02.2013 18:54, schrieb Jose Fonseca:
> 
> 
> - Original Message -
>> Am 19.02.2013 15:57, schrieb Jose Fonseca:
>>> There may be more vertex elements that used in the shader. But why should
>>> the key contain those elements? Won't this cause needless recompilations
>>> (e.g., in situations where the state tracker leaves unneeded elements from
>>> previous draw?)?
>> I don't think the state tracker would leave unneeded elements like that
>> (that is I think the nr_elements would be adjusted if the state tracker
>> has to figure it out on its own, causing recompiles in any case).
>> But yes if you set different pipe_vertex_element which only differ in
>> the unused elements then it will cause unnecessary recompile (I don't
>> think that's really something which matters here).
>>
>>>
>>> That is, it seems to be that the key should have the number of elements
>>> from pipe_vertex_element information, but only copy those that vertex
>>> shader uses.
>> That doesn't sound very good. If we want to dump the
>> pipe_vertex_elements like we do now either we need to fix up the
>> nr_elements or copy all of them. Also vs_generate function seems to
>> create code for all pipe_vertex_elements, not just those used by the shader.
>> I guess that instead of using nr_elements we could just use the
>> information from the shader instead consistently, though I'm actually
>> unsure this works always - is it somehow possible to only use
>> vertex_element nr 2 and 4 for instance?
> 
> Fair enough. Let's get this is as is for now, and keep our eyes open for any 
> performance regression.
No I realised you are actually right. The correct thing to do is indeed
just use the shader information for nr_vertex_elements. This is a
simpler change, it gets rid of the unnecessary dumping of unused
elements automatically, and should avoid unnecessary recompiles (even if
that's probably more of a theoretical case). I noticed the shader
generation actually didn't use these values in any case (although it
could (should?) which is why this worked (so not just by luck).
(Storing nr_vertex_elements in the key is actually unneeded now really
but it's used in quite some places and it looks like a hassle at least
for some of those places to get the shader information instead.)
I'll send out a new patch...

Roland
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] draw: make sure key size is calculated consistently.

2013-02-19 Thread sroland

From: Roland Scheidegger 

Some parts calculated key size by using shader information, others by using
the pipe_vertex_element information. Since it is perfectly valid to have more
vertex_elements set than the vertex shader is using those may not be the same,
so we weren't copying over all vertex_element state - this caused the tgsi dump
to assert (iterates over all vertex elements). More importantly in this
situation it would also break vertex texturing completely (since the sampler
state derived from the key is at a different position than expected).
Fix thix by deriving key->nr_vertex_elements from the shader information
instead of the pipe_vertex_element state (unlike dx10, we can't have "holes"
in pipe_vertex_element state, so this should be safe).
(Note that actual llvm shader generation does not use the pipe_vertex_element
state from the key itself in any case (althogh I guess it could) but uses
the one from draw.pt (which should be the same though contains all elements)
instead.)
---
 src/gallium/auxiliary/draw/draw_llvm.c |   14 +-
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/src/gallium/auxiliary/draw/draw_llvm.c 
b/src/gallium/auxiliary/draw/draw_llvm.c
index f3b..2467e5a 100644
--- a/src/gallium/auxiliary/draw/draw_llvm.c
+++ b/src/gallium/auxiliary/draw/draw_llvm.c
@@ -420,8 +420,8 @@ draw_llvm_destroy(struct draw_llvm *llvm)
  */
 struct draw_llvm_variant *
 draw_llvm_create_variant(struct draw_llvm *llvm,
-unsigned num_inputs,
-const struct draw_llvm_variant_key *key)
+ unsigned num_inputs,
+ const struct draw_llvm_variant_key *key)
 {
struct draw_llvm_variant *variant;
struct llvm_vertex_shader *shader =
@@ -429,8 +429,8 @@ draw_llvm_create_variant(struct draw_llvm *llvm,
LLVMTypeRef vertex_header;
 
variant = MALLOC(sizeof *variant +
-   shader->variant_key_size -
-   sizeof variant->key);
+shader->variant_key_size -
+sizeof variant->key);
if (variant == NULL)
   return NULL;
 
@@ -1415,8 +1415,12 @@ draw_llvm_make_variant_key(struct draw_llvm *llvm, char 
*store)
 
/* Presumably all variants of the shader should have the same
 * number of vertex elements - ie the number of shader inputs.
+* NOTE: we NEED to store the needed number of needed inputs
+* here, not the number of provided elements to match keysize
+* (and the offset of sampler state in the key).
 */
-   key->nr_vertex_elements = llvm->draw->pt.nr_vertex_elements;
+   key->nr_vertex_elements = 
llvm->draw->vs.vertex_shader->info.file_max[TGSI_FILE_INPUT] + 1;
+   assert(key->nr_vertex_elements <= llvm->draw->pt.nr_vertex_elements);
 
/* will have to rig this up properly later */
key->clip_xy = llvm->draw->clip_xy;
-- 
1.7.9.5

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] st/mesa: fix trimming of GL_QUAD_STRIP

2013-02-19 Thread Brian Paul

We sometimes convert GL_QUAD_STRIP prims into GL_TRIANGLE_STRIP, but
that changes the results of the u_trim_pipe_prim() call.  We need to
pass the original primitive type to the trim function.

Note that OpenGL's GL_x prim type values match Gallium's PIPE_PRIM_x values.

Fixes a failure in the new piglit degenerate-prims test.

Note: This is a candidate for the stable branches.
---
 src/mesa/state_tracker/st_draw.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/src/mesa/state_tracker/st_draw.c b/src/mesa/state_tracker/st_draw.c
index de62264..bff8d9b 100644
--- a/src/mesa/state_tracker/st_draw.c
+++ b/src/mesa/state_tracker/st_draw.c
@@ -283,7 +283,7 @@ st_draw_vbo(struct gl_context *ctx,
  /* don't trim, restarts might be inside index list */
  cso_draw_vbo(st->cso_context, &info);
   }
-  else if (u_trim_pipe_prim(info.mode, &info.count))
+  else if (u_trim_pipe_prim(prims[i].mode, &info.count))
  cso_draw_vbo(st->cso_context, &info);
}
 
-- 
1.7.3.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] r600g: status of my work on the shader optimization

2013-02-19 Thread Vadim Girlin


On 02/19/2013 08:39 PM, Andy Furniss wrote:

Vadim Girlin wrote:


Could you please test glxgears and other simple mesa demos? It's easier
to spot the problems with small apps that don't use a lot of complex
shaders. If some of them don't work correctly, please send me the dumps
with "R600_DUMP_SHADERS=2 R600_SB_DUMP=3".


All of the mesa demos work with and without llvm.


Also it might help if you can
look for piglit regressions against the piglit results with R600_SB=0
and send me the dumps for a few regressed tests.


I don't actually have piglit - it was always a pain with cmake to get it
to build on my old 32bit lfs with xorg/mesa installed under home.

I do now have a new 64bit clfs build with everything in normal places -
so maybe I'll give it a go on that - but I don't know how to use it as
such.

Even though it's "new" clfs uses gcc 4.6.3 so on there g++ is actually
too old to build your tree - without changing some friends  to
friends class ...


This should be fixed already.



I don't know when I'll get time to learn piglit but for now here's
working and not nexuiz.

R600_DUMP_SHADERS=2 R600_SB_DUMP=3 nexuiz &> nexuiz-working-dump

http://www.andyqos.ukfsn.org/nexuiz-working-dump

R600_LLVM=0 R600_DUMP_SHADERS=2 R600_SB_DUMP=3 nexuiz &>
nexuiz-corrupt-dump

http://www.andyqos.ukfsn.org/nexuiz-corrupt-dump



OK, I already got the dumps with piglit regressions on r700, the dump 
with nexuiz may also help. Thanks.


Vadim
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] llvmpipe: lp_resource_copy cleanup

2013-02-19 Thread sroland

From: Roland Scheidegger 

We don't need to flush resources for each layer, and since we don't actually
care about layer at all in the flush function just drop the parameter.
Also we can use util_copy_box instead of repeated util_copy_rect.
---
 src/gallium/drivers/llvmpipe/lp_flush.c   |3 +-
 src/gallium/drivers/llvmpipe/lp_flush.h   |1 -
 src/gallium/drivers/llvmpipe/lp_surface.c |   87 +++--
 src/gallium/drivers/llvmpipe/lp_texture.c |3 +-
 src/gallium/drivers/llvmpipe/lp_texture.h |2 +-
 5 files changed, 47 insertions(+), 49 deletions(-)

diff --git a/src/gallium/drivers/llvmpipe/lp_flush.c 
b/src/gallium/drivers/llvmpipe/lp_flush.c
index 964b792..cbfe564 100644
--- a/src/gallium/drivers/llvmpipe/lp_flush.c
+++ b/src/gallium/drivers/llvmpipe/lp_flush.c
@@ -98,7 +98,6 @@ boolean
 llvmpipe_flush_resource(struct pipe_context *pipe,
 struct pipe_resource *resource,
 unsigned level,
-int layer,
 boolean read_only,
 boolean cpu_access,
 boolean do_not_block,
@@ -106,7 +105,7 @@ llvmpipe_flush_resource(struct pipe_context *pipe,
 {
unsigned referenced;
 
-   referenced = llvmpipe_is_resource_referenced(pipe, resource, level, layer);
+   referenced = llvmpipe_is_resource_referenced(pipe, resource, level);
 
if ((referenced & LP_REFERENCED_FOR_WRITE) ||
((referenced & LP_REFERENCED_FOR_READ) && !read_only)) {
diff --git a/src/gallium/drivers/llvmpipe/lp_flush.h 
b/src/gallium/drivers/llvmpipe/lp_flush.h
index efff94c..bc1e2a8 100644
--- a/src/gallium/drivers/llvmpipe/lp_flush.h
+++ b/src/gallium/drivers/llvmpipe/lp_flush.h
@@ -47,7 +47,6 @@ boolean
 llvmpipe_flush_resource(struct pipe_context *pipe,
 struct pipe_resource *resource,
 unsigned level,
-int layer,
 boolean read_only,
 boolean cpu_access,
 boolean do_not_block,
diff --git a/src/gallium/drivers/llvmpipe/lp_surface.c 
b/src/gallium/drivers/llvmpipe/lp_surface.c
index dbaed95..a83a903 100644
--- a/src/gallium/drivers/llvmpipe/lp_surface.c
+++ b/src/gallium/drivers/llvmpipe/lp_surface.c
@@ -57,14 +57,12 @@ lp_resource_copy(struct pipe_context *pipe,
  struct pipe_resource *src, unsigned src_level,
  const struct pipe_box *src_box)
 {
-   /* XXX this used to ignore srcz/dstz
-* assume it works the same for cube and 3d
-*/
struct llvmpipe_resource *src_tex = llvmpipe_resource(src);
struct llvmpipe_resource *dst_tex = llvmpipe_resource(dst);
const enum pipe_format format = src_tex->base.format;
unsigned width = src_box->width;
unsigned height = src_box->height;
+   unsigned depth = src_box->depth;
unsigned z;
 
/* Fallback for buffers. */
@@ -74,27 +72,28 @@ lp_resource_copy(struct pipe_context *pipe,
   return;
}
 
+   llvmpipe_flush_resource(pipe,
+   dst, dst_level,
+   FALSE, /* read_only */
+   TRUE, /* cpu_access */
+   FALSE, /* do_not_block */
+   "blit dest");
+
+   llvmpipe_flush_resource(pipe,
+   src, src_level,
+   TRUE, /* read_only */
+   TRUE, /* cpu_access */
+   FALSE, /* do_not_block */
+   "blit src");
+
+   /*
+   printf("surface copy from %u lvl %u to %u lvl %u: %u,%u,%u to %u,%u,%u %u x 
%u x %u\n",
+  src_tex->id, src_level, dst_tex->id, dst_level,
+  src_box->x, src_box->y, src_box->z, dstx, dsty, dstz,
+  src_box->width, src_box->height, src_box->depth);
+   */
+
for (z = 0; z < src_box->depth; z++){
-  llvmpipe_flush_resource(pipe,
-  dst, dst_level, dstz + z,
-  FALSE, /* read_only */
-  TRUE, /* cpu_access */
-  FALSE, /* do_not_block */
-  "blit dest");
-
-  llvmpipe_flush_resource(pipe,
-  src, src_level, src_box->z + z,
-  TRUE, /* read_only */
-  TRUE, /* cpu_access */
-  FALSE, /* do_not_block */
-  "blit src");
-
-  /*
-  printf("surface copy from %u lvl %u to %u lvl %u: %u,%u,%u to %u,%u,%u 
%u x %u x %u\n",
- src_tex->id, src_level, dst_tex->id, dst_level,
- src_box->x, src_box->y, src_box->z, dstx, dsty, dstz,
- src_box->width, src_box->height, src_box->depth);
-  */
 
   /* set src tiles to linear layout */
   {
@@ -148,27 +147,29 @@ lp_resource_copy(struct pipe_context *pipe,
 }
  }

[Mesa-dev] [PATCH] gallivm: fix indirect src register fetches requiring bitcast

2013-02-19 Thread sroland

From: Roland Scheidegger 

For constant and temporary register fetches, the bitcasts weren't done
correctly for the indirect case, leading to crashes due to type mismatches.
Simply do the bitcasts after fetching (much simpler than fixing up the load
pointer for the various cases).

This fixes https://bugs.freedesktop.org/show_bug.cgi?id=61036
---
 src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c |   37 ++-
 1 file changed, 16 insertions(+), 21 deletions(-)

diff --git a/src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c 
b/src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c
index ae4a577..69957fe 100644
--- a/src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c
+++ b/src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c
@@ -603,10 +603,10 @@ emit_fetch_constant(
LLVMBuilderRef builder = gallivm->builder;
struct lp_build_context *uint_bld = &bld_base->uint_bld;
LLVMValueRef indirect_index = NULL;
-   struct lp_build_context *bld_fetch = stype_to_fetch(bld_base, stype);
unsigned dimension = 0;
LLVMValueRef dimension_index;
LLVMValueRef consts_ptr;
+   LLVMValueRef res;
 
/* XXX: Handle fetching xyzw components as a vector */
assert(swizzle != ~0);
@@ -637,7 +637,7 @@ emit_fetch_constant(
   index_vec = lp_build_add(uint_bld, index_vec, swizzle_vec);
 
   /* Gather values from the constant buffer */
-  return build_gather(bld_fetch, consts_ptr, index_vec);
+  res = build_gather(&bld_base->base, consts_ptr, index_vec);
}
else {
   LLVMValueRef index;  /* index into the const buffer */
@@ -646,18 +646,16 @@ emit_fetch_constant(
   index = lp_build_const_int32(gallivm, reg->Register.Index*4 + swizzle);
 
   scalar_ptr = LLVMBuildGEP(builder, consts_ptr,
-   &index, 1, "");
-
-  if (stype != TGSI_TYPE_FLOAT && stype != TGSI_TYPE_UNTYPED) {
- LLVMTypeRef ivtype = 
LLVMPointerType(LLVMInt32TypeInContext(gallivm->context), 0);
- LLVMValueRef temp_ptr;
- temp_ptr = LLVMBuildBitCast(builder, scalar_ptr, ivtype, "");
- scalar = LLVMBuildLoad(builder, temp_ptr, "");
-  } else
- scalar = LLVMBuildLoad(builder, scalar_ptr, "");
+&index, 1, "");
+  scalar = LLVMBuildLoad(builder, scalar_ptr, "");
+  res = lp_build_broadcast_scalar(&bld_base->base, scalar);
+   }
 
-  return lp_build_broadcast_scalar(bld_fetch, scalar);
+   if (stype == TGSI_TYPE_SIGNED || stype == TGSI_TYPE_UNSIGNED) {
+  struct lp_build_context *bld_fetch = stype_to_fetch(bld_base, stype);
+  res = LLVMBuildBitCast(builder, res, bld_fetch->vec_type, "");
}
+   return res;
 }
 
 static LLVMValueRef
@@ -791,16 +789,13 @@ emit_fetch_temporary(
}
else {
   LLVMValueRef temp_ptr;
-  if (stype != TGSI_TYPE_FLOAT && stype != TGSI_TYPE_UNTYPED) {
- LLVMTypeRef itype = LLVMPointerType(bld->bld_base.int_bld.vec_type, 
0);
- LLVMValueRef tint_ptr = lp_get_temp_ptr_soa(bld, reg->Register.Index,
- swizzle);
- temp_ptr = LLVMBuildBitCast(builder, tint_ptr, itype, "");
-  } else
- temp_ptr = lp_get_temp_ptr_soa(bld, reg->Register.Index, swizzle);
+  temp_ptr = lp_get_temp_ptr_soa(bld, reg->Register.Index, swizzle);
   res = LLVMBuildLoad(builder, temp_ptr, "");
-  if (!res)
- return bld->bld_base.base.undef;
+   }
+
+   if (stype == TGSI_TYPE_SIGNED || stype == TGSI_TYPE_UNSIGNED) {
+  struct lp_build_context *bld_fetch = stype_to_fetch(bld_base, stype);
+  res = LLVMBuildBitCast(builder, res, bld_fetch->vec_type, "");
}
 
return res;
-- 
1.7.9.5

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] llvmpipe: lp_resource_copy cleanup

2013-02-19 Thread Jose Fonseca



- Original Message -
> From: Roland Scheidegger 
> 
> We don't need to flush resources for each layer, and since we don't actually
> care about layer at all in the flush function just drop the parameter.
> Also we can use util_copy_box instead of repeated util_copy_rect.
> ---
>  src/gallium/drivers/llvmpipe/lp_flush.c   |3 +-
>  src/gallium/drivers/llvmpipe/lp_flush.h   |1 -
>  src/gallium/drivers/llvmpipe/lp_surface.c |   87
>  +++--
>  src/gallium/drivers/llvmpipe/lp_texture.c |3 +-
>  src/gallium/drivers/llvmpipe/lp_texture.h |2 +-
>  5 files changed, 47 insertions(+), 49 deletions(-)
> 
> diff --git a/src/gallium/drivers/llvmpipe/lp_flush.c
> b/src/gallium/drivers/llvmpipe/lp_flush.c
> index 964b792..cbfe564 100644
> --- a/src/gallium/drivers/llvmpipe/lp_flush.c
> +++ b/src/gallium/drivers/llvmpipe/lp_flush.c
> @@ -98,7 +98,6 @@ boolean
>  llvmpipe_flush_resource(struct pipe_context *pipe,
>  struct pipe_resource *resource,
>  unsigned level,
> -int layer,
>  boolean read_only,
>  boolean cpu_access,
>  boolean do_not_block,
> @@ -106,7 +105,7 @@ llvmpipe_flush_resource(struct pipe_context *pipe,
>  {
> unsigned referenced;
>  
> -   referenced = llvmpipe_is_resource_referenced(pipe, resource, level,
> layer);
> +   referenced = llvmpipe_is_resource_referenced(pipe, resource, level);
>  
> if ((referenced & LP_REFERENCED_FOR_WRITE) ||
> ((referenced & LP_REFERENCED_FOR_READ) && !read_only)) {
> diff --git a/src/gallium/drivers/llvmpipe/lp_flush.h
> b/src/gallium/drivers/llvmpipe/lp_flush.h
> index efff94c..bc1e2a8 100644
> --- a/src/gallium/drivers/llvmpipe/lp_flush.h
> +++ b/src/gallium/drivers/llvmpipe/lp_flush.h
> @@ -47,7 +47,6 @@ boolean
>  llvmpipe_flush_resource(struct pipe_context *pipe,
>  struct pipe_resource *resource,
>  unsigned level,
> -int layer,
>  boolean read_only,
>  boolean cpu_access,
>  boolean do_not_block,
> diff --git a/src/gallium/drivers/llvmpipe/lp_surface.c
> b/src/gallium/drivers/llvmpipe/lp_surface.c
> index dbaed95..a83a903 100644
> --- a/src/gallium/drivers/llvmpipe/lp_surface.c
> +++ b/src/gallium/drivers/llvmpipe/lp_surface.c
> @@ -57,14 +57,12 @@ lp_resource_copy(struct pipe_context *pipe,
>   struct pipe_resource *src, unsigned src_level,
>   const struct pipe_box *src_box)
>  {
> -   /* XXX this used to ignore srcz/dstz
> -* assume it works the same for cube and 3d
> -*/
> struct llvmpipe_resource *src_tex = llvmpipe_resource(src);
> struct llvmpipe_resource *dst_tex = llvmpipe_resource(dst);
> const enum pipe_format format = src_tex->base.format;
> unsigned width = src_box->width;
> unsigned height = src_box->height;
> +   unsigned depth = src_box->depth;
> unsigned z;
>  
> /* Fallback for buffers. */
> @@ -74,27 +72,28 @@ lp_resource_copy(struct pipe_context *pipe,
>return;
> }
>  
> +   llvmpipe_flush_resource(pipe,
> +   dst, dst_level,
> +   FALSE, /* read_only */
> +   TRUE, /* cpu_access */
> +   FALSE, /* do_not_block */
> +   "blit dest");
> +
> +   llvmpipe_flush_resource(pipe,
> +   src, src_level,
> +   TRUE, /* read_only */
> +   TRUE, /* cpu_access */
> +   FALSE, /* do_not_block */
> +   "blit src");
> +
> +   /*
> +   printf("surface copy from %u lvl %u to %u lvl %u: %u,%u,%u to %u,%u,%u %u
> x %u x %u\n",
> +  src_tex->id, src_level, dst_tex->id, dst_level,
> +  src_box->x, src_box->y, src_box->z, dstx, dsty, dstz,
> +  src_box->width, src_box->height, src_box->depth);
> +   */
> +
> for (z = 0; z < src_box->depth; z++){
> -  llvmpipe_flush_resource(pipe,
> -  dst, dst_level, dstz + z,
> -  FALSE, /* read_only */
> -  TRUE, /* cpu_access */
> -  FALSE, /* do_not_block */
> -  "blit dest");
> -
> -  llvmpipe_flush_resource(pipe,
> -  src, src_level, src_box->z + z,
> -  TRUE, /* read_only */
> -  TRUE, /* cpu_access */
> -  FALSE, /* do_not_block */
> -  "blit src");
> -
> -  /*
> -  printf("surface copy from %u lvl %u to %u lvl %u: %u,%u,%u to %u,%u,%u
> %u x %u x %u\n",
> - src_tex->id, src_level, dst_tex->id, dst_level,
> - src_box->x, src_b

Re: [Mesa-dev] [PATCH libdrm] freedreno: add freedreno DRM

2013-02-19 Thread Eric Anholt

Rob Clark  writes:

> From: Rob Clark 
>
> The libdrm_freedreno helper layer for use by xf86-video-freedreno,
> fdre (freedreno r/e library and tests for driving gpu), and eventual
> gallium driver for the Adreno GPU.  This uses the msm gpu driver
> from QCOM's android kernel tree.
>
> Note that current msm kernel driver is a bit strange.  It provides a
> DRM interface for GEM, which is basically sufficient to have DRI2
> working.  But it does not provide KMS.  And interface to 2d and 3d
> cores is via different other devices (/dev/kgsl-*).  This is not
> quite how I'd write a DRM driver, but at this stage it is useful for
> xf86-video-freedreno and fdre (and eventual gallium driver) to be
> able to work on existing kernel driver from QCOM, to allow to
> capture cmdstream dumps from the binary blob drivers without having
> to reboot.  So libdrm_freedreno attempts to hide most of the crazy.
> The intention is that when there is a proper kernel driver, it will
> be mostly just changes in libdrm_freedreno to adapt the gallium
> driver and xf86-video-freedreno (ignoring the fbdev->KMS changes).
>
> So don't look at freedreno as an example of how to write a libdrm
> module or a DRM driver.. it is just an attempt to paper over a non-
> standard kernel driver architecture.

Yeah, at this stage I expect things to be kinda held together with duct
tape and baling wire, and it's still worth having the code in git.

Acked-by: Eric Anholt 


pgpbjUlhOpuYw.pgp
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 59331] piglit arb_uniform_buffer_object-getintegeri_v regression

2013-02-19 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=59331

Ian Romanick  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #3 from Ian Romanick  ---
Fixed by piglit commit:

commit 7651a69e6c58d4d28373225a67ccac10468f2afe
Author: Ian Romanick 
Date:   Mon Jan 28 17:04:41 2013 -0800

GL_ARB_ubo/getintegeri_v: Respect implementation value of
GL_UNIFORM_BUFFER_OFFSET_ALIGNMENT

Signed-off-by: Ian Romanick 
Cc: Vinson Lee 
Cc: Fredrik Höglund 
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=59331

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 59331] piglit arb_uniform_buffer_object-getintegeri_v regression

2013-02-19 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=59331

Ian Romanick  changed:

   What|Removed |Added

 CC||xunx.f...@intel.com

--- Comment #4 from Ian Romanick  ---
*** Bug 61043 has been marked as a duplicate of this bug. ***

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] Need bench mark application for Opengles2 on mesa-8.0.4 with Linux

2013-02-19 Thread Ian Romanick


On 02/18/2013 02:31 AM, Ramesh Reddy Emmadi wrote:

Hi,

Can you please let us know is there any benchmark tool for opengles2 API's in 
Linux and Windows.

Thanks and Regards,
Ramesh

 CAUTION - Disclaimer *
This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely
for the use of the addressee(s). If you are not the intended recipient, please


This is a public mailing list, so all of this is bogus.  Remove it 
before posting again.  If your mail server is so broken that it cannot 
do this, use a different e-mail provider.  There are many fine, free 
options available.



notify the sender by e-mail and delete the original message. Further, you are 
not
to copy, disclose, or distribute this e-mail or its contents to any other 
person and
any such actions are unlawful. This e-mail may contain viruses. Infosys has 
taken
every reasonable precaution to minimize this risk, but is not liable for any 
damage
you may sustain as a result of any virus in this e-mail. You should carry out 
your
own virus checks before opening the e-mail or attachment. Infosys reserves the
right to monitor and review the content of all messages sent to or from this 
e-mail
address. Messages sent to or from this e-mail address may be stored on the
Infosys e-mail system.
***INFOSYS End of Disclaimer INFOSYS***
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] mesa: Don't install glEvalMesh in the beginend dispatch table

2013-02-19 Thread Ian Romanick

From: Ian Romanick 

NOTE: This is a candidate for the 9.1 branch.

Signed-off-by: Ian Romanick 
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=59740
Cc: Eric Anholt 
---
 src/mesa/main/eval.c   | 11 ---
 src/mesa/main/eval.h   |  4 +++-
 src/mesa/main/vtxfmt.c | 10 +-
 3 files changed, 16 insertions(+), 9 deletions(-)

diff --git a/src/mesa/main/eval.c b/src/mesa/main/eval.c
index 44b5792..b3c2841 100644
--- a/src/mesa/main/eval.c
+++ b/src/mesa/main/eval.c
@@ -824,7 +824,8 @@ _mesa_MapGrid2d( GLint un, GLdouble u1, GLdouble u2,
 
 void
 _mesa_install_eval_vtxfmt(struct _glapi_table *disp,
-  const GLvertexformat *vfmt)
+  const GLvertexformat *vfmt,
+  bool beginend)
 {
SET_EvalCoord1f(disp, vfmt->EvalCoord1f);
SET_EvalCoord1fv(disp, vfmt->EvalCoord1fv);
@@ -833,8 +834,12 @@ _mesa_install_eval_vtxfmt(struct _glapi_table *disp,
SET_EvalPoint1(disp, vfmt->EvalPoint1);
SET_EvalPoint2(disp, vfmt->EvalPoint2);
 
-   SET_EvalMesh1(disp, vfmt->EvalMesh1);
-   SET_EvalMesh2(disp, vfmt->EvalMesh2);
+   /* glEvalMesh1 and glEvalMesh2 are not allowed between glBegin and glEnd.
+*/
+   if (!beginend) {
+  SET_EvalMesh1(disp, vfmt->EvalMesh1);
+  SET_EvalMesh2(disp, vfmt->EvalMesh2);
+   }
 }
 
 
diff --git a/src/mesa/main/eval.h b/src/mesa/main/eval.h
index 1b6c704..cfde53f 100644
--- a/src/mesa/main/eval.h
+++ b/src/mesa/main/eval.h
@@ -39,6 +39,7 @@
 
 #include "main/mfeatures.h"
 #include "main/mtypes.h"
+#include 
 
 
 #define _MESA_INIT_EVAL_VTXFMT(vfmt, impl) \
@@ -76,7 +77,8 @@ extern GLfloat *_mesa_copy_map_points2d(GLenum target,
 
 extern void
 _mesa_install_eval_vtxfmt(struct _glapi_table *disp,
-  const GLvertexformat *vfmt);
+  const GLvertexformat *vfmt,
+  bool beginend);
 
 extern void _mesa_init_eval( struct gl_context *ctx );
 extern void _mesa_free_eval_data( struct gl_context *ctx );
diff --git a/src/mesa/main/vtxfmt.c b/src/mesa/main/vtxfmt.c
index 347d07d..8669c40 100644
--- a/src/mesa/main/vtxfmt.c
+++ b/src/mesa/main/vtxfmt.c
@@ -45,7 +45,7 @@
  */
 static void
 install_vtxfmt(struct gl_context *ctx, struct _glapi_table *tab,
-   const GLvertexformat *vfmt)
+   const GLvertexformat *vfmt, bool beginend)
 {
assert(ctx->Version > 0);
 
@@ -62,7 +62,7 @@ install_vtxfmt(struct gl_context *ctx, struct _glapi_table 
*tab,
}
 
if (ctx->API == API_OPENGL_COMPAT) {
-  _mesa_install_eval_vtxfmt(tab, vfmt);
+  _mesa_install_eval_vtxfmt(tab, vfmt, beginend);
}
 
if (ctx->API != API_OPENGL_CORE && ctx->API != API_OPENGLES2) {
@@ -251,9 +251,9 @@ install_vtxfmt(struct gl_context *ctx, struct _glapi_table 
*tab,
 void
 _mesa_install_exec_vtxfmt(struct gl_context *ctx, const GLvertexformat *vfmt)
 {
-   install_vtxfmt( ctx, ctx->Exec, vfmt );
+   install_vtxfmt(ctx, ctx->Exec, vfmt, false);
if (ctx->BeginEnd)
-  install_vtxfmt( ctx, ctx->BeginEnd, vfmt );
+  install_vtxfmt(ctx, ctx->BeginEnd, vfmt, true);
 }
 
 
@@ -265,7 +265,7 @@ void
 _mesa_install_save_vtxfmt(struct gl_context *ctx, const GLvertexformat *vfmt)
 {
if (_mesa_is_desktop_gl(ctx))
-  install_vtxfmt( ctx, ctx->Save, vfmt );
+  install_vtxfmt(ctx, ctx->Save, vfmt, false);
 }
 
 
-- 
1.7.11.7

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 2/2] i965: Consign COORD_REPLACE VS hacks to Pre-Gen6.

2013-02-19 Thread Kenneth Graunke


On 02/16/2013 07:29 AM, Paul Berry wrote:

Pre-Gen6, the SF thread requires exact matching between VS output
slots (aka VUE slots) and FS input slots, even when the corresponding
VS output slot is unused due to being overwritten by point coordinate
replacement (glTexEnvi(GL_POINT_SPRITE, GL_COORD_REPLACE, GL_TRUE)).
As a result, we have a special hack in the VS to ensure when any
texture coordinate is subject to point coordinate replacement, it is
always allocated space in the VUE, even if it isn't written to by the
VS.

This hack isn't needed from Gen6 onwards, since SF (Gen7: SBE)
swizzling has the ability to insert the point coordinate into
gl_TexCoord[] without needing a corresponding unused VUE slot.

Note that no modification of SF setup code is required for this
patch--get_attr_override() already does the right thing.  However, we
make a slight comment change to clarify why this works.

In addition to eliminating unnecessary VS recompiles and saving
precious URB space on Gen6+, this will save us the trouble of having
to adjust this hack when we implement geometry shaders.
---
  src/mesa/drivers/dri/i965/brw_vs.c| 22 --
  src/mesa/drivers/dri/i965/brw_vs.h| 10 ++
  src/mesa/drivers/dri/i965/gen6_sf_state.c | 13 -
  3 files changed, 34 insertions(+), 11 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_vs.c 
b/src/mesa/drivers/dri/i965/brw_vs.c
index 0810471..64659c0 100644
--- a/src/mesa/drivers/dri/i965/brw_vs.c
+++ b/src/mesa/drivers/dri/i965/brw_vs.c
@@ -258,15 +258,17 @@ do_vs_prog(struct brw_context *brw,
c.prog_data.inputs_read |= VERT_BIT_EDGEFLAG;
 }

-   /* Put dummy slots into the VUE for the SF to put the replaced
-* point sprite coords in.  We shouldn't need these dummy slots,
-* which take up precious URB space, but it would mean that the SF
-* doesn't get nice aligned pairs of input coords into output
-* coords, which would be a pain to handle.
-*/
-   for (i = 0; i < 8; i++) {
-  if (c.key.point_coord_replace & (1 << i))
-c.prog_data.outputs_written |= BITFIELD64_BIT(VERT_RESULT_TEX0 + i);
+   if (intel->gen < 6) {
+  /* Put dummy slots into the VUE for the SF to put the replaced
+   * point sprite coords in.  We shouldn't need these dummy slots,
+   * which take up precious URB space, but it would mean that the SF
+   * doesn't get nice aligned pairs of input coords into output
+   * coords, which would be a pain to handle.
+   */
+  for (i = 0; i < 8; i++) {
+ if (c.key.point_coord_replace & (1 << i))
+c.prog_data.outputs_written |= BITFIELD64_BIT(VERT_RESULT_TEX0 + 
i);
+  }


This looks good to me.

I wonder, thought, whether we could just move this into 
compute_vue_map()...just assign_vue_slot() some dummy slots in the gen 
4/5 cases.  Perhaps as a follow-on (if it's possible at all)?


As is,
Reviewed-by: Kenneth Graunke 


 }

 brw_compute_vue_map(brw, &c);
@@ -429,7 +431,7 @@ static void brw_upload_vs_prog(struct brw_context *brw)
 key.clamp_vertex_color = ctx->Light._ClampVertexColor;

 /* _NEW_POINT */
-   if (ctx->Point.PointSprite) {
+   if (intel->gen < 6 && ctx->Point.PointSprite) {
for (i = 0; i < 8; i++) {
 if (ctx->Point.CoordReplace[i])
key.point_coord_replace |= (1 << i);
diff --git a/src/mesa/drivers/dri/i965/brw_vs.h 
b/src/mesa/drivers/dri/i965/brw_vs.h
index 75c8a5f..caa8f7c 100644
--- a/src/mesa/drivers/dri/i965/brw_vs.h
+++ b/src/mesa/drivers/dri/i965/brw_vs.h
@@ -86,7 +86,17 @@ struct brw_vs_prog_key {
 GLuint userclip_planes_enabled_gen_4_5:MAX_CLIP_PLANES;

 GLuint copy_edgeflag:1;
+
+   /**
+* For pre-Gen6 hardware, a bitfield indicating which texture coordinates
+* are going to be replaced with point coordinates (as a consequence of a
+* call to glTexEnvi(GL_POINT_SPRITE, GL_COORD_REPLACE, GL_TRUE)).  Because
+* our SF thread requires exact matching between VS outputs and FS inputs,
+* these texture coordinates will need to be unconditionally included in
+* the VUE, even if they aren't written by the vertex shader.
+*/
 GLuint point_coord_replace:8;
+
 GLuint clamp_vertex_color:1;

 struct brw_sampler_prog_key_data tex;
diff --git a/src/mesa/drivers/dri/i965/gen6_sf_state.c 
b/src/mesa/drivers/dri/i965/gen6_sf_state.c
index d88c49a..11c929c 100644
--- a/src/mesa/drivers/dri/i965/gen6_sf_state.c
+++ b/src/mesa/drivers/dri/i965/gen6_sf_state.c
@@ -78,7 +78,18 @@ get_attr_override(struct brw_vue_map *vue_map, int 
urb_entry_read_offset,

 if (slot == -1) {
/* This attribute does not exist in the VUE--that means that the vertex
-   * shader did not write to it.  Behavior is undefined in this case, so
+   * shader did not write to it.  This means that either:
+   *
+   * (a) This attribute is a texture coordinate, and it is going to be
+   * replaced with point coordinates (as a co

Re: [Mesa-dev] [PATCH] meta: Allocate texture before initializing texture coordinates

2013-02-19 Thread Anuj Phogat

On Fri, Feb 15, 2013 at 11:20 AM, Anuj Phogat  wrote:
> tex->Sright and tex->Ttop are initialized during texture allocation.
> This fixes depth buffer blitting failures in khronos conformance tests
> when run on desktop GL 3.0.
>
> Note: This is a candidate for stable branches.
>
> Signed-off-by: Anuj Phogat 
> ---
>  src/mesa/drivers/common/meta.c |   17 -
>  1 files changed, 8 insertions(+), 9 deletions(-)
>
> diff --git a/src/mesa/drivers/common/meta.c b/src/mesa/drivers/common/meta.c
> index 4e32b50..29a209e 100644
> --- a/src/mesa/drivers/common/meta.c
> +++ b/src/mesa/drivers/common/meta.c
> @@ -1910,6 +1910,14 @@ _mesa_meta_BlitFramebuffer(struct gl_context *ctx,
>GLuint *tmp = malloc(srcW * srcH * sizeof(GLuint));
>
>if (tmp) {
> +
> + newTex = alloc_texture(depthTex, srcW, srcH, GL_DEPTH_COMPONENT);
> + _mesa_ReadPixels(srcX, srcY, srcW, srcH, GL_DEPTH_COMPONENT,
> +  GL_UNSIGNED_INT, tmp);
> + setup_drawpix_texture(ctx, depthTex, newTex, GL_DEPTH_COMPONENT,
> +   srcW, srcH, GL_DEPTH_COMPONENT,
> +   GL_UNSIGNED_INT, tmp);
> +
>   /* texcoords (after texture allocation!) */
>   {
>  verts[0].s = 0.0F;
> @@ -1928,15 +1936,6 @@ _mesa_meta_BlitFramebuffer(struct gl_context *ctx,
>   if (!blit->DepthFP)
>  init_blit_depth_pixels(ctx);
>
> - /* maybe change tex format here */
> - newTex = alloc_texture(depthTex, srcW, srcH, GL_DEPTH_COMPONENT);
> -
> - _mesa_ReadPixels(srcX, srcY, srcW, srcH,
> -  GL_DEPTH_COMPONENT, GL_UNSIGNED_INT, tmp);
> -
> - setup_drawpix_texture(ctx, depthTex, newTex, GL_DEPTH_COMPONENT, 
> srcW, srcH,
> -   GL_DEPTH_COMPONENT, GL_UNSIGNED_INT, tmp);
> -
>   _mesa_BindProgramARB(GL_FRAGMENT_PROGRAM_ARB, blit->DepthFP);
>   _mesa_set_enable(ctx, GL_FRAGMENT_PROGRAM_ARB, GL_TRUE);
>   _mesa_ColorMask(GL_FALSE, GL_FALSE, GL_FALSE, GL_FALSE);
> --
> 1.7.7.6
>

This also fixes https://bugs.freedesktop.org/show_bug.cgi?id=59495
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] i965: Avoid segfault in gen6_upload_state

2013-02-19 Thread Carl Worth

This fixes a bug introduced in commit 258453716f001eab1288d99765213 and
triggered whenever "rb" is NULL.

Fixes bug #59445:

[SNB/IVB/HSW Bisected]Oglc draw-buffers2(advanced.blending.none) 
segfault
https://bugs.freedesktop.org/show_bug.cgi?id=59445
---

I don't know under what conditions "rb" might be NULL, but it's clear that
it's possible and expected as there is earlier code in this function that
checks it, (and sets rb_type specifically in that case). So if someone could
help me write a more descriptive commit message, that would be great.

Also, I notice that similar code in brw_cc.c uses a different condition here:

   if (ctx->DrawBuffer->Visual.alphaBits == 0) {

So an alternate fix could be to switch to something like that. Please let me
know if one version or the other is cleaner, (and both could be made to
match).

 src/mesa/drivers/dri/i965/gen6_cc.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/gen6_cc.c 
b/src/mesa/drivers/dri/i965/gen6_cc.c
index d32f636..7ac5d5f 100644
--- a/src/mesa/drivers/dri/i965/gen6_cc.c
+++ b/src/mesa/drivers/dri/i965/gen6_cc.c
@@ -126,7 +126,7 @@ gen6_upload_blend_state(struct brw_context *brw)
   * not read the alpha channel, but will instead use the correct
   * implicit value for alpha.
   */
- if (!_mesa_base_format_has_channel(rb->_BaseFormat, 
GL_TEXTURE_ALPHA_TYPE))
+ if (rb && !_mesa_base_format_has_channel(rb->_BaseFormat, 
GL_TEXTURE_ALPHA_TYPE))
  {
 srcRGB = brw_fix_xRGB_alpha(srcRGB);
 srcA = brw_fix_xRGB_alpha(srcA);
-- 
1.7.10.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 1/9] glsl: Consolidate ir_expression constructors that use explicit types.

2013-02-19 Thread Matt Turner

From: Kenneth Graunke 

Previously, we had separate constructors for one, two, and four operand
expressions.  This patch consolidates them into a single constructor
which uses NULL default parameters.

The unary and binary operator constructors had assertions to verify that
the caller supplied the correct number of operands for the expression,
but the four-operand version did not.  Since get_num_operands for
ir_quadop_vector returns the number of vector_elements, we can safely
add that without breaking the semantics of ir_quadop_vector.

This also paves the way for expressions with three operands.  Currently,
none can be constructed since get_num_operands() never returns 3.

Reviewed-by: Matt Turner 
---
 src/glsl/ir.cpp |   34 ++
 src/glsl/ir.h   |   13 -
 2 files changed, 10 insertions(+), 37 deletions(-)

diff --git a/src/glsl/ir.cpp b/src/glsl/ir.cpp
index 954995d..4ccdc42 100644
--- a/src/glsl/ir.cpp
+++ b/src/glsl/ir.cpp
@@ -195,34 +195,6 @@ ir_assignment::ir_assignment(ir_rvalue *lhs, ir_rvalue 
*rhs,
this->set_lhs(lhs);
 }
 
-
-ir_expression::ir_expression(int op, const struct glsl_type *type,
-ir_rvalue *op0)
-{
-   assert(get_num_operands(ir_expression_operation(op)) == 1);
-   this->ir_type = ir_type_expression;
-   this->type = type;
-   this->operation = ir_expression_operation(op);
-   this->operands[0] = op0;
-   this->operands[1] = NULL;
-   this->operands[2] = NULL;
-   this->operands[3] = NULL;
-}
-
-ir_expression::ir_expression(int op, const struct glsl_type *type,
-ir_rvalue *op0, ir_rvalue *op1)
-{
-   assert(((op1 == NULL) && (get_num_operands(ir_expression_operation(op)) == 
1))
- || (get_num_operands(ir_expression_operation(op)) == 2));
-   this->ir_type = ir_type_expression;
-   this->type = type;
-   this->operation = ir_expression_operation(op);
-   this->operands[0] = op0;
-   this->operands[1] = op1;
-   this->operands[2] = NULL;
-   this->operands[3] = NULL;
-}
-
 ir_expression::ir_expression(int op, const struct glsl_type *type,
 ir_rvalue *op0, ir_rvalue *op1,
 ir_rvalue *op2, ir_rvalue *op3)
@@ -234,6 +206,12 @@ ir_expression::ir_expression(int op, const struct 
glsl_type *type,
this->operands[1] = op1;
this->operands[2] = op2;
this->operands[3] = op3;
+#ifndef NDEBUG
+   int num_operands = get_num_operands(this->operation);
+   for (int i = num_operands; i < 4; i++) {
+  assert(this->operands[i] == NULL);
+   }
+#endif
 }
 
 ir_expression::ir_expression(int op, ir_rvalue *op0)
diff --git a/src/glsl/ir.h b/src/glsl/ir.h
index 1e09988..d878bd8 100644
--- a/src/glsl/ir.h
+++ b/src/glsl/ir.h
@@ -1128,25 +1128,20 @@ enum ir_expression_operation {
 
 class ir_expression : public ir_rvalue {
 public:
+   ir_expression(int op, const struct glsl_type *type,
+ ir_rvalue *op0, ir_rvalue *op1 = NULL,
+ ir_rvalue *op2 = NULL, ir_rvalue *op3 = NULL);
+
/**
 * Constructor for unary operation expressions
 */
-   ir_expression(int op, const struct glsl_type *type, ir_rvalue *);
ir_expression(int op, ir_rvalue *);
 
/**
 * Constructor for binary operation expressions
 */
-   ir_expression(int op, const struct glsl_type *type,
-ir_rvalue *, ir_rvalue *);
ir_expression(int op, ir_rvalue *op0, ir_rvalue *op1);
 
-   /**
-* Constructor for quad operator expressions
-*/
-   ir_expression(int op, const struct glsl_type *type,
-ir_rvalue *, ir_rvalue *, ir_rvalue *, ir_rvalue *);
-
virtual ir_expression *as_expression()
{
   return this;
-- 
1.7.8.6

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 2/9] glsl: Rework ir_reader to handle expressions with three operands.

2013-02-19 Thread Matt Turner

From: Kenneth Graunke 

Reviewed-by: Matt Turner 
---
 src/glsl/ir_reader.cpp |   45 +++--
 1 files changed, 19 insertions(+), 26 deletions(-)

diff --git a/src/glsl/ir_reader.cpp b/src/glsl/ir_reader.cpp
index 405e75b..4dec4e8 100644
--- a/src/glsl/ir_reader.cpp
+++ b/src/glsl/ir_reader.cpp
@@ -676,15 +676,16 @@ ir_reader::read_expression(s_expression *expr)
 {
s_expression *s_type;
s_symbol *s_op;
-   s_expression *s_arg1;
+   s_expression *s_arg[3];
 
-   s_pattern pat[] = { "expression", s_type, s_op, s_arg1 };
+   s_pattern pat[] = { "expression", s_type, s_op, s_arg[0] };
if (!PARTIAL_MATCH(expr, pat)) {
   ir_read_error(expr, "expected (expression   "
  " [])");
   return NULL;
}
-   s_expression *s_arg2 = (s_expression *) s_arg1->next; // may be tail 
sentinel
+   s_arg[1] = (s_expression *) s_arg[0]->next; // may be tail sentinel
+   s_arg[2] = (s_expression *) s_arg[1]->next; // may be tail sentinel or NULL
 
const glsl_type *type = read_type(s_type);
if (type == NULL)
@@ -697,35 +698,27 @@ ir_reader::read_expression(s_expression *expr)
   return NULL;
}
 
-   unsigned num_operands = ir_expression::get_num_operands(op);
-   if (num_operands == 1 && !s_arg1->next->is_tail_sentinel()) {
-  ir_read_error(expr, "expected (expression  %s )",
-   s_op->value());
+   int num_operands = -3; /* skip "expression"   */
+   foreach_list(n, &((s_list *) expr)->subexpressions)
+  ++num_operands;
+
+   int expected_operands = ir_expression::get_num_operands(op);
+   if (num_operands != expected_operands) {
+  ir_read_error(expr, "found %d expression operands, expected %d",
+num_operands, expected_operands);
   return NULL;
}
 
-   ir_rvalue *arg1 = read_rvalue(s_arg1);
-   ir_rvalue *arg2 = NULL;
-   if (arg1 == NULL) {
-  ir_read_error(NULL, "when reading first operand of %s", s_op->value());
-  return NULL;
-   }
-
-   if (num_operands == 2) {
-  if (s_arg2->is_tail_sentinel() || !s_arg2->next->is_tail_sentinel()) {
-ir_read_error(expr, "expected (expression  %s  "
-")", s_op->value());
-return NULL;
-  }
-  arg2 = read_rvalue(s_arg2);
-  if (arg2 == NULL) {
-ir_read_error(NULL, "when reading second operand of %s",
-  s_op->value());
-return NULL;
+   ir_rvalue *arg[3] = {NULL, NULL, NULL};
+   for (int i = 0; i < num_operands; i++) {
+  arg[i] = read_rvalue(s_arg[i]);
+  if (arg[i] == NULL) {
+ ir_read_error(NULL, "when reading operand #%d of %s", i, 
s_op->value());
+ return NULL;
   }
}
 
-   return new(mem_ctx) ir_expression(op, type, arg1, arg2);
+   return new(mem_ctx) ir_expression(op, type, arg[0], arg[1], arg[2]);
 }
 
 ir_swizzle *
-- 
1.7.8.6

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 3/9] glsl: Convert mix() to use a new ir_triop_lrp opcode.

2013-02-19 Thread Matt Turner

From: Kenneth Graunke 

Many GPUs have an instruction to do linear interpolation which is more
efficient than simply performing the algebra necessary (two multiplies,
an add, and a subtract).

Pattern matching or peepholing this is more desirable, but can be
tricky.  By using an opcode, we can at least make shaders which use the
mix() built-in get the more efficient behavior.

Currently, all consumers lower ir_triop_lrp.  Subsequent patches will
actually generate different code.

v2 [mattst88]:
   - Add LRP_TO_ARITH flag to ir_to_mesa.cpp. Will be removed in a
 subsequent patch and ir_triop_lrp translated directly.

Reviewed-by: Matt Turner 
---
 src/glsl/builtins/ir/mix.ir|   14 +-
 src/glsl/ir.cpp|4 +++
 src/glsl/ir.h  |7 +
 src/glsl/ir_constant_expression.cpp|   13 ++
 src/glsl/ir_optimization.h |1 +
 src/glsl/ir_validate.cpp   |6 
 src/glsl/lower_instructions.cpp|   35 
 src/mesa/drivers/dri/i965/brw_shader.cpp   |3 +-
 src/mesa/program/ir_to_mesa.cpp|6 -
 src/mesa/state_tracker/st_glsl_to_tgsi.cpp |1 +
 10 files changed, 81 insertions(+), 9 deletions(-)

diff --git a/src/glsl/builtins/ir/mix.ir b/src/glsl/builtins/ir/mix.ir
index 70ae13c..e666532 100644
--- a/src/glsl/builtins/ir/mix.ir
+++ b/src/glsl/builtins/ir/mix.ir
@@ -4,49 +4,49 @@
(declare (in) float arg0)
(declare (in) float arg1)
(declare (in) float arg2))
- ((return (expression float + (expression float * (var_ref arg0) 
(expression float - (constant float (1.00)) (var_ref arg2))) (expression 
float * (var_ref arg1) (var_ref arg2))
+ ((return (expression float lrp (var_ref arg0) (var_ref arg1) (var_ref 
arg2)
 
(signature vec2
  (parameters
(declare (in) vec2 arg0)
(declare (in) vec2 arg1)
(declare (in) vec2 arg2))
- ((return (expression vec2 + (expression vec2 * (var_ref arg0) (expression 
vec2 - (constant float (1.00)) (var_ref arg2))) (expression vec2 * (var_ref 
arg1) (var_ref arg2))
+ ((return (expression vec2 lrp (var_ref arg0) (var_ref arg1) (var_ref 
arg2)
 
(signature vec3
  (parameters
(declare (in) vec3 arg0)
(declare (in) vec3 arg1)
(declare (in) vec3 arg2))
- ((return (expression vec3 + (expression vec3 * (var_ref arg0) (expression 
vec3 - (constant float (1.00)) (var_ref arg2))) (expression vec3 * (var_ref 
arg1) (var_ref arg2))
+ ((return (expression vec3 lrp (var_ref arg0) (var_ref arg1) (var_ref 
arg2)
 
(signature vec4
  (parameters
(declare (in) vec4 arg0)
(declare (in) vec4 arg1)
(declare (in) vec4 arg2))
- ((return (expression vec4 + (expression vec4 * (var_ref arg0) (expression 
vec4 - (constant float (1.00)) (var_ref arg2))) (expression vec4 * (var_ref 
arg1) (var_ref arg2))
+ ((return (expression vec4 lrp (var_ref arg0) (var_ref arg1) (var_ref 
arg2)
 
(signature vec2
  (parameters
(declare (in) vec2 arg0)
(declare (in) vec2 arg1)
(declare (in) float arg2))
- ((return (expression vec2 + (expression vec2 * (var_ref arg0) (expression 
float - (constant float (1.00)) (var_ref arg2))) (expression vec2 * 
(var_ref arg1) (var_ref arg2))
+ ((return (expression vec2 lrp (var_ref arg0) (var_ref arg1) (var_ref 
arg2)
 
(signature vec3
  (parameters
(declare (in) vec3 arg0)
(declare (in) vec3 arg1)
(declare (in) float arg2))
- ((return (expression vec3 + (expression vec3 * (var_ref arg0) (expression 
float - (constant float (1.00)) (var_ref arg2))) (expression vec3 * 
(var_ref arg1) (var_ref arg2))
+ ((return (expression vec3 lrp (var_ref arg0) (var_ref arg1) (var_ref 
arg2)
 
(signature vec4
  (parameters
(declare (in) vec4 arg0)
(declare (in) vec4 arg1)
(declare (in) float arg2))
- ((return (expression vec4 + (expression vec4 * (var_ref arg0) (expression 
float - (constant float (1.00)) (var_ref arg2))) (expression vec4 * 
(var_ref arg1) (var_ref arg2))
+ ((return (expression vec4 lrp (var_ref arg0) (var_ref arg1) (var_ref 
arg2)
 
(signature float
  (parameters
diff --git a/src/glsl/ir.cpp b/src/glsl/ir.cpp
index 4ccdc42..717d6f6 100644
--- a/src/glsl/ir.cpp
+++ b/src/glsl/ir.cpp
@@ -416,6 +416,9 @@ ir_expression::get_num_operands(ir_expression_operation op)
if (op <= ir_last_binop)
   return 2;
 
+   if (op <= ir_last_triop)
+  return 3;
+
if (op == ir_quadop_vector)
   return 4;
 
@@ -502,6 +505,7 @@ static const char *const operator_strs[] = {
"pow",
"packHalf2x16_split",
"ubo_load",
+   "lrp",
"vector",
 };
 
diff --git a/src/glsl/ir.h b/src/glsl/ir.h
index d878bd8..d63dac1 100644
--- a/src/glsl/ir.h
+++ b/src

[Mesa-dev] [PATCH 4/9] glsl: Optimize ir_triop_lrp(x, y, a) with a = 0.0f or 1.0f

2013-02-19 Thread Matt Turner

---
 src/glsl/opt_algebraic.cpp |   16 +---
 1 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/src/glsl/opt_algebraic.cpp b/src/glsl/opt_algebraic.cpp
index 75948db..952941e 100644
--- a/src/glsl/opt_algebraic.cpp
+++ b/src/glsl/opt_algebraic.cpp
@@ -186,12 +186,12 @@ ir_algebraic_visitor::swizzle_if_required(ir_expression 
*expr,
 ir_rvalue *
 ir_algebraic_visitor::handle_expression(ir_expression *ir)
 {
-   ir_constant *op_const[2] = {NULL, NULL};
-   ir_expression *op_expr[2] = {NULL, NULL};
+   ir_constant *op_const[3] = {NULL, NULL, NULL};
+   ir_expression *op_expr[3] = {NULL, NULL, NULL};
ir_expression *temp;
unsigned int i;
 
-   assert(ir->get_num_operands() <= 2);
+   assert(ir->get_num_operands() <= 3);
for (i = 0; i < ir->get_num_operands(); i++) {
   if (ir->operands[i]->type->is_matrix())
 return ir;
@@ -415,6 +415,16 @@ ir_algebraic_visitor::handle_expression(ir_expression *ir)
 
   break;
 
+   case ir_triop_lrp:
+ if (is_vec_zero(op_const[2])) {
+this->progress = true;
+return swizzle_if_required(ir, ir->operands[0]);
+ } else if (is_vec_one(op_const[2])) {
+this->progress = true;
+return swizzle_if_required(ir, ir->operands[1]);
+ }
+ break;
+
default:
   break;
}
-- 
1.7.8.6

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 5/9] i965: Add support for emitting the LRP instruction.

2013-02-19 Thread Matt Turner

From: Kenneth Graunke 

Like MAD, this is another three-source instruction.

Reviewed-by: Matt Turner 
---
 src/mesa/drivers/dri/i965/brw_defines.h |1 +
 src/mesa/drivers/dri/i965/brw_disasm.c  |1 +
 src/mesa/drivers/dri/i965/brw_eu.h  |1 +
 src/mesa/drivers/dri/i965/brw_eu_emit.c |1 +
 4 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_defines.h 
b/src/mesa/drivers/dri/i965/brw_defines.h
index 79cc12f..d0794c8 100644
--- a/src/mesa/drivers/dri/i965/brw_defines.h
+++ b/src/mesa/drivers/dri/i965/brw_defines.h
@@ -685,6 +685,7 @@ enum opcode {
BRW_OPCODE_LINE =   89,
BRW_OPCODE_PLN =90,
BRW_OPCODE_MAD =91,
+   BRW_OPCODE_LRP =92,
BRW_OPCODE_NOP =126,
 
/* These are compiler backend opcodes that get translated into other
diff --git a/src/mesa/drivers/dri/i965/brw_disasm.c 
b/src/mesa/drivers/dri/i965/brw_disasm.c
index 50551f4..8736764 100644
--- a/src/mesa/drivers/dri/i965/brw_disasm.c
+++ b/src/mesa/drivers/dri/i965/brw_disasm.c
@@ -50,6 +50,7 @@ const struct opcode_desc opcode_descs[128] = {
 [BRW_OPCODE_LINE] = { .name = "line", .nsrc = 2, .ndst = 1 },
 [BRW_OPCODE_PLN] = { .name = "pln", .nsrc = 2, .ndst = 1 },
 [BRW_OPCODE_MAD] = { .name = "mad", .nsrc = 3, .ndst = 1 },
+[BRW_OPCODE_LRP] = { .name = "lrp", .nsrc = 3, .ndst = 1 },
 [BRW_OPCODE_SAD2] = { .name = "sad2", .nsrc = 2, .ndst = 1 },
 [BRW_OPCODE_SADA2] = { .name = "sada2", .nsrc = 2, .ndst = 1 },
 [BRW_OPCODE_DP4] = { .name = "dp4", .nsrc = 2, .ndst = 1 },
diff --git a/src/mesa/drivers/dri/i965/brw_eu.h 
b/src/mesa/drivers/dri/i965/brw_eu.h
index adb3c4d..b6e2bee 100644
--- a/src/mesa/drivers/dri/i965/brw_eu.h
+++ b/src/mesa/drivers/dri/i965/brw_eu.h
@@ -174,6 +174,7 @@ ALU2(DP3)
 ALU2(DP2)
 ALU2(LINE)
 ALU2(PLN)
+ALU3(LRP)
 ALU3(MAD)
 
 ROUND(RNDZ)
diff --git a/src/mesa/drivers/dri/i965/brw_eu_emit.c 
b/src/mesa/drivers/dri/i965/brw_eu_emit.c
index f2dcbeb..8cdbb21 100644
--- a/src/mesa/drivers/dri/i965/brw_eu_emit.c
+++ b/src/mesa/drivers/dri/i965/brw_eu_emit.c
@@ -924,6 +924,7 @@ ALU2(DP2)
 ALU2(LINE)
 ALU2(PLN)
 ALU3(MAD)
+ALU3(LRP)
 
 ROUND(RNDZ)
 ROUND(RNDE)
-- 
1.7.8.6

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 6/9] i965/fs: Use the LRP instruction for ir_triop_lrp when possible.

2013-02-19 Thread Matt Turner

From: Kenneth Graunke 

v2 [mattst88]:
   - Add BRW_OPCODE_LRP to list of CSE-able expressions.
   - Fix op_var[] array size.
   - Rename arguments to emit_lrp to (x, y, a) to clear confusion.
   - Add LRP function to brw_fs.cpp/.h.
   - Corrected comment about LRP instruction arguments in emit_lrp.

Reviewed-by: Matt Turner 
---
 src/mesa/drivers/dri/i965/brw_fs.cpp   |8 
 src/mesa/drivers/dri/i965/brw_fs.h |2 +
 .../dri/i965/brw_fs_channel_expressions.cpp|   16 -
 src/mesa/drivers/dri/i965/brw_fs_cse.cpp   |1 +
 src/mesa/drivers/dri/i965/brw_fs_emit.cpp  |   15 +++--
 src/mesa/drivers/dri/i965/brw_fs_visitor.cpp   |   35 ++--
 src/mesa/drivers/dri/i965/brw_shader.cpp   |2 +-
 7 files changed, 71 insertions(+), 8 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index c1ccd92..bdb6616 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -146,6 +146,13 @@ fs_inst::fs_inst(enum opcode opcode, fs_reg dst,
   return new(mem_ctx) fs_inst(BRW_OPCODE_##op, dst, src0, src1);\
}
 
+#define ALU3(op)\
+   fs_inst *\
+   fs_visitor::op(fs_reg dst, fs_reg src0, fs_reg src1, fs_reg src2)\
+   {\
+  return new(mem_ctx) fs_inst(BRW_OPCODE_##op, dst, src0, src1, src2);\
+   }
+
 ALU1(NOT)
 ALU1(MOV)
 ALU1(FRC)
@@ -161,6 +168,7 @@ ALU2(XOR)
 ALU2(SHL)
 ALU2(SHR)
 ALU2(ASR)
+ALU3(LRP)
 
 /** Gen4 predicated IF. */
 fs_inst *
diff --git a/src/mesa/drivers/dri/i965/brw_fs.h 
b/src/mesa/drivers/dri/i965/brw_fs.h
index d5ebd51..9c1b359 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.h
+++ b/src/mesa/drivers/dri/i965/brw_fs.h
@@ -285,6 +285,7 @@ public:
fs_inst *IF(fs_reg src0, fs_reg src1, uint32_t condition);
fs_inst *CMP(fs_reg dst, fs_reg src0, fs_reg src1,
 uint32_t condition);
+   fs_inst *LRP(fs_reg dst, fs_reg a, fs_reg y, fs_reg x);
fs_inst *DEP_RESOLVE_MOV(int grf);
 
int type_size(const struct glsl_type *type);
@@ -360,6 +361,7 @@ public:
fs_reg fix_math_operand(fs_reg src);
fs_inst *emit_math(enum opcode op, fs_reg dst, fs_reg src0);
fs_inst *emit_math(enum opcode op, fs_reg dst, fs_reg src0, fs_reg src1);
+   void emit_lrp(fs_reg dst, fs_reg x, fs_reg y, fs_reg a);
void emit_minmax(uint32_t conditionalmod, fs_reg dst,
 fs_reg src0, fs_reg src1);
bool try_emit_saturate(ir_expression *ir);
diff --git a/src/mesa/drivers/dri/i965/brw_fs_channel_expressions.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_channel_expressions.cpp
index ea06225..30d8d9b 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_channel_expressions.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_channel_expressions.cpp
@@ -135,7 +135,7 @@ ir_channel_expressions_visitor::visit_leave(ir_assignment 
*ir)
ir_expression *expr = ir->rhs->as_expression();
bool found_vector = false;
unsigned int i, vector_elements = 1;
-   ir_variable *op_var[2];
+   ir_variable *op_var[3];
 
if (!expr)
   return visit_continue;
@@ -342,6 +342,20 @@ ir_channel_expressions_visitor::visit_leave(ir_assignment 
*ir)
   assert(!"not yet supported");
   break;
 
+   case ir_triop_lrp:
+  for (i = 0; i < vector_elements; i++) {
+ir_rvalue *op0 = get_element(op_var[0], i);
+ir_rvalue *op1 = get_element(op_var[1], i);
+ir_rvalue *op2 = get_element(op_var[2], i);
+
+assign(ir, i, new(mem_ctx) ir_expression(expr->operation,
+ element_type,
+ op0,
+ op1,
+ op2));
+  }
+  break;
+
case ir_unop_pack_snorm_2x16:
case ir_unop_pack_snorm_4x8:
case ir_unop_pack_unorm_2x16:
diff --git a/src/mesa/drivers/dri/i965/brw_fs_cse.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_cse.cpp
index 70c143a..0b74d2e 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_cse.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_cse.cpp
@@ -66,6 +66,7 @@ is_expression(const fs_inst *const inst)
case BRW_OPCODE_LINE:
case BRW_OPCODE_PLN:
case BRW_OPCODE_MAD:
+   case BRW_OPCODE_LRP:
case FS_OPCODE_CINTERP:
case FS_OPCODE_LINTERP:
   return true;
diff --git a/src/mesa/drivers/dri/i965/brw_fs_emit.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_emit.cpp
index 3d1f3b3..38d6332 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_emit.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_emit.cpp
@@ -1082,18 +1082,27 @@ fs_generator::generate_code(exec_list *instructions)
 break;
 
   case BRW_OPCODE_MAD:
+  case BRW_OPCODE_LRP: {
+ struct brw_instruction *(*brw_inst)(struct brw_compile *p,
+

[Mesa-dev] [PATCH 7/9] i965/fp: Use the LRP instruction for OPCODE_LRP.

2013-02-19 Thread Matt Turner

---
 src/mesa/drivers/dri/i965/brw_fs_fp.cpp |   12 
 1 files changed, 4 insertions(+), 8 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs_fp.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_fp.cpp
index 5f5f6a9..50e63da 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_fp.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_fp.cpp
@@ -316,14 +316,10 @@ fs_visitor::emit_fragment_program_code()
   case OPCODE_LRP:
  for (int i = 0; i < 4; i++) {
 if (fpi->DstReg.WriteMask & (1 << i)) {
-   fs_reg neg_src0 = regoffset(src[0], i);
-   neg_src0.negate = !neg_src0.negate;
-   fs_reg temp = fs_reg(this, glsl_type::float_type);
-   fs_reg temp2 = fs_reg(this, glsl_type::float_type);
-   emit(ADD(temp, neg_src0, fs_reg(1.0f)));
-   emit(MUL(temp, temp, regoffset(src[2], i)));
-   emit(MUL(temp2, regoffset(src[0], i), regoffset(src[1], i)));
-   emit(ADD(regoffset(dst, i), temp, temp2));
+   fs_reg a = regoffset(src[0], i);
+   fs_reg y = regoffset(src[1], i);
+   fs_reg x = regoffset(src[2], i);
+   emit_lrp(regoffset(dst, i), x, y, a);
 }
  }
  break;
-- 
1.7.8.6

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 8/9] ir_to_mesa: Translate ir_triop_lrp to OPCODE_LRP.

2013-02-19 Thread Matt Turner

---
 src/mesa/program/ir_to_mesa.cpp |7 +--
 1 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/src/mesa/program/ir_to_mesa.cpp b/src/mesa/program/ir_to_mesa.cpp
index 30305d2..5432323 100644
--- a/src/mesa/program/ir_to_mesa.cpp
+++ b/src/mesa/program/ir_to_mesa.cpp
@@ -1479,7 +1479,10 @@ ir_to_mesa_visitor::visit(ir_expression *ir)
   break;
 
case ir_triop_lrp:
-  assert(!"ir_triop_lrp should have been lowered.");
+  /* ir_triop_lrp operands are (x, y, a) while
+   * OPCODE_LRP operands are (a, y, x) to match ARB_fragment_program.
+   */
+  emit(ir, OPCODE_LRP, result_dst, op[2], op[1], op[0]);
   break;
 
case ir_quadop_vector:
@@ -2997,7 +3000,7 @@ _mesa_ir_link_shader(struct gl_context *ctx, struct 
gl_shader_program *prog)
 /* Lowering */
 do_mat_op_to_vec(ir);
 lower_instructions(ir, (MOD_TO_FRACT | DIV_TO_MUL_RCP | EXP_TO_EXP2
-| LOG_TO_LOG2 | INT_DIV_TO_MUL_RCP | 
LRP_TO_ARITH
+| LOG_TO_LOG2 | INT_DIV_TO_MUL_RCP
 | ((options->EmitNoPow) ? POW_TO_EXP2 : 0)));
 
 progress = do_lower_jumps(ir, true, true, options->EmitNoMainReturn, 
options->EmitNoCont, options->EmitNoLoops) || progress;
-- 
1.7.8.6

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 0/9] LRP

2013-02-19 Thread Matt Turner

This series adds ir_triop_lrp to the IR. A few patches clear the way
since it is the first 3-operand operator.

The next patches
 - emit lrp from GLSL's mix() function;
 - optimize away the a = 0.0 and 1.0 cases;
 - add i965 support for emitting the LRP instruction in fragment
shaders and fragment programs;
 - and directly translate ir_triop_lrp to OPCODE_LRP for IR-to-Mesa.

>From Eric's shader-db:

total instructions in shared programs: 1458134 -> 1450661 (-0.51%)
instructions in affected programs: 224094 -> 216621 (-3.33%)

There are some small increases, typically +2 or +4 instructions, in
shader-db. I'll investigate further.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 9/9] i965/vs: Assert that ir_triop_lrp was lowered.

2013-02-19 Thread Matt Turner

---
 src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp |4 
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
index ae4cf7d..a2bc9f5 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
@@ -1585,6 +1585,10 @@ vec4_visitor::visit(ir_expression *ir)
   break;
}
 
+   case ir_triop_lrp:
+  assert(!"not reached: should be handled by lrp_to_arith");
+  break;
+
case ir_quadop_vector:
   assert(!"not reached: should be handled by lower_quadop_vector");
   break;
-- 
1.7.8.6

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 61149] New: Crash on Intel Sandybridge Mobile with Vertex Buffer Objects and select mode OpenGL rendering

2013-02-19 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=61149

  Priority: medium
Bug ID: 61149
  Assignee: mesa-dev@lists.freedesktop.org
   Summary: Crash on Intel Sandybridge Mobile with Vertex Buffer
Objects and select mode OpenGL rendering
  Severity: normal
Classification: Unclassified
OS: All
  Reporter: kal...@gmail.com
  Hardware: Other
Status: NEW
   Version: 9.0
 Component: Mesa core
   Product: Mesa

Hi, this is an old bug we are encountering in blender.

My system is Ubuntu 12.10, Intel® Core™ i5-2410M CPU @ 2.30GHz × 4, 4GB ram,
Hybrid NVIDIA GT 540M - Intel(R) Sandybridge Mobile

To reproduce, 

* start blender
* enable VBO in user preferences (Ctrl-Alt-U) under the system tab.
* try selecting any object on 3D viewport (right clicking on the default Cube
for instance)

My guess is that it has to do with select mode rendering and vertex buffer
object combination. I would do a trunk compile but I am using optimus on my
system and I fear I may destabilize something. If it is safe and would help, I
could attempt it.

The backtrace is:

#0  0x7fffe26e4a3f in run_vp (ctx=, stage=)
at ../../../../../src/mesa/tnl/t_vb_program.c:389
#1  0x7fffe26e192d in _tnl_run_pipeline (ctx=0x57078e0)
at ../../../../../src/mesa/tnl/t_pipeline.c:163
#2  0x7fffe26e1f26 in _tnl_draw_prims (ctx=ctx@entry=0x57078e0, 
arrays=arrays@entry=0x7ffd5760, prim=prim@entry=0x7ffd63e0, 
nr_prims=nr_prims@entry=1, ib=ib@entry=0x7ffd5740, 
min_index=, max_index=0)
at ../../../../../src/mesa/tnl/t_draw.c:524
#3  0x7fffe26e2a8f in _tnl_vbo_draw_prims (ctx=ctx@entry=0x57078e0, 
prim=prim@entry=0x7ffd63e0, nr_prims=nr_prims@entry=1, 
ib=ib@entry=0x7ffd5740, 
index_bounds_valid=index_bounds_valid@entry=1 '\001', 
min_index=min_index@entry=0, max_index=0, 
tfb_vertcount=tfb_vertcount@entry=0x0)
at ../../../../../src/mesa/tnl/t_draw.c:424
#4  0x7fffe26d3301 in vbo_rebase_prims (ctx=ctx@entry=0x57078e0, 
arrays=arrays@entry=0x58572f8, prim=prim@entry=0x7ffd63e0, 
nr_prims=nr_prims@entry=1, ib=0x7ffd5740, ib@entry=0x7ffd63c0, 
min_index=min_index@entry=4294967295, 
max_index=max_index@entry=4294967295, 
draw=draw@entry=0x7fffe26e2a20 <_tnl_vbo_draw_prims>)
at ../../../../../src/mesa/vbo/vbo_rebase.c:233
---Type  to continue, or q  to quit---
#5  0x7fffe26e1b09 in _tnl_draw_prims (ctx=ctx@entry=0x57078e0, 
arrays=arrays@entry=0x58572f8, prim=prim@entry=0x7ffd63e0, 
nr_prims=nr_prims@entry=1, ib=ib@entry=0x7ffd63c0, 
min_index=4294967295, max_index=4294967295)
at ../../../../../src/mesa/tnl/t_draw.c:467
#6  0x7fffe2b70ac3 in brw_draw_prims (ctx=0x57078e0, prim=0x7ffd63e0, 
nr_prims=1, ib=0x7ffd63c0, index_bounds_valid=, 
min_index=4294967295, max_index=4294967295, tfb_vertcount=0x0)
at brw_draw.c:581
#7  0x7fffe26cf3aa in vbo_handle_primitive_restart (
ctx=ctx@entry=0x57078e0, prim=prim@entry=0x7ffd63e0, 
nr_prims=nr_prims@entry=1, ib=ib@entry=0x7ffd63c0, 
index_bounds_valid=index_bounds_valid@entry=0 '\000', 
min_index=min_index@entry=4294967295, max_index=max_index@entry=4294967295)
at ../../../../../src/mesa/vbo/vbo_exec_array.c:570
#8  0x7fffe26d0384 in vbo_validated_drawrangeelements (
ctx=ctx@entry=0x57078e0, mode=mode@entry=1, 
index_bounds_valid=index_bounds_valid@entry=0 '\000', 
start=start@entry=4294967295, end=end@entry=4294967295, 
count=count@entry=24, type=type@entry=5125, indices=indices@entry=0x0, 
basevertex=basevertex@entry=0, numInstances=numInstances@entry=1, 
baseInstance=baseInstance@entry=0)
at ../../../../../src/mesa/vbo/vbo_exec_array.c:867
---Type  to continue, or q  to quit---
#9  0x7fffe26d06f4 in vbo_exec_DrawElements (mode=1, count=24, type=5125, 
indices=0x0) at ../../../../../src/mesa/vbo/vbo_exec_array.c:997
#10 0x010ed899 in cdDM_drawEdges ()
#11 0x00d24cf1 in draw_mesh_object_outline.isra.5 ()
#12 0x00d2e18e in draw_mesh_object ()
#13 0x00d31d4e in draw_object ()
#14 0x00d21e80 in view3d_opengl_select ()
#15 0x00d1720c in mixed_bones_object_selectbuffer ()
#16 0x00d1a180 in mouse_select ()
#17 0x00d1aa48 in view3d_select_invoke ()
#18 0x00c65c52 in wm_operator_invoke ()
#19 0x00c66cae in wm_handler_operator_call ()
#20 0x00c66f6d in wm_handlers_do_intern ()
#21 0x00c676b6 in wm_handlers_do ()
#22 0x00c67b4b in wm_event_do_handlers ()
#23 0x00c60c88 in WM_main ()
#24 0x00c23d65 in main ()

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 61149] Crash on Intel Sandybridge Mobile with Vertex Buffer Objects and select mode OpenGL rendering

2013-02-19 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=61149

--- Comment #1 from Antony Riakiotakis  ---
Attaching full backtrace. Some information that might prove useful, we are
using glDrawElements as a draw call.


#0  0x7fffe2900a3f in run_vp (ctx=, stage=)
at ../../../../../src/mesa/tnl/t_vb_program.c:389
ptr = 0x8000f7fb0ff4 
size = 3
stride = 12
data = 0x8000f7fb0ff4
attr = 
tnl = 0x37e0450
store = 0x3922b80
VB = 0x37e0bc8
program = 
machine = 0x3923250
outputs = {0, 0, 1, 0 , 3797438540, 32767, 0, 0, 
  3801491299, 32767, 43, 6, 0, 0, 43, 0, 57137600, 0, 15, 0, 57129920, 
  0, 58425864, 0, 0}
numOutputs = 1
i = 
j = 
__PRETTY_FUNCTION__ = "run_vp"
#1  0x7fffe28fd92d in _tnl_run_pipeline (ctx=0x367bbc0)
at ../../../../../src/mesa/tnl/t_pipeline.c:163
s = 0x37e0710
tnl = 0x37e0450
i = 
#2  0x7fffe28fdf26 in _tnl_draw_prims (ctx=ctx@entry=0x367bbc0, 
arrays=arrays@entry=0x7ffd57f0, prim=prim@entry=0x7ffd6470, 
nr_prims=nr_prims@entry=1, ib=ib@entry=0x7ffd57d0, 
min_index=, max_index=0)
at ../../../../../src/mesa/tnl/t_draw.c:524
this_nr_prims = 
bo = {0x4a27050, 0x0 , 0x7271f328 , 
  0x0, 0x4c26f20, 0x3739f40, 0x373a028, 0x0, 0x7fffe258504c, 0x0, 0x0, 
  0x1001e, 0x7fff}
nr_bo = 1
inst = 
tnl = 0x37e0450
max = 
max_basevertex = 
i = 
#3  0x7fffe28fea8f in _tnl_vbo_draw_prims (ctx=ctx@entry=0x367bbc0, 
prim=prim@entry=0x7ffd6470, nr_prims=nr_prims@entry=1, 
ib=ib@entry=0x7ffd57d0, 
index_bounds_valid=index_bounds_valid@entry=1 '\001', 
min_index=min_index@entry=0, max_index=0, 
tfb_vertcount=tfb_vertcount@entry=0x0)
at ../../../../../src/mesa/tnl/t_draw.c:424
arrays = 0x7ffd57f0
#4  0x7fffe28ef301 in vbo_rebase_prims (ctx=ctx@entry=0x367bbc0, 
arrays=arrays@entry=0x37cd728, prim=prim@entry=0x7ffd6470, 
nr_prims=nr_prims@entry=1, ib=0x7ffd57d0, ib@entry=0x7ffd6450, 
min_index=min_index@entry=4294967295, 
max_index=max_index@entry=4294967295, 
draw=draw@entry=0x7fffe28fea20 <_tnl_vbo_draw_prims>)
at ../../../../../src/mesa/vbo/vbo_rebase.c:233
tmp_arrays = {{Size = 3, Type = 5126, Format = 6408, Stride = 0, 
StrideB = 12, Ptr = 0xfff4 , 
Enabled = 1 '\001', Normalized = 0 '\000', Integer = 0 '\000', 
InstanceDivisor = 0, _ElementSize = 12, BufferObj = 0x4a27050, 
_MaxElement = 36}, {Size = 4, Type = 5126, Format = 6408, 
Stride = 0, StrideB = 0, Ptr = 0x367d1e0 "", Enabled = 1 '\001', 
Normalized = 0 '\000', Integer = 0 '\000', InstanceDivisor = 0, 
_ElementSize = 16, BufferObj = 0x374ac40, _MaxElement = 0}, {
Size = 3, Type = 5126, Format = 6408, Stride = 0, StrideB = 0, 
Ptr = 0x367d1f0 "", Enabled = 1 '\001', Normalized = 0 '\000', 
Integer = 0 '\000', InstanceDivisor = 0, _ElementSize = 12, 
BufferObj = 0x374ac40, _MaxElement = 0}, {Size = 4, Type = 5126, 
Format = 6408, Stride = 0, StrideB = 0, Ptr = 0x367d200 "", 
Enabled = 1 '\001', Normalized = 0 '\000', Integer = 0 '\000', 
InstanceDivisor = 0, _ElementSize = 16, BufferObj = 0x374ac40, 
_MaxElement = 0}, {Size = 1, Type = 5126, Format = 6408, 
Stride = 0, StrideB = 0, Ptr = 0x367d210 "", Enabled = 1 '\001', 
Normalized = 0 '\000', Integer = 0 '\000', InstanceDivisor = 0, 
_ElementSize = 4, BufferObj = 0x374ac40, _MaxElement = 0}, {
Size = 1, Type = 5126, Format = 6408, Stride = 0, StrideB = 0, 
Ptr = 0x367d220 "", Enabled = 1 '\001', Normalized = 0 '\000', 
Integer = 0 '\000', InstanceDivisor = 0, _ElementSize = 4, 
BufferObj = 0x374ac40, _MaxElement = 0}, {Size = 1, Type = 5126, 
Format = 6408, Stride = 0, StrideB = 0, Ptr = 0x367d230 "", 
Enabled = 1 '\001', Normalized = 0 '\000', Integer = 0 '\000', 
InstanceDivisor = 0, _ElementSize = 4, BufferObj = 0x374ac40, 
_MaxElement = 0}, {Size = 1, Type = 5126, Format = 6408, 
Stride = 0, StrideB = 0, Ptr = 0x367d240 "", Enabled = 1 '\001', 
Normalized = 0 '\000', Integer = 0 '\000', InstanceDivisor = 0, 
_ElementSize = 4, BufferObj = 0x374ac40, _MaxElement = 0}, {
Size = 2, Type = 5126, Format = 6408, Stride = 0, StrideB = 0, 
Ptr = 0x367d250 "", Enabled = 1 '\001', Normalized = 0 '\000', 
Integer = 0 '\000', InstanceDivisor = 0, _ElementSize = 8, 
BufferObj = 0x374ac40, _MaxElement = 0}, {Size = 1, Type = 5126, 
Format = 6408, Stride = 0, StrideB = 0, Ptr = 0x367d260 "", 
Enabled = 1 '\001', Normal

Re: [Mesa-dev] [PATCH] i965: Avoid segfault in gen6_upload_state

2013-02-19 Thread Ian Romanick


On 02/19/2013 04:27 PM, Carl Worth wrote:

This fixes a bug introduced in commit 258453716f001eab1288d99765213 and
triggered whenever "rb" is NULL.

Fixes bug #59445:

[SNB/IVB/HSW Bisected]Oglc draw-buffers2(advanced.blending.none) 
segfault
https://bugs.freedesktop.org/show_bug.cgi?id=59445
---

I don't know under what conditions "rb" might be NULL, but it's clear that
it's possible and expected as there is earlier code in this function that
checks it, (and sets rb_type specifically in that case). So if someone could
help me write a more descriptive commit message, that would be great.

Also, I notice that similar code in brw_cc.c uses a different condition here:

if (ctx->DrawBuffer->Visual.alphaBits == 0) {


I don't know what cases could cause rb to be NULL either.  There is code 
earlier that checks this case (near the top of the function), so it 
doesn't seem to be an error condition.  Could this be for the window? 
Ken or Eric should know for sure.


Either way, ctx->DrawBuffer->Visual contains either the window 
configuration or a mirror of the state for the current FBO.  It should 
always be safe to use that.  Using ctx->DrawBuffer->Visual.alphaBits 
will ensure that you get the correct answer even when rb is NULL.



So an alternate fix could be to switch to something like that. Please let me
know if one version or the other is cleaner, (and both could be made to
match).

  src/mesa/drivers/dri/i965/gen6_cc.c |2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/gen6_cc.c 
b/src/mesa/drivers/dri/i965/gen6_cc.c
index d32f636..7ac5d5f 100644
--- a/src/mesa/drivers/dri/i965/gen6_cc.c
+++ b/src/mesa/drivers/dri/i965/gen6_cc.c
@@ -126,7 +126,7 @@ gen6_upload_blend_state(struct brw_context *brw)
* not read the alpha channel, but will instead use the correct
* implicit value for alpha.
*/
- if (!_mesa_base_format_has_channel(rb->_BaseFormat, 
GL_TEXTURE_ALPHA_TYPE))
+ if (rb && !_mesa_base_format_has_channel(rb->_BaseFormat, 
GL_TEXTURE_ALPHA_TYPE))
   {
  srcRGB = brw_fix_xRGB_alpha(srcRGB);
  srcA = brw_fix_xRGB_alpha(srcA);



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 2/4] i965/fs: Improve CSE performance by expiring some available expressions.

2013-02-19 Thread Eric Anholt

We're already walking the list, and we can easily know when something
has no reason to be in the list any longer, so take a brief extra step
to reduce our worst-case runtime (an oglconform test that emits the
maximum instructions in a fragment program).  I don't actually know what
the worst-case runtime was, because it was too long and I got bored.
---
 src/mesa/drivers/dri/i965/brw_fs_cse.cpp |   20 +++-
 1 file changed, 19 insertions(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs_cse.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_cse.cpp
index 44479d8..09c7fa6 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_cse.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_cse.cpp
@@ -88,6 +88,7 @@ fs_visitor::opt_cse_local(bblock_t *block, exec_list *aeb)
 
void *mem_ctx = ralloc_context(this->mem_ctx);
 
+   int ip = block->start_ip;
for (fs_inst *inst = (fs_inst *)block->start;
inst != block->end->next;
inst = (fs_inst *) inst->next) {
@@ -153,18 +154,33 @@ fs_visitor::opt_cse_local(bblock_t *block, exec_list *aeb)
 }
   }
 
-  /* Kill all AEB entries that use the destination. */
   foreach_list_safe(entry_node, aeb) {
 aeb_entry *entry = (aeb_entry *)entry_node;
 
 for (int i = 0; i < 3; i++) {
+fs_reg *src_reg = &entry->generator->src[i];
+
+/* Kill all AEB entries that use the destination we just
+ * overwrote.
+ */
 if (inst->overwrites_reg(entry->generator->src[i])) {
   entry->remove();
   ralloc_free(entry);
   break;
}
+
+/* Kill any AEB entries using registers that don't get reused any
+ * more -- a sure sign they'll fail operands_match().
+ */
+if (src_reg->file == GRF && virtual_grf_use[src_reg->reg] < ip) {
+   entry->remove();
+   ralloc_free(entry);
+  break;
+}
 }
   }
+
+  ip++;
}
 
ralloc_free(mem_ctx);
@@ -180,6 +196,8 @@ fs_visitor::opt_cse()
 {
bool progress = false;
 
+   calculate_live_intervals();
+
cfg_t cfg(this);
 
for (int b = 0; b < cfg.num_blocks; b++) {
-- 
1.7.10.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 3/4] mesa: Reduce the memory usage for reg alloc with many graph nodes (part 1)

2013-02-19 Thread Eric Anholt

We were allocating an adjacency_list entry for every possible
interference that could get created, but that usually doesn't happen.
We can save a lot of memory by resizing the array on demand.
---
 src/mesa/program/register_allocate.c |   14 +-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/src/mesa/program/register_allocate.c 
b/src/mesa/program/register_allocate.c
index 88793db..5862c78 100644
--- a/src/mesa/program/register_allocate.c
+++ b/src/mesa/program/register_allocate.c
@@ -120,6 +120,7 @@ struct ra_node {
 */
GLboolean *adjacency;
unsigned int *adjacency_list;
+   unsigned int adjacency_list_size;
unsigned int adjacency_count;
/** @} */
 
@@ -307,6 +308,15 @@ static void
 ra_add_node_adjacency(struct ra_graph *g, unsigned int n1, unsigned int n2)
 {
g->nodes[n1].adjacency[n2] = GL_TRUE;
+
+   if (g->nodes[n1].adjacency_count >=
+   g->nodes[n1].adjacency_list_size) {
+  g->nodes[n1].adjacency_list_size *= 2;
+  g->nodes[n1].adjacency_list = reralloc(g, g->nodes[n1].adjacency_list,
+ unsigned int,
+ g->nodes[n1].adjacency_list_size);
+   }
+
g->nodes[n1].adjacency_list[g->nodes[n1].adjacency_count] = n2;
g->nodes[n1].adjacency_count++;
 }
@@ -326,7 +336,9 @@ ra_alloc_interference_graph(struct ra_regs *regs, unsigned 
int count)
 
for (i = 0; i < count; i++) {
   g->nodes[i].adjacency = rzalloc_array(g, GLboolean, count);
-  g->nodes[i].adjacency_list = ralloc_array(g, unsigned int, count);
+  g->nodes[i].adjacency_list_size = 4;
+  g->nodes[i].adjacency_list =
+ ralloc_array(g, unsigned int, g->nodes[i].adjacency_list_size);
   g->nodes[i].adjacency_count = 0;
   ra_add_node_adjacency(g, i, i);
   g->nodes[i].reg = NO_REG;
-- 
1.7.10.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 1/4] i965/fs: Improve live variables calculation performance.

2013-02-19 Thread Eric Anholt

We can execute way fewer instructions by doing our boolean manipulation
on an "int" of bits at a time, while also reducing our working set size.

Reduces compile time of L4D2's slowest shader from 4s to 1.1s
(-72.4% +/- 0.2%, n=10)
---
 .../drivers/dri/i965/brw_fs_live_variables.cpp |   44 +++-
 src/mesa/drivers/dri/i965/brw_fs_live_variables.h  |   10 +++--
 2 files changed, 30 insertions(+), 24 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs_live_variables.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_live_variables.cpp
index db8f397..e7de43e 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_live_variables.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_live_variables.cpp
@@ -40,7 +40,7 @@ using namespace brw;
  */
 
 /**
- * Sets up the use[] and def[] arrays.
+ * Sets up the use[] and def[] bitsets.
  *
  * The basic-block-level live variable analysis needs to know which
  * variables get used before they're completely defined, and which
@@ -67,8 +67,8 @@ fs_live_variables::setup_def_use()
if (inst->src[i].file == GRF) {
   int reg = inst->src[i].reg;
 
-  if (!bd[b].def[reg])
- bd[b].use[reg] = true;
+  if (!BITSET_TEST(bd[b].def, reg))
+ BITSET_SET(bd[b].use, reg);
}
 }
 
@@ -82,8 +82,8 @@ fs_live_variables::setup_def_use()
 !inst->force_uncompressed &&
 !inst->force_sechalf) {
int reg = inst->dst.reg;
-   if (!bd[b].use[reg])
-  bd[b].def[reg] = true;
+if (!BITSET_TEST(bd[b].use, reg))
+   BITSET_SET(bd[b].def, reg);
 }
 
 ip++;
@@ -107,12 +107,12 @@ fs_live_variables::compute_live_variables()
 
   for (int b = 0; b < cfg->num_blocks; b++) {
 /* Update livein */
-for (int i = 0; i < num_vars; i++) {
-   if (bd[b].use[i] || (bd[b].liveout[i] && !bd[b].def[i])) {
-  if (!bd[b].livein[i]) {
- bd[b].livein[i] = true;
- cont = true;
-  }
+for (int i = 0; i < bitset_words; i++) {
+BITSET_WORD new_livein = (bd[b].use[i] |
+  (bd[b].liveout[i] & ~bd[b].def[i]));
+   if (new_livein & ~bd[b].livein[i]) {
+   bd[b].livein[i] |= new_livein;
+   cont = true;
}
 }
 
@@ -121,9 +121,11 @@ fs_live_variables::compute_live_variables()
bblock_link *link = (bblock_link *)block_node;
bblock_t *block = link->block;
 
-   for (int i = 0; i < num_vars; i++) {
-  if (bd[block->block_num].livein[i] && !bd[b].liveout[i]) {
- bd[b].liveout[i] = true;
+   for (int i = 0; i < bitset_words; i++) {
+   BITSET_WORD new_liveout = (bd[block->block_num].livein[i] &
+  ~bd[b].liveout[i]);
+  if (new_liveout & ~bd[b].liveout[i]) {
+ bd[b].liveout[i] |= new_liveout;
  cont = true;
   }
}
@@ -140,11 +142,13 @@ fs_live_variables::fs_live_variables(fs_visitor *v, cfg_t 
*cfg)
num_vars = v->virtual_grf_count;
bd = rzalloc_array(mem_ctx, struct block_data, cfg->num_blocks);
 
+   bitset_words = (ALIGN(v->virtual_grf_count, BITSET_WORDBITS) /
+   BITSET_WORDBITS);
for (int i = 0; i < cfg->num_blocks; i++) {
-  bd[i].def = rzalloc_array(mem_ctx, bool, num_vars);
-  bd[i].use = rzalloc_array(mem_ctx, bool, num_vars);
-  bd[i].livein = rzalloc_array(mem_ctx, bool, num_vars);
-  bd[i].liveout = rzalloc_array(mem_ctx, bool, num_vars);
+  bd[i].def = rzalloc_array(mem_ctx, BITSET_WORD, bitset_words);
+  bd[i].use = rzalloc_array(mem_ctx, BITSET_WORD, bitset_words);
+  bd[i].livein = rzalloc_array(mem_ctx, BITSET_WORD, bitset_words);
+  bd[i].liveout = rzalloc_array(mem_ctx, BITSET_WORD, bitset_words);
}
 
setup_def_use();
@@ -208,12 +212,12 @@ fs_visitor::calculate_live_intervals()
 
for (int b = 0; b < cfg.num_blocks; b++) {
   for (int i = 0; i < num_vars; i++) {
-if (livevars.bd[b].livein[i]) {
+if (BITSET_TEST(livevars.bd[b].livein, i)) {
def[i] = MIN2(def[i], cfg.blocks[b]->start_ip);
use[i] = MAX2(use[i], cfg.blocks[b]->start_ip);
 }
 
-if (livevars.bd[b].liveout[i]) {
+if (BITSET_TEST(livevars.bd[b].liveout, i)) {
def[i] = MIN2(def[i], cfg.blocks[b]->end_ip);
use[i] = MAX2(use[i], cfg.blocks[b]->end_ip);
 }
diff --git a/src/mesa/drivers/dri/i965/brw_fs_live_variables.h 
b/src/mesa/drivers/dri/i965/brw_fs_live_variables.h
index 5f7e67e..1cde5f4 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_live_variables.h
+++ b/src/mesa/drivers/dri/i965/brw_fs_live_variables.h
@@ -26,6 +26,7 @@
  */
 
 #include "brw_fs.h"
+#include "main/bitset.h"
 
 namespace brw {
 
@@ -36,18 +37,18 @@ struct block_data {

[Mesa-dev] [PATCH 4/4] mesa: Reduce memory usage for reg alloc with many graph nodes (part 2).

2013-02-19 Thread Eric Anholt

After the previous fix that almost removes an allocation of 4*n^2
bytes, we can use a bitset to reduce another allocation from n^2 bytes
to n^2/8 bytes.

Between the previous commit and this one, the peak heap size for an
oglconform ARB_fragment_program max instructions test on i965 goes from
4GB to 255MB.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=55825
---
 src/mesa/program/register_allocate.c |   12 
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/src/mesa/program/register_allocate.c 
b/src/mesa/program/register_allocate.c
index 5862c78..a9064c3 100644
--- a/src/mesa/program/register_allocate.c
+++ b/src/mesa/program/register_allocate.c
@@ -75,6 +75,7 @@
 #include "main/imports.h"
 #include "main/macros.h"
 #include "main/mtypes.h"
+#include "main/bitset.h"
 #include "register_allocate.h"
 
 #define NO_REG ~0
@@ -118,7 +119,7 @@ struct ra_node {
 * List of which nodes this node interferes with.  This should be
 * symmetric with the other node.
 */
-   GLboolean *adjacency;
+   BITSET_WORD *adjacency;
unsigned int *adjacency_list;
unsigned int adjacency_list_size;
unsigned int adjacency_count;
@@ -307,7 +308,7 @@ ra_set_finalize(struct ra_regs *regs, unsigned int 
**q_values)
 static void
 ra_add_node_adjacency(struct ra_graph *g, unsigned int n1, unsigned int n2)
 {
-   g->nodes[n1].adjacency[n2] = GL_TRUE;
+   BITSET_SET(g->nodes[n1].adjacency, n2);
 
if (g->nodes[n1].adjacency_count >=
g->nodes[n1].adjacency_list_size) {
@@ -335,11 +336,14 @@ ra_alloc_interference_graph(struct ra_regs *regs, 
unsigned int count)
g->stack = rzalloc_array(g, unsigned int, count);
 
for (i = 0; i < count; i++) {
-  g->nodes[i].adjacency = rzalloc_array(g, GLboolean, count);
+  int bitset_count = ALIGN(count, BITSET_WORDBITS) / BITSET_WORDBITS;
+  g->nodes[i].adjacency = rzalloc_array(g, BITSET_WORD, bitset_count);
+
   g->nodes[i].adjacency_list_size = 4;
   g->nodes[i].adjacency_list =
  ralloc_array(g, unsigned int, g->nodes[i].adjacency_list_size);
   g->nodes[i].adjacency_count = 0;
+
   ra_add_node_adjacency(g, i, i);
   g->nodes[i].reg = NO_REG;
}
@@ -358,7 +362,7 @@ void
 ra_add_node_interference(struct ra_graph *g,
 unsigned int n1, unsigned int n2)
 {
-   if (!g->nodes[n1].adjacency[n2]) {
+   if (!BITSET_TEST(g->nodes[n1].adjacency, n2)) {
   ra_add_node_adjacency(g, n1, n2);
   ra_add_node_adjacency(g, n2, n1);
}
-- 
1.7.10.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 3/9] glsl: Convert mix() to use a new ir_triop_lrp opcode.

2013-02-19 Thread Roland Scheidegger

Not much to say about the code (the theory sounds sane) but I was
wondering about the comment.
Why did glsl implement this really as x * (1 - a) + y * a?
The usual way for lerp would be (y - x) * a + x, i.e. two ops for most
gpus (sub+mad, or sub+mul+add). But I'm wondering if that sacrifices
precision or gets Infs wrong or something (this is the way the gallivm
code implements TGSI_OPCODE_LRP). I guess strict IEEE conformance would
really forbid that optimization though...

Roland


Am 20.02.2013 02:03, schrieb Matt Turner:
> From: Kenneth Graunke 
> 
> Many GPUs have an instruction to do linear interpolation which is more
> efficient than simply performing the algebra necessary (two multiplies,
> an add, and a subtract).
> 
> Pattern matching or peepholing this is more desirable, but can be
> tricky.  By using an opcode, we can at least make shaders which use the
> mix() built-in get the more efficient behavior.
> 
> Currently, all consumers lower ir_triop_lrp.  Subsequent patches will
> actually generate different code.
> 
> v2 [mattst88]:
>- Add LRP_TO_ARITH flag to ir_to_mesa.cpp. Will be removed in a
>  subsequent patch and ir_triop_lrp translated directly.
> 
> Reviewed-by: Matt Turner 
> ---
>  src/glsl/builtins/ir/mix.ir|   14 +-
>  src/glsl/ir.cpp|4 +++
>  src/glsl/ir.h  |7 +
>  src/glsl/ir_constant_expression.cpp|   13 ++
>  src/glsl/ir_optimization.h |1 +
>  src/glsl/ir_validate.cpp   |6 
>  src/glsl/lower_instructions.cpp|   35 
> 
>  src/mesa/drivers/dri/i965/brw_shader.cpp   |3 +-
>  src/mesa/program/ir_to_mesa.cpp|6 -
>  src/mesa/state_tracker/st_glsl_to_tgsi.cpp |1 +
>  10 files changed, 81 insertions(+), 9 deletions(-)
> 
> diff --git a/src/glsl/builtins/ir/mix.ir b/src/glsl/builtins/ir/mix.ir
> index 70ae13c..e666532 100644
> --- a/src/glsl/builtins/ir/mix.ir
> +++ b/src/glsl/builtins/ir/mix.ir
> @@ -4,49 +4,49 @@
> (declare (in) float arg0)
> (declare (in) float arg1)
> (declare (in) float arg2))
> - ((return (expression float + (expression float * (var_ref arg0) 
> (expression float - (constant float (1.00)) (var_ref arg2))) (expression 
> float * (var_ref arg1) (var_ref arg2))
> + ((return (expression float lrp (var_ref arg0) (var_ref arg1) (var_ref 
> arg2)
>  
> (signature vec2
>   (parameters
> (declare (in) vec2 arg0)
> (declare (in) vec2 arg1)
> (declare (in) vec2 arg2))
> - ((return (expression vec2 + (expression vec2 * (var_ref arg0) 
> (expression vec2 - (constant float (1.00)) (var_ref arg2))) (expression 
> vec2 * (var_ref arg1) (var_ref arg2))
> + ((return (expression vec2 lrp (var_ref arg0) (var_ref arg1) (var_ref 
> arg2)
>  
> (signature vec3
>   (parameters
> (declare (in) vec3 arg0)
> (declare (in) vec3 arg1)
> (declare (in) vec3 arg2))
> - ((return (expression vec3 + (expression vec3 * (var_ref arg0) 
> (expression vec3 - (constant float (1.00)) (var_ref arg2))) (expression 
> vec3 * (var_ref arg1) (var_ref arg2))
> + ((return (expression vec3 lrp (var_ref arg0) (var_ref arg1) (var_ref 
> arg2)
>  
> (signature vec4
>   (parameters
> (declare (in) vec4 arg0)
> (declare (in) vec4 arg1)
> (declare (in) vec4 arg2))
> - ((return (expression vec4 + (expression vec4 * (var_ref arg0) 
> (expression vec4 - (constant float (1.00)) (var_ref arg2))) (expression 
> vec4 * (var_ref arg1) (var_ref arg2))
> + ((return (expression vec4 lrp (var_ref arg0) (var_ref arg1) (var_ref 
> arg2)
>  
> (signature vec2
>   (parameters
> (declare (in) vec2 arg0)
> (declare (in) vec2 arg1)
> (declare (in) float arg2))
> - ((return (expression vec2 + (expression vec2 * (var_ref arg0) 
> (expression float - (constant float (1.00)) (var_ref arg2))) (expression 
> vec2 * (var_ref arg1) (var_ref arg2))
> + ((return (expression vec2 lrp (var_ref arg0) (var_ref arg1) (var_ref 
> arg2)
>  
> (signature vec3
>   (parameters
> (declare (in) vec3 arg0)
> (declare (in) vec3 arg1)
> (declare (in) float arg2))
> - ((return (expression vec3 + (expression vec3 * (var_ref arg0) 
> (expression float - (constant float (1.00)) (var_ref arg2))) (expression 
> vec3 * (var_ref arg1) (var_ref arg2))
> + ((return (expression vec3 lrp (var_ref arg0) (var_ref arg1) (var_ref 
> arg2)
>  
> (signature vec4
>   (parameters
> (declare (in) vec4 arg0)
> (declare (in) vec4 arg1)
> (declare (in) float arg2))
> - ((return (expression vec4 + (expression vec4 * (var_ref arg0) 
> (expression float - (constant float (1.00)) (var_ref arg2))) (expression 
> vec4 * (var_ref ar

[Mesa-dev] [PATCH 1/4] i965/fs: Remove duplicate scan_inst->mlen check.

2013-02-19 Thread Matt Turner

Is already checked 20 lines below.
---
 src/mesa/drivers/dri/i965/brw_fs.cpp |5 -
 1 files changed, 0 insertions(+), 5 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index bdb6616..56358f7 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -2070,11 +2070,6 @@ fs_visitor::compute_to_mrf()
 * into a compute-to-MRF.
 */
 
-/* SENDs can only write to GRFs, so no compute-to-MRF. */
-   if (scan_inst->mlen) {
-  break;
-   }
-
/* If it's predicated, it (probably) didn't populate all
 * the channels.  We might be able to rewrite everything
 * that writes that reg, but it would require smarter
-- 
1.7.8.6

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 2/4] i965/gen7: Relax restrictions on fake MRFs.

2013-02-19 Thread Matt Turner

Gen6 has write-only MRF registers, and for ease of implementation we
paritition off 16 general purposes registers to act as MRFs on Gen7.

Knowing that our Gen7 MRFs are actually GRFs, we can potentially do
things we can't do with real MRFs:
   - read from them;
   - return values directly to them from a send instruction; and
   - compute directly to them with math instructions.
---
 src/mesa/drivers/dri/i965/brw_eu_emit.c |6 --
 1 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_eu_emit.c 
b/src/mesa/drivers/dri/i965/brw_eu_emit.c
index 8cdbb21..8ed8c4a 100644
--- a/src/mesa/drivers/dri/i965/brw_eu_emit.c
+++ b/src/mesa/drivers/dri/i965/brw_eu_emit.c
@@ -1709,7 +1709,8 @@ void brw_math( struct brw_compile *p,
if (intel->gen >= 6) {
   struct brw_instruction *insn = next_insn(p, BRW_OPCODE_MATH);
 
-  assert(dest.file == BRW_GENERAL_REGISTER_FILE);
+  assert(dest.file == BRW_GENERAL_REGISTER_FILE ||
+ (intel->gen >= 7 && dest.file == BRW_MESSAGE_REGISTER_FILE));
   assert(src.file == BRW_GENERAL_REGISTER_FILE);
 
   assert(dest.hstride == BRW_HORIZONTAL_STRIDE_1);
@@ -1773,7 +1774,8 @@ void brw_math2(struct brw_compile *p,
(void) intel;
 
 
-   assert(dest.file == BRW_GENERAL_REGISTER_FILE);
+   assert(dest.file == BRW_GENERAL_REGISTER_FILE ||
+  (intel->gen >= 7 && dest.file == BRW_MESSAGE_REGISTER_FILE));
assert(src0.file == BRW_GENERAL_REGISTER_FILE);
assert(src1.file == BRW_GENERAL_REGISTER_FILE);
 
-- 
1.7.8.6

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 3/4] i965/fs/gen7: Allow MATH instructions to have MRF as a destination.

2013-02-19 Thread Matt Turner

total instructions in shared programs: 1376297 -> 1375626 (-0.05%)
instructions in affected programs: 35977 -> 35306 (-1.87%)
---
 src/mesa/drivers/dri/i965/brw_fs.cpp |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index 56358f7..999be86 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -2090,7 +2090,7 @@ fs_visitor::compute_to_mrf()
if (scan_inst->mlen)
   break;
 
-   if (intel->gen >= 6) {
+   if (intel->gen < 7) {
   /* gen6 math instructions must have the destination be
* GRF, so no compute-to-MRF for them.
*/
-- 
1.7.8.6

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 4/4] i965/vs/gen7: Allow MATH instructions to have MRF as a destination.

2013-02-19 Thread Matt Turner

total instructions in shared programs: 346873 -> 346847 (-0.01%)
instructions in affected programs: 364 -> 338 (-7.14%)

(All affected shaders are from Lightsmark)
---
 src/mesa/drivers/dri/i965/brw_vec4.cpp |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4.cpp
index d5b7cb7..6454f82 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
@@ -753,7 +753,7 @@ vec4_visitor::opt_register_coalesce()
if (scan_inst->mlen)
   break;
 
-   if (intel->gen >= 6) {
+   if (intel->gen < 7) {
   /* gen6 math instructions must have the destination be
* GRF, so no compute-to-MRF for them.
*/
-- 
1.7.8.6

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 61153] New: [softpipe] piglit interpolation-noperspective-gl_BackColor-flat-vertex regression

2013-02-19 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=61153

  Priority: medium
Bug ID: 61153
  Keywords: regression
CC: bri...@vmware.com
  Assignee: mesa-dev@lists.freedesktop.org
   Summary: [softpipe] piglit
interpolation-noperspective-gl_BackColor-flat-vertex
regression
  Severity: normal
Classification: Unclassified
OS: Linux (All)
  Reporter: v...@freedesktop.org
  Hardware: x86-64 (AMD64)
Status: NEW
   Version: git
 Component: Other
   Product: Mesa

mesa: 076403c30d9f5cc79374e30d9f6007b08a63bf2d (master)

$ ./bin/shader_runner
generated_tests/spec/glsl-1.30/execution/interpolation/interpolation-noperspective-gl_BackColor-flat-vertex.shader_test
-auto
Mesa warning: failed to remap glClampColorARB
Mesa warning: failed to remap glTexBufferARB
Mesa warning: failed to remap glFramebufferTextureARB
Mesa warning: failed to remap glVertexAttribDivisorARB
Mesa warning: failed to remap glProgramParameteriARB
Probe at (159,45)
  Expected: 0.272727 0.181818 0.545455 1.00
  Observed: 0.474510 0.160784 0.364706 1.00
Probe at (192,38)
  Expected: 0.153846 0.153846 0.692308 1.00
  Observed: 0.27 0.13 0.596078 1.00
Probe at (216,33)
  Expected: 0.07 0.13 0.80 1.00
  Observed: 0.117647 0.117647 0.764706 1.00
Probe at (166,83)
  Expected: 0.17 0.33 0.50 1.00
  Observed: 0.294118 0.294118 0.415686 1.00
Probe at (196,71)
  Expected: 0.071429 0.285714 0.642857 1.00
  Observed: 0.125490 0.250980 0.627451 1.00
Probe at (136,136)
  Expected: 0.181818 0.545455 0.272727 1.00
  Observed: 0.317647 0.478431 0.203922 1.00
Probe at (173,115)
  Expected: 0.076923 0.461538 0.461538 1.00
  Observed: 0.129412 0.403922 0.462745 1.00
Probe at (145,166)
  Expected: 0.08 0.67 0.25 1.00
  Observed: 0.149020 0.603922 0.247059 1.00
PIGLIT: {'result': 'fail' }

4a938ef7136a89c828ebb16effe1bc5bea08b7d7 is the first bad commit
commit 4a938ef7136a89c828ebb16effe1bc5bea08b7d7
Author: Brian Paul 
Date:   Mon Jan 21 11:32:49 2013 -0700

draw: add new debug code and comments in clip code template

In debug builds, set clipped vertex window coordinates to NaN values
to help debugging.  Otherwise, we're just leaving the coordinate in clip
space and it's invalid to use it later expecting it to be a window coord.

Reviewed-by: José Fonseca 

:04 04 1c9a933b785f0a56ebf2f13874aacffc4183e976
b87acb6dca3e64abd30f9da1d7483ac61364fb9e M  src
bisect run success

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] mesa: Don't install glEvalMesh in the beginend dispatch table

2013-02-19 Thread Eric Anholt

Ian Romanick  writes:

> From: Ian Romanick 
>
> NOTE: This is a candidate for the 9.1 branch.
>
> Signed-off-by: Ian Romanick 
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=59740
> Cc: Eric Anholt 

I had "make a GL 1.1 testcase like gl-1.0/beginend" on the back burner
because of this, but I'll take a patch even before then.

Reviewed-by: Eric Anholt 


pgpISSRH6_Adb.pgp
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] radeonsi: Fix memory leak in si_shader_select.

2013-02-19 Thread Vinson Lee

Fixes resource leak defect reported by Coverity.

Signed-off-by: Vinson Lee 
---
 src/gallium/drivers/radeonsi/si_state.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/gallium/drivers/radeonsi/si_state.c 
b/src/gallium/drivers/radeonsi/si_state.c
index d20e3ff..7f76cb5 100644
--- a/src/gallium/drivers/radeonsi/si_state.c
+++ b/src/gallium/drivers/radeonsi/si_state.c
@@ -1940,6 +1940,7 @@ int si_shader_select(struct pipe_context *ctx,
R600_ERR("Failed to build shader variant (type=%u) 
%d\n",
 sel->type, r);
sel->current = NULL;
+   FREE(shader);
return r;
}
 
-- 
1.8.1.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

99 matches

Mail list logo