Re: [Mesa-dev] r600g: status of my work on the shader optimization
On 02/18/2013 02:20 PM, Andy Furniss wrote: Stefan Seifert wrote: Hi! Amazing work! I see some 50 % speed ups in FlightGear and even more. While normally 3D clouds tear performance down to an unflyable stutter, with your branch I can fly in densly clouded conditions at usable framerates. I can now turn all shaders to maximum and enjoy the view. This makes a huge difference. Unfortunately there's a downside as well: Testing with rv790 with drm-fixes kernel not much works - etqw runs but in a level 50% of screen is junk. nexuiz menus total junk, didn't test further. xonotic menus OK but gpu lock on starting timedemo. vdpau mpeg2 decode - renders 90% junk. heaven 3.0 (on a different pure 64 bit setup) gpu lock. I've pushed the patch to improve support for the r6xx, r7xx and cayman. I believe the chances that it will work on these chips are higher now, so you might want to give it another try. Vadim Unrelated question wtr heaven 3.0 - does it work properly anyway? For me running 64bit on rv790 with vanilla mesa with or without llvm I have to set shaders to medium, on high it works but I get no lighting/effects. There are also a couple of scenes that render as flared out black and white. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] glsl: Initialize parcel_out_uniform_storage member variables.
Hi, I have tested your changes: [Mesa-dev] [PATCH] glsl: Initialize parcel_out_uniform_storage member variables. Project: mesa (Mesa build tests) Configurations: android linux Tested the patch(es) on top of the following commits: 07cdfdb st/mesa: remove what is left from u_blit 40ee93c st/mesa: simplify and improve CopyTexSubImage 6520a86 st/mesa: don't do sRGB conversion in CopyTexSubImage 0a1479c st/mesa: implement blit-based TexImage and TexSubImage a6e0ac9 st/mesa: fix blit-based GetTexImage for 1D array textures 91acf62 st/mesa: fix blit-based GetTexImage for depth/stencil formats 0181e18 st/mesa: factor out code for determining blit.mask from CopyTexSubImage Failed to build for "android"" 07cdfdb st/mesa: remove what is left from u_blit 40ee93c st/mesa: simplify and improve CopyTexSubImage 6520a86 st/mesa: don't do sRGB conversion in CopyTexSubImage 0a1479c st/mesa: implement blit-based TexImage and TexSubImage a6e0ac9 st/mesa: fix blit-based GetTexImage for 1D array textures 91acf62 st/mesa: fix blit-based GetTexImage for depth/stencil formats 0181e18 st/mesa: factor out code for determining blit.mask from CopyTexSubImage src/glsl/glsl_parser.yy: conflicts: 1 shift/reduce src/glsl/./link_uniforms.cpp: In constructor 'parcel_out_uniform_storage::parcel_out_uniform_storage(string_to_uint_map*, gl_uniform_storage*, gl_constant_value*)': src/glsl/./link_uniforms.cpp:267:18: error: expected '{' before 'uniforms' make: *** [out/host/linux-x86/obj/EXECUTABLES/mesa_builtin_compiler_intermediates/./link_uniforms.o] Error 1 FAILURE Successfully built configuration "linux", no issues -- Regards! http://groleo.wordpress.com ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2] glsl: Initialize parcel_out_uniform_storage member variables.
Fixes uninitialized scalar field defect reported by Coverity. Signed-off-by: Vinson Lee --- src/glsl/link_uniforms.cpp | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/src/glsl/link_uniforms.cpp b/src/glsl/link_uniforms.cpp index d457e4d..b5bfe13 100644 --- a/src/glsl/link_uniforms.cpp +++ b/src/glsl/link_uniforms.cpp @@ -263,9 +263,11 @@ public: parcel_out_uniform_storage(struct string_to_uint_map *map, struct gl_uniform_storage *uniforms, union gl_constant_value *values) - : map(map), uniforms(uniforms), next_sampler(0), values(values) + : ubo_block_index(0), ubo_byte_offset(0), ubo_row_major(false), +map(map), uniforms(uniforms), next_sampler(0), values(values), +targets(), shader_samplers_used(0), shader_shadow_samplers(0) { - memset(this->targets, 0, sizeof(this->targets)); + /* empty */ } void start_shader() -- 1.8.1.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2] configure.ac: Do not check for clock_gettime on MinGW.
MinGW does not have clock_gettime. Signed-off-by: Vinson Lee --- configure.ac | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/configure.ac b/configure.ac index 16c2f8c..1e11b4e 100644 --- a/configure.ac +++ b/configure.ac @@ -500,7 +500,7 @@ AC_CHECK_FUNC([dlopen], [DEFINES="$DEFINES -DHAVE_DLOPEN"], AC_SUBST([DLOPEN_LIBS]) case "$host_os" in -darwin*) +darwin*|mingw*) ;; *) AC_CHECK_FUNCS([clock_gettime], [CLOCK_LIB=], -- 1.8.1.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] llvmpipe: fix lp_resource_copy using more than one 3d slice
Thanks for fixing this Roland. This is definitely an improvement. I'd recommend a few tweaks (it could even be as a follow on change): - Calling llvmpipe_flush_resource() in a loop is overkill (it will call llvmpipe_flush() to be called many times needlessly). Please refactor llvmpipe_flush_resource() and llvmpipe_is_resource_referenced() to receive start_layer, end_layer pair. - call util_copy_box instead of util_copy_rect Jose - Original Message - > From: Roland Scheidegger > > These used to be illegal a very long time ago, then for some more time > nothing really emitted these so this code path wasn't hit. > Just trivially iterate over box->depth. > (Might be worth refactoring at some point since nowadays all the code > doesn't really do much except for depth textures.) > > This fixes https://bugs.freedesktop.org/show_bug.cgi?id=61093 > --- > src/gallium/drivers/llvmpipe/lp_surface.c | 170 > +++-- > 1 file changed, 86 insertions(+), 84 deletions(-) > > diff --git a/src/gallium/drivers/llvmpipe/lp_surface.c > b/src/gallium/drivers/llvmpipe/lp_surface.c > index 11475fd..dbaed95 100644 > --- a/src/gallium/drivers/llvmpipe/lp_surface.c > +++ b/src/gallium/drivers/llvmpipe/lp_surface.c > @@ -65,7 +65,7 @@ lp_resource_copy(struct pipe_context *pipe, > const enum pipe_format format = src_tex->base.format; > unsigned width = src_box->width; > unsigned height = src_box->height; > - assert(src_box->depth == 1); > + unsigned z; > > /* Fallback for buffers. */ > if (dst->target == PIPE_BUFFER && src->target == PIPE_BUFFER) { > @@ -74,99 +74,101 @@ lp_resource_copy(struct pipe_context *pipe, >return; > } > > - llvmpipe_flush_resource(pipe, > - dst, dst_level, dstz, > - FALSE, /* read_only */ > - TRUE, /* cpu_access */ > - FALSE, /* do_not_block */ > - "blit dest"); > - > - llvmpipe_flush_resource(pipe, > - src, src_level, src_box->z, > - TRUE, /* read_only */ > - TRUE, /* cpu_access */ > - FALSE, /* do_not_block */ > - "blit src"); > - > - /* > - printf("surface copy from %u lvl %u to %u lvl %u: %u,%u,%u to %u,%u,%u %u > x %u x %u\n", > - src_tex->id, src_level, dst_tex->id, dst_level, > - src_box->x, src_box->y, src_box->z, dstx, dsty, dstz, > - src_box->width, src_box->height, src_box->depth); > - */ > - > - /* set src tiles to linear layout */ > - { > - unsigned tx, ty, tw, th; > - unsigned x, y; > - > - adjust_to_tile_bounds(src_box->x, src_box->y, width, height, > -&tx, &ty, &tw, &th); > - > - for (y = 0; y < th; y += TILE_SIZE) { > - for (x = 0; x < tw; x += TILE_SIZE) { > -(void) llvmpipe_get_texture_tile_linear(src_tex, > -src_box->z, src_level, > -LP_TEX_USAGE_READ, > -tx + x, ty + y); > + for (z = 0; z < src_box->depth; z++){ > + llvmpipe_flush_resource(pipe, > + dst, dst_level, dstz + z, > + FALSE, /* read_only */ > + TRUE, /* cpu_access */ > + FALSE, /* do_not_block */ > + "blit dest"); > + > + llvmpipe_flush_resource(pipe, > + src, src_level, src_box->z + z, > + TRUE, /* read_only */ > + TRUE, /* cpu_access */ > + FALSE, /* do_not_block */ > + "blit src"); > + > + /* > + printf("surface copy from %u lvl %u to %u lvl %u: %u,%u,%u to %u,%u,%u > %u x %u x %u\n", > + src_tex->id, src_level, dst_tex->id, dst_level, > + src_box->x, src_box->y, src_box->z, dstx, dsty, dstz, > + src_box->width, src_box->height, src_box->depth); > + */ > + > + /* set src tiles to linear layout */ > + { > + unsigned tx, ty, tw, th; > + unsigned x, y; > + > + adjust_to_tile_bounds(src_box->x, src_box->y, width, height, > + &tx, &ty, &tw, &th); > + > + for (y = 0; y < th; y += TILE_SIZE) { > +for (x = 0; x < tw; x += TILE_SIZE) { > + (void) llvmpipe_get_texture_tile_linear(src_tex, > + src_box->z + z, > src_level, > + LP_TEX_USAGE_READ, > + tx + x, ty + y); > +} > } >} > - } > - > - /* set dst
Re: [Mesa-dev] [PATCH] radeonsi: Fix blending using destination alpha factor but non-alpha destination
Am 18.02.2013 20:11, schrieb Roland Scheidegger: Am 18.02.2013 19:14, schrieb Michel Dänzer: From: Michel Dänzer 11 more little piglits. NOTE: This is a candidate for the 9.1 branch. Signed-off-by: Michel Dänzer --- Any ideas why this seems necessary with radeonsi but not with r600g? Maybe the hw uses an implicit 1 if the format has no alpha (though I'm not sure if it can always know with bgrx formats and the like). I'm wondering if there should be a helper for those fixups. Looks to me like quite some drivers need it (though well so far I think just non-gallium i965 does this plus llvmpipe, but for some of the others I'm skeptical if not doing it is really correct...). I agree alpha blending with a buffer format that doesn't have alpha is a bit strange, that should be catched by the upper layers. src/gallium/drivers/radeonsi/si_state.c | 116 +--- src/gallium/drivers/radeonsi/si_state.h | 3 +- 2 files changed, 61 insertions(+), 58 deletions(-) diff --git a/src/gallium/drivers/radeonsi/si_state.c b/src/gallium/drivers/radeonsi/si_state.c index d20e3ff..144a29d 100644 --- a/src/gallium/drivers/radeonsi/si_state.c +++ b/src/gallium/drivers/radeonsi/si_state.c @@ -36,33 +36,6 @@ #include "si_state.h" #include "sid.h" -/* - * inferred framebuffer and blender state - */ -static void si_update_fb_blend_state(struct r600_context *rctx) -{ - struct si_pm4_state *pm4; - struct si_state_blend *blend = rctx->queued.named.blend; - uint32_t mask; - - if (blend == NULL) - return; - - pm4 = CALLOC_STRUCT(si_pm4_state); - if (pm4 == NULL) - return; - - mask = (1ULL << ((unsigned)rctx->framebuffer.nr_cbufs * 4)) - 1; - mask &= blend->cb_target_mask; - si_pm4_set_reg(pm4, R_028238_CB_TARGET_MASK, mask); - - si_pm4_set_state(rctx, fb_blend, pm4); -} - -/* - * Blender functions - */ - static uint32_t si_translate_blend_function(int blend_func) { switch (blend_func) { @@ -84,7 +57,7 @@ static uint32_t si_translate_blend_function(int blend_func) return 0; } -static uint32_t si_translate_blend_factor(int blend_fact) +static uint32_t si_translate_blend_factor(int blend_fact, bool dst_alpha) { switch (blend_fact) { case PIPE_BLENDFACTOR_ONE: @@ -94,7 +67,7 @@ static uint32_t si_translate_blend_factor(int blend_fact) case PIPE_BLENDFACTOR_SRC_ALPHA: return V_028780_BLEND_SRC_ALPHA; case PIPE_BLENDFACTOR_DST_ALPHA: - return V_028780_BLEND_DST_ALPHA; + return dst_alpha ? V_028780_BLEND_DST_ALPHA : V_028780_BLEND_ONE; case PIPE_BLENDFACTOR_DST_COLOR: return V_028780_BLEND_DST_COLOR; case PIPE_BLENDFACTOR_SRC_ALPHA_SATURATE: @@ -110,7 +83,7 @@ static uint32_t si_translate_blend_factor(int blend_fact) case PIPE_BLENDFACTOR_INV_SRC_ALPHA: return V_028780_BLEND_ONE_MINUS_SRC_ALPHA; case PIPE_BLENDFACTOR_INV_DST_ALPHA: - return V_028780_BLEND_ONE_MINUS_DST_ALPHA; + return dst_alpha ? V_028780_BLEND_ONE_MINUS_DST_ALPHA : V_028780_BLEND_ZERO; case PIPE_BLENDFACTOR_INV_DST_COLOR: return V_028780_BLEND_ONE_MINUS_DST_COLOR; case PIPE_BLENDFACTOR_INV_CONST_COLOR: @@ -133,30 +106,25 @@ static uint32_t si_translate_blend_factor(int blend_fact) return 0; } I think you might also need to patch up SRC_ALPHA_SATURATE (to zero). Can't comment on the hw stuff but at least llvmpipe does the same otherwise :-). Why should we do so? SRC_ALPHA_SATURATE should still work fine, even when the destination buffer doesn't have an alpha component. Christian. Roland ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] radeonsi: Fix blending using destination alpha factor but non-alpha destination
On Die, 2013-02-19 at 10:33 +0100, Christian König wrote: > Am 18.02.2013 20:11, schrieb Roland Scheidegger: > > Am 18.02.2013 19:14, schrieb Michel Dänzer: > >> From: Michel Dänzer > >> > >> 11 more little piglits. > >> > >> NOTE: This is a candidate for the 9.1 branch. > >> > >> Signed-off-by: Michel Dänzer > >> --- > >> > >> Any ideas why this seems necessary with radeonsi but not with r600g? > > Maybe the hw uses an implicit 1 if the format has no alpha (though I'm > > not sure if it can always know with bgrx formats and the like). > > I'm wondering if there should be a helper for those fixups. Looks to me > > like quite some drivers need it (though well so far I think just > > non-gallium i965 does this plus llvmpipe, but for some of the others I'm > > skeptical if not doing it is really correct...). > > I agree alpha blending with a buffer format that doesn't have alpha is a > bit strange, that should be catched by the upper layers. If it was that simple. :\ The problem is that AFAICT for formats such as R8G8B8X8, there's no other way to tell the hardware to always use 1 for the destination alpha. And I'm not sure we can just not support any such formats, I certainly don't think that would be a good idea. > >> @@ -84,7 +57,7 @@ static uint32_t si_translate_blend_function(int > >> blend_func) > >>return 0; > >> } > >> > >> -static uint32_t si_translate_blend_factor(int blend_fact) > >> +static uint32_t si_translate_blend_factor(int blend_fact, bool dst_alpha) > >> { > >>switch (blend_fact) { > >>case PIPE_BLENDFACTOR_ONE: > >> @@ -94,7 +67,7 @@ static uint32_t si_translate_blend_factor(int blend_fact) > >>case PIPE_BLENDFACTOR_SRC_ALPHA: > >>return V_028780_BLEND_SRC_ALPHA; > >>case PIPE_BLENDFACTOR_DST_ALPHA: > >> - return V_028780_BLEND_DST_ALPHA; > >> + return dst_alpha ? V_028780_BLEND_DST_ALPHA : > >> V_028780_BLEND_ONE; > >>case PIPE_BLENDFACTOR_DST_COLOR: > >>return V_028780_BLEND_DST_COLOR; > >>case PIPE_BLENDFACTOR_SRC_ALPHA_SATURATE: > >> @@ -110,7 +83,7 @@ static uint32_t si_translate_blend_factor(int > >> blend_fact) > >>case PIPE_BLENDFACTOR_INV_SRC_ALPHA: > >>return V_028780_BLEND_ONE_MINUS_SRC_ALPHA; > >>case PIPE_BLENDFACTOR_INV_DST_ALPHA: > >> - return V_028780_BLEND_ONE_MINUS_DST_ALPHA; > >> + return dst_alpha ? V_028780_BLEND_ONE_MINUS_DST_ALPHA : > >> V_028780_BLEND_ZERO; > >>case PIPE_BLENDFACTOR_INV_DST_COLOR: > >>return V_028780_BLEND_ONE_MINUS_DST_COLOR; > >>case PIPE_BLENDFACTOR_INV_CONST_COLOR: > >> @@ -133,30 +106,25 @@ static uint32_t si_translate_blend_factor(int > >> blend_fact) > >>return 0; > >> } > > I think you might also need to patch up SRC_ALPHA_SATURATE (to zero). > > > > Can't comment on the hw stuff but at least llvmpipe does the same > > otherwise :-). > > Why should we do so? SRC_ALPHA_SATURATE should still work fine, even > when the destination buffer doesn't have an alpha component. I think Roland is right. When the destination has no alpha, the destination alpha value is supposed to be always 1, so SRC_ALPHA_SATURATE is always 0. But with a format as described above, the destination X8 channel may contain any value. Really, what I don't understand is why r600g doesn't seem affected by this... at least on my RS880 it's passing the piglit tests this change fixes with radeonsi. So maybe I'm just missing some magic bit for radeonsi. -- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Debian, X and DRI developer ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] radeonsi: Fix blending using destination alpha factor but non-alpha destination
On Mon, 2013-02-18 at 20:11 +0100, Roland Scheidegger wrote: > Am 18.02.2013 19:14, schrieb Michel Dänzer: > > From: Michel Dänzer > > > > 11 more little piglits. > > > > NOTE: This is a candidate for the 9.1 branch. > > > > Signed-off-by: Michel Dänzer > > --- > > > > Any ideas why this seems necessary with radeonsi but not with r600g? > Maybe the hw uses an implicit 1 if the format has no alpha (though I'm > not sure if it can always know with bgrx formats and the like). Yeah, I can't seem to find anything like that. > I'm wondering if there should be a helper for those fixups. Looks to me > like quite some drivers need it (though well so far I think just > non-gallium i965 does this plus llvmpipe, but for some of the others I'm > skeptical if not doing it is really correct...). Some kind of helper might be nice, maybe that could also simplify the other blending parameters accordingly when possible. -- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Debian, X and DRI developer ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 38086] Mesa 7.11-devel implementation error: Unexpected program target in destroy_program_variants_cb()
https://bugs.freedesktop.org/show_bug.cgi?id=38086 --- Comment #5 from Laurent carlier --- Can reproduce this bug also with mesa-9.2-devel (git) but also 9.0.x with counter strike: Source (steam-linux) gdb backtrace: Breakpoint 2, destroy_program_variants (st=0x8171910, program=0xf52863c0 <_mesa_DummyProgram>) at ../../src/mesa/state_tracker/st_program.c:1212 1212 _mesa_problem(NULL, "Unexpected program target 0x%x in " (gdb) bt full #0 destroy_program_variants (st=0x8171910, program=0xf52863c0 <_mesa_DummyProgram>) at ../../src/mesa/state_tracker/st_program.c:1212 No locals. #1 0xf405cb16 in destroy_program_variants_cb (key=2, data=0xf52863c0 <_mesa_DummyProgram>, userData=0x8171910) at ../../src/mesa/state_tracker/st_program.c:1266 st = 0x8171910 program = 0xf52863c0 <_mesa_DummyProgram> #2 0xf3f3b343 in _mesa_HashWalk (table=0x80b5240, callback=0xf405caf2 , userData=0x8171910) at ../../src/mesa/main/hash.c:329 table2 = 0x80b5240 entry = 0xbf3fe30 __PRETTY_FUNCTION__ = "_mesa_HashWalk" #3 0xf405cb51 in st_destroy_program_variants (st=0x8171910) at ../../src/mesa/state_tracker/st_program.c:1279 No locals. #4 0xf403761c in st_destroy_context (st=0x8171910) at ../../src/mesa/state_tracker/st_context.c:301 pipe = 0x80a6f90 cso = 0x8176e48 ctx = 0x818f788 i = 4 #5 0xf4054e92 in st_context_destroy (stctxi=0x8171910) at ../../src/mesa/state_tracker/st_manager.c:598 st = 0x8171910 #6 0xf42a08e3 in dri_destroy_context (cPriv=0x80b6110) at dri_context.c:187 ctx = 0x80a6f00 #7 0xf3e9ef3e in driDestroyContext (pcp=0x80b6110) at ../../../../src/mesa/drivers/dri/common/dri_util.c:329 No locals. #8 0xf7c520c6 in ?? () from /usr/lib32/libGL.so.1 No symbol table info available. #9 0xf7c26b98 in glXDestroyContext () from /usr/lib32/libGL.so.1 No symbol table info available. #10 0xf73ab759 in X11_GL_DeleteContext (_this=0x805e380, context=0x80b6028) at src/video/x11/SDL_x11opengl.c:747 display = 0x805edb0 #11 0xf7387d00 in SDL_GL_DeleteContext (context=0x80b6028) at src/video/SDL_video.c:2785 No locals. #12 0xf7a0b38f in ?? () from bin/launcher.so No symbol table info available. #13 0xf741418b in ?? () from /home/lordh/.local/share/Steam/SteamApps/lordheavy/Counter-Strike Source/bin/libtogl.so No symbol table info available. #14 0xf74143a6 in ?? () from /home/lordh/.local/share/Steam/SteamApps/lordheavy/Counter-Strike Source/bin/libtogl.so No symbol table info available. #15 0xf7403596 in IDirect3DDevice9::~IDirect3DDevice9() () from /home/lordh/.local/share/Steam/SteamApps/lordheavy/Counter-Strike Source/bin/libtogl.so No symbol table info available. ---Type to continue, or q to quit--- #16 0xf74036f2 in IDirect3DDevice9::~IDirect3DDevice9() () from /home/lordh/.local/share/Steam/SteamApps/lordheavy/Counter-Strike Source/bin/libtogl.so No symbol table info available. #17 0xec66482f in ?? () from /home/lordh/.local/share/Steam/SteamApps/lordheavy/Counter-Strike Source/bin/shaderapidx9.so No symbol table info available. #18 0xec6631fe in ?? () from /home/lordh/.local/share/Steam/SteamApps/lordheavy/Counter-Strike Source/bin/shaderapidx9.so No symbol table info available. #19 0xf1cf7731 in ?? () from /home/lordh/.local/share/Steam/SteamApps/lordheavy/Counter-Strike Source/bin/materialsystem.so No symbol table info available. #20 0xf7a07f8c in ?? () from bin/launcher.so No symbol table info available. #21 0xf7a0801b in ?? () from bin/launcher.so No symbol table info available. #22 0xf7a08010 in ?? () from bin/launcher.so No symbol table info available. #23 0xf79f04cd in LauncherMain () from bin/launcher.so No symbol table info available. #24 0x08048474 in main () (gdb) print *program $1 = {Id = 0, String = 0x0, RefCount = 0, Target = 0, Format = 0, Instructions = 0x0, InputsRead = 0, OutputsWritten = 0, SystemValuesRead = 0, InputFlags = {0 }, OutputFlags = {0 }, TexturesUsed = {0 }, SamplersUsed = 0, ShadowSamplers = 0, Parameters = 0x0, LocalParams = {{0, 0, 0, 0} }, SamplerUnits = '\000' , IndirectRegisterFiles = 0, NumInstructions = 0, NumTemporaries = 0, NumParameters = 0, NumAttributes = 0, NumAddressRegs = 0, NumAluInstructions = 0, NumTexInstructions = 0, NumTexIndirections = 0, NumNativeInstructions = 0, NumNativeTemporaries = 0, NumNativeParameters = 0, NumNativeAttributes = 0, NumNativeAddressRegs = 0, NumNativeAluInstructions = 0, NumNativeTexInstructions = 0, NumNativeTexIndirections = 0} -- You are receiving this mail because: You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 61093] [llvmpipe] lp_surface.c:68:lp_resource_copy: Assertion `src_box->depth == 1' failed.
https://bugs.freedesktop.org/show_bug.cgi?id=61093 --- Comment #1 from Marek Olšák --- The assertion in lp_resource_copy can be fixed easily, but I can't reproduce it. llvmpipe is failing a different assertion here: texsubimage: /home/marek/dev/llvm/include/llvm/CodeGen/SelectionDAGNodes.h:539: const llvm::SDValue &llvm::SDNode::getOperand(unsigned int) const: Assertion `Num < NumOperands && "Invalid child # of SDNode!"' failed. The way I see it, my work only uncovered this bug. -- You are receiving this mail because: You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH mesa] freedreno: gallium driver for adreno
On Mon, Feb 18, 2013 at 7:58 PM, Rob Clark wrote: > On Mon, Feb 18, 2013 at 12:47 PM, Matt Turner wrote: >> On Sun, Feb 17, 2013 at 11:33 AM, Rob Clark wrote: >>> >>> diff --git a/src/gallium/drivers/freedreno/Makefile.am >>> b/src/gallium/drivers/freedreno/Makefile.am >>> new file mode 100644 >>> index 000..10abdfb >>> --- /dev/null >>> +++ b/src/gallium/drivers/freedreno/Makefile.am >>> @@ -0,0 +1,35 @@ >>> +include $(top_srcdir)/src/gallium/Automake.inc >>> + >>> +noinst_LTLIBRARIES = libfreedreno.la >>> + >>> +AM_CFLAGS = \ >>> + -Werror -Wno-packed-bitfield-compat \ >>> + -I$(top_srcdir)/src/gallium/include \ <-- >>> + -I$(top_srcdir)/src/gallium/auxiliary \ <-- >>> + -I$(top_srcdir)/src/gallium/drivers \ >>> + -I$(top_srcdir)/include \ <-- >>> + $(FREEDRENO_CFLAGS) \ >>> + $(DEFINES) \ <-- >>> + $(PIC_FLAGS) \ >>> + $(VISIBILITY_CFLAGS) >> >> The <-- mark things that are provided by the GALLIUM_CFLAGS variable >> in Automake.inc that you've already included. PIC_FLAGS is now dead. >> Distributions don't like -Werror being hardcoded into upstream's >> CFLAGS. > > Hmm, is there a better way to get -Werror for just freedreno when I am > building myself? I do find that it is pretty useful to let the > compiler help me catch problems rather than debugging them the hard > way ;-) You can set the CFLAGS and CXXFLAGS environment variables before configuring Mesa. Marek ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] radeonsi: Fix blending using destination alpha factor but non-alpha destination
On Tue, Feb 19, 2013 at 10:33 AM, Christian König wrote: > Am 18.02.2013 20:11, schrieb Roland Scheidegger: > >> Am 18.02.2013 19:14, schrieb Michel Dänzer: >>> >>> From: Michel Dänzer >>> >>> 11 more little piglits. >>> >>> NOTE: This is a candidate for the 9.1 branch. >>> >>> Signed-off-by: Michel Dänzer >>> --- >>> >>> Any ideas why this seems necessary with radeonsi but not with r600g? >> >> Maybe the hw uses an implicit 1 if the format has no alpha (though I'm >> not sure if it can always know with bgrx formats and the like). >> I'm wondering if there should be a helper for those fixups. Looks to me >> like quite some drivers need it (though well so far I think just >> non-gallium i965 does this plus llvmpipe, but for some of the others I'm >> skeptical if not doing it is really correct...). > > > I agree alpha blending with a buffer format that doesn't have alpha is a bit > strange, that should be catched by the upper layers. I think it's better to do that in drivers instead. r300g also uses a different blend state for RGBX and RGBA. The R300 blend state CSO actually contains 11 command buffers and the driver switches between them when needed. Two of those command buffers contain blend state for RGBX and RGBA. This approach of having multiple command buffers per CSO has a much lower overhead than any other solution I've seen (including rebuilding states on the fly and having the state tracker figure it out). Marek ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] r600g: don't reserve more stack space than required v3
Vadim Girlin wrote: v3: handle hw-specific cases Signed-off-by: Vadim Girlin --- cc: Andy Furniss Hopefully this should work better on the non-evergreen chips This one seems to work OK for me. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] r600g: status of my work on the shader optimization
Vadim Girlin wrote: Testing with rv790 with drm-fixes kernel not much works - etqw runs but in a level 50% of screen is junk. nexuiz menus total junk, didn't test further. xonotic menus OK but gpu lock on starting timedemo. vdpau mpeg2 decode - renders 90% junk. heaven 3.0 (on a different pure 64 bit setup) gpu lock. I've pushed the patch to improve support for the r6xx, r7xx and cayman. I believe the chances that it will work on these chips are higher now, so you might want to give it another try. It's still the same for me. I tested with and without llvm this time - nexuiz renders OK with llvm but is still corrupt without. etqw as above with/without xonotic locks with/without vdpau junk without gpu lock with. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] radeonsi: Fix blending using destination alpha factor but non-alpha destination
On Tue, Feb 19, 2013 at 11:02 AM, Michel Dänzer wrote: > On Die, 2013-02-19 at 10:33 +0100, Christian König wrote: >> Am 18.02.2013 20:11, schrieb Roland Scheidegger: >> > Am 18.02.2013 19:14, schrieb Michel Dänzer: >> >> From: Michel Dänzer >> >> >> >> 11 more little piglits. >> >> >> >> NOTE: This is a candidate for the 9.1 branch. >> >> >> >> Signed-off-by: Michel Dänzer >> >> --- >> >> >> >> Any ideas why this seems necessary with radeonsi but not with r600g? >> > Maybe the hw uses an implicit 1 if the format has no alpha (though I'm >> > not sure if it can always know with bgrx formats and the like). >> > I'm wondering if there should be a helper for those fixups. Looks to me >> > like quite some drivers need it (though well so far I think just >> > non-gallium i965 does this plus llvmpipe, but for some of the others I'm >> > skeptical if not doing it is really correct...). >> >> I agree alpha blending with a buffer format that doesn't have alpha is a >> bit strange, that should be catched by the upper layers. > > If it was that simple. :\ > > The problem is that AFAICT for formats such as R8G8B8X8, there's no > other way to tell the hardware to always use 1 for the destination > alpha. And I'm not sure we can just not support any such formats, I > certainly don't think that would be a good idea. > > >> >> @@ -84,7 +57,7 @@ static uint32_t si_translate_blend_function(int >> >> blend_func) >> >>return 0; >> >> } >> >> >> >> -static uint32_t si_translate_blend_factor(int blend_fact) >> >> +static uint32_t si_translate_blend_factor(int blend_fact, bool dst_alpha) >> >> { >> >>switch (blend_fact) { >> >>case PIPE_BLENDFACTOR_ONE: >> >> @@ -94,7 +67,7 @@ static uint32_t si_translate_blend_factor(int >> >> blend_fact) >> >>case PIPE_BLENDFACTOR_SRC_ALPHA: >> >>return V_028780_BLEND_SRC_ALPHA; >> >>case PIPE_BLENDFACTOR_DST_ALPHA: >> >> - return V_028780_BLEND_DST_ALPHA; >> >> + return dst_alpha ? V_028780_BLEND_DST_ALPHA : >> >> V_028780_BLEND_ONE; >> >>case PIPE_BLENDFACTOR_DST_COLOR: >> >>return V_028780_BLEND_DST_COLOR; >> >>case PIPE_BLENDFACTOR_SRC_ALPHA_SATURATE: >> >> @@ -110,7 +83,7 @@ static uint32_t si_translate_blend_factor(int >> >> blend_fact) >> >>case PIPE_BLENDFACTOR_INV_SRC_ALPHA: >> >>return V_028780_BLEND_ONE_MINUS_SRC_ALPHA; >> >>case PIPE_BLENDFACTOR_INV_DST_ALPHA: >> >> - return V_028780_BLEND_ONE_MINUS_DST_ALPHA; >> >> + return dst_alpha ? V_028780_BLEND_ONE_MINUS_DST_ALPHA : >> >> V_028780_BLEND_ZERO; >> >>case PIPE_BLENDFACTOR_INV_DST_COLOR: >> >>return V_028780_BLEND_ONE_MINUS_DST_COLOR; >> >>case PIPE_BLENDFACTOR_INV_CONST_COLOR: >> >> @@ -133,30 +106,25 @@ static uint32_t si_translate_blend_factor(int >> >> blend_fact) >> >>return 0; >> >> } >> > I think you might also need to patch up SRC_ALPHA_SATURATE (to zero). >> > >> > Can't comment on the hw stuff but at least llvmpipe does the same >> > otherwise :-). >> >> Why should we do so? SRC_ALPHA_SATURATE should still work fine, even >> when the destination buffer doesn't have an alpha component. > > I think Roland is right. When the destination has no alpha, the > destination alpha value is supposed to be always 1, so > SRC_ALPHA_SATURATE is always 0. But with a format as described above, > the destination X8 channel may contain any value. > > > Really, what I don't understand is why r600g doesn't seem affected by > this... at least on my RS880 it's passing the piglit tests this change > fixes with radeonsi. So maybe I'm just missing some magic bit for > radeonsi. RGB formats do fail fbo-blending-formats with r600g/redwood here. However the alpha channel can sometimes contain 1 in memory even if the format is RGBX. Off the top of my head, glClear, glTex[Sub]Image, glCopyTex[Sub]Image always set alpha to 1. Blits do too if they use RGBX as a source. One way to set alpha != 1 is to draw some geometry. Marek ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] r600g: status of my work on the shader optimization
On 02/19/2013 04:54 PM, Andy Furniss wrote: Vadim Girlin wrote: Testing with rv790 with drm-fixes kernel not much works - etqw runs but in a level 50% of screen is junk. nexuiz menus total junk, didn't test further. xonotic menus OK but gpu lock on starting timedemo. vdpau mpeg2 decode - renders 90% junk. heaven 3.0 (on a different pure 64 bit setup) gpu lock. I've pushed the patch to improve support for the r6xx, r7xx and cayman. I believe the chances that it will work on these chips are higher now, so you might want to give it another try. It's still the same for me. I tested with and without llvm this time - nexuiz renders OK with llvm but is still corrupt without. etqw as above with/without xonotic locks with/without vdpau junk without gpu lock with. Could you please test glxgears and other simple mesa demos? It's easier to spot the problems with small apps that don't use a lot of complex shaders. If some of them don't work correctly, please send me the dumps with "R600_DUMP_SHADERS=2 R600_SB_DUMP=3". Also it might help if you can look for piglit regressions against the piglit results with R600_SB=0 and send me the dumps for a few regressed tests. Thanks for testing. Vadim ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 3/8] R600/SI: rework VOP2_* pattern
From: Christian König Fixing asm operation names. Signed-off-by: Christian König --- lib/Target/R600/SIISelLowering.cpp |3 --- lib/Target/R600/SIInstrInfo.td | 37 ++-- 2 files changed, 18 insertions(+), 22 deletions(-) diff --git a/lib/Target/R600/SIISelLowering.cpp b/lib/Target/R600/SIISelLowering.cpp index 4085890..5a468ae 100644 --- a/lib/Target/R600/SIISelLowering.cpp +++ b/lib/Target/R600/SIISelLowering.cpp @@ -75,7 +75,6 @@ MachineBasicBlock * SITargetLowering::EmitInstrWithCustomInserter( .addOperand(MI->getOperand(0)) .addOperand(MI->getOperand(1)) .addImm(0x80) // SRC1 - .addImm(0x80) // SRC2 .addImm(0) // ABS .addImm(1) // CLAMP .addImm(0) // OMOD @@ -88,7 +87,6 @@ MachineBasicBlock * SITargetLowering::EmitInstrWithCustomInserter( .addOperand(MI->getOperand(0)) .addOperand(MI->getOperand(1)) .addImm(0x80) // SRC1 - .addImm(0x80) // SRC2 .addImm(1) // ABS .addImm(0) // CLAMP .addImm(0) // OMOD @@ -101,7 +99,6 @@ MachineBasicBlock * SITargetLowering::EmitInstrWithCustomInserter( .addOperand(MI->getOperand(0)) .addOperand(MI->getOperand(1)) .addImm(0x80) // SRC1 - .addImm(0x80) // SRC2 .addImm(0) // ABS .addImm(0) // CLAMP .addImm(0) // OMOD diff --git a/lib/Target/R600/SIInstrInfo.td b/lib/Target/R600/SIInstrInfo.td index dbe616d..be791e2 100644 --- a/lib/Target/R600/SIInstrInfo.td +++ b/lib/Target/R600/SIInstrInfo.td @@ -123,29 +123,28 @@ multiclass VOP1_32 op, string opName, list pattern> multiclass VOP1_64 op, string opName, list pattern> : VOP1_Helper ; -class VOP2_Helper op, RegisterClass vrc, RegisterClass arc, - string opName, list pattern> : - VOP2 < -op, (outs vrc:$dst), (ins arc:$src0, vrc:$src1), opName, pattern - >; - -multiclass VOP2_32 op, string opName, list pattern> { - - def _e32 : VOP2_Helper ; - - def _e64 : VOP3_32 <{1, 0, 0, op{5}, op{4}, op{3}, op{2}, op{1}, op{0}}, - opName, [] +multiclass VOP2_Helper op, RegisterClass vrc, RegisterClass arc, +string opName, list pattern> { + def _e32 : VOP2 < +op, (outs vrc:$dst), (ins arc:$src0, vrc:$src1), opName#"_e32", pattern >; + def _e64 : VOP3 < +{1, 0, 0, op{5}, op{4}, op{3}, op{2}, op{1}, op{0}}, +(outs vrc:$dst), +(ins arc:$src0, vrc:$src1, + i32imm:$abs, i32imm:$clamp, + i32imm:$omod, i32imm:$neg), +opName#"_e64", [] + > { +let SRC2 = 0x80; + } } -multiclass VOP2_64 op, string opName, list pattern> { - def _e32: VOP2_Helper ; +multiclass VOP2_32 op, string opName, list pattern> + : VOP2_Helper ; - def _e64 : VOP3_64 < -{1, 0, 0, op{5}, op{4}, op{3}, op{2}, op{1}, op{0}}, -opName, [] - >; -} +multiclass VOP2_64 op, string opName, list pattern> + : VOP2_Helper ; class SOPK_32 op, string opName, list pattern> : SOPK ; -- 1.7.10.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/8] R600/SI: rework VOP1_* patterns
From: Christian König Fixing asm operation names. Signed-off-by: Christian König --- lib/Target/R600/SIInstrInfo.td | 36 ++-- 1 file changed, 18 insertions(+), 18 deletions(-) diff --git a/lib/Target/R600/SIInstrInfo.td b/lib/Target/R600/SIInstrInfo.td index 77c57b7..dbe616d 100644 --- a/lib/Target/R600/SIInstrInfo.td +++ b/lib/Target/R600/SIInstrInfo.td @@ -100,28 +100,28 @@ class SOP2_32 op, string opName, list pattern> class SOP2_64 op, string opName, list pattern> : SOP2 ; -class VOP1_Helper op, RegisterClass vrc, RegisterClass arc, - string opName, list pattern> : - VOP1 < -op, (outs vrc:$dst), (ins arc:$src0), opName, pattern - >; +multiclass VOP1_Helper op, RegisterClass drc, RegisterClass src, +string opName, list pattern> { -multiclass VOP1_32 op, string opName, list pattern> { - def _e32: VOP1_Helper ; - def _e64 : VOP3_32 <{1, 1, op{6}, op{5}, op{4}, op{3}, op{2}, op{1}, op{0}}, - opName, [] - >; + def _e32: VOP1 ; + def _e64 : VOP3 < +{1, 1, op{6}, op{5}, op{4}, op{3}, op{2}, op{1}, op{0}}, +(outs drc:$dst), +(ins src:$src0, + i32imm:$abs, i32imm:$clamp, + i32imm:$omod, i32imm:$neg), +opName#"_e64", [] + > { +let SRC1 = 0x80; +let SRC2 = 0x80; + } } -multiclass VOP1_64 op, string opName, list pattern> { - - def _e32 : VOP1_Helper ; +multiclass VOP1_32 op, string opName, list pattern> + : VOP1_Helper ; - def _e64 : VOP3_64 < -{1, 1, op{6}, op{5}, op{4}, op{3}, op{2}, op{1}, op{0}}, -opName, [] - >; -} +multiclass VOP1_64 op, string opName, list pattern> + : VOP1_Helper ; class VOP2_Helper op, RegisterClass vrc, RegisterClass arc, string opName, list pattern> : -- 1.7.10.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 4/8] R600/SI: simplify VOPC_* patterns
From: Christian König Fixing asm operation names. Signed-off-by: Christian König --- lib/Target/R600/AMDGPUInstructions.td |5 + lib/Target/R600/SIInstrInfo.td| 19 +- lib/Target/R600/SIInstructions.td | 444 +++-- 3 files changed, 213 insertions(+), 255 deletions(-) diff --git a/lib/Target/R600/AMDGPUInstructions.td b/lib/Target/R600/AMDGPUInstructions.td index 0559a5a..960f108 100644 --- a/lib/Target/R600/AMDGPUInstructions.td +++ b/lib/Target/R600/AMDGPUInstructions.td @@ -77,6 +77,11 @@ def COND_LE : PatLeaf < case ISD::SETLE: return true;}}}] >; +def COND_NULL : PatLeaf < + (cond), + [{return false;}] +>; + //===--===// // Load/Store Pattern Fragments //===--===// diff --git a/lib/Target/R600/SIInstrInfo.td b/lib/Target/R600/SIInstrInfo.td index be791e2..69357ce 100644 --- a/lib/Target/R600/SIInstrInfo.td +++ b/lib/Target/R600/SIInstrInfo.td @@ -153,26 +153,31 @@ class SOPK_64 op, string opName, list pattern> : SOPK ; multiclass VOPC_Helper op, RegisterClass vrc, RegisterClass arc, -string opName, list pattern> { +string opName, ValueType vt, PatLeaf cond> { - def _e32 : VOPC ; + def _e32 : VOPC ; def _e64 : VOP3 < {0, op{7}, op{6}, op{5}, op{4}, op{3}, op{2}, op{1}, op{0}}, (outs SReg_64:$dst), (ins arc:$src0, vrc:$src1, InstFlag:$abs, InstFlag:$clamp, InstFlag:$omod, InstFlag:$neg), -opName, pattern +opName#"_e32", +!if(!eq(!cast(cond), "COND_NULL"), [], + [(set SReg_64:$dst, (i1 (setcc (vt arc:$src0), vrc:$src1, cond)))] +) > { let SRC2 = 0x80; } } -multiclass VOPC_32 op, string opName, list pattern> - : VOPC_Helper ; +multiclass VOPC_32 op, string opName, + ValueType vt = untyped, PatLeaf cond = COND_NULL> + : VOPC_Helper ; -multiclass VOPC_64 op, string opName, list pattern> - : VOPC_Helper ; +multiclass VOPC_64 op, string opName, + ValueType vt = untyped, PatLeaf cond = COND_NULL> + : VOPC_Helper ; class SOPC_32 op, string opName, list pattern> : SOPC ; diff --git a/lib/Target/R600/SIInstructions.td b/lib/Target/R600/SIInstructions.td index b4a263d..700b8f8 100644 --- a/lib/Target/R600/SIInstructions.td +++ b/lib/Target/R600/SIInstructions.td @@ -127,286 +127,234 @@ def S_GETREG_REGRD_B32 : SOPK_32 <0x0014, "S_GETREG_REGRD_B32", []>; //def S_SETREG_IMM32_B32 : SOPK_32 <0x0015, "S_SETREG_IMM32_B32", []>; //def EXP : EXP_ <0x, "EXP", []>; -defm V_CMP_F_F32 : VOPC_32 <0x, "V_CMP_F_F32", []>; -defm V_CMP_LT_F32 : VOPC_32 <0x0001, "V_CMP_LT_F32", []>; -def : Pat < - (i1 (setcc (f32 VSrc_32:$src0), VReg_32:$src1, COND_LT)), - (V_CMP_LT_F32_e64 VSrc_32:$src0, VReg_32:$src1) ->; -defm V_CMP_EQ_F32 : VOPC_32 <0x0002, "V_CMP_EQ_F32", []>; -def : Pat < - (i1 (setcc (f32 VSrc_32:$src0), VReg_32:$src1, COND_EQ)), - (V_CMP_EQ_F32_e64 VSrc_32:$src0, VReg_32:$src1) ->; -defm V_CMP_LE_F32 : VOPC_32 <0x0003, "V_CMP_LE_F32", []>; -def : Pat < - (i1 (setcc (f32 VSrc_32:$src0), VReg_32:$src1, COND_LE)), - (V_CMP_LE_F32_e64 VSrc_32:$src0, VReg_32:$src1) ->; -defm V_CMP_GT_F32 : VOPC_32 <0x0004, "V_CMP_GT_F32", []>; -def : Pat < - (i1 (setcc (f32 VSrc_32:$src0), VReg_32:$src1, COND_GT)), - (V_CMP_GT_F32_e64 VSrc_32:$src0, VReg_32:$src1) ->; -defm V_CMP_LG_F32 : VOPC_32 <0x0005, "V_CMP_LG_F32", []>; -def : Pat < - (i1 (setcc (f32 VSrc_32:$src0), VReg_32:$src1, COND_NE)), - (V_CMP_LG_F32_e64 VSrc_32:$src0, VReg_32:$src1) ->; -defm V_CMP_GE_F32 : VOPC_32 <0x0006, "V_CMP_GE_F32", []>; -def : Pat < - (i1 (setcc (f32 VSrc_32:$src0), VReg_32:$src1, COND_GE)), - (V_CMP_GE_F32_e64 VSrc_32:$src0, VReg_32:$src1) ->; -defm V_CMP_O_F32 : VOPC_32 <0x0007, "V_CMP_O_F32", []>; -defm V_CMP_U_F32 : VOPC_32 <0x0008, "V_CMP_U_F32", []>; -defm V_CMP_NGE_F32 : VOPC_32 <0x0009, "V_CMP_NGE_F32", []>; -defm V_CMP_NLG_F32 : VOPC_32 <0x000a, "V_CMP_NLG_F32", []>; -defm V_CMP_NGT_F32 : VOPC_32 <0x000b, "V_CMP_NGT_F32", []>; -defm V_CMP_NLE_F32 : VOPC_32 <0x000c, "V_CMP_NLE_F32", []>; -defm V_CMP_NEQ_F32 : VOPC_32 <0x000d, "V_CMP_NEQ_F32", []>; -def : Pat < - (i1 (setcc (f32 VSrc_32:$src0), VReg_32:$src1, COND_NE)), - (V_CMP_NEQ_F32_e64 VSrc_32:$src0, VReg_32:$src1) ->; -defm V_CMP_NLT_F32 : VOPC_32 <0x000e, "V_CMP_NLT_F32", []>; -defm V_CMP_TRU_F32 : VOPC_32 <0x000f, "V_CMP_TRU_F32", []>; +defm V_CMP_F_F32 : VOPC_32 <0x, "V_CMP_F_F32">; +defm V_CMP_LT_F32 : VOPC_32 <0x0001, "V_CMP_LT_F32", f32, COND_LT>; +defm V_CMP_EQ_F32 : VOPC_32 <0x0002, "V_CMP_EQ_F32", f32, COND_EQ>; +defm V_CMP_LE_F32 : VOPC_32 <0x0003, "V_CMP_LE_F32", f32, COND_LE>; +defm V_CMP_GT_F32 : VOPC_32 <0x0004, "V_CMP_GT_F32", f32, COND_GT>; +defm V_CMP_LG_F32 : VOPC_32 <0x0005, "V_
[Mesa-dev] [PATCH 1/8] R600/SI: cleanup SIInstrInfo.td and SIInstrFormat.td
From: Christian König Those two files got mixed up. Signed-off-by: Christian König --- lib/Target/R600/SIInstrFormats.td | 500 +++-- lib/Target/R600/SIInstrInfo.td| 495 +++- 2 files changed, 509 insertions(+), 486 deletions(-) diff --git a/lib/Target/R600/SIInstrFormats.td b/lib/Target/R600/SIInstrFormats.td index 40e37aa..fe417d6 100644 --- a/lib/Target/R600/SIInstrFormats.td +++ b/lib/Target/R600/SIInstrFormats.td @@ -1,4 +1,4 @@ -//===-- SIInstrFormats.td - SI Instruction Formats ===// +//===-- SIInstrFormats.td - SI Instruction Encodings --===// // // The LLVM Compiler Infrastructure // @@ -9,180 +9,418 @@ // // SI Instruction format definitions. // -// Instructions with _32 take 32-bit operands. -// Instructions with _64 take 64-bit operands. -// -// VOP_* instructions can use either a 32-bit or 64-bit encoding. The 32-bit -// encoding is the standard encoding, but instruction that make use of -// any of the instruction modifiers must use the 64-bit encoding. -// -// Instructions with _e32 use the 32-bit encoding. -// Instructions with _e64 use the 64-bit encoding. -// //===--===// -class VOP3_32 op, string opName, list pattern> - : VOP3 ; +class InstSI pattern> : +AMDGPUInst { + + field bits<1> VM_CNT = 0; + field bits<1> EXP_CNT = 0; + field bits<1> LGKM_CNT = 0; + + let TSFlags{0} = VM_CNT; + let TSFlags{1} = EXP_CNT; + let TSFlags{2} = LGKM_CNT; +} + +class Enc32 pattern> : +InstSI { + + field bits<32> Inst; + let Size = 4; +} -class VOP3_64 op, string opName, list pattern> - : VOP3 ; +class Enc64 pattern> : +InstSI { -class SOP1_32 op, string opName, list pattern> - : SOP1 ; + field bits<64> Inst; + let Size = 8; +} -class SOP1_64 op, string opName, list pattern> - : SOP1 ; +//===--===// +// Scalar operations +//===--===// -class SOP2_32 op, string opName, list pattern> - : SOP2 ; +class SOP1 op, dag outs, dag ins, string asm, list pattern> : +Enc32 { -class SOP2_64 op, string opName, list pattern> - : SOP2 ; + bits<7> SDST; + bits<8> SSRC0; -class VOP1_Helper op, RegisterClass vrc, RegisterClass arc, - string opName, list pattern> : - VOP1 < -op, (outs vrc:$dst), (ins arc:$src0), opName, pattern - >; + let Inst{7-0} = SSRC0; + let Inst{15-8} = op; + let Inst{22-16} = SDST; + let Inst{31-23} = 0x17d; //encoding; -multiclass VOP1_32 op, string opName, list pattern> { - def _e32: VOP1_Helper ; - def _e64 : VOP3_32 <{1, 1, op{6}, op{5}, op{4}, op{3}, op{2}, op{1}, op{0}}, - opName, [] - >; + let mayLoad = 0; + let mayStore = 0; + let hasSideEffects = 0; } -multiclass VOP1_64 op, string opName, list pattern> { +class SOP2 op, dag outs, dag ins, string asm, list pattern> : +Enc32 { + + bits<7> SDST; + bits<8> SSRC0; + bits<8> SSRC1; - def _e32 : VOP1_Helper ; + let Inst{7-0} = SSRC0; + let Inst{15-8} = SSRC1; + let Inst{22-16} = SDST; + let Inst{29-23} = op; + let Inst{31-30} = 0x2; // encoding - def _e64 : VOP3_64 < -{1, 1, op{6}, op{5}, op{4}, op{3}, op{2}, op{1}, op{0}}, -opName, [] - >; + let mayLoad = 0; + let mayStore = 0; + let hasSideEffects = 0; } -class VOP2_Helper op, RegisterClass vrc, RegisterClass arc, - string opName, list pattern> : - VOP2 < -op, (outs vrc:$dst), (ins arc:$src0, vrc:$src1), opName, pattern - >; +class SOPC op, dag outs, dag ins, string asm, list pattern> : + Enc32 { -multiclass VOP2_32 op, string opName, list pattern> { + bits<8> SSRC0; + bits<8> SSRC1; - def _e32 : VOP2_Helper ; + let Inst{7-0} = SSRC0; + let Inst{15-8} = SSRC1; + let Inst{22-16} = op; + let Inst{31-23} = 0x17e; - def _e64 : VOP3_32 <{1, 0, 0, op{5}, op{4}, op{3}, op{2}, op{1}, op{0}}, - opName, [] - >; + let DisableEncoding = "$dst"; + let mayLoad = 0; + let mayStore = 0; + let hasSideEffects = 0; } -multiclass VOP2_64 op, string opName, list pattern> { - def _e32: VOP2_Helper ; +class SOPK op, dag outs, dag ins, string asm, list pattern> : + Enc32 { - def _e64 : VOP3_64 < -{1, 0, 0, op{5}, op{4}, op{3}, op{2}, op{1}, op{0}}, -opName, [] - >; + bits <7> SDST; + bits <16> SIMM16; + + let Inst{15-0} = SIMM16; + let Inst{22-16} = SDST; + let Inst{27-23} = op; + let Inst{31-28} = 0xb; //encoding + + let mayLoad = 0; + let mayStore = 0; + let hasSideEffects = 0; } -class SOPK_32 op, string opName, list pattern> - : SOPK ; +class SOPP op, dag ins, string asm, list pattern> : Enc32 < + (outs), + ins, + asm, + pattern > { -class SOPK_64 op, string opName, list pattern> - : SOPK ; + bits
[Mesa-dev] [PATCH 5/8] R600/SI: sort and cleanup SIInstrInfo.td
From: Christian König Fix code formating and sort/group the classes. Signed-off-by: Christian König --- lib/Target/R600/SIInstrInfo.td | 100 +++- 1 file changed, 58 insertions(+), 42 deletions(-) diff --git a/lib/Target/R600/SIInstrInfo.td b/lib/Target/R600/SIInstrInfo.td index 69357ce..9bdab10 100644 --- a/lib/Target/R600/SIInstrInfo.td +++ b/lib/Target/R600/SIInstrInfo.td @@ -82,11 +82,9 @@ include "SIInstrFormats.td" // //===--===// -class VOP3_32 op, string opName, list pattern> - : VOP3 ; - -class VOP3_64 op, string opName, list pattern> - : VOP3 ; +//===--===// +// Scalar classes +//===--===// class SOP1_32 op, string opName, list pattern> : SOP1 ; @@ -100,6 +98,36 @@ class SOP2_32 op, string opName, list pattern> class SOP2_64 op, string opName, list pattern> : SOP2 ; +class SOPC_32 op, string opName, list pattern> + : SOPC ; + +class SOPC_64 op, string opName, list pattern> + : SOPC ; + +class SOPK_32 op, string opName, list pattern> + : SOPK ; + +class SOPK_64 op, string opName, list pattern> + : SOPK ; + +multiclass SMRD_Helper op, string asm, RegisterClass dstClass> { + def _IMM : SMRD < +op, 1, (outs dstClass:$dst), +(ins GPR2Align:$sbase, i32imm:$offset), +asm, [] + >; + + def _SGPR : SMRD < +op, 0, (outs dstClass:$dst), +(ins GPR2Align:$sbase, SReg_32:$soff), +asm, [] + >; +} + +//===--===// +// Vector ALU classes +//===--===// + multiclass VOP1_Helper op, RegisterClass drc, RegisterClass src, string opName, list pattern> { @@ -146,11 +174,19 @@ multiclass VOP2_32 op, string opName, list pattern> multiclass VOP2_64 op, string opName, list pattern> : VOP2_Helper ; -class SOPK_32 op, string opName, list pattern> - : SOPK ; +class VOP3_32 op, string opName, list pattern> : VOP3 < + op, (outs VReg_32:$dst), + (ins VSrc_32:$src0, VReg_32:$src1, VReg_32:$src2, i32imm:$src3, + i32imm:$src4, i32imm:$src5, i32imm:$src6), + opName, pattern +>; -class SOPK_64 op, string opName, list pattern> - : SOPK ; +class VOP3_64 op, string opName, list pattern> : VOP3 < + op, (outs VReg_64:$dst), + (ins VSrc_64:$src0, VReg_64:$src1, VReg_64:$src2, + i32imm:$src3, i32imm:$src4, i32imm:$src5, i32imm:$src6), + opName, pattern +>; multiclass VOPC_Helper op, RegisterClass vrc, RegisterClass arc, string opName, ValueType vt, PatLeaf cond> { @@ -179,23 +215,9 @@ multiclass VOPC_64 op, string opName, ValueType vt = untyped, PatLeaf cond = COND_NULL> : VOPC_Helper ; -class SOPC_32 op, string opName, list pattern> - : SOPC ; - -class SOPC_64 op, string opName, list pattern> - : SOPC ; - -class MIMG_Load_Helper op, string asm> : MIMG < - op, - (outs VReg_128:$vdata), - (ins i32imm:$dmask, i1imm:$unorm, i1imm:$glc, i1imm:$da, i1imm:$r128, - i1imm:$tfe, i1imm:$lwe, i1imm:$slc, VReg_32:$vaddr, - GPR4Align:$srsrc, GPR4Align:$ssamp), - asm, - []> { - let mayLoad = 1; - let mayStore = 0; -} +//===--===// +// Vector ALU classes +//===--===// class MTBUF_Store_Helper op, string asm, RegisterClass regClass> : MTBUF < op, @@ -233,22 +255,16 @@ class MTBUF_Load_Helper op, string asm, RegisterClass regClass> : MTBUF let mayStore = 0; } -multiclass SMRD_Helper op, string asm, RegisterClass dstClass> { - def _IMM : SMRD < - op, 1, - (outs dstClass:$dst), - (ins GPR2Align:$sbase, i32imm:$offset), - asm, - [] - >; - - def _SGPR : SMRD < - op, 0, - (outs dstClass:$dst), - (ins GPR2Align:$sbase, SReg_32:$soff), - asm, - [] - >; +class MIMG_Load_Helper op, string asm> : MIMG < + op, + (outs VReg_128:$vdata), + (ins i32imm:$dmask, i1imm:$unorm, i1imm:$glc, i1imm:$da, i1imm:$r128, + i1imm:$tfe, i1imm:$lwe, i1imm:$slc, VReg_32:$vaddr, + GPR4Align:$srsrc, GPR4Align:$ssamp), + asm, + []> { + let mayLoad = 1; + let mayStore = 0; } include "SIInstructions.td" -- 1.7.10.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 6/8] R600/SI: use patterns for clamp, fabs, fneg
From: Christian König Instead of using custom inserters, it's simpler and should make DAG folding easier. Signed-off-by: Christian König --- lib/Target/R600/SIISelLowering.cpp | 36 lib/Target/R600/SIInstructions.td | 26 ++ 2 files changed, 22 insertions(+), 40 deletions(-) diff --git a/lib/Target/R600/SIISelLowering.cpp b/lib/Target/R600/SIISelLowering.cpp index 5a468ae..2f304eb 100644 --- a/lib/Target/R600/SIISelLowering.cpp +++ b/lib/Target/R600/SIISelLowering.cpp @@ -62,7 +62,6 @@ SITargetLowering::SITargetLowering(TargetMachine &TM) : MachineBasicBlock * SITargetLowering::EmitInstrWithCustomInserter( MachineInstr * MI, MachineBasicBlock * BB) const { - const TargetInstrInfo * TII = getTargetMachine().getInstrInfo(); MachineRegisterInfo & MRI = BB->getParent()->getRegInfo(); MachineBasicBlock::iterator I = MI; @@ -70,41 +69,6 @@ MachineBasicBlock * SITargetLowering::EmitInstrWithCustomInserter( default: return AMDGPUTargetLowering::EmitInstrWithCustomInserter(MI, BB); case AMDGPU::BRANCH: return BB; - case AMDGPU::CLAMP_SI: -BuildMI(*BB, I, BB->findDebugLoc(I), TII->get(AMDGPU::V_ADD_F32_e64)) - .addOperand(MI->getOperand(0)) - .addOperand(MI->getOperand(1)) - .addImm(0x80) // SRC1 - .addImm(0) // ABS - .addImm(1) // CLAMP - .addImm(0) // OMOD - .addImm(0); // NEG -MI->eraseFromParent(); -break; - - case AMDGPU::FABS_SI: -BuildMI(*BB, I, BB->findDebugLoc(I), TII->get(AMDGPU::V_ADD_F32_e64)) - .addOperand(MI->getOperand(0)) - .addOperand(MI->getOperand(1)) - .addImm(0x80) // SRC1 - .addImm(1) // ABS - .addImm(0) // CLAMP - .addImm(0) // OMOD - .addImm(0); // NEG -MI->eraseFromParent(); -break; - - case AMDGPU::FNEG_SI: -BuildMI(*BB, I, BB->findDebugLoc(I), TII->get(AMDGPU::V_ADD_F32_e64)) - .addOperand(MI->getOperand(0)) - .addOperand(MI->getOperand(1)) - .addImm(0x80) // SRC1 - .addImm(0) // ABS - .addImm(0) // CLAMP - .addImm(0) // OMOD - .addImm(1); // NEG -MI->eraseFromParent(); -break; case AMDGPU::SHADER_TYPE: BB->getParent()->getInfo()->ShaderType = MI->getOperand(0).getImm(); diff --git a/lib/Target/R600/SIInstructions.td b/lib/Target/R600/SIInstructions.td index 700b8f8..71de032 100644 --- a/lib/Target/R600/SIInstructions.td +++ b/lib/Target/R600/SIInstructions.td @@ -1184,10 +1184,6 @@ defm : SamplePatterns; defm : SamplePatterns; defm : SamplePatterns; -def CLAMP_SI : CLAMP; -def FABS_SI : FABS; -def FNEG_SI : FNEG; - def : Extract_Element ; def : Extract_Element ; def : Extract_Element ; @@ -1211,6 +1207,28 @@ def : BitConvert ; def : BitConvert ; def : BitConvert ; +/** === **/ +/** Src & Dst modifiers **/ +/** === **/ + +def : Pat < + (int_AMDIL_clamp VReg_32:$src, (f32 FP_ZERO), (f32 FP_ONE)), + (V_ADD_F32_e64 VReg_32:$src, (i32 0x80 /* SRC1 */), + 0 /* ABS */, 1 /* CLAMP */, 0 /* OMOD */, 0 /* NEG */) +>; + +def : Pat < + (fabs VReg_32:$src), + (V_ADD_F32_e64 VReg_32:$src, (i32 0x80 /* SRC1 */), + 1 /* ABS */, 0 /* CLAMP */, 0 /* OMOD */, 0 /* NEG */) +>; + +def : Pat < + (fneg VReg_32:$src), + (V_ADD_F32_e64 VReg_32:$src, (i32 0x80 /* SRC1 */), + 0 /* ABS */, 0 /* CLAMP */, 0 /* OMOD */, 1 /* NEG */) +>; + /** == **/ /** Immediate Patterns **/ /** == **/ -- 1.7.10.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 7/8] R600/SI: add OMOD patterns
From: Christian König Signed-off-by: Christian König --- lib/Target/R600/AMDGPUInstructions.td | 15 +++ lib/Target/R600/SIInstructions.td | 18 ++ 2 files changed, 33 insertions(+) diff --git a/lib/Target/R600/AMDGPUInstructions.td b/lib/Target/R600/AMDGPUInstructions.td index 960f108..da3d7b7 100644 --- a/lib/Target/R600/AMDGPUInstructions.td +++ b/lib/Target/R600/AMDGPUInstructions.td @@ -102,11 +102,26 @@ def FP_ZERO : PatLeaf < [{return N->getValueAPF().isZero();}] >; +def FP_0_5 : PatLeaf < + (fpimm), + [{return N->isExactlyValue(0.5);}] +>; + def FP_ONE : PatLeaf < (fpimm), [{return N->isExactlyValue(1.0);}] >; +def FP_TWO : PatLeaf < + (fpimm), + [{return N->isExactlyValue(2.0);}] +>; + +def FP_FOUR : PatLeaf < + (fpimm), + [{return N->isExactlyValue(4.0);}] +>; + let isCodeGenOnly = 1, isPseudo = 1 in { let usesCustomInserter = 1 in { diff --git a/lib/Target/R600/SIInstructions.td b/lib/Target/R600/SIInstructions.td index 71de032..3b7cc6f 100644 --- a/lib/Target/R600/SIInstructions.td +++ b/lib/Target/R600/SIInstructions.td @@ -1229,6 +1229,24 @@ def : Pat < 0 /* ABS */, 0 /* CLAMP */, 0 /* OMOD */, 1 /* NEG */) >; +def : Pat < + (fmul VReg_32:$src, (f32 FP_0_5)), + (V_ADD_F32_e64 VReg_32:$src, (i32 0x80 /* SRC1 */), + 0 /* ABS */, 0 /* CLAMP */, 3 /* OMOD */, 0 /* NEG */) +>; + +def : Pat < + (fmul VReg_32:$src, (f32 FP_TWO)), + (V_ADD_F32_e64 VReg_32:$src, (i32 0x80 /* SRC1 */), + 0 /* ABS */, 0 /* CLAMP */, 1 /* OMOD */, 0 /* NEG */) +>; + +def : Pat < + (fmul VReg_32:$src, (f32 FP_FOUR)), + (V_ADD_F32_e64 VReg_32:$src, (i32 0x80 /* SRC1 */), + 0 /* ABS */, 0 /* CLAMP */, 2 /* OMOD */, 0 /* NEG */) +>; + /** == **/ /** Immediate Patterns **/ /** == **/ -- 1.7.10.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 8/8] R600/SI: replace SI_V_CNDLT with a pattern
From: Christian König It actually fixes quite a bunch of piglit tests. Signed-off-by: Christian König --- lib/Target/R600/SIISelLowering.cpp | 22 -- lib/Target/R600/SIISelLowering.h |2 -- lib/Target/R600/SIInstructions.td | 12 +--- 3 files changed, 5 insertions(+), 31 deletions(-) diff --git a/lib/Target/R600/SIISelLowering.cpp b/lib/Target/R600/SIISelLowering.cpp index 2f304eb..212e3f2 100644 --- a/lib/Target/R600/SIISelLowering.cpp +++ b/lib/Target/R600/SIISelLowering.cpp @@ -81,9 +81,6 @@ MachineBasicBlock * SITargetLowering::EmitInstrWithCustomInserter( case AMDGPU::SI_WQM: LowerSI_WQM(MI, *BB, I, MRI); break; - case AMDGPU::SI_V_CNDLT: -LowerSI_V_CNDLT(MI, *BB, I, MRI); -break; } return BB; } @@ -127,25 +124,6 @@ void SITargetLowering::LowerSI_INTERP(MachineInstr *MI, MachineBasicBlock &BB, MI->eraseFromParent(); } -void SITargetLowering::LowerSI_V_CNDLT(MachineInstr *MI, MachineBasicBlock &BB, -MachineBasicBlock::iterator I, MachineRegisterInfo & MRI) const { - unsigned VCC = MRI.createVirtualRegister(&AMDGPU::SReg_64RegClass); - - BuildMI(BB, I, BB.findDebugLoc(I), - TII->get(AMDGPU::V_CMP_GT_F32_e32), - VCC) - .addImm(0) - .addOperand(MI->getOperand(1)); - - BuildMI(BB, I, BB.findDebugLoc(I), TII->get(AMDGPU::V_CNDMASK_B32_e32)) - .addOperand(MI->getOperand(0)) - .addOperand(MI->getOperand(3)) - .addOperand(MI->getOperand(2)) - .addReg(VCC); - - MI->eraseFromParent(); -} - EVT SITargetLowering::getSetCCResultType(EVT VT) const { return MVT::i1; } diff --git a/lib/Target/R600/SIISelLowering.h b/lib/Target/R600/SIISelLowering.h index a8429b7..5d048f8 100644 --- a/lib/Target/R600/SIISelLowering.h +++ b/lib/Target/R600/SIISelLowering.h @@ -29,8 +29,6 @@ class SITargetLowering : public AMDGPUTargetLowering { MachineBasicBlock::iterator I, MachineRegisterInfo & MRI) const; void LowerSI_WQM(MachineInstr *MI, MachineBasicBlock &BB, MachineBasicBlock::iterator I, MachineRegisterInfo & MRI) const; - void LowerSI_V_CNDLT(MachineInstr *MI, MachineBasicBlock &BB, - MachineBasicBlock::iterator I, MachineRegisterInfo & MRI) const; SDValue LowerLOAD(SDValue Op, SelectionDAG &DAG) const; SDValue LowerSELECT_CC(SDValue Op, SelectionDAG &DAG) const; diff --git a/lib/Target/R600/SIInstructions.td b/lib/Target/R600/SIInstructions.td index 3b7cc6f..01e7933 100644 --- a/lib/Target/R600/SIInstructions.td +++ b/lib/Target/R600/SIInstructions.td @@ -987,13 +987,6 @@ def LOAD_CONST : AMDGPUShaderInst < let usesCustomInserter = 1 in { -def SI_V_CNDLT : InstSI < - (outs VReg_32:$dst), - (ins VReg_32:$src0, VReg_32:$src1, VReg_32:$src2), - "SI_V_CNDLT $dst, $src0, $src1, $src2", - [(set VReg_32:$dst, (int_AMDGPU_cndlt VReg_32:$src0, VReg_32:$src1, VReg_32:$src2))] ->; - def SI_INTERP : InstSI < (outs VReg_32:$dst), (ins VReg_32:$i, VReg_32:$j, i32imm:$attr_chan, i32imm:$attr, SReg_32:$params), @@ -1083,6 +1076,11 @@ def SI_KILL : InstSI < } // end IsCodeGenOnly, isPseudo +def : Pat< + (int_AMDGPU_cndlt VReg_32:$src0, VReg_32:$src1, VReg_32:$src2), + (V_CNDMASK_B32_e64 VReg_32:$src2, VReg_32:$src1, (V_CMP_GT_F32_e64 0, VReg_32:$src0)) +>; + def : Pat < (int_AMDGPU_kilp), (SI_KILL (V_MOV_B32_e32 0xbf80)) -- 1.7.10.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 61093] [llvmpipe] lp_surface.c:68:lp_resource_copy: Assertion `src_box->depth == 1' failed.
https://bugs.freedesktop.org/show_bug.cgi?id=61093 José Fonseca changed: What|Removed |Added Assignee|mesa-dev@lists.freedesktop. |srol...@vmware.com |org | CC||jfons...@vmware.com --- Comment #2 from José Fonseca --- (In reply to comment #1) > The assertion in lp_resource_copy can be fixed easily, but I can't reproduce > it. Roland already has fix for review on mesa3d-dev. > llvmpipe is failing a different assertion here: > > texsubimage: > /home/marek/dev/llvm/include/llvm/CodeGen/SelectionDAGNodes.h:539: const > llvm::SDValue &llvm::SDNode::getOperand(unsigned int) const: Assertion `Num > < NumOperands && "Invalid child # of SDNode!"' failed. This must be something different. > The way I see it, my work only uncovered this bug. Yes, I agree. -- You are receiving this mail because: You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] radeonsi: Fix blending using destination alpha factor but non-alpha destination
On Die, 2013-02-19 at 14:04 +0100, Marek Olšák wrote: > On Tue, Feb 19, 2013 at 11:02 AM, Michel Dänzer wrote: > > > > Really, what I don't understand is why r600g doesn't seem affected by > > this... at least on my RS880 it's passing the piglit tests this change > > fixes with radeonsi. So maybe I'm just missing some magic bit for > > radeonsi. > > RGB formats do fail fbo-blending-formats with r600g/redwood here. Okay. > However the alpha channel can sometimes contain 1 in memory even if > the format is RGBX. Off the top of my head, glClear, glTex[Sub]Image, > glCopyTex[Sub]Image always set alpha to 1. Well, but they shouldn't for these formats. :) The memory corresponding to X* channels should remain unchanged. I'm working on a separate patch for that for radeonsi. -- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Debian, X and DRI developer ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 38086] Mesa 7.11-devel implementation error: Unexpected program target in destroy_program_variants_cb()
https://bugs.freedesktop.org/show_bug.cgi?id=38086 --- Comment #6 from Brian Paul --- Can you make a trace of this issue with apitrace? https://github.com/apitrace/apitrace -- You are receiving this mail because: You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 60938] [softpipe] piglit interpolation-noperspective-gl_BackColor-flat-fixed regression
https://bugs.freedesktop.org/show_bug.cgi?id=60938 Brian Paul changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #1 from Brian Paul --- Fixed with commit 5da967aff5adb3e27954488206fb885ea1ede0fd -- You are receiving this mail because: You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 61026] Segfault in glBitmap when called with PBO source
https://bugs.freedesktop.org/show_bug.cgi?id=61026 Brian Paul changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #5 from Brian Paul --- Fixed with commit 63c30d7e4fd9676c72d5d94640e1e136bd9dd09f -- You are receiving this mail because: You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 59876] glGetTexLevelParameteriv broken for indirect rendering
https://bugs.freedesktop.org/show_bug.cgi?id=59876 Brian Paul changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #6 from Brian Paul --- Patch committed as 5876a5dbc0a6ec9ae7f44b5e483d38ae0d24a259 -- You are receiving this mail because: You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 61012] alloc_layout_array tx * ty assertion failure when making pbuffer current
https://bugs.freedesktop.org/show_bug.cgi?id=61012 Brian Paul changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #9 from Brian Paul --- Fixed by commit e2091f64cb9ea79f3b51c353ed9facc03ec5690a -- You are receiving this mail because: You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 4/8] R600/SI: simplify VOPC_* patterns
On Die, 2013-02-19 at 14:54 +0100, Christian König wrote: > From: Christian König > > Fixing asm operation names. [...] > diff --git a/lib/Target/R600/SIInstrInfo.td b/lib/Target/R600/SIInstrInfo.td > index be791e2..69357ce 100644 > --- a/lib/Target/R600/SIInstrInfo.td > +++ b/lib/Target/R600/SIInstrInfo.td > @@ -153,26 +153,31 @@ class SOPK_64 op, string opName, list > pattern> >: SOPK ; > > multiclass VOPC_Helper op, RegisterClass vrc, RegisterClass arc, > -string opName, list pattern> { > +string opName, ValueType vt, PatLeaf cond> { > > - def _e32 : VOPC ; > + def _e32 : VOPC ; >def _e64 : VOP3 < > {0, op{7}, op{6}, op{5}, op{4}, op{3}, op{2}, op{1}, op{0}}, > (outs SReg_64:$dst), > (ins arc:$src0, vrc:$src1, > InstFlag:$abs, InstFlag:$clamp, > InstFlag:$omod, InstFlag:$neg), > -opName, pattern > +opName#"_e32", I think this should be _e64, shouldn't it? Also, while you're changing the asm strings, could you add the operands to them? > let SRC2 = 0x80; Hmm, we're scattering quite a few of these magic 0x80 around, would be nice to make those more self-documenting somehow... -- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Debian, X and DRI developer ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] draw: make sure key size is calculated consistently.
On 02/18/2013 05:27 PM, srol...@vmware.com wrote: From: Roland Scheidegger Some parts calculated key size by using shader information, others by using the pipe_vertex_element information. Since it is perfectly valid to have more vertex_elements set than the vertex shader is using those may not be the same, so we weren't copying over all vertex_element state - this caused the tgsi dump to assert (iterates over all vertex elements). With some luck it didn't crash otherwise even though the llvm generate_fetch code also iterates over all vertex elements (probably because llvm threw away the unused inputs anyway), but if in this situation vertex texturing would be used things would definitely go wrong (as the sampler information wouldn't be copied). So drop the key size calculation using shader information. --- src/gallium/auxiliary/draw/draw_llvm.c | 13 - src/gallium/auxiliary/draw/draw_llvm.h |1 - .../draw/draw_pt_fetch_shade_pipeline_llvm.c |7 ++- src/gallium/auxiliary/draw/draw_vs_llvm.c |6 -- 4 files changed, 14 insertions(+), 13 deletions(-) Looks OK to me. Reviewed-by: Brian Paul ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] radeonsi: Fix blending using destination alpha factor but non-alpha destination
On Tue, Feb 19, 2013 at 3:28 PM, Michel Dänzer wrote: > On Die, 2013-02-19 at 14:04 +0100, Marek Olšák wrote: >> On Tue, Feb 19, 2013 at 11:02 AM, Michel Dänzer wrote: >> > >> > Really, what I don't understand is why r600g doesn't seem affected by >> > this... at least on my RS880 it's passing the piglit tests this change >> > fixes with radeonsi. So maybe I'm just missing some magic bit for >> > radeonsi. >> >> RGB formats do fail fbo-blending-formats with r600g/redwood here. > > Okay. > > >> However the alpha channel can sometimes contain 1 in memory even if >> the format is RGBX. Off the top of my head, glClear, glTex[Sub]Image, >> glCopyTex[Sub]Image always set alpha to 1. > > Well, but they shouldn't for these formats. :) The memory corresponding > to X* channels should remain unchanged. I'm working on a separate patch > for that for radeonsi. I think the only way you could do that is to set the colormask to RGB. Doesn't it have a negative effect on performance if some channels are masked out? Marek ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 61091] piglit glsl-fs-texture2drect regression
https://bugs.freedesktop.org/show_bug.cgi?id=61091 --- Comment #1 from Marek Olšák --- glBlitFramebuffer with rectangle textures is also broken with both softpipe and llvmpipe and it has been so for quite a while. I have a piglit test for that. -- You are receiving this mail because: You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] VTK Tests fails with Mesa Swrast passes with OSMesa
On 02/15/2013 09:00 AM, Kevin H. Hobbs wrote: I have two machines {bubbles, murron} doing nightly dashboard builds of VTK using nightly Mesa. Each machine does a build of VTK using swrast and one with OSMesa. Many tests pass on both machines when using OSMesa and fail on both machines using swrast. This is an example : Test failing on bubbles and murron with swrast http://open.cdash.org/testDetails.php?test=177420341&build=2813128 http://open.cdash.org/testDetails.php?test=177431601&build=2813212 Test passing on bubbles and murron with OSMesa http://open.cdash.org/testDetails.php?test=177404326&build=2812997 http://open.cdash.org/testDetails.php?test=177411381&build=2813049 Many of the tests fail in a similar way: some of the elements of the image are just missing. Looks like lines, in particular, are missing. I don't see any recent changes to swrast/osmesa that would seem to cause this. I think there's two approaches to narrowing this down: 1. You do a git-bisect of mesa to find the regression 2. Make an apitrace of the failing test so I can investigate. Thanks. -Brian ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] radeonsi: Fix blending using destination alpha factor but non-alpha destination
On Die, 2013-02-19 at 15:48 +0100, Marek Olšák wrote: > On Tue, Feb 19, 2013 at 3:28 PM, Michel Dänzer wrote: > > On Die, 2013-02-19 at 14:04 +0100, Marek Olšák wrote: > >> On Tue, Feb 19, 2013 at 11:02 AM, Michel Dänzer wrote: > >> > > >> > Really, what I don't understand is why r600g doesn't seem affected by > >> > this... at least on my RS880 it's passing the piglit tests this change > >> > fixes with radeonsi. So maybe I'm just missing some magic bit for > >> > radeonsi. > >> > >> RGB formats do fail fbo-blending-formats with r600g/redwood here. > > > > Okay. > > > > > >> However the alpha channel can sometimes contain 1 in memory even if > >> the format is RGBX. Off the top of my head, glClear, glTex[Sub]Image, > >> glCopyTex[Sub]Image always set alpha to 1. > > > > Well, but they shouldn't for these formats. :) The memory corresponding > > to X* channels should remain unchanged. I'm working on a separate patch > > for that for radeonsi. > > I think the only way you could do that is to set the colormask to RGB. Exactly. > Doesn't it have a negative effect on performance if some channels are > masked out? It might, but I don't see that we really have a choice. If the app / state tracker doesn't want to preserve those bits, it should use a non-X* format. -- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Debian, X and DRI developer ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] draw: make sure key size is calculated consistently.
There may be more vertex elements that used in the shader. But why should the key contain those elements? Won't this cause needless recompilations (e.g., in situations where the state tracker leaves unneeded elements from previous draw?)? That is, it seems to be that the key should have the number of elements from pipe_vertex_element information, but only copy those that vertex shader uses. Jose - Original Message - > From: Roland Scheidegger > > Some parts calculated key size by using shader information, others by using > the pipe_vertex_element information. Since it is perfectly valid to have more > vertex_elements set than the vertex shader is using those may not be the > same, > so we weren't copying over all vertex_element state - this caused the tgsi > dump > to assert (iterates over all vertex elements). With some luck it didn't > crash otherwise even though the llvm generate_fetch code also iterates over > all vertex elements (probably because llvm threw away the unused inputs > anyway), > but if in this situation vertex texturing would be used things would > definitely > go wrong (as the sampler information wouldn't be copied). > So drop the key size calculation using shader information. > --- > src/gallium/auxiliary/draw/draw_llvm.c | 13 - > src/gallium/auxiliary/draw/draw_llvm.h |1 - > .../draw/draw_pt_fetch_shade_pipeline_llvm.c |7 ++- > src/gallium/auxiliary/draw/draw_vs_llvm.c |6 -- > 4 files changed, 14 insertions(+), 13 deletions(-) > > diff --git a/src/gallium/auxiliary/draw/draw_llvm.c > b/src/gallium/auxiliary/draw/draw_llvm.c > index f3b..df57358 100644 > --- a/src/gallium/auxiliary/draw/draw_llvm.c > +++ b/src/gallium/auxiliary/draw/draw_llvm.c > @@ -420,17 +420,20 @@ draw_llvm_destroy(struct draw_llvm *llvm) > */ > struct draw_llvm_variant * > draw_llvm_create_variant(struct draw_llvm *llvm, > - unsigned num_inputs, > - const struct draw_llvm_variant_key *key) > + unsigned num_inputs, > + const struct draw_llvm_variant_key *key) > { > struct draw_llvm_variant *variant; > struct llvm_vertex_shader *shader = >llvm_vertex_shader(llvm->draw->vs.vertex_shader); > LLVMTypeRef vertex_header; > + unsigned key_size = draw_llvm_variant_key_size(key->nr_vertex_elements, > + MAX2(key->nr_samplers, > + > key->nr_sampler_views)); > > variant = MALLOC(sizeof *variant + > - shader->variant_key_size - > - sizeof variant->key); > +key_size - > +sizeof variant->key); > if (variant == NULL) >return NULL; > > @@ -440,7 +443,7 @@ draw_llvm_create_variant(struct draw_llvm *llvm, > > create_jit_types(variant); > > - memcpy(&variant->key, key, shader->variant_key_size); > + memcpy(&variant->key, key, key_size); > > vertex_header = create_jit_vertex_header(variant->gallivm, num_inputs); > > diff --git a/src/gallium/auxiliary/draw/draw_llvm.h > b/src/gallium/auxiliary/draw/draw_llvm.h > index 17ca304..b20cee5 100644 > --- a/src/gallium/auxiliary/draw/draw_llvm.h > +++ b/src/gallium/auxiliary/draw/draw_llvm.h > @@ -281,7 +281,6 @@ struct draw_llvm_variant > struct llvm_vertex_shader { > struct draw_vertex_shader base; > > - unsigned variant_key_size; > struct draw_llvm_variant_list_item variants; > unsigned variants_created; > unsigned variants_cached; > diff --git a/src/gallium/auxiliary/draw/draw_pt_fetch_shade_pipeline_llvm.c > b/src/gallium/auxiliary/draw/draw_pt_fetch_shade_pipeline_llvm.c > index b0c18ed..d7f855f 100644 > --- a/src/gallium/auxiliary/draw/draw_pt_fetch_shade_pipeline_llvm.c > +++ b/src/gallium/auxiliary/draw/draw_pt_fetch_shade_pipeline_llvm.c > @@ -127,13 +127,18 @@ llvm_middle_end_prepare( struct draw_pt_middle_end > *middle, >struct llvm_vertex_shader *shader = llvm_vertex_shader(vs); >char store[DRAW_LLVM_MAX_VARIANT_KEY_SIZE]; >unsigned i; > + unsigned key_size; > >key = draw_llvm_make_variant_key(fpme->llvm, store); > > + key_size = draw_llvm_variant_key_size(key->nr_vertex_elements, > +MAX2(key->nr_samplers, > + key->nr_sampler_views)); > + >/* Search shader's list of variants for the key */ >li = first_elem(&shader->variants); >while (!at_end(&shader->variants, li)) { > - if (memcmp(&li->base->key, key, shader->variant_key_size) == 0) { > + if (memcmp(&li->base->key, key, key_size) == 0) { > variant = li->base; > break; > } > diff --git a/src/gallium/auxiliary/draw/draw_vs_llvm.c > b/src/gallium/auxiliary/draw/draw_vs_llvm.c > index ac3999e..50cef79 100644 > --- a/src/gallium/auxi
Re: [Mesa-dev] [PATCH 5/6] R600: Remove LowerConstCopyPass and lower CONST_COPY right after ISel.
On Mon, Feb 18, 2013 at 05:27:29PM +0100, Vincent Lejeune wrote: > Maintaining CONST_COPY Instructions until Pre Emit may prevent some ifcvt case > and taking them in account for scheduling is difficult for no real benefit. > --- > lib/Target/R600/AMDGPU.h| 1 - > lib/Target/R600/AMDGPUTargetMachine.cpp | 1 - > lib/Target/R600/R600ISelLowering.cpp| 8 +- > lib/Target/R600/R600Instructions.td | 7 +- > lib/Target/R600/R600LowerConstCopy.cpp | 222 > Don't forget to remove this file from CMakeLists.txt > 5 files changed, 11 insertions(+), 228 deletions(-) > delete mode 100644 lib/Target/R600/R600LowerConstCopy.cpp > > diff --git a/lib/Target/R600/AMDGPU.h b/lib/Target/R600/AMDGPU.h > index ba87918..67073ab 100644 > --- a/lib/Target/R600/AMDGPU.h > +++ b/lib/Target/R600/AMDGPU.h > @@ -23,7 +23,6 @@ class AMDGPUTargetMachine; > // R600 Passes > FunctionPass* createR600KernelParametersPass(const DataLayout *TD); > FunctionPass *createR600ExpandSpecialInstrsPass(TargetMachine &tm); > -FunctionPass *createR600LowerConstCopy(TargetMachine &tm); > > // SI Passes > FunctionPass *createSIAnnotateControlFlowPass(); > diff --git a/lib/Target/R600/AMDGPUTargetMachine.cpp > b/lib/Target/R600/AMDGPUTargetMachine.cpp > index e2f00be..70b34b0 100644 > --- a/lib/Target/R600/AMDGPUTargetMachine.cpp > +++ b/lib/Target/R600/AMDGPUTargetMachine.cpp > @@ -143,7 +143,6 @@ bool AMDGPUPassConfig::addPreEmitPass() { > addPass(createAMDGPUCFGStructurizerPass(*TM)); > addPass(createR600ExpandSpecialInstrsPass(*TM)); > addPass(&FinalizeMachineBundlesID); > -addPass(createR600LowerConstCopy(*TM)); >} else { > addPass(createSILowerControlFlowPass(*TM)); >} > diff --git a/lib/Target/R600/R600ISelLowering.cpp > b/lib/Target/R600/R600ISelLowering.cpp > index ece0b9a..f25ced1 100644 > --- a/lib/Target/R600/R600ISelLowering.cpp > +++ b/lib/Target/R600/R600ISelLowering.cpp > @@ -150,7 +150,13 @@ MachineBasicBlock * > R600TargetLowering::EmitInstrWithCustomInserter( > TII->buildMovImm(*BB, I, MI->getOperand(0).getReg(), > MI->getOperand(1).getImm()); > break; > - > + case AMDGPU::CONST_COPY: { > +MachineInstr *NewMI = TII->buildDefaultInstruction(*BB, MI, AMDGPU::MOV, > +MI->getOperand(0).getReg(), AMDGPU::ALU_CONST); > +TII->setImmOperand(NewMI, R600Operands::SRC0_SEL, > +MI->getOperand(1).getImm()); > +break; > + } > >case AMDGPU::RAT_WRITE_CACHELESS_32_eg: >case AMDGPU::RAT_WRITE_CACHELESS_128_eg: { > diff --git a/lib/Target/R600/R600Instructions.td > b/lib/Target/R600/R600Instructions.td > index 74106c9..10bcdcf 100644 > --- a/lib/Target/R600/R600Instructions.td > +++ b/lib/Target/R600/R600Instructions.td > @@ -1650,17 +1650,18 @@ let isTerminator = 1, isReturn = 1, isBarrier = 1, > hasCtrlDep = 1, > // Constant Buffer Addressing Support > > //===--===// > > -let isCodeGenOnly = 1, isPseudo = 1, Namespace = "AMDGPU" in { > +let usesCustomInserter = 1, isCodeGenOnly = 1, isPseudo = 1, Namespace = > "AMDGPU" in { > def CONST_COPY : Instruction { >let OutOperandList = (outs R600_Reg32:$dst); >let InOperandList = (ins i32imm:$src); > - let Pattern = [(set R600_Reg32:$dst, (CONST_ADDRESS > ADDRGA_CONST_OFFSET:$src))]; > + let Pattern = > + [(set R600_Reg32:$dst, (CONST_ADDRESS ADDRGA_CONST_OFFSET:$src))]; >let AsmString = "CONST_COPY"; >let neverHasSideEffects = 1; >let isAsCheapAsAMove = 1; >let Itinerary = NullALU; > } > -} // end isCodeGenOnly = 1, isPseudo = 1, Namespace = "AMDGPU" > +} // end usesCustomInserter = 1, isCodeGenOnly = 1, isPseudo = 1, Namespace > = "AMDGPU" > > def TEX_VTX_CONSTBUF : >InstR600ISA <(outs R600_Reg128:$dst), (ins MEMxi:$ptr, i32imm:$BUFFER_ID), > "VTX_READ_eg $dst, $ptr", > diff --git a/lib/Target/R600/R600LowerConstCopy.cpp > b/lib/Target/R600/R600LowerConstCopy.cpp > deleted file mode 100644 > index 3ebe653..000 > --- a/lib/Target/R600/R600LowerConstCopy.cpp > +++ /dev/null > @@ -1,222 +0,0 @@ > -//===-- R600LowerConstCopy.cpp - Propagate ConstCopy / lower them to > MOV---===// > -// > -// The LLVM Compiler Infrastructure > -// > -// This file is distributed under the University of Illinois Open Source > -// License. See LICENSE.TXT for details. > -// > -//===--===// > -// > -/// \file > -/// This pass is intended to handle remaining ConstCopy pseudo MachineInstr. > -/// ISel will fold each Const Buffer read inside scalar ALU. However it > cannot > -/// fold them inside vector instruction, like DOT4 or Cube ; ISel emits > -/// ConstCopy instead. This pass (executed after ExpandingSpecialInstr) will > try > -/// to fold them if possible or replace them by MOV otherwise. > -// > -//===-
Re: [Mesa-dev] [PATCH 6/6] R600: initial scheduler code
Hi Vincent, >From now on, please cc llvm-comm...@cs.uiuc.edu when you submit a patch. I'm cc'ing that list now. This looks OK to me at first glance, but I would like to test it with compute shaders before you merge it. On Mon, Feb 18, 2013 at 05:27:30PM +0100, Vincent Lejeune wrote: > From: Vadim Girlin > > This is a skeleton for a pre-RA MachineInstr scheduler strategy. Currently > it only tries to expose more parallelism for ALU instructions (this also > makes the distribution of GPR channels more uniform and increases the > chances of ALU instructions to be packed together in a single VLIW group). > Also it tries to reduce clause switching by grouping instruction of the > same kind (ALU/FETCH/CF) together. > > Vincent Lejeune: > - Support for VLIW4 Slot assignement > - Recomputation of ScheduleDAG to get more parallelism opportunities > > Tom Stellard: > - Fix assertion failure when trying to determine an instruction's slot >based on its destination register's class > - Fix some compiler warnings > > Vincent Lejeune: [v2] > - Remove recomputation of ScheduleDAG (will be provided in a later patch) > - Improve estimation of an ALU clause size so that heuristic does not emit cf > instructions at the wrong position. > - Make schedule heuristic smarter using SUnit Depth > - Take constant read limitations into account > --- > lib/Target/R600/AMDGPUTargetMachine.cpp | 17 +- > lib/Target/R600/R600MachineScheduler.cpp | 483 > +++ > lib/Target/R600/R600MachineScheduler.h | 121 > test/CodeGen/R600/fdiv.v4f32.ll | 6 +- > 4 files changed, 623 insertions(+), 4 deletions(-) > create mode 100644 lib/Target/R600/R600MachineScheduler.cpp > create mode 100644 lib/Target/R600/R600MachineScheduler.h > > diff --git a/lib/Target/R600/AMDGPUTargetMachine.cpp > b/lib/Target/R600/AMDGPUTargetMachine.cpp > index 70b34b0..eb58853 100644 > --- a/lib/Target/R600/AMDGPUTargetMachine.cpp > +++ b/lib/Target/R600/AMDGPUTargetMachine.cpp > @@ -17,6 +17,7 @@ > #include "AMDGPU.h" > #include "R600ISelLowering.h" > #include "R600InstrInfo.h" > +#include "R600MachineScheduler.h" > #include "SIISelLowering.h" > #include "SIInstrInfo.h" > #include "llvm/Analysis/Passes.h" > @@ -39,6 +40,14 @@ extern "C" void LLVMInitializeR600Target() { >RegisterTargetMachine X(TheAMDGPUTarget); > } > > +static ScheduleDAGInstrs *createR600MachineScheduler(MachineSchedContext *C) > { > + return new ScheduleDAGMI(C, new R600SchedStrategy()); > +} > + > +static MachineSchedRegistry > +SchedCustomRegistry("r600", "Run R600's custom scheduler", > +createR600MachineScheduler); > + > AMDGPUTargetMachine::AMDGPUTargetMachine(const Target &T, StringRef TT, > StringRef CPU, StringRef FS, >TargetOptions Options, > @@ -70,7 +79,13 @@ namespace { > class AMDGPUPassConfig : public TargetPassConfig { > public: >AMDGPUPassConfig(AMDGPUTargetMachine *TM, PassManagerBase &PM) > -: TargetPassConfig(TM, PM) {} > +: TargetPassConfig(TM, PM) { > +const AMDGPUSubtarget &ST = TM->getSubtarget(); > +if (ST.device()->getGeneration() <= AMDGPUDeviceInfo::HD6XXX) { > + enablePass(&MachineSchedulerID); > + MachineSchedRegistry::setDefault(createR600MachineScheduler); > +} > + } > >AMDGPUTargetMachine &getAMDGPUTargetMachine() const { > return getTM(); > diff --git a/lib/Target/R600/R600MachineScheduler.cpp > b/lib/Target/R600/R600MachineScheduler.cpp > new file mode 100644 > index 000..efd9490 > --- /dev/null > +++ b/lib/Target/R600/R600MachineScheduler.cpp > @@ -0,0 +1,483 @@ > +//===-- R600MachineScheduler.cpp - R600 Scheduler Interface -*- C++ > -*-===// > +// > +// The LLVM Compiler Infrastructure > +// > +// This file is distributed under the University of Illinois Open Source > +// License. See LICENSE.TXT for details. > +// > +//===--===// > +// > +/// \file > +/// \brief R600 Machine Scheduler interface > +// TODO: Scheduling is optimised for VLIW4 arch, modify it to support TRANS > slot > +// > +//===--===// > + > +#define DEBUG_TYPE "misched" > + > +#include "R600MachineScheduler.h" > +#include "llvm/CodeGen/MachineRegisterInfo.h" > +#include "llvm/CodeGen/LiveIntervalAnalysis.h" > +#include "llvm/Pass.h" > +#include "llvm/PassManager.h" > +#include > +#include > +using namespace llvm; > + > +void R600SchedStrategy::initialize(ScheduleDAGMI *dag) { > + > + DAG = dag; > + TII = static_cast(DAG->TII); > + TRI = static_cast(DAG->TRI); > + MRI = &DAG->MRI; > + Available[IDAlu]->clear(); > + Available[IDFetch]->clear(); > + Available[IDOther]->clear(); > + CurInstKind = IDOther; > + CurEmitted = 0; > + memset(InstructionsGroupCandidate, 0, sizeof(InstructionsGroupCandidate)); > + InstKindLimit[IDAlu] = 120; // 120 minus 8 for securi
Re: [Mesa-dev] [PATCH 1/6] R600: Use MUL_IEEE for trig/fdiv intrinsic
On Mon, Feb 18, 2013 at 05:27:25PM +0100, Vincent Lejeune wrote: Reviewed-by: Tom Stellard > --- > lib/Target/R600/R600Instructions.td | 8 > test/CodeGen/R600/fdiv.v4f32.ll | 8 > 2 files changed, 8 insertions(+), 8 deletions(-) > > diff --git a/lib/Target/R600/R600Instructions.td > b/lib/Target/R600/R600Instructions.td > index 0a01400..e4cc06e 100644 > --- a/lib/Target/R600/R600Instructions.td > +++ b/lib/Target/R600/R600Instructions.td > @@ -1090,12 +1090,12 @@ class COS_Common inst> : R600_1OP < > multiclass DIV_Common { > def : Pat< >(int_AMDGPU_div R600_Reg32:$src0, R600_Reg32:$src1), > - (MUL R600_Reg32:$src0, (recip_ieee R600_Reg32:$src1)) > + (MUL_IEEE R600_Reg32:$src0, (recip_ieee R600_Reg32:$src1)) > >; > > def : Pat< >(fdiv R600_Reg32:$src0, R600_Reg32:$src1), > - (MUL R600_Reg32:$src0, (recip_ieee R600_Reg32:$src1)) > + (MUL_IEEE R600_Reg32:$src0, (recip_ieee R600_Reg32:$src1)) > >; > } > > @@ -1169,12 +1169,12 @@ let Predicates = [isR600] in { > // cards. > class COS_PAT : Pat< >(fcos R600_Reg32:$src), > - (trig (MUL (MOV_IMM_I32 CONST.TWO_PI_INV), R600_Reg32:$src)) > + (trig (MUL_IEEE (MOV_IMM_I32 CONST.TWO_PI_INV), R600_Reg32:$src)) > >; > > class SIN_PAT : Pat< >(fsin R600_Reg32:$src), > - (trig (MUL (MOV_IMM_I32 CONST.TWO_PI_INV), R600_Reg32:$src)) > + (trig (MUL_IEEE (MOV_IMM_I32 CONST.TWO_PI_INV), R600_Reg32:$src)) > >; > > > //===--===// > diff --git a/test/CodeGen/R600/fdiv.v4f32.ll b/test/CodeGen/R600/fdiv.v4f32.ll > index b013fd6..459fd11 100644 > --- a/test/CodeGen/R600/fdiv.v4f32.ll > +++ b/test/CodeGen/R600/fdiv.v4f32.ll > @@ -1,13 +1,13 @@ > ;RUN: llc < %s -march=r600 -mcpu=redwood | FileCheck %s > > ;CHECK: RECIP_IEEE T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}} > -;CHECK: MUL NON-IEEE T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}} > +;CHECK: MUL_IEEE T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}} > ;CHECK: RECIP_IEEE T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}} > -;CHECK: MUL NON-IEEE T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}} > +;CHECK: MUL_IEEE T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}} > ;CHECK: RECIP_IEEE T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}} > -;CHECK: MUL NON-IEEE T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}} > +;CHECK: MUL_IEEE T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}} > ;CHECK: RECIP_IEEE T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}} > -;CHECK: MUL NON-IEEE T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}} > +;CHECK: MUL_IEEE T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}} > > define void @test(<4 x float> addrspace(1)* %out, <4 x float> addrspace(1)* > %in) { >%b_ptr = getelementptr <4 x float> addrspace(1)* %in, i32 1 > -- > 1.8.1.2 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/6] R600: CONST_ADDRESS node is not marked as mayLoad anymore
On Mon, Feb 18, 2013 at 05:27:26PM +0100, Vincent Lejeune wrote: > mayLoad complexify scheduling and does not bring any usefull info > as the location is not writeable at all. Reviewed-by: Tom Stellard > --- > lib/Target/R600/R600Instructions.td | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/lib/Target/R600/R600Instructions.td > b/lib/Target/R600/R600Instructions.td > index e4cc06e..0a777f1 100644 > --- a/lib/Target/R600/R600Instructions.td > +++ b/lib/Target/R600/R600Instructions.td > @@ -513,7 +513,7 @@ def INTERP_PAIR_ZW : AMDGPUShaderInst < > > def CONST_ADDRESS: SDNode<"AMDGPUISD::CONST_ADDRESS", >SDTypeProfile<1, -1, [SDTCisInt<0>, SDTCisPtrTy<1>]>, > - [SDNPMayLoad, SDNPVariadic] > + [SDNPVariadic] > >; > > > //===--===// > -- > 1.8.1.2 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 3/6] R600: Turn BUILD_VECTOR into Reg_Sequence
On Mon, Feb 18, 2013 at 05:27:27PM +0100, Vincent Lejeune wrote: Reviewed-by: Tom Stellard > --- > lib/Target/R600/AMDILISelDAGToDAG.cpp | 29 + > 1 file changed, 29 insertions(+) > > diff --git a/lib/Target/R600/AMDILISelDAGToDAG.cpp > b/lib/Target/R600/AMDILISelDAGToDAG.cpp > index 2e726e9..6b24117 100644 > --- a/lib/Target/R600/AMDILISelDAGToDAG.cpp > +++ b/lib/Target/R600/AMDILISelDAGToDAG.cpp > @@ -160,6 +160,35 @@ SDNode *AMDGPUDAGToDAGISel::Select(SDNode *N) { >} >switch (Opc) { >default: break; > + case ISD::BUILD_VECTOR: { > +const AMDGPUSubtarget &ST = TM.getSubtarget(); > +if (ST.device()->getGeneration() > AMDGPUDeviceInfo::HD6XXX) { > + break; > +} > +// BUILD_VECTOR is usually lowered into an IMPLICIT_DEF + 4 INSERT_SUBREG > +// that adds a 128 bits reg copy when going through > TwoAddressInstructions > +// pass. We want to avoid 128 bits copies as much as possible because > they > +// can't be bundled by our scheduler. > +SDValue RegSeqArgs[9] = { > + CurDAG->getTargetConstant(AMDGPU::R600_Reg128RegClassID, MVT::i32), > + SDValue(), CurDAG->getTargetConstant(AMDGPU::sub0, MVT::i32), > + SDValue(), CurDAG->getTargetConstant(AMDGPU::sub1, MVT::i32), > + SDValue(), CurDAG->getTargetConstant(AMDGPU::sub2, MVT::i32), > + SDValue(), CurDAG->getTargetConstant(AMDGPU::sub3, MVT::i32) > +}; > +bool IsRegSeq = true; > +for (unsigned i = 0; i < N->getNumOperands(); i++) { > + if (dyn_cast(N->getOperand(i))) { > +IsRegSeq = false; > +break; > + } > + RegSeqArgs[2 * i + 1] = N->getOperand(i); > +} > +if (!IsRegSeq) > + break; > +return CurDAG->SelectNodeTo(N, AMDGPU::REG_SEQUENCE, N->getVTList(), > +RegSeqArgs, 2 * N->getNumOperands() + 1); > + } >case ISD::ConstantFP: >case ISD::Constant: { > const AMDGPUSubtarget &ST = TM.getSubtarget(); > -- > 1.8.1.2 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 4/6] R600: Fix for Unigine when MachineSched is enabled
On Mon, Feb 18, 2013 at 05:27:28PM +0100, Vincent Lejeune wrote: Reviewed-by: Tom Stellard > --- > lib/Target/R600/R600Instructions.td | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/lib/Target/R600/R600Instructions.td > b/lib/Target/R600/R600Instructions.td > index 0a777f1..74106c9 100644 > --- a/lib/Target/R600/R600Instructions.td > +++ b/lib/Target/R600/R600Instructions.td > @@ -1587,6 +1587,7 @@ def PRED_X : InstR600 < >(ins R600_Reg32:$src0, i32imm:$src1, i32imm:$flags), >"", [], NullALU> { >let FlagOperandIdx = 3; > + let isTerminator = 1; > } > > let isTerminator = 1, isBranch = 1, isBarrier = 1 in { > -- > 1.8.1.2 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/8] R600/SI: cleanup SIInstrInfo.td and SIInstrFormat.td
Hi Christian, >From now on can you cc llvm-comm...@cs.uiuc.edu when you submit a patch. Thanks, Tom On Tue, Feb 19, 2013 at 02:54:23PM +0100, Christian König wrote: > From: Christian König > > Those two files got mixed up. > > Signed-off-by: Christian König > --- > lib/Target/R600/SIInstrFormats.td | 500 > +++-- > lib/Target/R600/SIInstrInfo.td| 495 +++- > 2 files changed, 509 insertions(+), 486 deletions(-) > > diff --git a/lib/Target/R600/SIInstrFormats.td > b/lib/Target/R600/SIInstrFormats.td > index 40e37aa..fe417d6 100644 > --- a/lib/Target/R600/SIInstrFormats.td > +++ b/lib/Target/R600/SIInstrFormats.td > @@ -1,4 +1,4 @@ > -//===-- SIInstrFormats.td - SI Instruction Formats > ===// > +//===-- SIInstrFormats.td - SI Instruction Encodings > --===// > // > // The LLVM Compiler Infrastructure > // > @@ -9,180 +9,418 @@ > // > // SI Instruction format definitions. > // > -// Instructions with _32 take 32-bit operands. > -// Instructions with _64 take 64-bit operands. > -// > -// VOP_* instructions can use either a 32-bit or 64-bit encoding. The 32-bit > -// encoding is the standard encoding, but instruction that make use of > -// any of the instruction modifiers must use the 64-bit encoding. > -// > -// Instructions with _e32 use the 32-bit encoding. > -// Instructions with _e64 use the 64-bit encoding. > -// > > //===--===// > > -class VOP3_32 op, string opName, list pattern> > - : VOP3 VReg_32:$src2, i32imm:$src3, i32imm:$src4, i32imm:$src5, i32imm:$src6), > opName, pattern>; > +class InstSI pattern> : > +AMDGPUInst { > + > + field bits<1> VM_CNT = 0; > + field bits<1> EXP_CNT = 0; > + field bits<1> LGKM_CNT = 0; > + > + let TSFlags{0} = VM_CNT; > + let TSFlags{1} = EXP_CNT; > + let TSFlags{2} = LGKM_CNT; > +} > + > +class Enc32 pattern> : > +InstSI { > + > + field bits<32> Inst; > + let Size = 4; > +} > > -class VOP3_64 op, string opName, list pattern> > - : VOP3 VReg_64:$src2, i32imm:$src3, i32imm:$src4, i32imm:$src5, i32imm:$src6), > opName, pattern>; > +class Enc64 pattern> : > +InstSI { > > -class SOP1_32 op, string opName, list pattern> > - : SOP1 ; > + field bits<64> Inst; > + let Size = 8; > +} > > -class SOP1_64 op, string opName, list pattern> > - : SOP1 ; > +//===--===// > +// Scalar operations > +//===--===// > > -class SOP2_32 op, string opName, list pattern> > - : SOP2 opName, pattern>; > +class SOP1 op, dag outs, dag ins, string asm, list pattern> : > +Enc32 { > > -class SOP2_64 op, string opName, list pattern> > - : SOP2 opName, pattern>; > + bits<7> SDST; > + bits<8> SSRC0; > > -class VOP1_Helper op, RegisterClass vrc, RegisterClass arc, > - string opName, list pattern> : > - VOP1 < > -op, (outs vrc:$dst), (ins arc:$src0), opName, pattern > - >; > + let Inst{7-0} = SSRC0; > + let Inst{15-8} = op; > + let Inst{22-16} = SDST; > + let Inst{31-23} = 0x17d; //encoding; > > -multiclass VOP1_32 op, string opName, list pattern> { > - def _e32: VOP1_Helper ; > - def _e64 : VOP3_32 <{1, 1, op{6}, op{5}, op{4}, op{3}, op{2}, op{1}, > op{0}}, > - opName, [] > - >; > + let mayLoad = 0; > + let mayStore = 0; > + let hasSideEffects = 0; > } > > -multiclass VOP1_64 op, string opName, list pattern> { > +class SOP2 op, dag outs, dag ins, string asm, list pattern> : > +Enc32 { > + > + bits<7> SDST; > + bits<8> SSRC0; > + bits<8> SSRC1; > > - def _e32 : VOP1_Helper ; > + let Inst{7-0} = SSRC0; > + let Inst{15-8} = SSRC1; > + let Inst{22-16} = SDST; > + let Inst{29-23} = op; > + let Inst{31-30} = 0x2; // encoding > > - def _e64 : VOP3_64 < > -{1, 1, op{6}, op{5}, op{4}, op{3}, op{2}, op{1}, op{0}}, > -opName, [] > - >; > + let mayLoad = 0; > + let mayStore = 0; > + let hasSideEffects = 0; > } > > -class VOP2_Helper op, RegisterClass vrc, RegisterClass arc, > - string opName, list pattern> : > - VOP2 < > -op, (outs vrc:$dst), (ins arc:$src0, vrc:$src1), opName, pattern > - >; > +class SOPC op, dag outs, dag ins, string asm, list pattern> : > + Enc32 { > > -multiclass VOP2_32 op, string opName, list pattern> { > + bits<8> SSRC0; > + bits<8> SSRC1; > > - def _e32 : VOP2_Helper ; > + let Inst{7-0} = SSRC0; > + let Inst{15-8} = SSRC1; > + let Inst{22-16} = op; > + let Inst{31-23} = 0x17e; > > - def _e64 : VOP3_32 <{1, 0, 0, op{5}, op{4}, op{3}, op{2}, op{1}, op{0}}, > - opName, [] > - >; > + let DisableEncoding = "$dst"; > + let mayLoad = 0; > + let mayStore = 0; > + let hasSideEffects = 0; > } > > -multiclass VOP2_64 op
Re: [Mesa-dev] [PATCH v2] configure.ac: Do not check for clock_gettime on MinGW.
On Tue, Feb 19, 2013 at 12:55 AM, Vinson Lee wrote: > MinGW does not have clock_gettime. > > Signed-off-by: Vinson Lee > --- > configure.ac | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/configure.ac b/configure.ac > index 16c2f8c..1e11b4e 100644 > --- a/configure.ac > +++ b/configure.ac > @@ -500,7 +500,7 @@ AC_CHECK_FUNC([dlopen], [DEFINES="$DEFINES > -DHAVE_DLOPEN"], > AC_SUBST([DLOPEN_LIBS]) > > case "$host_os" in > -darwin*) > +darwin*|mingw*) > ;; > *) > AC_CHECK_FUNCS([clock_gettime], [CLOCK_LIB=], > -- > 1.8.1.2 > Reviewed-by: Matt Turner ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] radeonsi: Fix blending using destination alpha factor but non-alpha destination
On Tue, Feb 19, 2013 at 3:55 PM, Michel Dänzer wrote: > On Die, 2013-02-19 at 15:48 +0100, Marek Olšák wrote: >> On Tue, Feb 19, 2013 at 3:28 PM, Michel Dänzer wrote: >> > On Die, 2013-02-19 at 14:04 +0100, Marek Olšák wrote: >> >> On Tue, Feb 19, 2013 at 11:02 AM, Michel Dänzer >> >> wrote: >> >> > >> >> > Really, what I don't understand is why r600g doesn't seem affected by >> >> > this... at least on my RS880 it's passing the piglit tests this change >> >> > fixes with radeonsi. So maybe I'm just missing some magic bit for >> >> > radeonsi. >> >> >> >> RGB formats do fail fbo-blending-formats with r600g/redwood here. >> > >> > Okay. >> > >> > >> >> However the alpha channel can sometimes contain 1 in memory even if >> >> the format is RGBX. Off the top of my head, glClear, glTex[Sub]Image, >> >> glCopyTex[Sub]Image always set alpha to 1. >> > >> > Well, but they shouldn't for these formats. :) The memory corresponding >> > to X* channels should remain unchanged. I'm working on a separate patch >> > for that for radeonsi. >> >> I think the only way you could do that is to set the colormask to RGB. > > Exactly. > >> Doesn't it have a negative effect on performance if some channels are >> masked out? > > It might, but I don't see that we really have a choice. If the app / > state tracker doesn't want to preserve those bits, it should use a > non-X* format. We do have a choice: let's do nothing. ReadPixels and GetTexImage always set the alpha to one, and we can patch the blend state manually to get correct RGB blending. What could possibly be broken if the alpha is modified by the hardware? Marek ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [RFC] New EGL extension: EGL_EXT_platform_display
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 I'm seeking feedback on an EGL extension that I'm drafting. The ideas have already been discussed at Khronos meetings to a good reception, but I want feedback from Mesa developers too. Summary - --- The extension, tentatively named EGL_EXT_platform_display, enables EGL clients to specify to which platform (X11, Wayland, gbm, etc) an EGL resource (EGLDisplay, EGLSurface, etc) belongs when the resource is derived from a platform-native type. As a corollary, the extension enables the creation of EGL resources from different platforms within a single process. Feedback - I'd like to hear feeback about the details below. Do you see any potential problems? Is it lacking a feature that you believe should be present? Details - --- The draft extension defines the following new functions: // This is the extenion's key function. // EGLDisplay eglGetPlatformDisplayEXT(EGLenum platform, void *native_display); // The two eglCreate functions below differ from their core counterparts // only in their signature. The EGLNative types are replaced with void*. // This makes the signature agnostic to which platform the native resource // belongs. EGLSurface eglCreatePlatformWindowSurfaceEXT(EGLDisplay dpy, EGLConfig config, void *native_window, const EGLint *attrib_list); EGLSurface eglCreatePlatformPixmapSurface(EGLDisplay dpy, EGLConfig config, void *native_pixmap, const EGLint *attrib_list); Valid values for `platform` are defined by layered extensions. For example, EGL_EXT_platform_x11 defines EGL_PLATFORM_X11, and EGL_EXT_platform_wayland defines EGL_PLATFORM_WAYLAND. Also, the layered extensions specify which native types should be passed as the native parameters. For example, EGL_EXT_platform_wayland specifies that, when calling eglCreatePlatformWindowSurfaceEXT with a display that was derived from a Wayland display, then the native_window parameter must be `struct wl_egl_window*`. Analogously, EGL_EXT_platform_x11 specifies that native_window must be `Window*`. Example Code for X11 - // The internal representation of the egl_dpy, created below, remembers that // it was derived from an Xlib display. Display *xlib_dpy = XOpenDisplay(NULL); EGLDisplay *egl_dpy = eglGetPlatformDisplayEXT(EGL_PLATFORM_X11, xlib_dpy); EGLConfig config; eglChooseConfig(egl_dpy, &config, ...); // Since egl_dpy remembers that it was derived from an Xlib display, when // creating the EGLSurface below libEGL internally casts the // `(void*) &xlib_win` to `Window*`. Window xlib_win = XCreateWindow(xlib_dpy, ...); EGLSurface egl_surface = eglCreatePlatformWindowSurfaceEXT(egl_dpy, config, (void*) &xlib_win, NULL); Example Code for Wayland - // The internal representation of the egl_dpy, created below, remembers that // it was derived from a Wayland display. struct wl_display *wl_dpy = wl_display_connect(NULL); EGLDisplay *egl_dpy = eglGetPlatformDisplay(EGL_PLATFORM_WAYLAND, wl_dpy); EGLConfig config; eglChooseConfig(egl_dpy, &config, ...); // Since egl_dpy remembers that it was derived from an Wayland display, when // creating the EGLSurface below libEGL internally casts the // `(void*) wl_win` to `struct wl_egl_window*`. struct wl_egl_window *wl_win = wl_egl_window_create(...); EGLSurface egl_surface = eglCreateWindowSurface(egl_dpy, config, (void*) wl_win, NULL); -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.19 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQIcBAEBAgAGBQJRI6ZfAAoJEAIvNt057x8iIiEP/0ikLSoTa7sy7bAi5lRasFUZ /fKhktZbf062K6PNETS+kS5xiKcEnaJ7FnEjnga/ao2Lbp+7+7ArI0v8vKEpekS0 Ln9oQs7fXzp9dD5+YZT0ICpB7oanZsVy7VDBEq5GcH5zuHGVL1PwiPNKT4OEUi5R 7+j2UZ0kCWVGQS0vB0onoLHeSwud6mVURSvvOghhza3f32QgUDuw3XsEdrmpx0Bw WMROUDgcpYFBJ8lQ5GO+yFkPPnWytwspECveMQXUg/M63s+UADfWFvEuOE92yddb SMviKzlKzbG+ZZffvOBy4lt99NCO1oZ+FeR0Uc5m9wT3dpF0GDILR+sH+eemAbxn JicvhPycgd9mfjtsG47+Y1atkdkh7nBIbk5qrkCq4eMxVVSeQLQ8PdBUJUQA1JI+ YIM4/+E4iFi8ynCIcWKXccnFnV+POHizIDPwxQHp7cbuOXvI8tQhxf0H83Qm3Gl+ amfRgJWj1nUGtz4UQK5DEq6KfxRIy84/OvrWd9fw610sFAALXiSZC2b0wmz0Alv4 bkQ3LxIJtN9Nyabcm2B1eXtl1SYFRwx24P0T+2IL4INCvUFPJVtzfORo8pWLIMYm kA8p+0DSgrEkBE0D6bgluGhTjsWejyHoqoNG95YIcEGPnNJf12cS36oJSOttvnz2 BXfyGn39yyRGUIrRgpUR =MP7i -END PGP SIGNATURE- ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] r600g: status of my work on the shader optimization
Vadim Girlin wrote: Could you please test glxgears and other simple mesa demos? It's easier to spot the problems with small apps that don't use a lot of complex shaders. If some of them don't work correctly, please send me the dumps with "R600_DUMP_SHADERS=2 R600_SB_DUMP=3". All of the mesa demos work with and without llvm. Also it might help if you can look for piglit regressions against the piglit results with R600_SB=0 and send me the dumps for a few regressed tests. I don't actually have piglit - it was always a pain with cmake to get it to build on my old 32bit lfs with xorg/mesa installed under home. I do now have a new 64bit clfs build with everything in normal places - so maybe I'll give it a go on that - but I don't know how to use it as such. Even though it's "new" clfs uses gcc 4.6.3 so on there g++ is actually too old to build your tree - without changing some friends to friends class ... I don't know when I'll get time to learn piglit but for now here's working and not nexuiz. R600_DUMP_SHADERS=2 R600_SB_DUMP=3 nexuiz &> nexuiz-working-dump http://www.andyqos.ukfsn.org/nexuiz-working-dump R600_LLVM=0 R600_DUMP_SHADERS=2 R600_SB_DUMP=3 nexuiz &> nexuiz-corrupt-dump http://www.andyqos.ukfsn.org/nexuiz-corrupt-dump ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] llvmpipe: fix lp_resource_copy using more than one 3d slice
Am 19.02.2013 10:13, schrieb Jose Fonseca: > Thanks for fixing this Roland. > > This is definitely an improvement. I'd recommend a few tweaks (it could even > be as a follow on change): > > - Calling llvmpipe_flush_resource() in a loop is overkill (it will call > llvmpipe_flush() to be called many times needlessly). Please refactor > llvmpipe_flush_resource() and llvmpipe_is_resource_referenced() to receive > start_layer, end_layer pair. Actually I guess I'll just drop the layer parameter completely. It is passed through another function however in the end it is just unused and thrown away anyway, so it doesn't matter if we check for whole resource or just parts (of course at some point we might want to change this but that's how it looks for now). > > - call util_copy_box instead of util_copy_rect Ah you're right I thought it wouldn't work outside the loop but it should (not that it makes much difference since util_copy_box will just call util_copy_rect repeatedly but it is definitely nicer style). Roland > > Jose > > > - Original Message - >> From: Roland Scheidegger >> >> These used to be illegal a very long time ago, then for some more time >> nothing really emitted these so this code path wasn't hit. >> Just trivially iterate over box->depth. >> (Might be worth refactoring at some point since nowadays all the code >> doesn't really do much except for depth textures.) >> >> This fixes https://bugs.freedesktop.org/show_bug.cgi?id=61093 >> --- >> src/gallium/drivers/llvmpipe/lp_surface.c | 170 >> +++-- >> 1 file changed, 86 insertions(+), 84 deletions(-) >> >> diff --git a/src/gallium/drivers/llvmpipe/lp_surface.c >> b/src/gallium/drivers/llvmpipe/lp_surface.c >> index 11475fd..dbaed95 100644 >> --- a/src/gallium/drivers/llvmpipe/lp_surface.c >> +++ b/src/gallium/drivers/llvmpipe/lp_surface.c >> @@ -65,7 +65,7 @@ lp_resource_copy(struct pipe_context *pipe, >> const enum pipe_format format = src_tex->base.format; >> unsigned width = src_box->width; >> unsigned height = src_box->height; >> - assert(src_box->depth == 1); >> + unsigned z; >> >> /* Fallback for buffers. */ >> if (dst->target == PIPE_BUFFER && src->target == PIPE_BUFFER) { >> @@ -74,99 +74,101 @@ lp_resource_copy(struct pipe_context *pipe, >>return; >> } >> >> - llvmpipe_flush_resource(pipe, >> - dst, dst_level, dstz, >> - FALSE, /* read_only */ >> - TRUE, /* cpu_access */ >> - FALSE, /* do_not_block */ >> - "blit dest"); >> - >> - llvmpipe_flush_resource(pipe, >> - src, src_level, src_box->z, >> - TRUE, /* read_only */ >> - TRUE, /* cpu_access */ >> - FALSE, /* do_not_block */ >> - "blit src"); >> - >> - /* >> - printf("surface copy from %u lvl %u to %u lvl %u: %u,%u,%u to %u,%u,%u %u >> x %u x %u\n", >> - src_tex->id, src_level, dst_tex->id, dst_level, >> - src_box->x, src_box->y, src_box->z, dstx, dsty, dstz, >> - src_box->width, src_box->height, src_box->depth); >> - */ >> - >> - /* set src tiles to linear layout */ >> - { >> - unsigned tx, ty, tw, th; >> - unsigned x, y; >> - >> - adjust_to_tile_bounds(src_box->x, src_box->y, width, height, >> -&tx, &ty, &tw, &th); >> - >> - for (y = 0; y < th; y += TILE_SIZE) { >> - for (x = 0; x < tw; x += TILE_SIZE) { >> -(void) llvmpipe_get_texture_tile_linear(src_tex, >> -src_box->z, src_level, >> -LP_TEX_USAGE_READ, >> -tx + x, ty + y); >> + for (z = 0; z < src_box->depth; z++){ >> + llvmpipe_flush_resource(pipe, >> + dst, dst_level, dstz + z, >> + FALSE, /* read_only */ >> + TRUE, /* cpu_access */ >> + FALSE, /* do_not_block */ >> + "blit dest"); >> + >> + llvmpipe_flush_resource(pipe, >> + src, src_level, src_box->z + z, >> + TRUE, /* read_only */ >> + TRUE, /* cpu_access */ >> + FALSE, /* do_not_block */ >> + "blit src"); >> + >> + /* >> + printf("surface copy from %u lvl %u to %u lvl %u: %u,%u,%u to %u,%u,%u >> %u x %u x %u\n", >> + src_tex->id, src_level, dst_tex->id, dst_level, >> + src_box->x, src_box->y, src_box->z, dstx, dsty, dstz, >> + src_box->width, src_box->height, src_box->depth); >> + */ >> + >> + /* set src tiles to linear lay
[Mesa-dev] [Bug 38086] Mesa 7.11-devel implementation error: Unexpected program target in destroy_program_variants_cb()
https://bugs.freedesktop.org/show_bug.cgi?id=38086 --- Comment #7 from Laurent carlier --- (In reply to comment #6) > Can you make a trace of this issue with apitrace? > https://github.com/apitrace/apitrace You can find it here: http://pkgbuild.com/~lcarlier/traces/hl2_linux.trace.tar.gz -- You are receiving this mail because: You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/3] radeonsi: use u_box_origin_2d helper function
From: Marek Olšák [ Cherry-picked from r600g commit b278aba42310e8fa30f2408b9dcd58dbb4901724 ] Signed-off-by: Michel Dänzer --- src/gallium/drivers/radeonsi/r600_texture.c | 7 ++- 1 file changed, 2 insertions(+), 5 deletions(-) diff --git a/src/gallium/drivers/radeonsi/r600_texture.c b/src/gallium/drivers/radeonsi/r600_texture.c index e8d9932..d546554 100644 --- a/src/gallium/drivers/radeonsi/r600_texture.c +++ b/src/gallium/drivers/radeonsi/r600_texture.c @@ -55,11 +55,8 @@ static void r600_copy_from_staging_texture(struct pipe_context *ctx, struct r600 struct pipe_resource *texture = transfer->resource; struct pipe_box sbox; - sbox.x = sbox.y = sbox.z = 0; - sbox.width = transfer->box.width; - sbox.height = transfer->box.height; - /* XXX that might be wrong */ - sbox.depth = 1; + u_box_origin_2d(transfer->box.width, transfer->box.height, &sbox); + ctx->resource_copy_region(ctx, texture, transfer->level, transfer->box.x, transfer->box.y, transfer->box.z, rtransfer->staging, -- 1.8.1.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 0/3] radeonsi: Cherry-pick transfer fixes from r600g
These together get us 11 more little piglits with Marek's glTex(Sub)Image improvements in st/mesa. [PATCH 1/3] radeonsi: use u_box_origin_2d helper function [PATCH 2/3] radeonsi: add assertions to prevent creation of invalid [PATCH 3/3] radeonsi: implement 3D transfers ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/3] radeonsi: add assertions to prevent creation of invalid surfaces
From: Marek Olšák [ Cherry-picked from r600g commit ef11ed61a0414d0405c3faf7f48fa3f1d083f82e ] Signed-off-by: Michel Dänzer --- src/gallium/drivers/radeonsi/r600_blit.c | 15 --- src/gallium/drivers/radeonsi/r600_texture.c | 2 ++ src/gallium/drivers/radeonsi/radeonsi_pipe.h | 16 3 files changed, 18 insertions(+), 15 deletions(-) diff --git a/src/gallium/drivers/radeonsi/r600_blit.c b/src/gallium/drivers/radeonsi/r600_blit.c index 35c8f95..0b0eba3 100644 --- a/src/gallium/drivers/radeonsi/r600_blit.c +++ b/src/gallium/drivers/radeonsi/r600_blit.c @@ -98,21 +98,6 @@ static void r600_blitter_end(struct pipe_context *ctx) r600_context_queries_resume(rctx); } -static unsigned u_max_layer(struct pipe_resource *r, unsigned level) -{ - switch (r->target) { - case PIPE_TEXTURE_CUBE: - return 6 - 1; - case PIPE_TEXTURE_3D: - return u_minify(r->depth0, level) - 1; - case PIPE_TEXTURE_1D_ARRAY: - case PIPE_TEXTURE_2D_ARRAY: - return r->array_size - 1; - default: - return 0; - } -} - void si_blit_uncompress_depth(struct pipe_context *ctx, struct r600_resource_texture *texture, struct r600_resource_texture *staging, diff --git a/src/gallium/drivers/radeonsi/r600_texture.c b/src/gallium/drivers/radeonsi/r600_texture.c index d546554..5790974 100644 --- a/src/gallium/drivers/radeonsi/r600_texture.c +++ b/src/gallium/drivers/radeonsi/r600_texture.c @@ -545,6 +545,8 @@ static struct pipe_surface *r600_create_surface(struct pipe_context *pipe, struct r600_surface *surface = CALLOC_STRUCT(r600_surface); unsigned level = surf_tmpl->u.tex.level; + assert(surf_tmpl->u.tex.first_layer <= u_max_layer(texture, surf_tmpl->u.tex.level)); + assert(surf_tmpl->u.tex.last_layer <= u_max_layer(texture, surf_tmpl->u.tex.level)); assert(surf_tmpl->u.tex.first_layer == surf_tmpl->u.tex.last_layer); if (surface == NULL) return NULL; diff --git a/src/gallium/drivers/radeonsi/radeonsi_pipe.h b/src/gallium/drivers/radeonsi/radeonsi_pipe.h index d0f04f4..8c6d908 100644 --- a/src/gallium/drivers/radeonsi/radeonsi_pipe.h +++ b/src/gallium/drivers/radeonsi/radeonsi_pipe.h @@ -277,4 +277,20 @@ static INLINE uint64_t r600_resource_va(struct pipe_screen *screen, struct pipe_ return rscreen->ws->buffer_get_virtual_address(rresource->cs_buf); } +static INLINE unsigned u_max_layer(struct pipe_resource *r, unsigned level) +{ + switch (r->target) { + case PIPE_TEXTURE_CUBE: + return 6 - 1; + case PIPE_TEXTURE_3D: + return u_minify(r->depth0, level) - 1; + case PIPE_TEXTURE_1D_ARRAY: + case PIPE_TEXTURE_2D_ARRAY: + case PIPE_TEXTURE_CUBE_ARRAY: + return r->array_size - 1; + default: + return 0; + } +} + #endif -- 1.8.1.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 3/3] radeonsi: implement 3D transfers
From: Marek Olšák That means we can map and read multiple slices with one transfer_map call. [ Cherry-picked from r600g commit 1aebb6911e9aa1bd8900868b58d1750ca83a20c7 ] Signed-off-by: Michel Dänzer --- src/gallium/drivers/radeonsi/r600_texture.c | 49 + 1 file changed, 29 insertions(+), 20 deletions(-) diff --git a/src/gallium/drivers/radeonsi/r600_texture.c b/src/gallium/drivers/radeonsi/r600_texture.c index 5790974..153df00 100644 --- a/src/gallium/drivers/radeonsi/r600_texture.c +++ b/src/gallium/drivers/radeonsi/r600_texture.c @@ -55,7 +55,7 @@ static void r600_copy_from_staging_texture(struct pipe_context *ctx, struct r600 struct pipe_resource *texture = transfer->resource; struct pipe_box sbox; - u_box_origin_2d(transfer->box.width, transfer->box.height, &sbox); + u_box_3d(0, 0, 0, transfer->box.width, transfer->box.height, transfer->box.depth, &sbox); ctx->resource_copy_region(ctx, texture, transfer->level, transfer->box.x, transfer->box.y, transfer->box.z, @@ -235,7 +235,6 @@ static void *si_texture_transfer_map(struct pipe_context *ctx, { struct r600_context *rctx = (struct r600_context *)ctx; struct r600_resource_texture *rtex = (struct r600_resource_texture*)texture; - struct pipe_resource resource; struct r600_transfer *trans; boolean use_staging_texture = FALSE; struct radeon_winsys_cs_handle *buf; @@ -295,42 +294,52 @@ static void *si_texture_transfer_map(struct pipe_context *ctx, level, level, box->z, box->z + box->depth - 1); trans->transfer.stride = staging_depth->surface.level[level].pitch_bytes; + trans->transfer.layer_stride = staging_depth->surface.level[level].slice_size; trans->offset = r600_texture_get_offset(staging_depth, level, box->z); trans->staging = &staging_depth->resource.b.b; } else if (use_staging_texture) { - resource.target = PIPE_TEXTURE_2D; + struct pipe_resource resource; + struct r600_resource_texture *staging; + + memset(&resource, 0, sizeof(resource)); resource.format = texture->format; resource.width0 = box->width; resource.height0 = box->height; resource.depth0 = 1; resource.array_size = 1; - resource.last_level = 0; - resource.nr_samples = 0; resource.usage = PIPE_USAGE_STAGING; - resource.bind = 0; resource.flags = R600_RESOURCE_FLAG_TRANSFER; - /* For texture reading, the temporary (detiled) texture is used as -* a render target when blitting from a tiled texture. */ - if (usage & PIPE_TRANSFER_READ) { - resource.bind |= PIPE_BIND_RENDER_TARGET; - } - /* For texture writing, the temporary texture is used as a sampler -* when blitting into a tiled texture. */ - if (usage & PIPE_TRANSFER_WRITE) { - resource.bind |= PIPE_BIND_SAMPLER_VIEW; + + /* We must set the correct texture target and dimensions if needed for a 3D transfer. */ + if (box->depth > 1 && u_max_layer(texture, level) > 0) + resource.target = texture->target; + else + resource.target = PIPE_TEXTURE_2D; + + switch (resource.target) { + case PIPE_TEXTURE_1D_ARRAY: + case PIPE_TEXTURE_2D_ARRAY: + case PIPE_TEXTURE_CUBE_ARRAY: + resource.array_size = box->depth; + break; + case PIPE_TEXTURE_3D: + resource.depth0 = box->depth; + break; + default:; } /* Create the temporary texture. */ - trans->staging = ctx->screen->resource_create(ctx->screen, &resource); - if (trans->staging == NULL) { + staging = (struct r600_resource_texture*)ctx->screen->resource_create(ctx->screen, &resource); + if (staging == NULL) { R600_ERR("failed to create temporary texture to hold untiled copy\n"); pipe_resource_reference(&trans->transfer.resource, NULL); FREE(trans); return NULL; } - trans->transfer.stride = ((struct r600_resource_texture *)trans->staging) - ->surface.level[0].pitch_bytes; + trans->staging = &staging->resource.b.b; + trans->transfer.stride = staging->surface.level[0].pitch_bytes; + trans->tra
Re: [Mesa-dev] VTK Tests fails with Mesa Swrast passes with OSMesa
On 02/19/2013 09:51 AM, Brian Paul wrote: > > Looks like lines, in particular, are missing. I don't see any recent > changes to swrast/osmesa that would seem to cause this. > There probably were none. I'm trying to track down long standing issues. > > 1. You do a git-bisect of mesa to find the regression Since I have no idea when this failure started.. > 2. Make an apitrace of the failing test so I can investigate. > http://crab-lab.zool.ohiou.edu/kevin/vtk_apitraces.tar.bz2 vtk_apitraces/vtkTraceSwrast.trace fails vtk_apitraces/vtkTraceOSMesa.trace passes signature.asc Description: OpenPGP digital signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] draw: make sure key size is calculated consistently.
Am 19.02.2013 15:57, schrieb Jose Fonseca: > There may be more vertex elements that used in the shader. But why should the > key contain those elements? Won't this cause needless recompilations (e.g., > in situations where the state tracker leaves unneeded elements from previous > draw?)? I don't think the state tracker would leave unneeded elements like that (that is I think the nr_elements would be adjusted if the state tracker has to figure it out on its own, causing recompiles in any case). But yes if you set different pipe_vertex_element which only differ in the unused elements then it will cause unnecessary recompile (I don't think that's really something which matters here). > > That is, it seems to be that the key should have the number of elements from > pipe_vertex_element information, but only copy those that vertex shader uses. That doesn't sound very good. If we want to dump the pipe_vertex_elements like we do now either we need to fix up the nr_elements or copy all of them. Also vs_generate function seems to create code for all pipe_vertex_elements, not just those used by the shader. I guess that instead of using nr_elements we could just use the information from the shader instead consistently, though I'm actually unsure this works always - is it somehow possible to only use vertex_element nr 2 and 4 for instance? So I think you're suggesting instead of this fix that key->nr_elements wouldn't be used for anything except the key comparison itself, and everything else (calculating sampler offset in the key, tgsi dump, code generation) would use the shader information? Roland > > Jose > > > - Original Message - >> From: Roland Scheidegger >> >> Some parts calculated key size by using shader information, others by using >> the pipe_vertex_element information. Since it is perfectly valid to have more >> vertex_elements set than the vertex shader is using those may not be the >> same, >> so we weren't copying over all vertex_element state - this caused the tgsi >> dump >> to assert (iterates over all vertex elements). With some luck it didn't >> crash otherwise even though the llvm generate_fetch code also iterates over >> all vertex elements (probably because llvm threw away the unused inputs >> anyway), >> but if in this situation vertex texturing would be used things would >> definitely >> go wrong (as the sampler information wouldn't be copied). >> So drop the key size calculation using shader information. >> --- >> src/gallium/auxiliary/draw/draw_llvm.c | 13 - >> src/gallium/auxiliary/draw/draw_llvm.h |1 - >> .../draw/draw_pt_fetch_shade_pipeline_llvm.c |7 ++- >> src/gallium/auxiliary/draw/draw_vs_llvm.c |6 -- >> 4 files changed, 14 insertions(+), 13 deletions(-) >> >> diff --git a/src/gallium/auxiliary/draw/draw_llvm.c >> b/src/gallium/auxiliary/draw/draw_llvm.c >> index f3b..df57358 100644 >> --- a/src/gallium/auxiliary/draw/draw_llvm.c >> +++ b/src/gallium/auxiliary/draw/draw_llvm.c >> @@ -420,17 +420,20 @@ draw_llvm_destroy(struct draw_llvm *llvm) >> */ >> struct draw_llvm_variant * >> draw_llvm_create_variant(struct draw_llvm *llvm, >> - unsigned num_inputs, >> - const struct draw_llvm_variant_key *key) >> + unsigned num_inputs, >> + const struct draw_llvm_variant_key *key) >> { >> struct draw_llvm_variant *variant; >> struct llvm_vertex_shader *shader = >>llvm_vertex_shader(llvm->draw->vs.vertex_shader); >> LLVMTypeRef vertex_header; >> + unsigned key_size = draw_llvm_variant_key_size(key->nr_vertex_elements, >> + MAX2(key->nr_samplers, >> + >> key->nr_sampler_views)); >> >> variant = MALLOC(sizeof *variant + >> -shader->variant_key_size - >> -sizeof variant->key); >> +key_size - >> +sizeof variant->key); >> if (variant == NULL) >>return NULL; >> >> @@ -440,7 +443,7 @@ draw_llvm_create_variant(struct draw_llvm *llvm, >> >> create_jit_types(variant); >> >> - memcpy(&variant->key, key, shader->variant_key_size); >> + memcpy(&variant->key, key, key_size); >> >> vertex_header = create_jit_vertex_header(variant->gallivm, num_inputs); >> >> diff --git a/src/gallium/auxiliary/draw/draw_llvm.h >> b/src/gallium/auxiliary/draw/draw_llvm.h >> index 17ca304..b20cee5 100644 >> --- a/src/gallium/auxiliary/draw/draw_llvm.h >> +++ b/src/gallium/auxiliary/draw/draw_llvm.h >> @@ -281,7 +281,6 @@ struct draw_llvm_variant >> struct llvm_vertex_shader { >> struct draw_vertex_shader base; >> >> - unsigned variant_key_size; >> struct draw_llvm_variant_list_item variants; >> unsigned variants_created; >> unsigned variants_cached; >> diff --git a/src/gallium/auxiliary/draw/draw_pt_fetch_shade_pipeline_llvm.c >> b/src/g
Re: [Mesa-dev] [PATCH] draw: make sure key size is calculated consistently.
- Original Message - > Am 19.02.2013 15:57, schrieb Jose Fonseca: > > There may be more vertex elements that used in the shader. But why should > > the key contain those elements? Won't this cause needless recompilations > > (e.g., in situations where the state tracker leaves unneeded elements from > > previous draw?)? > I don't think the state tracker would leave unneeded elements like that > (that is I think the nr_elements would be adjusted if the state tracker > has to figure it out on its own, causing recompiles in any case). > But yes if you set different pipe_vertex_element which only differ in > the unused elements then it will cause unnecessary recompile (I don't > think that's really something which matters here). > > > > > That is, it seems to be that the key should have the number of elements > > from pipe_vertex_element information, but only copy those that vertex > > shader uses. > That doesn't sound very good. If we want to dump the > pipe_vertex_elements like we do now either we need to fix up the > nr_elements or copy all of them. Also vs_generate function seems to > create code for all pipe_vertex_elements, not just those used by the shader. > I guess that instead of using nr_elements we could just use the > information from the shader instead consistently, though I'm actually > unsure this works always - is it somehow possible to only use > vertex_element nr 2 and 4 for instance? Fair enough. Let's get this is as is for now, and keep our eyes open for any performance regression. Jose > So I think you're suggesting instead of this fix that key->nr_elements > wouldn't be used for anything except the key comparison itself, and > everything else (calculating sampler offset in the key, tgsi dump, code > generation) would use the shader information? ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 0/3] radeonsi: Cherry-pick transfer fixes from r600g
On Tue, Feb 19, 2013 at 12:15 PM, Michel Dänzer wrote: > These together get us 11 more little piglits with Marek's > glTex(Sub)Image improvements in st/mesa. > > [PATCH 1/3] radeonsi: use u_box_origin_2d helper function > [PATCH 2/3] radeonsi: add assertions to prevent creation of invalid > [PATCH 3/3] radeonsi: implement 3D transfers For the series: Reviewed-by: Alex Deucher ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] draw: make sure key size is calculated consistently.
Am 19.02.2013 18:54, schrieb Jose Fonseca: > > > - Original Message - >> Am 19.02.2013 15:57, schrieb Jose Fonseca: >>> There may be more vertex elements that used in the shader. But why should >>> the key contain those elements? Won't this cause needless recompilations >>> (e.g., in situations where the state tracker leaves unneeded elements from >>> previous draw?)? >> I don't think the state tracker would leave unneeded elements like that >> (that is I think the nr_elements would be adjusted if the state tracker >> has to figure it out on its own, causing recompiles in any case). >> But yes if you set different pipe_vertex_element which only differ in >> the unused elements then it will cause unnecessary recompile (I don't >> think that's really something which matters here). >> >>> >>> That is, it seems to be that the key should have the number of elements >>> from pipe_vertex_element information, but only copy those that vertex >>> shader uses. >> That doesn't sound very good. If we want to dump the >> pipe_vertex_elements like we do now either we need to fix up the >> nr_elements or copy all of them. Also vs_generate function seems to >> create code for all pipe_vertex_elements, not just those used by the shader. >> I guess that instead of using nr_elements we could just use the >> information from the shader instead consistently, though I'm actually >> unsure this works always - is it somehow possible to only use >> vertex_element nr 2 and 4 for instance? > > Fair enough. Let's get this is as is for now, and keep our eyes open for any > performance regression. No I realised you are actually right. The correct thing to do is indeed just use the shader information for nr_vertex_elements. This is a simpler change, it gets rid of the unnecessary dumping of unused elements automatically, and should avoid unnecessary recompiles (even if that's probably more of a theoretical case). I noticed the shader generation actually didn't use these values in any case (although it could (should?) which is why this worked (so not just by luck). (Storing nr_vertex_elements in the key is actually unneeded now really but it's used in quite some places and it looks like a hassle at least for some of those places to get the shader information instead.) I'll send out a new patch... Roland ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] draw: make sure key size is calculated consistently.
From: Roland Scheidegger Some parts calculated key size by using shader information, others by using the pipe_vertex_element information. Since it is perfectly valid to have more vertex_elements set than the vertex shader is using those may not be the same, so we weren't copying over all vertex_element state - this caused the tgsi dump to assert (iterates over all vertex elements). More importantly in this situation it would also break vertex texturing completely (since the sampler state derived from the key is at a different position than expected). Fix thix by deriving key->nr_vertex_elements from the shader information instead of the pipe_vertex_element state (unlike dx10, we can't have "holes" in pipe_vertex_element state, so this should be safe). (Note that actual llvm shader generation does not use the pipe_vertex_element state from the key itself in any case (althogh I guess it could) but uses the one from draw.pt (which should be the same though contains all elements) instead.) --- src/gallium/auxiliary/draw/draw_llvm.c | 14 +- 1 file changed, 9 insertions(+), 5 deletions(-) diff --git a/src/gallium/auxiliary/draw/draw_llvm.c b/src/gallium/auxiliary/draw/draw_llvm.c index f3b..2467e5a 100644 --- a/src/gallium/auxiliary/draw/draw_llvm.c +++ b/src/gallium/auxiliary/draw/draw_llvm.c @@ -420,8 +420,8 @@ draw_llvm_destroy(struct draw_llvm *llvm) */ struct draw_llvm_variant * draw_llvm_create_variant(struct draw_llvm *llvm, -unsigned num_inputs, -const struct draw_llvm_variant_key *key) + unsigned num_inputs, + const struct draw_llvm_variant_key *key) { struct draw_llvm_variant *variant; struct llvm_vertex_shader *shader = @@ -429,8 +429,8 @@ draw_llvm_create_variant(struct draw_llvm *llvm, LLVMTypeRef vertex_header; variant = MALLOC(sizeof *variant + - shader->variant_key_size - - sizeof variant->key); +shader->variant_key_size - +sizeof variant->key); if (variant == NULL) return NULL; @@ -1415,8 +1415,12 @@ draw_llvm_make_variant_key(struct draw_llvm *llvm, char *store) /* Presumably all variants of the shader should have the same * number of vertex elements - ie the number of shader inputs. +* NOTE: we NEED to store the needed number of needed inputs +* here, not the number of provided elements to match keysize +* (and the offset of sampler state in the key). */ - key->nr_vertex_elements = llvm->draw->pt.nr_vertex_elements; + key->nr_vertex_elements = llvm->draw->vs.vertex_shader->info.file_max[TGSI_FILE_INPUT] + 1; + assert(key->nr_vertex_elements <= llvm->draw->pt.nr_vertex_elements); /* will have to rig this up properly later */ key->clip_xy = llvm->draw->clip_xy; -- 1.7.9.5 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] st/mesa: fix trimming of GL_QUAD_STRIP
We sometimes convert GL_QUAD_STRIP prims into GL_TRIANGLE_STRIP, but that changes the results of the u_trim_pipe_prim() call. We need to pass the original primitive type to the trim function. Note that OpenGL's GL_x prim type values match Gallium's PIPE_PRIM_x values. Fixes a failure in the new piglit degenerate-prims test. Note: This is a candidate for the stable branches. --- src/mesa/state_tracker/st_draw.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/src/mesa/state_tracker/st_draw.c b/src/mesa/state_tracker/st_draw.c index de62264..bff8d9b 100644 --- a/src/mesa/state_tracker/st_draw.c +++ b/src/mesa/state_tracker/st_draw.c @@ -283,7 +283,7 @@ st_draw_vbo(struct gl_context *ctx, /* don't trim, restarts might be inside index list */ cso_draw_vbo(st->cso_context, &info); } - else if (u_trim_pipe_prim(info.mode, &info.count)) + else if (u_trim_pipe_prim(prims[i].mode, &info.count)) cso_draw_vbo(st->cso_context, &info); } -- 1.7.3.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] r600g: status of my work on the shader optimization
On 02/19/2013 08:39 PM, Andy Furniss wrote: Vadim Girlin wrote: Could you please test glxgears and other simple mesa demos? It's easier to spot the problems with small apps that don't use a lot of complex shaders. If some of them don't work correctly, please send me the dumps with "R600_DUMP_SHADERS=2 R600_SB_DUMP=3". All of the mesa demos work with and without llvm. Also it might help if you can look for piglit regressions against the piglit results with R600_SB=0 and send me the dumps for a few regressed tests. I don't actually have piglit - it was always a pain with cmake to get it to build on my old 32bit lfs with xorg/mesa installed under home. I do now have a new 64bit clfs build with everything in normal places - so maybe I'll give it a go on that - but I don't know how to use it as such. Even though it's "new" clfs uses gcc 4.6.3 so on there g++ is actually too old to build your tree - without changing some friends to friends class ... This should be fixed already. I don't know when I'll get time to learn piglit but for now here's working and not nexuiz. R600_DUMP_SHADERS=2 R600_SB_DUMP=3 nexuiz &> nexuiz-working-dump http://www.andyqos.ukfsn.org/nexuiz-working-dump R600_LLVM=0 R600_DUMP_SHADERS=2 R600_SB_DUMP=3 nexuiz &> nexuiz-corrupt-dump http://www.andyqos.ukfsn.org/nexuiz-corrupt-dump OK, I already got the dumps with piglit regressions on r700, the dump with nexuiz may also help. Thanks. Vadim ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] llvmpipe: lp_resource_copy cleanup
From: Roland Scheidegger We don't need to flush resources for each layer, and since we don't actually care about layer at all in the flush function just drop the parameter. Also we can use util_copy_box instead of repeated util_copy_rect. --- src/gallium/drivers/llvmpipe/lp_flush.c |3 +- src/gallium/drivers/llvmpipe/lp_flush.h |1 - src/gallium/drivers/llvmpipe/lp_surface.c | 87 +++-- src/gallium/drivers/llvmpipe/lp_texture.c |3 +- src/gallium/drivers/llvmpipe/lp_texture.h |2 +- 5 files changed, 47 insertions(+), 49 deletions(-) diff --git a/src/gallium/drivers/llvmpipe/lp_flush.c b/src/gallium/drivers/llvmpipe/lp_flush.c index 964b792..cbfe564 100644 --- a/src/gallium/drivers/llvmpipe/lp_flush.c +++ b/src/gallium/drivers/llvmpipe/lp_flush.c @@ -98,7 +98,6 @@ boolean llvmpipe_flush_resource(struct pipe_context *pipe, struct pipe_resource *resource, unsigned level, -int layer, boolean read_only, boolean cpu_access, boolean do_not_block, @@ -106,7 +105,7 @@ llvmpipe_flush_resource(struct pipe_context *pipe, { unsigned referenced; - referenced = llvmpipe_is_resource_referenced(pipe, resource, level, layer); + referenced = llvmpipe_is_resource_referenced(pipe, resource, level); if ((referenced & LP_REFERENCED_FOR_WRITE) || ((referenced & LP_REFERENCED_FOR_READ) && !read_only)) { diff --git a/src/gallium/drivers/llvmpipe/lp_flush.h b/src/gallium/drivers/llvmpipe/lp_flush.h index efff94c..bc1e2a8 100644 --- a/src/gallium/drivers/llvmpipe/lp_flush.h +++ b/src/gallium/drivers/llvmpipe/lp_flush.h @@ -47,7 +47,6 @@ boolean llvmpipe_flush_resource(struct pipe_context *pipe, struct pipe_resource *resource, unsigned level, -int layer, boolean read_only, boolean cpu_access, boolean do_not_block, diff --git a/src/gallium/drivers/llvmpipe/lp_surface.c b/src/gallium/drivers/llvmpipe/lp_surface.c index dbaed95..a83a903 100644 --- a/src/gallium/drivers/llvmpipe/lp_surface.c +++ b/src/gallium/drivers/llvmpipe/lp_surface.c @@ -57,14 +57,12 @@ lp_resource_copy(struct pipe_context *pipe, struct pipe_resource *src, unsigned src_level, const struct pipe_box *src_box) { - /* XXX this used to ignore srcz/dstz -* assume it works the same for cube and 3d -*/ struct llvmpipe_resource *src_tex = llvmpipe_resource(src); struct llvmpipe_resource *dst_tex = llvmpipe_resource(dst); const enum pipe_format format = src_tex->base.format; unsigned width = src_box->width; unsigned height = src_box->height; + unsigned depth = src_box->depth; unsigned z; /* Fallback for buffers. */ @@ -74,27 +72,28 @@ lp_resource_copy(struct pipe_context *pipe, return; } + llvmpipe_flush_resource(pipe, + dst, dst_level, + FALSE, /* read_only */ + TRUE, /* cpu_access */ + FALSE, /* do_not_block */ + "blit dest"); + + llvmpipe_flush_resource(pipe, + src, src_level, + TRUE, /* read_only */ + TRUE, /* cpu_access */ + FALSE, /* do_not_block */ + "blit src"); + + /* + printf("surface copy from %u lvl %u to %u lvl %u: %u,%u,%u to %u,%u,%u %u x %u x %u\n", + src_tex->id, src_level, dst_tex->id, dst_level, + src_box->x, src_box->y, src_box->z, dstx, dsty, dstz, + src_box->width, src_box->height, src_box->depth); + */ + for (z = 0; z < src_box->depth; z++){ - llvmpipe_flush_resource(pipe, - dst, dst_level, dstz + z, - FALSE, /* read_only */ - TRUE, /* cpu_access */ - FALSE, /* do_not_block */ - "blit dest"); - - llvmpipe_flush_resource(pipe, - src, src_level, src_box->z + z, - TRUE, /* read_only */ - TRUE, /* cpu_access */ - FALSE, /* do_not_block */ - "blit src"); - - /* - printf("surface copy from %u lvl %u to %u lvl %u: %u,%u,%u to %u,%u,%u %u x %u x %u\n", - src_tex->id, src_level, dst_tex->id, dst_level, - src_box->x, src_box->y, src_box->z, dstx, dsty, dstz, - src_box->width, src_box->height, src_box->depth); - */ /* set src tiles to linear layout */ { @@ -148,27 +147,29 @@ lp_resource_copy(struct pipe_context *pipe, } }
[Mesa-dev] [PATCH] gallivm: fix indirect src register fetches requiring bitcast
From: Roland Scheidegger For constant and temporary register fetches, the bitcasts weren't done correctly for the indirect case, leading to crashes due to type mismatches. Simply do the bitcasts after fetching (much simpler than fixing up the load pointer for the various cases). This fixes https://bugs.freedesktop.org/show_bug.cgi?id=61036 --- src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c | 37 ++- 1 file changed, 16 insertions(+), 21 deletions(-) diff --git a/src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c b/src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c index ae4a577..69957fe 100644 --- a/src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c +++ b/src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c @@ -603,10 +603,10 @@ emit_fetch_constant( LLVMBuilderRef builder = gallivm->builder; struct lp_build_context *uint_bld = &bld_base->uint_bld; LLVMValueRef indirect_index = NULL; - struct lp_build_context *bld_fetch = stype_to_fetch(bld_base, stype); unsigned dimension = 0; LLVMValueRef dimension_index; LLVMValueRef consts_ptr; + LLVMValueRef res; /* XXX: Handle fetching xyzw components as a vector */ assert(swizzle != ~0); @@ -637,7 +637,7 @@ emit_fetch_constant( index_vec = lp_build_add(uint_bld, index_vec, swizzle_vec); /* Gather values from the constant buffer */ - return build_gather(bld_fetch, consts_ptr, index_vec); + res = build_gather(&bld_base->base, consts_ptr, index_vec); } else { LLVMValueRef index; /* index into the const buffer */ @@ -646,18 +646,16 @@ emit_fetch_constant( index = lp_build_const_int32(gallivm, reg->Register.Index*4 + swizzle); scalar_ptr = LLVMBuildGEP(builder, consts_ptr, - &index, 1, ""); - - if (stype != TGSI_TYPE_FLOAT && stype != TGSI_TYPE_UNTYPED) { - LLVMTypeRef ivtype = LLVMPointerType(LLVMInt32TypeInContext(gallivm->context), 0); - LLVMValueRef temp_ptr; - temp_ptr = LLVMBuildBitCast(builder, scalar_ptr, ivtype, ""); - scalar = LLVMBuildLoad(builder, temp_ptr, ""); - } else - scalar = LLVMBuildLoad(builder, scalar_ptr, ""); +&index, 1, ""); + scalar = LLVMBuildLoad(builder, scalar_ptr, ""); + res = lp_build_broadcast_scalar(&bld_base->base, scalar); + } - return lp_build_broadcast_scalar(bld_fetch, scalar); + if (stype == TGSI_TYPE_SIGNED || stype == TGSI_TYPE_UNSIGNED) { + struct lp_build_context *bld_fetch = stype_to_fetch(bld_base, stype); + res = LLVMBuildBitCast(builder, res, bld_fetch->vec_type, ""); } + return res; } static LLVMValueRef @@ -791,16 +789,13 @@ emit_fetch_temporary( } else { LLVMValueRef temp_ptr; - if (stype != TGSI_TYPE_FLOAT && stype != TGSI_TYPE_UNTYPED) { - LLVMTypeRef itype = LLVMPointerType(bld->bld_base.int_bld.vec_type, 0); - LLVMValueRef tint_ptr = lp_get_temp_ptr_soa(bld, reg->Register.Index, - swizzle); - temp_ptr = LLVMBuildBitCast(builder, tint_ptr, itype, ""); - } else - temp_ptr = lp_get_temp_ptr_soa(bld, reg->Register.Index, swizzle); + temp_ptr = lp_get_temp_ptr_soa(bld, reg->Register.Index, swizzle); res = LLVMBuildLoad(builder, temp_ptr, ""); - if (!res) - return bld->bld_base.base.undef; + } + + if (stype == TGSI_TYPE_SIGNED || stype == TGSI_TYPE_UNSIGNED) { + struct lp_build_context *bld_fetch = stype_to_fetch(bld_base, stype); + res = LLVMBuildBitCast(builder, res, bld_fetch->vec_type, ""); } return res; -- 1.7.9.5 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] llvmpipe: lp_resource_copy cleanup
- Original Message - > From: Roland Scheidegger > > We don't need to flush resources for each layer, and since we don't actually > care about layer at all in the flush function just drop the parameter. > Also we can use util_copy_box instead of repeated util_copy_rect. > --- > src/gallium/drivers/llvmpipe/lp_flush.c |3 +- > src/gallium/drivers/llvmpipe/lp_flush.h |1 - > src/gallium/drivers/llvmpipe/lp_surface.c | 87 > +++-- > src/gallium/drivers/llvmpipe/lp_texture.c |3 +- > src/gallium/drivers/llvmpipe/lp_texture.h |2 +- > 5 files changed, 47 insertions(+), 49 deletions(-) > > diff --git a/src/gallium/drivers/llvmpipe/lp_flush.c > b/src/gallium/drivers/llvmpipe/lp_flush.c > index 964b792..cbfe564 100644 > --- a/src/gallium/drivers/llvmpipe/lp_flush.c > +++ b/src/gallium/drivers/llvmpipe/lp_flush.c > @@ -98,7 +98,6 @@ boolean > llvmpipe_flush_resource(struct pipe_context *pipe, > struct pipe_resource *resource, > unsigned level, > -int layer, > boolean read_only, > boolean cpu_access, > boolean do_not_block, > @@ -106,7 +105,7 @@ llvmpipe_flush_resource(struct pipe_context *pipe, > { > unsigned referenced; > > - referenced = llvmpipe_is_resource_referenced(pipe, resource, level, > layer); > + referenced = llvmpipe_is_resource_referenced(pipe, resource, level); > > if ((referenced & LP_REFERENCED_FOR_WRITE) || > ((referenced & LP_REFERENCED_FOR_READ) && !read_only)) { > diff --git a/src/gallium/drivers/llvmpipe/lp_flush.h > b/src/gallium/drivers/llvmpipe/lp_flush.h > index efff94c..bc1e2a8 100644 > --- a/src/gallium/drivers/llvmpipe/lp_flush.h > +++ b/src/gallium/drivers/llvmpipe/lp_flush.h > @@ -47,7 +47,6 @@ boolean > llvmpipe_flush_resource(struct pipe_context *pipe, > struct pipe_resource *resource, > unsigned level, > -int layer, > boolean read_only, > boolean cpu_access, > boolean do_not_block, > diff --git a/src/gallium/drivers/llvmpipe/lp_surface.c > b/src/gallium/drivers/llvmpipe/lp_surface.c > index dbaed95..a83a903 100644 > --- a/src/gallium/drivers/llvmpipe/lp_surface.c > +++ b/src/gallium/drivers/llvmpipe/lp_surface.c > @@ -57,14 +57,12 @@ lp_resource_copy(struct pipe_context *pipe, > struct pipe_resource *src, unsigned src_level, > const struct pipe_box *src_box) > { > - /* XXX this used to ignore srcz/dstz > -* assume it works the same for cube and 3d > -*/ > struct llvmpipe_resource *src_tex = llvmpipe_resource(src); > struct llvmpipe_resource *dst_tex = llvmpipe_resource(dst); > const enum pipe_format format = src_tex->base.format; > unsigned width = src_box->width; > unsigned height = src_box->height; > + unsigned depth = src_box->depth; > unsigned z; > > /* Fallback for buffers. */ > @@ -74,27 +72,28 @@ lp_resource_copy(struct pipe_context *pipe, >return; > } > > + llvmpipe_flush_resource(pipe, > + dst, dst_level, > + FALSE, /* read_only */ > + TRUE, /* cpu_access */ > + FALSE, /* do_not_block */ > + "blit dest"); > + > + llvmpipe_flush_resource(pipe, > + src, src_level, > + TRUE, /* read_only */ > + TRUE, /* cpu_access */ > + FALSE, /* do_not_block */ > + "blit src"); > + > + /* > + printf("surface copy from %u lvl %u to %u lvl %u: %u,%u,%u to %u,%u,%u %u > x %u x %u\n", > + src_tex->id, src_level, dst_tex->id, dst_level, > + src_box->x, src_box->y, src_box->z, dstx, dsty, dstz, > + src_box->width, src_box->height, src_box->depth); > + */ > + > for (z = 0; z < src_box->depth; z++){ > - llvmpipe_flush_resource(pipe, > - dst, dst_level, dstz + z, > - FALSE, /* read_only */ > - TRUE, /* cpu_access */ > - FALSE, /* do_not_block */ > - "blit dest"); > - > - llvmpipe_flush_resource(pipe, > - src, src_level, src_box->z + z, > - TRUE, /* read_only */ > - TRUE, /* cpu_access */ > - FALSE, /* do_not_block */ > - "blit src"); > - > - /* > - printf("surface copy from %u lvl %u to %u lvl %u: %u,%u,%u to %u,%u,%u > %u x %u x %u\n", > - src_tex->id, src_level, dst_tex->id, dst_level, > - src_box->x, src_b
Re: [Mesa-dev] [PATCH libdrm] freedreno: add freedreno DRM
Rob Clark writes: > From: Rob Clark > > The libdrm_freedreno helper layer for use by xf86-video-freedreno, > fdre (freedreno r/e library and tests for driving gpu), and eventual > gallium driver for the Adreno GPU. This uses the msm gpu driver > from QCOM's android kernel tree. > > Note that current msm kernel driver is a bit strange. It provides a > DRM interface for GEM, which is basically sufficient to have DRI2 > working. But it does not provide KMS. And interface to 2d and 3d > cores is via different other devices (/dev/kgsl-*). This is not > quite how I'd write a DRM driver, but at this stage it is useful for > xf86-video-freedreno and fdre (and eventual gallium driver) to be > able to work on existing kernel driver from QCOM, to allow to > capture cmdstream dumps from the binary blob drivers without having > to reboot. So libdrm_freedreno attempts to hide most of the crazy. > The intention is that when there is a proper kernel driver, it will > be mostly just changes in libdrm_freedreno to adapt the gallium > driver and xf86-video-freedreno (ignoring the fbdev->KMS changes). > > So don't look at freedreno as an example of how to write a libdrm > module or a DRM driver.. it is just an attempt to paper over a non- > standard kernel driver architecture. Yeah, at this stage I expect things to be kinda held together with duct tape and baling wire, and it's still worth having the code in git. Acked-by: Eric Anholt pgpbjUlhOpuYw.pgp Description: PGP signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 59331] piglit arb_uniform_buffer_object-getintegeri_v regression
https://bugs.freedesktop.org/show_bug.cgi?id=59331 Ian Romanick changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED --- Comment #3 from Ian Romanick --- Fixed by piglit commit: commit 7651a69e6c58d4d28373225a67ccac10468f2afe Author: Ian Romanick Date: Mon Jan 28 17:04:41 2013 -0800 GL_ARB_ubo/getintegeri_v: Respect implementation value of GL_UNIFORM_BUFFER_OFFSET_ALIGNMENT Signed-off-by: Ian Romanick Cc: Vinson Lee Cc: Fredrik Höglund Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=59331 -- You are receiving this mail because: You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 59331] piglit arb_uniform_buffer_object-getintegeri_v regression
https://bugs.freedesktop.org/show_bug.cgi?id=59331 Ian Romanick changed: What|Removed |Added CC||xunx.f...@intel.com --- Comment #4 from Ian Romanick --- *** Bug 61043 has been marked as a duplicate of this bug. *** -- You are receiving this mail because: You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] Need bench mark application for Opengles2 on mesa-8.0.4 with Linux
On 02/18/2013 02:31 AM, Ramesh Reddy Emmadi wrote: Hi, Can you please let us know is there any benchmark tool for opengles2 API's in Linux and Windows. Thanks and Regards, Ramesh CAUTION - Disclaimer * This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely for the use of the addressee(s). If you are not the intended recipient, please This is a public mailing list, so all of this is bogus. Remove it before posting again. If your mail server is so broken that it cannot do this, use a different e-mail provider. There are many fine, free options available. notify the sender by e-mail and delete the original message. Further, you are not to copy, disclose, or distribute this e-mail or its contents to any other person and any such actions are unlawful. This e-mail may contain viruses. Infosys has taken every reasonable precaution to minimize this risk, but is not liable for any damage you may sustain as a result of any virus in this e-mail. You should carry out your own virus checks before opening the e-mail or attachment. Infosys reserves the right to monitor and review the content of all messages sent to or from this e-mail address. Messages sent to or from this e-mail address may be stored on the Infosys e-mail system. ***INFOSYS End of Disclaimer INFOSYS*** ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] mesa: Don't install glEvalMesh in the beginend dispatch table
From: Ian Romanick NOTE: This is a candidate for the 9.1 branch. Signed-off-by: Ian Romanick Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=59740 Cc: Eric Anholt --- src/mesa/main/eval.c | 11 --- src/mesa/main/eval.h | 4 +++- src/mesa/main/vtxfmt.c | 10 +- 3 files changed, 16 insertions(+), 9 deletions(-) diff --git a/src/mesa/main/eval.c b/src/mesa/main/eval.c index 44b5792..b3c2841 100644 --- a/src/mesa/main/eval.c +++ b/src/mesa/main/eval.c @@ -824,7 +824,8 @@ _mesa_MapGrid2d( GLint un, GLdouble u1, GLdouble u2, void _mesa_install_eval_vtxfmt(struct _glapi_table *disp, - const GLvertexformat *vfmt) + const GLvertexformat *vfmt, + bool beginend) { SET_EvalCoord1f(disp, vfmt->EvalCoord1f); SET_EvalCoord1fv(disp, vfmt->EvalCoord1fv); @@ -833,8 +834,12 @@ _mesa_install_eval_vtxfmt(struct _glapi_table *disp, SET_EvalPoint1(disp, vfmt->EvalPoint1); SET_EvalPoint2(disp, vfmt->EvalPoint2); - SET_EvalMesh1(disp, vfmt->EvalMesh1); - SET_EvalMesh2(disp, vfmt->EvalMesh2); + /* glEvalMesh1 and glEvalMesh2 are not allowed between glBegin and glEnd. +*/ + if (!beginend) { + SET_EvalMesh1(disp, vfmt->EvalMesh1); + SET_EvalMesh2(disp, vfmt->EvalMesh2); + } } diff --git a/src/mesa/main/eval.h b/src/mesa/main/eval.h index 1b6c704..cfde53f 100644 --- a/src/mesa/main/eval.h +++ b/src/mesa/main/eval.h @@ -39,6 +39,7 @@ #include "main/mfeatures.h" #include "main/mtypes.h" +#include #define _MESA_INIT_EVAL_VTXFMT(vfmt, impl) \ @@ -76,7 +77,8 @@ extern GLfloat *_mesa_copy_map_points2d(GLenum target, extern void _mesa_install_eval_vtxfmt(struct _glapi_table *disp, - const GLvertexformat *vfmt); + const GLvertexformat *vfmt, + bool beginend); extern void _mesa_init_eval( struct gl_context *ctx ); extern void _mesa_free_eval_data( struct gl_context *ctx ); diff --git a/src/mesa/main/vtxfmt.c b/src/mesa/main/vtxfmt.c index 347d07d..8669c40 100644 --- a/src/mesa/main/vtxfmt.c +++ b/src/mesa/main/vtxfmt.c @@ -45,7 +45,7 @@ */ static void install_vtxfmt(struct gl_context *ctx, struct _glapi_table *tab, - const GLvertexformat *vfmt) + const GLvertexformat *vfmt, bool beginend) { assert(ctx->Version > 0); @@ -62,7 +62,7 @@ install_vtxfmt(struct gl_context *ctx, struct _glapi_table *tab, } if (ctx->API == API_OPENGL_COMPAT) { - _mesa_install_eval_vtxfmt(tab, vfmt); + _mesa_install_eval_vtxfmt(tab, vfmt, beginend); } if (ctx->API != API_OPENGL_CORE && ctx->API != API_OPENGLES2) { @@ -251,9 +251,9 @@ install_vtxfmt(struct gl_context *ctx, struct _glapi_table *tab, void _mesa_install_exec_vtxfmt(struct gl_context *ctx, const GLvertexformat *vfmt) { - install_vtxfmt( ctx, ctx->Exec, vfmt ); + install_vtxfmt(ctx, ctx->Exec, vfmt, false); if (ctx->BeginEnd) - install_vtxfmt( ctx, ctx->BeginEnd, vfmt ); + install_vtxfmt(ctx, ctx->BeginEnd, vfmt, true); } @@ -265,7 +265,7 @@ void _mesa_install_save_vtxfmt(struct gl_context *ctx, const GLvertexformat *vfmt) { if (_mesa_is_desktop_gl(ctx)) - install_vtxfmt( ctx, ctx->Save, vfmt ); + install_vtxfmt(ctx, ctx->Save, vfmt, false); } -- 1.7.11.7 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/2] i965: Consign COORD_REPLACE VS hacks to Pre-Gen6.
On 02/16/2013 07:29 AM, Paul Berry wrote: Pre-Gen6, the SF thread requires exact matching between VS output slots (aka VUE slots) and FS input slots, even when the corresponding VS output slot is unused due to being overwritten by point coordinate replacement (glTexEnvi(GL_POINT_SPRITE, GL_COORD_REPLACE, GL_TRUE)). As a result, we have a special hack in the VS to ensure when any texture coordinate is subject to point coordinate replacement, it is always allocated space in the VUE, even if it isn't written to by the VS. This hack isn't needed from Gen6 onwards, since SF (Gen7: SBE) swizzling has the ability to insert the point coordinate into gl_TexCoord[] without needing a corresponding unused VUE slot. Note that no modification of SF setup code is required for this patch--get_attr_override() already does the right thing. However, we make a slight comment change to clarify why this works. In addition to eliminating unnecessary VS recompiles and saving precious URB space on Gen6+, this will save us the trouble of having to adjust this hack when we implement geometry shaders. --- src/mesa/drivers/dri/i965/brw_vs.c| 22 -- src/mesa/drivers/dri/i965/brw_vs.h| 10 ++ src/mesa/drivers/dri/i965/gen6_sf_state.c | 13 - 3 files changed, 34 insertions(+), 11 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_vs.c b/src/mesa/drivers/dri/i965/brw_vs.c index 0810471..64659c0 100644 --- a/src/mesa/drivers/dri/i965/brw_vs.c +++ b/src/mesa/drivers/dri/i965/brw_vs.c @@ -258,15 +258,17 @@ do_vs_prog(struct brw_context *brw, c.prog_data.inputs_read |= VERT_BIT_EDGEFLAG; } - /* Put dummy slots into the VUE for the SF to put the replaced -* point sprite coords in. We shouldn't need these dummy slots, -* which take up precious URB space, but it would mean that the SF -* doesn't get nice aligned pairs of input coords into output -* coords, which would be a pain to handle. -*/ - for (i = 0; i < 8; i++) { - if (c.key.point_coord_replace & (1 << i)) -c.prog_data.outputs_written |= BITFIELD64_BIT(VERT_RESULT_TEX0 + i); + if (intel->gen < 6) { + /* Put dummy slots into the VUE for the SF to put the replaced + * point sprite coords in. We shouldn't need these dummy slots, + * which take up precious URB space, but it would mean that the SF + * doesn't get nice aligned pairs of input coords into output + * coords, which would be a pain to handle. + */ + for (i = 0; i < 8; i++) { + if (c.key.point_coord_replace & (1 << i)) +c.prog_data.outputs_written |= BITFIELD64_BIT(VERT_RESULT_TEX0 + i); + } This looks good to me. I wonder, thought, whether we could just move this into compute_vue_map()...just assign_vue_slot() some dummy slots in the gen 4/5 cases. Perhaps as a follow-on (if it's possible at all)? As is, Reviewed-by: Kenneth Graunke } brw_compute_vue_map(brw, &c); @@ -429,7 +431,7 @@ static void brw_upload_vs_prog(struct brw_context *brw) key.clamp_vertex_color = ctx->Light._ClampVertexColor; /* _NEW_POINT */ - if (ctx->Point.PointSprite) { + if (intel->gen < 6 && ctx->Point.PointSprite) { for (i = 0; i < 8; i++) { if (ctx->Point.CoordReplace[i]) key.point_coord_replace |= (1 << i); diff --git a/src/mesa/drivers/dri/i965/brw_vs.h b/src/mesa/drivers/dri/i965/brw_vs.h index 75c8a5f..caa8f7c 100644 --- a/src/mesa/drivers/dri/i965/brw_vs.h +++ b/src/mesa/drivers/dri/i965/brw_vs.h @@ -86,7 +86,17 @@ struct brw_vs_prog_key { GLuint userclip_planes_enabled_gen_4_5:MAX_CLIP_PLANES; GLuint copy_edgeflag:1; + + /** +* For pre-Gen6 hardware, a bitfield indicating which texture coordinates +* are going to be replaced with point coordinates (as a consequence of a +* call to glTexEnvi(GL_POINT_SPRITE, GL_COORD_REPLACE, GL_TRUE)). Because +* our SF thread requires exact matching between VS outputs and FS inputs, +* these texture coordinates will need to be unconditionally included in +* the VUE, even if they aren't written by the vertex shader. +*/ GLuint point_coord_replace:8; + GLuint clamp_vertex_color:1; struct brw_sampler_prog_key_data tex; diff --git a/src/mesa/drivers/dri/i965/gen6_sf_state.c b/src/mesa/drivers/dri/i965/gen6_sf_state.c index d88c49a..11c929c 100644 --- a/src/mesa/drivers/dri/i965/gen6_sf_state.c +++ b/src/mesa/drivers/dri/i965/gen6_sf_state.c @@ -78,7 +78,18 @@ get_attr_override(struct brw_vue_map *vue_map, int urb_entry_read_offset, if (slot == -1) { /* This attribute does not exist in the VUE--that means that the vertex - * shader did not write to it. Behavior is undefined in this case, so + * shader did not write to it. This means that either: + * + * (a) This attribute is a texture coordinate, and it is going to be + * replaced with point coordinates (as a co
Re: [Mesa-dev] [PATCH] meta: Allocate texture before initializing texture coordinates
On Fri, Feb 15, 2013 at 11:20 AM, Anuj Phogat wrote: > tex->Sright and tex->Ttop are initialized during texture allocation. > This fixes depth buffer blitting failures in khronos conformance tests > when run on desktop GL 3.0. > > Note: This is a candidate for stable branches. > > Signed-off-by: Anuj Phogat > --- > src/mesa/drivers/common/meta.c | 17 - > 1 files changed, 8 insertions(+), 9 deletions(-) > > diff --git a/src/mesa/drivers/common/meta.c b/src/mesa/drivers/common/meta.c > index 4e32b50..29a209e 100644 > --- a/src/mesa/drivers/common/meta.c > +++ b/src/mesa/drivers/common/meta.c > @@ -1910,6 +1910,14 @@ _mesa_meta_BlitFramebuffer(struct gl_context *ctx, >GLuint *tmp = malloc(srcW * srcH * sizeof(GLuint)); > >if (tmp) { > + > + newTex = alloc_texture(depthTex, srcW, srcH, GL_DEPTH_COMPONENT); > + _mesa_ReadPixels(srcX, srcY, srcW, srcH, GL_DEPTH_COMPONENT, > + GL_UNSIGNED_INT, tmp); > + setup_drawpix_texture(ctx, depthTex, newTex, GL_DEPTH_COMPONENT, > + srcW, srcH, GL_DEPTH_COMPONENT, > + GL_UNSIGNED_INT, tmp); > + > /* texcoords (after texture allocation!) */ > { > verts[0].s = 0.0F; > @@ -1928,15 +1936,6 @@ _mesa_meta_BlitFramebuffer(struct gl_context *ctx, > if (!blit->DepthFP) > init_blit_depth_pixels(ctx); > > - /* maybe change tex format here */ > - newTex = alloc_texture(depthTex, srcW, srcH, GL_DEPTH_COMPONENT); > - > - _mesa_ReadPixels(srcX, srcY, srcW, srcH, > - GL_DEPTH_COMPONENT, GL_UNSIGNED_INT, tmp); > - > - setup_drawpix_texture(ctx, depthTex, newTex, GL_DEPTH_COMPONENT, > srcW, srcH, > - GL_DEPTH_COMPONENT, GL_UNSIGNED_INT, tmp); > - > _mesa_BindProgramARB(GL_FRAGMENT_PROGRAM_ARB, blit->DepthFP); > _mesa_set_enable(ctx, GL_FRAGMENT_PROGRAM_ARB, GL_TRUE); > _mesa_ColorMask(GL_FALSE, GL_FALSE, GL_FALSE, GL_FALSE); > -- > 1.7.7.6 > This also fixes https://bugs.freedesktop.org/show_bug.cgi?id=59495 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] i965: Avoid segfault in gen6_upload_state
This fixes a bug introduced in commit 258453716f001eab1288d99765213 and triggered whenever "rb" is NULL. Fixes bug #59445: [SNB/IVB/HSW Bisected]Oglc draw-buffers2(advanced.blending.none) segfault https://bugs.freedesktop.org/show_bug.cgi?id=59445 --- I don't know under what conditions "rb" might be NULL, but it's clear that it's possible and expected as there is earlier code in this function that checks it, (and sets rb_type specifically in that case). So if someone could help me write a more descriptive commit message, that would be great. Also, I notice that similar code in brw_cc.c uses a different condition here: if (ctx->DrawBuffer->Visual.alphaBits == 0) { So an alternate fix could be to switch to something like that. Please let me know if one version or the other is cleaner, (and both could be made to match). src/mesa/drivers/dri/i965/gen6_cc.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/gen6_cc.c b/src/mesa/drivers/dri/i965/gen6_cc.c index d32f636..7ac5d5f 100644 --- a/src/mesa/drivers/dri/i965/gen6_cc.c +++ b/src/mesa/drivers/dri/i965/gen6_cc.c @@ -126,7 +126,7 @@ gen6_upload_blend_state(struct brw_context *brw) * not read the alpha channel, but will instead use the correct * implicit value for alpha. */ - if (!_mesa_base_format_has_channel(rb->_BaseFormat, GL_TEXTURE_ALPHA_TYPE)) + if (rb && !_mesa_base_format_has_channel(rb->_BaseFormat, GL_TEXTURE_ALPHA_TYPE)) { srcRGB = brw_fix_xRGB_alpha(srcRGB); srcA = brw_fix_xRGB_alpha(srcA); -- 1.7.10.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/9] glsl: Consolidate ir_expression constructors that use explicit types.
From: Kenneth Graunke Previously, we had separate constructors for one, two, and four operand expressions. This patch consolidates them into a single constructor which uses NULL default parameters. The unary and binary operator constructors had assertions to verify that the caller supplied the correct number of operands for the expression, but the four-operand version did not. Since get_num_operands for ir_quadop_vector returns the number of vector_elements, we can safely add that without breaking the semantics of ir_quadop_vector. This also paves the way for expressions with three operands. Currently, none can be constructed since get_num_operands() never returns 3. Reviewed-by: Matt Turner --- src/glsl/ir.cpp | 34 ++ src/glsl/ir.h | 13 - 2 files changed, 10 insertions(+), 37 deletions(-) diff --git a/src/glsl/ir.cpp b/src/glsl/ir.cpp index 954995d..4ccdc42 100644 --- a/src/glsl/ir.cpp +++ b/src/glsl/ir.cpp @@ -195,34 +195,6 @@ ir_assignment::ir_assignment(ir_rvalue *lhs, ir_rvalue *rhs, this->set_lhs(lhs); } - -ir_expression::ir_expression(int op, const struct glsl_type *type, -ir_rvalue *op0) -{ - assert(get_num_operands(ir_expression_operation(op)) == 1); - this->ir_type = ir_type_expression; - this->type = type; - this->operation = ir_expression_operation(op); - this->operands[0] = op0; - this->operands[1] = NULL; - this->operands[2] = NULL; - this->operands[3] = NULL; -} - -ir_expression::ir_expression(int op, const struct glsl_type *type, -ir_rvalue *op0, ir_rvalue *op1) -{ - assert(((op1 == NULL) && (get_num_operands(ir_expression_operation(op)) == 1)) - || (get_num_operands(ir_expression_operation(op)) == 2)); - this->ir_type = ir_type_expression; - this->type = type; - this->operation = ir_expression_operation(op); - this->operands[0] = op0; - this->operands[1] = op1; - this->operands[2] = NULL; - this->operands[3] = NULL; -} - ir_expression::ir_expression(int op, const struct glsl_type *type, ir_rvalue *op0, ir_rvalue *op1, ir_rvalue *op2, ir_rvalue *op3) @@ -234,6 +206,12 @@ ir_expression::ir_expression(int op, const struct glsl_type *type, this->operands[1] = op1; this->operands[2] = op2; this->operands[3] = op3; +#ifndef NDEBUG + int num_operands = get_num_operands(this->operation); + for (int i = num_operands; i < 4; i++) { + assert(this->operands[i] == NULL); + } +#endif } ir_expression::ir_expression(int op, ir_rvalue *op0) diff --git a/src/glsl/ir.h b/src/glsl/ir.h index 1e09988..d878bd8 100644 --- a/src/glsl/ir.h +++ b/src/glsl/ir.h @@ -1128,25 +1128,20 @@ enum ir_expression_operation { class ir_expression : public ir_rvalue { public: + ir_expression(int op, const struct glsl_type *type, + ir_rvalue *op0, ir_rvalue *op1 = NULL, + ir_rvalue *op2 = NULL, ir_rvalue *op3 = NULL); + /** * Constructor for unary operation expressions */ - ir_expression(int op, const struct glsl_type *type, ir_rvalue *); ir_expression(int op, ir_rvalue *); /** * Constructor for binary operation expressions */ - ir_expression(int op, const struct glsl_type *type, -ir_rvalue *, ir_rvalue *); ir_expression(int op, ir_rvalue *op0, ir_rvalue *op1); - /** -* Constructor for quad operator expressions -*/ - ir_expression(int op, const struct glsl_type *type, -ir_rvalue *, ir_rvalue *, ir_rvalue *, ir_rvalue *); - virtual ir_expression *as_expression() { return this; -- 1.7.8.6 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/9] glsl: Rework ir_reader to handle expressions with three operands.
From: Kenneth Graunke Reviewed-by: Matt Turner --- src/glsl/ir_reader.cpp | 45 +++-- 1 files changed, 19 insertions(+), 26 deletions(-) diff --git a/src/glsl/ir_reader.cpp b/src/glsl/ir_reader.cpp index 405e75b..4dec4e8 100644 --- a/src/glsl/ir_reader.cpp +++ b/src/glsl/ir_reader.cpp @@ -676,15 +676,16 @@ ir_reader::read_expression(s_expression *expr) { s_expression *s_type; s_symbol *s_op; - s_expression *s_arg1; + s_expression *s_arg[3]; - s_pattern pat[] = { "expression", s_type, s_op, s_arg1 }; + s_pattern pat[] = { "expression", s_type, s_op, s_arg[0] }; if (!PARTIAL_MATCH(expr, pat)) { ir_read_error(expr, "expected (expression " " [])"); return NULL; } - s_expression *s_arg2 = (s_expression *) s_arg1->next; // may be tail sentinel + s_arg[1] = (s_expression *) s_arg[0]->next; // may be tail sentinel + s_arg[2] = (s_expression *) s_arg[1]->next; // may be tail sentinel or NULL const glsl_type *type = read_type(s_type); if (type == NULL) @@ -697,35 +698,27 @@ ir_reader::read_expression(s_expression *expr) return NULL; } - unsigned num_operands = ir_expression::get_num_operands(op); - if (num_operands == 1 && !s_arg1->next->is_tail_sentinel()) { - ir_read_error(expr, "expected (expression %s )", - s_op->value()); + int num_operands = -3; /* skip "expression" */ + foreach_list(n, &((s_list *) expr)->subexpressions) + ++num_operands; + + int expected_operands = ir_expression::get_num_operands(op); + if (num_operands != expected_operands) { + ir_read_error(expr, "found %d expression operands, expected %d", +num_operands, expected_operands); return NULL; } - ir_rvalue *arg1 = read_rvalue(s_arg1); - ir_rvalue *arg2 = NULL; - if (arg1 == NULL) { - ir_read_error(NULL, "when reading first operand of %s", s_op->value()); - return NULL; - } - - if (num_operands == 2) { - if (s_arg2->is_tail_sentinel() || !s_arg2->next->is_tail_sentinel()) { -ir_read_error(expr, "expected (expression %s " -")", s_op->value()); -return NULL; - } - arg2 = read_rvalue(s_arg2); - if (arg2 == NULL) { -ir_read_error(NULL, "when reading second operand of %s", - s_op->value()); -return NULL; + ir_rvalue *arg[3] = {NULL, NULL, NULL}; + for (int i = 0; i < num_operands; i++) { + arg[i] = read_rvalue(s_arg[i]); + if (arg[i] == NULL) { + ir_read_error(NULL, "when reading operand #%d of %s", i, s_op->value()); + return NULL; } } - return new(mem_ctx) ir_expression(op, type, arg1, arg2); + return new(mem_ctx) ir_expression(op, type, arg[0], arg[1], arg[2]); } ir_swizzle * -- 1.7.8.6 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 3/9] glsl: Convert mix() to use a new ir_triop_lrp opcode.
From: Kenneth Graunke Many GPUs have an instruction to do linear interpolation which is more efficient than simply performing the algebra necessary (two multiplies, an add, and a subtract). Pattern matching or peepholing this is more desirable, but can be tricky. By using an opcode, we can at least make shaders which use the mix() built-in get the more efficient behavior. Currently, all consumers lower ir_triop_lrp. Subsequent patches will actually generate different code. v2 [mattst88]: - Add LRP_TO_ARITH flag to ir_to_mesa.cpp. Will be removed in a subsequent patch and ir_triop_lrp translated directly. Reviewed-by: Matt Turner --- src/glsl/builtins/ir/mix.ir| 14 +- src/glsl/ir.cpp|4 +++ src/glsl/ir.h |7 + src/glsl/ir_constant_expression.cpp| 13 ++ src/glsl/ir_optimization.h |1 + src/glsl/ir_validate.cpp |6 src/glsl/lower_instructions.cpp| 35 src/mesa/drivers/dri/i965/brw_shader.cpp |3 +- src/mesa/program/ir_to_mesa.cpp|6 - src/mesa/state_tracker/st_glsl_to_tgsi.cpp |1 + 10 files changed, 81 insertions(+), 9 deletions(-) diff --git a/src/glsl/builtins/ir/mix.ir b/src/glsl/builtins/ir/mix.ir index 70ae13c..e666532 100644 --- a/src/glsl/builtins/ir/mix.ir +++ b/src/glsl/builtins/ir/mix.ir @@ -4,49 +4,49 @@ (declare (in) float arg0) (declare (in) float arg1) (declare (in) float arg2)) - ((return (expression float + (expression float * (var_ref arg0) (expression float - (constant float (1.00)) (var_ref arg2))) (expression float * (var_ref arg1) (var_ref arg2)) + ((return (expression float lrp (var_ref arg0) (var_ref arg1) (var_ref arg2) (signature vec2 (parameters (declare (in) vec2 arg0) (declare (in) vec2 arg1) (declare (in) vec2 arg2)) - ((return (expression vec2 + (expression vec2 * (var_ref arg0) (expression vec2 - (constant float (1.00)) (var_ref arg2))) (expression vec2 * (var_ref arg1) (var_ref arg2)) + ((return (expression vec2 lrp (var_ref arg0) (var_ref arg1) (var_ref arg2) (signature vec3 (parameters (declare (in) vec3 arg0) (declare (in) vec3 arg1) (declare (in) vec3 arg2)) - ((return (expression vec3 + (expression vec3 * (var_ref arg0) (expression vec3 - (constant float (1.00)) (var_ref arg2))) (expression vec3 * (var_ref arg1) (var_ref arg2)) + ((return (expression vec3 lrp (var_ref arg0) (var_ref arg1) (var_ref arg2) (signature vec4 (parameters (declare (in) vec4 arg0) (declare (in) vec4 arg1) (declare (in) vec4 arg2)) - ((return (expression vec4 + (expression vec4 * (var_ref arg0) (expression vec4 - (constant float (1.00)) (var_ref arg2))) (expression vec4 * (var_ref arg1) (var_ref arg2)) + ((return (expression vec4 lrp (var_ref arg0) (var_ref arg1) (var_ref arg2) (signature vec2 (parameters (declare (in) vec2 arg0) (declare (in) vec2 arg1) (declare (in) float arg2)) - ((return (expression vec2 + (expression vec2 * (var_ref arg0) (expression float - (constant float (1.00)) (var_ref arg2))) (expression vec2 * (var_ref arg1) (var_ref arg2)) + ((return (expression vec2 lrp (var_ref arg0) (var_ref arg1) (var_ref arg2) (signature vec3 (parameters (declare (in) vec3 arg0) (declare (in) vec3 arg1) (declare (in) float arg2)) - ((return (expression vec3 + (expression vec3 * (var_ref arg0) (expression float - (constant float (1.00)) (var_ref arg2))) (expression vec3 * (var_ref arg1) (var_ref arg2)) + ((return (expression vec3 lrp (var_ref arg0) (var_ref arg1) (var_ref arg2) (signature vec4 (parameters (declare (in) vec4 arg0) (declare (in) vec4 arg1) (declare (in) float arg2)) - ((return (expression vec4 + (expression vec4 * (var_ref arg0) (expression float - (constant float (1.00)) (var_ref arg2))) (expression vec4 * (var_ref arg1) (var_ref arg2)) + ((return (expression vec4 lrp (var_ref arg0) (var_ref arg1) (var_ref arg2) (signature float (parameters diff --git a/src/glsl/ir.cpp b/src/glsl/ir.cpp index 4ccdc42..717d6f6 100644 --- a/src/glsl/ir.cpp +++ b/src/glsl/ir.cpp @@ -416,6 +416,9 @@ ir_expression::get_num_operands(ir_expression_operation op) if (op <= ir_last_binop) return 2; + if (op <= ir_last_triop) + return 3; + if (op == ir_quadop_vector) return 4; @@ -502,6 +505,7 @@ static const char *const operator_strs[] = { "pow", "packHalf2x16_split", "ubo_load", + "lrp", "vector", }; diff --git a/src/glsl/ir.h b/src/glsl/ir.h index d878bd8..d63dac1 100644 --- a/src/glsl/ir.h +++ b/src
[Mesa-dev] [PATCH 4/9] glsl: Optimize ir_triop_lrp(x, y, a) with a = 0.0f or 1.0f
--- src/glsl/opt_algebraic.cpp | 16 +--- 1 files changed, 13 insertions(+), 3 deletions(-) diff --git a/src/glsl/opt_algebraic.cpp b/src/glsl/opt_algebraic.cpp index 75948db..952941e 100644 --- a/src/glsl/opt_algebraic.cpp +++ b/src/glsl/opt_algebraic.cpp @@ -186,12 +186,12 @@ ir_algebraic_visitor::swizzle_if_required(ir_expression *expr, ir_rvalue * ir_algebraic_visitor::handle_expression(ir_expression *ir) { - ir_constant *op_const[2] = {NULL, NULL}; - ir_expression *op_expr[2] = {NULL, NULL}; + ir_constant *op_const[3] = {NULL, NULL, NULL}; + ir_expression *op_expr[3] = {NULL, NULL, NULL}; ir_expression *temp; unsigned int i; - assert(ir->get_num_operands() <= 2); + assert(ir->get_num_operands() <= 3); for (i = 0; i < ir->get_num_operands(); i++) { if (ir->operands[i]->type->is_matrix()) return ir; @@ -415,6 +415,16 @@ ir_algebraic_visitor::handle_expression(ir_expression *ir) break; + case ir_triop_lrp: + if (is_vec_zero(op_const[2])) { +this->progress = true; +return swizzle_if_required(ir, ir->operands[0]); + } else if (is_vec_one(op_const[2])) { +this->progress = true; +return swizzle_if_required(ir, ir->operands[1]); + } + break; + default: break; } -- 1.7.8.6 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 5/9] i965: Add support for emitting the LRP instruction.
From: Kenneth Graunke Like MAD, this is another three-source instruction. Reviewed-by: Matt Turner --- src/mesa/drivers/dri/i965/brw_defines.h |1 + src/mesa/drivers/dri/i965/brw_disasm.c |1 + src/mesa/drivers/dri/i965/brw_eu.h |1 + src/mesa/drivers/dri/i965/brw_eu_emit.c |1 + 4 files changed, 4 insertions(+), 0 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_defines.h b/src/mesa/drivers/dri/i965/brw_defines.h index 79cc12f..d0794c8 100644 --- a/src/mesa/drivers/dri/i965/brw_defines.h +++ b/src/mesa/drivers/dri/i965/brw_defines.h @@ -685,6 +685,7 @@ enum opcode { BRW_OPCODE_LINE = 89, BRW_OPCODE_PLN =90, BRW_OPCODE_MAD =91, + BRW_OPCODE_LRP =92, BRW_OPCODE_NOP =126, /* These are compiler backend opcodes that get translated into other diff --git a/src/mesa/drivers/dri/i965/brw_disasm.c b/src/mesa/drivers/dri/i965/brw_disasm.c index 50551f4..8736764 100644 --- a/src/mesa/drivers/dri/i965/brw_disasm.c +++ b/src/mesa/drivers/dri/i965/brw_disasm.c @@ -50,6 +50,7 @@ const struct opcode_desc opcode_descs[128] = { [BRW_OPCODE_LINE] = { .name = "line", .nsrc = 2, .ndst = 1 }, [BRW_OPCODE_PLN] = { .name = "pln", .nsrc = 2, .ndst = 1 }, [BRW_OPCODE_MAD] = { .name = "mad", .nsrc = 3, .ndst = 1 }, +[BRW_OPCODE_LRP] = { .name = "lrp", .nsrc = 3, .ndst = 1 }, [BRW_OPCODE_SAD2] = { .name = "sad2", .nsrc = 2, .ndst = 1 }, [BRW_OPCODE_SADA2] = { .name = "sada2", .nsrc = 2, .ndst = 1 }, [BRW_OPCODE_DP4] = { .name = "dp4", .nsrc = 2, .ndst = 1 }, diff --git a/src/mesa/drivers/dri/i965/brw_eu.h b/src/mesa/drivers/dri/i965/brw_eu.h index adb3c4d..b6e2bee 100644 --- a/src/mesa/drivers/dri/i965/brw_eu.h +++ b/src/mesa/drivers/dri/i965/brw_eu.h @@ -174,6 +174,7 @@ ALU2(DP3) ALU2(DP2) ALU2(LINE) ALU2(PLN) +ALU3(LRP) ALU3(MAD) ROUND(RNDZ) diff --git a/src/mesa/drivers/dri/i965/brw_eu_emit.c b/src/mesa/drivers/dri/i965/brw_eu_emit.c index f2dcbeb..8cdbb21 100644 --- a/src/mesa/drivers/dri/i965/brw_eu_emit.c +++ b/src/mesa/drivers/dri/i965/brw_eu_emit.c @@ -924,6 +924,7 @@ ALU2(DP2) ALU2(LINE) ALU2(PLN) ALU3(MAD) +ALU3(LRP) ROUND(RNDZ) ROUND(RNDE) -- 1.7.8.6 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 6/9] i965/fs: Use the LRP instruction for ir_triop_lrp when possible.
From: Kenneth Graunke v2 [mattst88]: - Add BRW_OPCODE_LRP to list of CSE-able expressions. - Fix op_var[] array size. - Rename arguments to emit_lrp to (x, y, a) to clear confusion. - Add LRP function to brw_fs.cpp/.h. - Corrected comment about LRP instruction arguments in emit_lrp. Reviewed-by: Matt Turner --- src/mesa/drivers/dri/i965/brw_fs.cpp |8 src/mesa/drivers/dri/i965/brw_fs.h |2 + .../dri/i965/brw_fs_channel_expressions.cpp| 16 - src/mesa/drivers/dri/i965/brw_fs_cse.cpp |1 + src/mesa/drivers/dri/i965/brw_fs_emit.cpp | 15 +++-- src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 35 ++-- src/mesa/drivers/dri/i965/brw_shader.cpp |2 +- 7 files changed, 71 insertions(+), 8 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index c1ccd92..bdb6616 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -146,6 +146,13 @@ fs_inst::fs_inst(enum opcode opcode, fs_reg dst, return new(mem_ctx) fs_inst(BRW_OPCODE_##op, dst, src0, src1);\ } +#define ALU3(op)\ + fs_inst *\ + fs_visitor::op(fs_reg dst, fs_reg src0, fs_reg src1, fs_reg src2)\ + {\ + return new(mem_ctx) fs_inst(BRW_OPCODE_##op, dst, src0, src1, src2);\ + } + ALU1(NOT) ALU1(MOV) ALU1(FRC) @@ -161,6 +168,7 @@ ALU2(XOR) ALU2(SHL) ALU2(SHR) ALU2(ASR) +ALU3(LRP) /** Gen4 predicated IF. */ fs_inst * diff --git a/src/mesa/drivers/dri/i965/brw_fs.h b/src/mesa/drivers/dri/i965/brw_fs.h index d5ebd51..9c1b359 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.h +++ b/src/mesa/drivers/dri/i965/brw_fs.h @@ -285,6 +285,7 @@ public: fs_inst *IF(fs_reg src0, fs_reg src1, uint32_t condition); fs_inst *CMP(fs_reg dst, fs_reg src0, fs_reg src1, uint32_t condition); + fs_inst *LRP(fs_reg dst, fs_reg a, fs_reg y, fs_reg x); fs_inst *DEP_RESOLVE_MOV(int grf); int type_size(const struct glsl_type *type); @@ -360,6 +361,7 @@ public: fs_reg fix_math_operand(fs_reg src); fs_inst *emit_math(enum opcode op, fs_reg dst, fs_reg src0); fs_inst *emit_math(enum opcode op, fs_reg dst, fs_reg src0, fs_reg src1); + void emit_lrp(fs_reg dst, fs_reg x, fs_reg y, fs_reg a); void emit_minmax(uint32_t conditionalmod, fs_reg dst, fs_reg src0, fs_reg src1); bool try_emit_saturate(ir_expression *ir); diff --git a/src/mesa/drivers/dri/i965/brw_fs_channel_expressions.cpp b/src/mesa/drivers/dri/i965/brw_fs_channel_expressions.cpp index ea06225..30d8d9b 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_channel_expressions.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_channel_expressions.cpp @@ -135,7 +135,7 @@ ir_channel_expressions_visitor::visit_leave(ir_assignment *ir) ir_expression *expr = ir->rhs->as_expression(); bool found_vector = false; unsigned int i, vector_elements = 1; - ir_variable *op_var[2]; + ir_variable *op_var[3]; if (!expr) return visit_continue; @@ -342,6 +342,20 @@ ir_channel_expressions_visitor::visit_leave(ir_assignment *ir) assert(!"not yet supported"); break; + case ir_triop_lrp: + for (i = 0; i < vector_elements; i++) { +ir_rvalue *op0 = get_element(op_var[0], i); +ir_rvalue *op1 = get_element(op_var[1], i); +ir_rvalue *op2 = get_element(op_var[2], i); + +assign(ir, i, new(mem_ctx) ir_expression(expr->operation, + element_type, + op0, + op1, + op2)); + } + break; + case ir_unop_pack_snorm_2x16: case ir_unop_pack_snorm_4x8: case ir_unop_pack_unorm_2x16: diff --git a/src/mesa/drivers/dri/i965/brw_fs_cse.cpp b/src/mesa/drivers/dri/i965/brw_fs_cse.cpp index 70c143a..0b74d2e 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_cse.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_cse.cpp @@ -66,6 +66,7 @@ is_expression(const fs_inst *const inst) case BRW_OPCODE_LINE: case BRW_OPCODE_PLN: case BRW_OPCODE_MAD: + case BRW_OPCODE_LRP: case FS_OPCODE_CINTERP: case FS_OPCODE_LINTERP: return true; diff --git a/src/mesa/drivers/dri/i965/brw_fs_emit.cpp b/src/mesa/drivers/dri/i965/brw_fs_emit.cpp index 3d1f3b3..38d6332 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_emit.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_emit.cpp @@ -1082,18 +1082,27 @@ fs_generator::generate_code(exec_list *instructions) break; case BRW_OPCODE_MAD: + case BRW_OPCODE_LRP: { + struct brw_instruction *(*brw_inst)(struct brw_compile *p, +
[Mesa-dev] [PATCH 7/9] i965/fp: Use the LRP instruction for OPCODE_LRP.
--- src/mesa/drivers/dri/i965/brw_fs_fp.cpp | 12 1 files changed, 4 insertions(+), 8 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs_fp.cpp b/src/mesa/drivers/dri/i965/brw_fs_fp.cpp index 5f5f6a9..50e63da 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_fp.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_fp.cpp @@ -316,14 +316,10 @@ fs_visitor::emit_fragment_program_code() case OPCODE_LRP: for (int i = 0; i < 4; i++) { if (fpi->DstReg.WriteMask & (1 << i)) { - fs_reg neg_src0 = regoffset(src[0], i); - neg_src0.negate = !neg_src0.negate; - fs_reg temp = fs_reg(this, glsl_type::float_type); - fs_reg temp2 = fs_reg(this, glsl_type::float_type); - emit(ADD(temp, neg_src0, fs_reg(1.0f))); - emit(MUL(temp, temp, regoffset(src[2], i))); - emit(MUL(temp2, regoffset(src[0], i), regoffset(src[1], i))); - emit(ADD(regoffset(dst, i), temp, temp2)); + fs_reg a = regoffset(src[0], i); + fs_reg y = regoffset(src[1], i); + fs_reg x = regoffset(src[2], i); + emit_lrp(regoffset(dst, i), x, y, a); } } break; -- 1.7.8.6 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 8/9] ir_to_mesa: Translate ir_triop_lrp to OPCODE_LRP.
--- src/mesa/program/ir_to_mesa.cpp |7 +-- 1 files changed, 5 insertions(+), 2 deletions(-) diff --git a/src/mesa/program/ir_to_mesa.cpp b/src/mesa/program/ir_to_mesa.cpp index 30305d2..5432323 100644 --- a/src/mesa/program/ir_to_mesa.cpp +++ b/src/mesa/program/ir_to_mesa.cpp @@ -1479,7 +1479,10 @@ ir_to_mesa_visitor::visit(ir_expression *ir) break; case ir_triop_lrp: - assert(!"ir_triop_lrp should have been lowered."); + /* ir_triop_lrp operands are (x, y, a) while + * OPCODE_LRP operands are (a, y, x) to match ARB_fragment_program. + */ + emit(ir, OPCODE_LRP, result_dst, op[2], op[1], op[0]); break; case ir_quadop_vector: @@ -2997,7 +3000,7 @@ _mesa_ir_link_shader(struct gl_context *ctx, struct gl_shader_program *prog) /* Lowering */ do_mat_op_to_vec(ir); lower_instructions(ir, (MOD_TO_FRACT | DIV_TO_MUL_RCP | EXP_TO_EXP2 -| LOG_TO_LOG2 | INT_DIV_TO_MUL_RCP | LRP_TO_ARITH +| LOG_TO_LOG2 | INT_DIV_TO_MUL_RCP | ((options->EmitNoPow) ? POW_TO_EXP2 : 0))); progress = do_lower_jumps(ir, true, true, options->EmitNoMainReturn, options->EmitNoCont, options->EmitNoLoops) || progress; -- 1.7.8.6 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 0/9] LRP
This series adds ir_triop_lrp to the IR. A few patches clear the way since it is the first 3-operand operator. The next patches - emit lrp from GLSL's mix() function; - optimize away the a = 0.0 and 1.0 cases; - add i965 support for emitting the LRP instruction in fragment shaders and fragment programs; - and directly translate ir_triop_lrp to OPCODE_LRP for IR-to-Mesa. >From Eric's shader-db: total instructions in shared programs: 1458134 -> 1450661 (-0.51%) instructions in affected programs: 224094 -> 216621 (-3.33%) There are some small increases, typically +2 or +4 instructions, in shader-db. I'll investigate further. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 9/9] i965/vs: Assert that ir_triop_lrp was lowered.
--- src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp |4 1 files changed, 4 insertions(+), 0 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp index ae4cf7d..a2bc9f5 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp @@ -1585,6 +1585,10 @@ vec4_visitor::visit(ir_expression *ir) break; } + case ir_triop_lrp: + assert(!"not reached: should be handled by lrp_to_arith"); + break; + case ir_quadop_vector: assert(!"not reached: should be handled by lower_quadop_vector"); break; -- 1.7.8.6 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 61149] New: Crash on Intel Sandybridge Mobile with Vertex Buffer Objects and select mode OpenGL rendering
https://bugs.freedesktop.org/show_bug.cgi?id=61149 Priority: medium Bug ID: 61149 Assignee: mesa-dev@lists.freedesktop.org Summary: Crash on Intel Sandybridge Mobile with Vertex Buffer Objects and select mode OpenGL rendering Severity: normal Classification: Unclassified OS: All Reporter: kal...@gmail.com Hardware: Other Status: NEW Version: 9.0 Component: Mesa core Product: Mesa Hi, this is an old bug we are encountering in blender. My system is Ubuntu 12.10, Intel® Core™ i5-2410M CPU @ 2.30GHz × 4, 4GB ram, Hybrid NVIDIA GT 540M - Intel(R) Sandybridge Mobile To reproduce, * start blender * enable VBO in user preferences (Ctrl-Alt-U) under the system tab. * try selecting any object on 3D viewport (right clicking on the default Cube for instance) My guess is that it has to do with select mode rendering and vertex buffer object combination. I would do a trunk compile but I am using optimus on my system and I fear I may destabilize something. If it is safe and would help, I could attempt it. The backtrace is: #0 0x7fffe26e4a3f in run_vp (ctx=, stage=) at ../../../../../src/mesa/tnl/t_vb_program.c:389 #1 0x7fffe26e192d in _tnl_run_pipeline (ctx=0x57078e0) at ../../../../../src/mesa/tnl/t_pipeline.c:163 #2 0x7fffe26e1f26 in _tnl_draw_prims (ctx=ctx@entry=0x57078e0, arrays=arrays@entry=0x7ffd5760, prim=prim@entry=0x7ffd63e0, nr_prims=nr_prims@entry=1, ib=ib@entry=0x7ffd5740, min_index=, max_index=0) at ../../../../../src/mesa/tnl/t_draw.c:524 #3 0x7fffe26e2a8f in _tnl_vbo_draw_prims (ctx=ctx@entry=0x57078e0, prim=prim@entry=0x7ffd63e0, nr_prims=nr_prims@entry=1, ib=ib@entry=0x7ffd5740, index_bounds_valid=index_bounds_valid@entry=1 '\001', min_index=min_index@entry=0, max_index=0, tfb_vertcount=tfb_vertcount@entry=0x0) at ../../../../../src/mesa/tnl/t_draw.c:424 #4 0x7fffe26d3301 in vbo_rebase_prims (ctx=ctx@entry=0x57078e0, arrays=arrays@entry=0x58572f8, prim=prim@entry=0x7ffd63e0, nr_prims=nr_prims@entry=1, ib=0x7ffd5740, ib@entry=0x7ffd63c0, min_index=min_index@entry=4294967295, max_index=max_index@entry=4294967295, draw=draw@entry=0x7fffe26e2a20 <_tnl_vbo_draw_prims>) at ../../../../../src/mesa/vbo/vbo_rebase.c:233 ---Type to continue, or q to quit--- #5 0x7fffe26e1b09 in _tnl_draw_prims (ctx=ctx@entry=0x57078e0, arrays=arrays@entry=0x58572f8, prim=prim@entry=0x7ffd63e0, nr_prims=nr_prims@entry=1, ib=ib@entry=0x7ffd63c0, min_index=4294967295, max_index=4294967295) at ../../../../../src/mesa/tnl/t_draw.c:467 #6 0x7fffe2b70ac3 in brw_draw_prims (ctx=0x57078e0, prim=0x7ffd63e0, nr_prims=1, ib=0x7ffd63c0, index_bounds_valid=, min_index=4294967295, max_index=4294967295, tfb_vertcount=0x0) at brw_draw.c:581 #7 0x7fffe26cf3aa in vbo_handle_primitive_restart ( ctx=ctx@entry=0x57078e0, prim=prim@entry=0x7ffd63e0, nr_prims=nr_prims@entry=1, ib=ib@entry=0x7ffd63c0, index_bounds_valid=index_bounds_valid@entry=0 '\000', min_index=min_index@entry=4294967295, max_index=max_index@entry=4294967295) at ../../../../../src/mesa/vbo/vbo_exec_array.c:570 #8 0x7fffe26d0384 in vbo_validated_drawrangeelements ( ctx=ctx@entry=0x57078e0, mode=mode@entry=1, index_bounds_valid=index_bounds_valid@entry=0 '\000', start=start@entry=4294967295, end=end@entry=4294967295, count=count@entry=24, type=type@entry=5125, indices=indices@entry=0x0, basevertex=basevertex@entry=0, numInstances=numInstances@entry=1, baseInstance=baseInstance@entry=0) at ../../../../../src/mesa/vbo/vbo_exec_array.c:867 ---Type to continue, or q to quit--- #9 0x7fffe26d06f4 in vbo_exec_DrawElements (mode=1, count=24, type=5125, indices=0x0) at ../../../../../src/mesa/vbo/vbo_exec_array.c:997 #10 0x010ed899 in cdDM_drawEdges () #11 0x00d24cf1 in draw_mesh_object_outline.isra.5 () #12 0x00d2e18e in draw_mesh_object () #13 0x00d31d4e in draw_object () #14 0x00d21e80 in view3d_opengl_select () #15 0x00d1720c in mixed_bones_object_selectbuffer () #16 0x00d1a180 in mouse_select () #17 0x00d1aa48 in view3d_select_invoke () #18 0x00c65c52 in wm_operator_invoke () #19 0x00c66cae in wm_handler_operator_call () #20 0x00c66f6d in wm_handlers_do_intern () #21 0x00c676b6 in wm_handlers_do () #22 0x00c67b4b in wm_event_do_handlers () #23 0x00c60c88 in WM_main () #24 0x00c23d65 in main () -- You are receiving this mail because: You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 61149] Crash on Intel Sandybridge Mobile with Vertex Buffer Objects and select mode OpenGL rendering
https://bugs.freedesktop.org/show_bug.cgi?id=61149 --- Comment #1 from Antony Riakiotakis --- Attaching full backtrace. Some information that might prove useful, we are using glDrawElements as a draw call. #0 0x7fffe2900a3f in run_vp (ctx=, stage=) at ../../../../../src/mesa/tnl/t_vb_program.c:389 ptr = 0x8000f7fb0ff4 size = 3 stride = 12 data = 0x8000f7fb0ff4 attr = tnl = 0x37e0450 store = 0x3922b80 VB = 0x37e0bc8 program = machine = 0x3923250 outputs = {0, 0, 1, 0 , 3797438540, 32767, 0, 0, 3801491299, 32767, 43, 6, 0, 0, 43, 0, 57137600, 0, 15, 0, 57129920, 0, 58425864, 0, 0} numOutputs = 1 i = j = __PRETTY_FUNCTION__ = "run_vp" #1 0x7fffe28fd92d in _tnl_run_pipeline (ctx=0x367bbc0) at ../../../../../src/mesa/tnl/t_pipeline.c:163 s = 0x37e0710 tnl = 0x37e0450 i = #2 0x7fffe28fdf26 in _tnl_draw_prims (ctx=ctx@entry=0x367bbc0, arrays=arrays@entry=0x7ffd57f0, prim=prim@entry=0x7ffd6470, nr_prims=nr_prims@entry=1, ib=ib@entry=0x7ffd57d0, min_index=, max_index=0) at ../../../../../src/mesa/tnl/t_draw.c:524 this_nr_prims = bo = {0x4a27050, 0x0 , 0x7271f328 , 0x0, 0x4c26f20, 0x3739f40, 0x373a028, 0x0, 0x7fffe258504c, 0x0, 0x0, 0x1001e, 0x7fff} nr_bo = 1 inst = tnl = 0x37e0450 max = max_basevertex = i = #3 0x7fffe28fea8f in _tnl_vbo_draw_prims (ctx=ctx@entry=0x367bbc0, prim=prim@entry=0x7ffd6470, nr_prims=nr_prims@entry=1, ib=ib@entry=0x7ffd57d0, index_bounds_valid=index_bounds_valid@entry=1 '\001', min_index=min_index@entry=0, max_index=0, tfb_vertcount=tfb_vertcount@entry=0x0) at ../../../../../src/mesa/tnl/t_draw.c:424 arrays = 0x7ffd57f0 #4 0x7fffe28ef301 in vbo_rebase_prims (ctx=ctx@entry=0x367bbc0, arrays=arrays@entry=0x37cd728, prim=prim@entry=0x7ffd6470, nr_prims=nr_prims@entry=1, ib=0x7ffd57d0, ib@entry=0x7ffd6450, min_index=min_index@entry=4294967295, max_index=max_index@entry=4294967295, draw=draw@entry=0x7fffe28fea20 <_tnl_vbo_draw_prims>) at ../../../../../src/mesa/vbo/vbo_rebase.c:233 tmp_arrays = {{Size = 3, Type = 5126, Format = 6408, Stride = 0, StrideB = 12, Ptr = 0xfff4 , Enabled = 1 '\001', Normalized = 0 '\000', Integer = 0 '\000', InstanceDivisor = 0, _ElementSize = 12, BufferObj = 0x4a27050, _MaxElement = 36}, {Size = 4, Type = 5126, Format = 6408, Stride = 0, StrideB = 0, Ptr = 0x367d1e0 "", Enabled = 1 '\001', Normalized = 0 '\000', Integer = 0 '\000', InstanceDivisor = 0, _ElementSize = 16, BufferObj = 0x374ac40, _MaxElement = 0}, { Size = 3, Type = 5126, Format = 6408, Stride = 0, StrideB = 0, Ptr = 0x367d1f0 "", Enabled = 1 '\001', Normalized = 0 '\000', Integer = 0 '\000', InstanceDivisor = 0, _ElementSize = 12, BufferObj = 0x374ac40, _MaxElement = 0}, {Size = 4, Type = 5126, Format = 6408, Stride = 0, StrideB = 0, Ptr = 0x367d200 "", Enabled = 1 '\001', Normalized = 0 '\000', Integer = 0 '\000', InstanceDivisor = 0, _ElementSize = 16, BufferObj = 0x374ac40, _MaxElement = 0}, {Size = 1, Type = 5126, Format = 6408, Stride = 0, StrideB = 0, Ptr = 0x367d210 "", Enabled = 1 '\001', Normalized = 0 '\000', Integer = 0 '\000', InstanceDivisor = 0, _ElementSize = 4, BufferObj = 0x374ac40, _MaxElement = 0}, { Size = 1, Type = 5126, Format = 6408, Stride = 0, StrideB = 0, Ptr = 0x367d220 "", Enabled = 1 '\001', Normalized = 0 '\000', Integer = 0 '\000', InstanceDivisor = 0, _ElementSize = 4, BufferObj = 0x374ac40, _MaxElement = 0}, {Size = 1, Type = 5126, Format = 6408, Stride = 0, StrideB = 0, Ptr = 0x367d230 "", Enabled = 1 '\001', Normalized = 0 '\000', Integer = 0 '\000', InstanceDivisor = 0, _ElementSize = 4, BufferObj = 0x374ac40, _MaxElement = 0}, {Size = 1, Type = 5126, Format = 6408, Stride = 0, StrideB = 0, Ptr = 0x367d240 "", Enabled = 1 '\001', Normalized = 0 '\000', Integer = 0 '\000', InstanceDivisor = 0, _ElementSize = 4, BufferObj = 0x374ac40, _MaxElement = 0}, { Size = 2, Type = 5126, Format = 6408, Stride = 0, StrideB = 0, Ptr = 0x367d250 "", Enabled = 1 '\001', Normalized = 0 '\000', Integer = 0 '\000', InstanceDivisor = 0, _ElementSize = 8, BufferObj = 0x374ac40, _MaxElement = 0}, {Size = 1, Type = 5126, Format = 6408, Stride = 0, StrideB = 0, Ptr = 0x367d260 "", Enabled = 1 '\001', Normal
Re: [Mesa-dev] [PATCH] i965: Avoid segfault in gen6_upload_state
On 02/19/2013 04:27 PM, Carl Worth wrote: This fixes a bug introduced in commit 258453716f001eab1288d99765213 and triggered whenever "rb" is NULL. Fixes bug #59445: [SNB/IVB/HSW Bisected]Oglc draw-buffers2(advanced.blending.none) segfault https://bugs.freedesktop.org/show_bug.cgi?id=59445 --- I don't know under what conditions "rb" might be NULL, but it's clear that it's possible and expected as there is earlier code in this function that checks it, (and sets rb_type specifically in that case). So if someone could help me write a more descriptive commit message, that would be great. Also, I notice that similar code in brw_cc.c uses a different condition here: if (ctx->DrawBuffer->Visual.alphaBits == 0) { I don't know what cases could cause rb to be NULL either. There is code earlier that checks this case (near the top of the function), so it doesn't seem to be an error condition. Could this be for the window? Ken or Eric should know for sure. Either way, ctx->DrawBuffer->Visual contains either the window configuration or a mirror of the state for the current FBO. It should always be safe to use that. Using ctx->DrawBuffer->Visual.alphaBits will ensure that you get the correct answer even when rb is NULL. So an alternate fix could be to switch to something like that. Please let me know if one version or the other is cleaner, (and both could be made to match). src/mesa/drivers/dri/i965/gen6_cc.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/gen6_cc.c b/src/mesa/drivers/dri/i965/gen6_cc.c index d32f636..7ac5d5f 100644 --- a/src/mesa/drivers/dri/i965/gen6_cc.c +++ b/src/mesa/drivers/dri/i965/gen6_cc.c @@ -126,7 +126,7 @@ gen6_upload_blend_state(struct brw_context *brw) * not read the alpha channel, but will instead use the correct * implicit value for alpha. */ - if (!_mesa_base_format_has_channel(rb->_BaseFormat, GL_TEXTURE_ALPHA_TYPE)) + if (rb && !_mesa_base_format_has_channel(rb->_BaseFormat, GL_TEXTURE_ALPHA_TYPE)) { srcRGB = brw_fix_xRGB_alpha(srcRGB); srcA = brw_fix_xRGB_alpha(srcA); ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/4] i965/fs: Improve CSE performance by expiring some available expressions.
We're already walking the list, and we can easily know when something has no reason to be in the list any longer, so take a brief extra step to reduce our worst-case runtime (an oglconform test that emits the maximum instructions in a fragment program). I don't actually know what the worst-case runtime was, because it was too long and I got bored. --- src/mesa/drivers/dri/i965/brw_fs_cse.cpp | 20 +++- 1 file changed, 19 insertions(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs_cse.cpp b/src/mesa/drivers/dri/i965/brw_fs_cse.cpp index 44479d8..09c7fa6 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_cse.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_cse.cpp @@ -88,6 +88,7 @@ fs_visitor::opt_cse_local(bblock_t *block, exec_list *aeb) void *mem_ctx = ralloc_context(this->mem_ctx); + int ip = block->start_ip; for (fs_inst *inst = (fs_inst *)block->start; inst != block->end->next; inst = (fs_inst *) inst->next) { @@ -153,18 +154,33 @@ fs_visitor::opt_cse_local(bblock_t *block, exec_list *aeb) } } - /* Kill all AEB entries that use the destination. */ foreach_list_safe(entry_node, aeb) { aeb_entry *entry = (aeb_entry *)entry_node; for (int i = 0; i < 3; i++) { +fs_reg *src_reg = &entry->generator->src[i]; + +/* Kill all AEB entries that use the destination we just + * overwrote. + */ if (inst->overwrites_reg(entry->generator->src[i])) { entry->remove(); ralloc_free(entry); break; } + +/* Kill any AEB entries using registers that don't get reused any + * more -- a sure sign they'll fail operands_match(). + */ +if (src_reg->file == GRF && virtual_grf_use[src_reg->reg] < ip) { + entry->remove(); + ralloc_free(entry); + break; +} } } + + ip++; } ralloc_free(mem_ctx); @@ -180,6 +196,8 @@ fs_visitor::opt_cse() { bool progress = false; + calculate_live_intervals(); + cfg_t cfg(this); for (int b = 0; b < cfg.num_blocks; b++) { -- 1.7.10.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 3/4] mesa: Reduce the memory usage for reg alloc with many graph nodes (part 1)
We were allocating an adjacency_list entry for every possible interference that could get created, but that usually doesn't happen. We can save a lot of memory by resizing the array on demand. --- src/mesa/program/register_allocate.c | 14 +- 1 file changed, 13 insertions(+), 1 deletion(-) diff --git a/src/mesa/program/register_allocate.c b/src/mesa/program/register_allocate.c index 88793db..5862c78 100644 --- a/src/mesa/program/register_allocate.c +++ b/src/mesa/program/register_allocate.c @@ -120,6 +120,7 @@ struct ra_node { */ GLboolean *adjacency; unsigned int *adjacency_list; + unsigned int adjacency_list_size; unsigned int adjacency_count; /** @} */ @@ -307,6 +308,15 @@ static void ra_add_node_adjacency(struct ra_graph *g, unsigned int n1, unsigned int n2) { g->nodes[n1].adjacency[n2] = GL_TRUE; + + if (g->nodes[n1].adjacency_count >= + g->nodes[n1].adjacency_list_size) { + g->nodes[n1].adjacency_list_size *= 2; + g->nodes[n1].adjacency_list = reralloc(g, g->nodes[n1].adjacency_list, + unsigned int, + g->nodes[n1].adjacency_list_size); + } + g->nodes[n1].adjacency_list[g->nodes[n1].adjacency_count] = n2; g->nodes[n1].adjacency_count++; } @@ -326,7 +336,9 @@ ra_alloc_interference_graph(struct ra_regs *regs, unsigned int count) for (i = 0; i < count; i++) { g->nodes[i].adjacency = rzalloc_array(g, GLboolean, count); - g->nodes[i].adjacency_list = ralloc_array(g, unsigned int, count); + g->nodes[i].adjacency_list_size = 4; + g->nodes[i].adjacency_list = + ralloc_array(g, unsigned int, g->nodes[i].adjacency_list_size); g->nodes[i].adjacency_count = 0; ra_add_node_adjacency(g, i, i); g->nodes[i].reg = NO_REG; -- 1.7.10.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/4] i965/fs: Improve live variables calculation performance.
We can execute way fewer instructions by doing our boolean manipulation on an "int" of bits at a time, while also reducing our working set size. Reduces compile time of L4D2's slowest shader from 4s to 1.1s (-72.4% +/- 0.2%, n=10) --- .../drivers/dri/i965/brw_fs_live_variables.cpp | 44 +++- src/mesa/drivers/dri/i965/brw_fs_live_variables.h | 10 +++-- 2 files changed, 30 insertions(+), 24 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs_live_variables.cpp b/src/mesa/drivers/dri/i965/brw_fs_live_variables.cpp index db8f397..e7de43e 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_live_variables.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_live_variables.cpp @@ -40,7 +40,7 @@ using namespace brw; */ /** - * Sets up the use[] and def[] arrays. + * Sets up the use[] and def[] bitsets. * * The basic-block-level live variable analysis needs to know which * variables get used before they're completely defined, and which @@ -67,8 +67,8 @@ fs_live_variables::setup_def_use() if (inst->src[i].file == GRF) { int reg = inst->src[i].reg; - if (!bd[b].def[reg]) - bd[b].use[reg] = true; + if (!BITSET_TEST(bd[b].def, reg)) + BITSET_SET(bd[b].use, reg); } } @@ -82,8 +82,8 @@ fs_live_variables::setup_def_use() !inst->force_uncompressed && !inst->force_sechalf) { int reg = inst->dst.reg; - if (!bd[b].use[reg]) - bd[b].def[reg] = true; +if (!BITSET_TEST(bd[b].use, reg)) + BITSET_SET(bd[b].def, reg); } ip++; @@ -107,12 +107,12 @@ fs_live_variables::compute_live_variables() for (int b = 0; b < cfg->num_blocks; b++) { /* Update livein */ -for (int i = 0; i < num_vars; i++) { - if (bd[b].use[i] || (bd[b].liveout[i] && !bd[b].def[i])) { - if (!bd[b].livein[i]) { - bd[b].livein[i] = true; - cont = true; - } +for (int i = 0; i < bitset_words; i++) { +BITSET_WORD new_livein = (bd[b].use[i] | + (bd[b].liveout[i] & ~bd[b].def[i])); + if (new_livein & ~bd[b].livein[i]) { + bd[b].livein[i] |= new_livein; + cont = true; } } @@ -121,9 +121,11 @@ fs_live_variables::compute_live_variables() bblock_link *link = (bblock_link *)block_node; bblock_t *block = link->block; - for (int i = 0; i < num_vars; i++) { - if (bd[block->block_num].livein[i] && !bd[b].liveout[i]) { - bd[b].liveout[i] = true; + for (int i = 0; i < bitset_words; i++) { + BITSET_WORD new_liveout = (bd[block->block_num].livein[i] & + ~bd[b].liveout[i]); + if (new_liveout & ~bd[b].liveout[i]) { + bd[b].liveout[i] |= new_liveout; cont = true; } } @@ -140,11 +142,13 @@ fs_live_variables::fs_live_variables(fs_visitor *v, cfg_t *cfg) num_vars = v->virtual_grf_count; bd = rzalloc_array(mem_ctx, struct block_data, cfg->num_blocks); + bitset_words = (ALIGN(v->virtual_grf_count, BITSET_WORDBITS) / + BITSET_WORDBITS); for (int i = 0; i < cfg->num_blocks; i++) { - bd[i].def = rzalloc_array(mem_ctx, bool, num_vars); - bd[i].use = rzalloc_array(mem_ctx, bool, num_vars); - bd[i].livein = rzalloc_array(mem_ctx, bool, num_vars); - bd[i].liveout = rzalloc_array(mem_ctx, bool, num_vars); + bd[i].def = rzalloc_array(mem_ctx, BITSET_WORD, bitset_words); + bd[i].use = rzalloc_array(mem_ctx, BITSET_WORD, bitset_words); + bd[i].livein = rzalloc_array(mem_ctx, BITSET_WORD, bitset_words); + bd[i].liveout = rzalloc_array(mem_ctx, BITSET_WORD, bitset_words); } setup_def_use(); @@ -208,12 +212,12 @@ fs_visitor::calculate_live_intervals() for (int b = 0; b < cfg.num_blocks; b++) { for (int i = 0; i < num_vars; i++) { -if (livevars.bd[b].livein[i]) { +if (BITSET_TEST(livevars.bd[b].livein, i)) { def[i] = MIN2(def[i], cfg.blocks[b]->start_ip); use[i] = MAX2(use[i], cfg.blocks[b]->start_ip); } -if (livevars.bd[b].liveout[i]) { +if (BITSET_TEST(livevars.bd[b].liveout, i)) { def[i] = MIN2(def[i], cfg.blocks[b]->end_ip); use[i] = MAX2(use[i], cfg.blocks[b]->end_ip); } diff --git a/src/mesa/drivers/dri/i965/brw_fs_live_variables.h b/src/mesa/drivers/dri/i965/brw_fs_live_variables.h index 5f7e67e..1cde5f4 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_live_variables.h +++ b/src/mesa/drivers/dri/i965/brw_fs_live_variables.h @@ -26,6 +26,7 @@ */ #include "brw_fs.h" +#include "main/bitset.h" namespace brw { @@ -36,18 +37,18 @@ struct block_data {
[Mesa-dev] [PATCH 4/4] mesa: Reduce memory usage for reg alloc with many graph nodes (part 2).
After the previous fix that almost removes an allocation of 4*n^2 bytes, we can use a bitset to reduce another allocation from n^2 bytes to n^2/8 bytes. Between the previous commit and this one, the peak heap size for an oglconform ARB_fragment_program max instructions test on i965 goes from 4GB to 255MB. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=55825 --- src/mesa/program/register_allocate.c | 12 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/src/mesa/program/register_allocate.c b/src/mesa/program/register_allocate.c index 5862c78..a9064c3 100644 --- a/src/mesa/program/register_allocate.c +++ b/src/mesa/program/register_allocate.c @@ -75,6 +75,7 @@ #include "main/imports.h" #include "main/macros.h" #include "main/mtypes.h" +#include "main/bitset.h" #include "register_allocate.h" #define NO_REG ~0 @@ -118,7 +119,7 @@ struct ra_node { * List of which nodes this node interferes with. This should be * symmetric with the other node. */ - GLboolean *adjacency; + BITSET_WORD *adjacency; unsigned int *adjacency_list; unsigned int adjacency_list_size; unsigned int adjacency_count; @@ -307,7 +308,7 @@ ra_set_finalize(struct ra_regs *regs, unsigned int **q_values) static void ra_add_node_adjacency(struct ra_graph *g, unsigned int n1, unsigned int n2) { - g->nodes[n1].adjacency[n2] = GL_TRUE; + BITSET_SET(g->nodes[n1].adjacency, n2); if (g->nodes[n1].adjacency_count >= g->nodes[n1].adjacency_list_size) { @@ -335,11 +336,14 @@ ra_alloc_interference_graph(struct ra_regs *regs, unsigned int count) g->stack = rzalloc_array(g, unsigned int, count); for (i = 0; i < count; i++) { - g->nodes[i].adjacency = rzalloc_array(g, GLboolean, count); + int bitset_count = ALIGN(count, BITSET_WORDBITS) / BITSET_WORDBITS; + g->nodes[i].adjacency = rzalloc_array(g, BITSET_WORD, bitset_count); + g->nodes[i].adjacency_list_size = 4; g->nodes[i].adjacency_list = ralloc_array(g, unsigned int, g->nodes[i].adjacency_list_size); g->nodes[i].adjacency_count = 0; + ra_add_node_adjacency(g, i, i); g->nodes[i].reg = NO_REG; } @@ -358,7 +362,7 @@ void ra_add_node_interference(struct ra_graph *g, unsigned int n1, unsigned int n2) { - if (!g->nodes[n1].adjacency[n2]) { + if (!BITSET_TEST(g->nodes[n1].adjacency, n2)) { ra_add_node_adjacency(g, n1, n2); ra_add_node_adjacency(g, n2, n1); } -- 1.7.10.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 3/9] glsl: Convert mix() to use a new ir_triop_lrp opcode.
Not much to say about the code (the theory sounds sane) but I was wondering about the comment. Why did glsl implement this really as x * (1 - a) + y * a? The usual way for lerp would be (y - x) * a + x, i.e. two ops for most gpus (sub+mad, or sub+mul+add). But I'm wondering if that sacrifices precision or gets Infs wrong or something (this is the way the gallivm code implements TGSI_OPCODE_LRP). I guess strict IEEE conformance would really forbid that optimization though... Roland Am 20.02.2013 02:03, schrieb Matt Turner: > From: Kenneth Graunke > > Many GPUs have an instruction to do linear interpolation which is more > efficient than simply performing the algebra necessary (two multiplies, > an add, and a subtract). > > Pattern matching or peepholing this is more desirable, but can be > tricky. By using an opcode, we can at least make shaders which use the > mix() built-in get the more efficient behavior. > > Currently, all consumers lower ir_triop_lrp. Subsequent patches will > actually generate different code. > > v2 [mattst88]: >- Add LRP_TO_ARITH flag to ir_to_mesa.cpp. Will be removed in a > subsequent patch and ir_triop_lrp translated directly. > > Reviewed-by: Matt Turner > --- > src/glsl/builtins/ir/mix.ir| 14 +- > src/glsl/ir.cpp|4 +++ > src/glsl/ir.h |7 + > src/glsl/ir_constant_expression.cpp| 13 ++ > src/glsl/ir_optimization.h |1 + > src/glsl/ir_validate.cpp |6 > src/glsl/lower_instructions.cpp| 35 > > src/mesa/drivers/dri/i965/brw_shader.cpp |3 +- > src/mesa/program/ir_to_mesa.cpp|6 - > src/mesa/state_tracker/st_glsl_to_tgsi.cpp |1 + > 10 files changed, 81 insertions(+), 9 deletions(-) > > diff --git a/src/glsl/builtins/ir/mix.ir b/src/glsl/builtins/ir/mix.ir > index 70ae13c..e666532 100644 > --- a/src/glsl/builtins/ir/mix.ir > +++ b/src/glsl/builtins/ir/mix.ir > @@ -4,49 +4,49 @@ > (declare (in) float arg0) > (declare (in) float arg1) > (declare (in) float arg2)) > - ((return (expression float + (expression float * (var_ref arg0) > (expression float - (constant float (1.00)) (var_ref arg2))) (expression > float * (var_ref arg1) (var_ref arg2)) > + ((return (expression float lrp (var_ref arg0) (var_ref arg1) (var_ref > arg2) > > (signature vec2 > (parameters > (declare (in) vec2 arg0) > (declare (in) vec2 arg1) > (declare (in) vec2 arg2)) > - ((return (expression vec2 + (expression vec2 * (var_ref arg0) > (expression vec2 - (constant float (1.00)) (var_ref arg2))) (expression > vec2 * (var_ref arg1) (var_ref arg2)) > + ((return (expression vec2 lrp (var_ref arg0) (var_ref arg1) (var_ref > arg2) > > (signature vec3 > (parameters > (declare (in) vec3 arg0) > (declare (in) vec3 arg1) > (declare (in) vec3 arg2)) > - ((return (expression vec3 + (expression vec3 * (var_ref arg0) > (expression vec3 - (constant float (1.00)) (var_ref arg2))) (expression > vec3 * (var_ref arg1) (var_ref arg2)) > + ((return (expression vec3 lrp (var_ref arg0) (var_ref arg1) (var_ref > arg2) > > (signature vec4 > (parameters > (declare (in) vec4 arg0) > (declare (in) vec4 arg1) > (declare (in) vec4 arg2)) > - ((return (expression vec4 + (expression vec4 * (var_ref arg0) > (expression vec4 - (constant float (1.00)) (var_ref arg2))) (expression > vec4 * (var_ref arg1) (var_ref arg2)) > + ((return (expression vec4 lrp (var_ref arg0) (var_ref arg1) (var_ref > arg2) > > (signature vec2 > (parameters > (declare (in) vec2 arg0) > (declare (in) vec2 arg1) > (declare (in) float arg2)) > - ((return (expression vec2 + (expression vec2 * (var_ref arg0) > (expression float - (constant float (1.00)) (var_ref arg2))) (expression > vec2 * (var_ref arg1) (var_ref arg2)) > + ((return (expression vec2 lrp (var_ref arg0) (var_ref arg1) (var_ref > arg2) > > (signature vec3 > (parameters > (declare (in) vec3 arg0) > (declare (in) vec3 arg1) > (declare (in) float arg2)) > - ((return (expression vec3 + (expression vec3 * (var_ref arg0) > (expression float - (constant float (1.00)) (var_ref arg2))) (expression > vec3 * (var_ref arg1) (var_ref arg2)) > + ((return (expression vec3 lrp (var_ref arg0) (var_ref arg1) (var_ref > arg2) > > (signature vec4 > (parameters > (declare (in) vec4 arg0) > (declare (in) vec4 arg1) > (declare (in) float arg2)) > - ((return (expression vec4 + (expression vec4 * (var_ref arg0) > (expression float - (constant float (1.00)) (var_ref arg2))) (expression > vec4 * (var_ref ar
[Mesa-dev] [PATCH 1/4] i965/fs: Remove duplicate scan_inst->mlen check.
Is already checked 20 lines below. --- src/mesa/drivers/dri/i965/brw_fs.cpp |5 - 1 files changed, 0 insertions(+), 5 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index bdb6616..56358f7 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -2070,11 +2070,6 @@ fs_visitor::compute_to_mrf() * into a compute-to-MRF. */ -/* SENDs can only write to GRFs, so no compute-to-MRF. */ - if (scan_inst->mlen) { - break; - } - /* If it's predicated, it (probably) didn't populate all * the channels. We might be able to rewrite everything * that writes that reg, but it would require smarter -- 1.7.8.6 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/4] i965/gen7: Relax restrictions on fake MRFs.
Gen6 has write-only MRF registers, and for ease of implementation we paritition off 16 general purposes registers to act as MRFs on Gen7. Knowing that our Gen7 MRFs are actually GRFs, we can potentially do things we can't do with real MRFs: - read from them; - return values directly to them from a send instruction; and - compute directly to them with math instructions. --- src/mesa/drivers/dri/i965/brw_eu_emit.c |6 -- 1 files changed, 4 insertions(+), 2 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_eu_emit.c b/src/mesa/drivers/dri/i965/brw_eu_emit.c index 8cdbb21..8ed8c4a 100644 --- a/src/mesa/drivers/dri/i965/brw_eu_emit.c +++ b/src/mesa/drivers/dri/i965/brw_eu_emit.c @@ -1709,7 +1709,8 @@ void brw_math( struct brw_compile *p, if (intel->gen >= 6) { struct brw_instruction *insn = next_insn(p, BRW_OPCODE_MATH); - assert(dest.file == BRW_GENERAL_REGISTER_FILE); + assert(dest.file == BRW_GENERAL_REGISTER_FILE || + (intel->gen >= 7 && dest.file == BRW_MESSAGE_REGISTER_FILE)); assert(src.file == BRW_GENERAL_REGISTER_FILE); assert(dest.hstride == BRW_HORIZONTAL_STRIDE_1); @@ -1773,7 +1774,8 @@ void brw_math2(struct brw_compile *p, (void) intel; - assert(dest.file == BRW_GENERAL_REGISTER_FILE); + assert(dest.file == BRW_GENERAL_REGISTER_FILE || + (intel->gen >= 7 && dest.file == BRW_MESSAGE_REGISTER_FILE)); assert(src0.file == BRW_GENERAL_REGISTER_FILE); assert(src1.file == BRW_GENERAL_REGISTER_FILE); -- 1.7.8.6 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 3/4] i965/fs/gen7: Allow MATH instructions to have MRF as a destination.
total instructions in shared programs: 1376297 -> 1375626 (-0.05%) instructions in affected programs: 35977 -> 35306 (-1.87%) --- src/mesa/drivers/dri/i965/brw_fs.cpp |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index 56358f7..999be86 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -2090,7 +2090,7 @@ fs_visitor::compute_to_mrf() if (scan_inst->mlen) break; - if (intel->gen >= 6) { + if (intel->gen < 7) { /* gen6 math instructions must have the destination be * GRF, so no compute-to-MRF for them. */ -- 1.7.8.6 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 4/4] i965/vs/gen7: Allow MATH instructions to have MRF as a destination.
total instructions in shared programs: 346873 -> 346847 (-0.01%) instructions in affected programs: 364 -> 338 (-7.14%) (All affected shaders are from Lightsmark) --- src/mesa/drivers/dri/i965/brw_vec4.cpp |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp b/src/mesa/drivers/dri/i965/brw_vec4.cpp index d5b7cb7..6454f82 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp @@ -753,7 +753,7 @@ vec4_visitor::opt_register_coalesce() if (scan_inst->mlen) break; - if (intel->gen >= 6) { + if (intel->gen < 7) { /* gen6 math instructions must have the destination be * GRF, so no compute-to-MRF for them. */ -- 1.7.8.6 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 61153] New: [softpipe] piglit interpolation-noperspective-gl_BackColor-flat-vertex regression
https://bugs.freedesktop.org/show_bug.cgi?id=61153 Priority: medium Bug ID: 61153 Keywords: regression CC: bri...@vmware.com Assignee: mesa-dev@lists.freedesktop.org Summary: [softpipe] piglit interpolation-noperspective-gl_BackColor-flat-vertex regression Severity: normal Classification: Unclassified OS: Linux (All) Reporter: v...@freedesktop.org Hardware: x86-64 (AMD64) Status: NEW Version: git Component: Other Product: Mesa mesa: 076403c30d9f5cc79374e30d9f6007b08a63bf2d (master) $ ./bin/shader_runner generated_tests/spec/glsl-1.30/execution/interpolation/interpolation-noperspective-gl_BackColor-flat-vertex.shader_test -auto Mesa warning: failed to remap glClampColorARB Mesa warning: failed to remap glTexBufferARB Mesa warning: failed to remap glFramebufferTextureARB Mesa warning: failed to remap glVertexAttribDivisorARB Mesa warning: failed to remap glProgramParameteriARB Probe at (159,45) Expected: 0.272727 0.181818 0.545455 1.00 Observed: 0.474510 0.160784 0.364706 1.00 Probe at (192,38) Expected: 0.153846 0.153846 0.692308 1.00 Observed: 0.27 0.13 0.596078 1.00 Probe at (216,33) Expected: 0.07 0.13 0.80 1.00 Observed: 0.117647 0.117647 0.764706 1.00 Probe at (166,83) Expected: 0.17 0.33 0.50 1.00 Observed: 0.294118 0.294118 0.415686 1.00 Probe at (196,71) Expected: 0.071429 0.285714 0.642857 1.00 Observed: 0.125490 0.250980 0.627451 1.00 Probe at (136,136) Expected: 0.181818 0.545455 0.272727 1.00 Observed: 0.317647 0.478431 0.203922 1.00 Probe at (173,115) Expected: 0.076923 0.461538 0.461538 1.00 Observed: 0.129412 0.403922 0.462745 1.00 Probe at (145,166) Expected: 0.08 0.67 0.25 1.00 Observed: 0.149020 0.603922 0.247059 1.00 PIGLIT: {'result': 'fail' } 4a938ef7136a89c828ebb16effe1bc5bea08b7d7 is the first bad commit commit 4a938ef7136a89c828ebb16effe1bc5bea08b7d7 Author: Brian Paul Date: Mon Jan 21 11:32:49 2013 -0700 draw: add new debug code and comments in clip code template In debug builds, set clipped vertex window coordinates to NaN values to help debugging. Otherwise, we're just leaving the coordinate in clip space and it's invalid to use it later expecting it to be a window coord. Reviewed-by: José Fonseca :04 04 1c9a933b785f0a56ebf2f13874aacffc4183e976 b87acb6dca3e64abd30f9da1d7483ac61364fb9e M src bisect run success -- You are receiving this mail because: You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] mesa: Don't install glEvalMesh in the beginend dispatch table
Ian Romanick writes: > From: Ian Romanick > > NOTE: This is a candidate for the 9.1 branch. > > Signed-off-by: Ian Romanick > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=59740 > Cc: Eric Anholt I had "make a GL 1.1 testcase like gl-1.0/beginend" on the back burner because of this, but I'll take a patch even before then. Reviewed-by: Eric Anholt pgpISSRH6_Adb.pgp Description: PGP signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] radeonsi: Fix memory leak in si_shader_select.
Fixes resource leak defect reported by Coverity. Signed-off-by: Vinson Lee --- src/gallium/drivers/radeonsi/si_state.c | 1 + 1 file changed, 1 insertion(+) diff --git a/src/gallium/drivers/radeonsi/si_state.c b/src/gallium/drivers/radeonsi/si_state.c index d20e3ff..7f76cb5 100644 --- a/src/gallium/drivers/radeonsi/si_state.c +++ b/src/gallium/drivers/radeonsi/si_state.c @@ -1940,6 +1940,7 @@ int si_shader_select(struct pipe_context *ctx, R600_ERR("Failed to build shader variant (type=%u) %d\n", sel->type, r); sel->current = NULL; + FREE(shader); return r; } -- 1.8.1.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev