Re: [Mesa-dev] [PATCH] i965: Implement ARB_texture_mirror_clamp.
I have one minor nitpick (see below). But either way, with the subject fixed (as mentioned by Matt), this is: Reviewed-by: Rico Schüller On 21.10.2013 07:24, Kenneth Graunke wrote: > This passes Piglit's texwrap tests (after applying Rico's patch to > make them use this extension). > > Cc: Rico Schüller > Cc: Ian Romanick > Signed-off-by: Kenneth Graunke > --- > src/mesa/drivers/dri/i965/brw_wm_sampler_state.c | 2 ++ > src/mesa/drivers/dri/i965/intel_extensions.c | 1 + > 2 files changed, 3 insertions(+) > > diff --git a/src/mesa/drivers/dri/i965/brw_wm_sampler_state.c > b/src/mesa/drivers/dri/i965/brw_wm_sampler_state.c > index b716d61..db7ab60 100644 > --- a/src/mesa/drivers/dri/i965/brw_wm_sampler_state.c > +++ b/src/mesa/drivers/dri/i965/brw_wm_sampler_state.c > @@ -71,6 +71,8 @@ translate_wrap_mode(GLenum wrap, bool using_nearest) >return BRW_TEXCOORDMODE_CLAMP_BORDER; > case GL_MIRRORED_REPEAT: >return BRW_TEXCOORDMODE_MIRROR; > + case GL_MIRROR_CLAMP_TO_EDGE_EXT: > + return BRW_TEXCOORDMODE_MIRROR_ONCE; I'd prefer GL_MIRROR_CLAMP_TO_EDGE instead of GL_MIRROR_CLAMP_TO_EDGE_EXT but as it is the same value it really shouldn't matter. > default: >return BRW_TEXCOORDMODE_WRAP; > } > diff --git a/src/mesa/drivers/dri/i965/intel_extensions.c > b/src/mesa/drivers/dri/i965/intel_extensions.c > index 803d090..87cc87d 100644 > --- a/src/mesa/drivers/dri/i965/intel_extensions.c > +++ b/src/mesa/drivers/dri/i965/intel_extensions.c > @@ -75,6 +75,7 @@ intelInitExtensions(struct gl_context *ctx) > ctx->Extensions.ARB_texture_env_crossbar = true; > ctx->Extensions.ARB_texture_env_dot3 = true; > ctx->Extensions.ARB_texture_float = true; > + ctx->Extensions.ARB_texture_mirror_clamp_to_edge = true; > ctx->Extensions.ARB_texture_non_power_of_two = true; > ctx->Extensions.ARB_texture_rg = true; > ctx->Extensions.ARB_texture_rgb10_a2ui = true; > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] gallium: new, unified pipe_context::set_sampler_views() function
On 16/10/13 03:23, Emil Velikov wrote: > On 08/10/13 01:23, Brian Paul wrote: [...] >> This change touches quite a few files. I've probably missed >> something in drivers or state trackers that I can't test. >> Please test if you're able. Thanks. >> --- > Will run a quick piglit with and w/o on my nv50 this weekend. > Running a quick piglit test has proven a bit trickier than last time. With that said I see no regressions on my nv50, apart from the 2-3 tests with somewhat random result :) Tested-by: Emil Velikov Cheers Emil ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] docs: Update docs for ARB_texture_mirror_clamp_to_edge.
Signed-off-by: Rico Schüller --- docs/GL3.txt| 2 +- docs/relnotes/10.0.html | 1 + 2 Dateien geändert, 2 Zeilen hinzugefügt(+), 1 Zeile entfernt(-) diff --git a/docs/GL3.txt b/docs/GL3.txt index a56e7fe..e8e797a 100644 --- a/docs/GL3.txt +++ b/docs/GL3.txt @@ -173,7 +173,7 @@ ARB_clear_texturenot started ARB_enhanced_layouts not started ARB_multi_bind not started ARB_query_buffer_object not started -ARB_texture_mirror_clamp_to_edge not started +ARB_texture_mirror_clamp_to_edge DONE (i965, nv30, nv50, nvc0, r300, r600, radeonsi, swrast) ARB_texture_stencil8 not started ARB_vertex_type_10f_11f_11f_rev not started diff --git a/docs/relnotes/10.0.html b/docs/relnotes/10.0.html index 0b25f49..ef550d1 100644 --- a/docs/relnotes/10.0.html +++ b/docs/relnotes/10.0.html @@ -48,6 +48,7 @@ Note: some of the new features are only available with certain drivers. GL_ARB_conservative_depth on i965. GL_ARB_texture_gather on i965. GL_ARB_texture_query_levels on i965. +GL_ARB_texture_mirror_clamp_to_edge. GL_KHR_debug -- 1.7.11.7 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/2] gallivm: implement seamless cube filtering
Looks great AFAICT. Jose - Original Message - > From: Roland Scheidegger > > For seamless cube filtering it is necessary to determine new faces and new > coords per sample. The logic for this is _seriously_ complex (what needs > to happen is very "asymmetric" wrt face, x/y under/overflow), further > complicated by the fact that if the 4 samples are in a corner (meaning we > only have actually 3 samples, and all 3 are on different faces) then > falling off the edge is happening _both_ on x and y axis simultaneously. > There was a noticeable performance hit in mesa's cubemap demo when seamless > filtering was forced on (just below 10 percent or so in a debug build, when > disabling all filtering hacks, otherwise it would probably be a bit more) and > when always doing the logic, hence use a branch which it only does it if any > of the pixels in a quad (or in two quads) actually hit this. With that there > was no measurable performance hit in the cubemap demo (neither in a debug nor > release buidl), but this will vary (cubemap demo very rarely hits edges). > Might also be different on other cpus, as this forces SoA sampling path which > potentially can be quite a bit slower. > Note that as for corners, this code gets all the 3 samples which actually > exist right, and the 4th texel will simply be the same as one of the others, > meaning that filter weights will be a bit wrong. This however should be > enough for full OpenGL (but not d3d10) compliance. > --- > src/gallium/auxiliary/gallivm/lp_bld_sample.c | 138 +++ > src/gallium/auxiliary/gallivm/lp_bld_sample.h | 13 ++ > src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c | 257 > + > 3 files changed, 368 insertions(+), 40 deletions(-) > > diff --git a/src/gallium/auxiliary/gallivm/lp_bld_sample.c > b/src/gallium/auxiliary/gallivm/lp_bld_sample.c > index 1c35200..a032d9d 100644 > --- a/src/gallium/auxiliary/gallivm/lp_bld_sample.c > +++ b/src/gallium/auxiliary/gallivm/lp_bld_sample.c > @@ -1402,6 +1402,144 @@ lp_build_unnormalized_coords(struct > lp_build_sample_context *bld, > } > } > > +/** > + * Generate new coords and faces for cubemap texels falling off the face. > + * > + * @param face face (center) of the pixel > + * @param x0 lower x coord > + * @param x1 higher x coord (must be x0 + 1) > + * @param y0 lower y coord > + * @param y1 higher y coord (must be x0 + 1) > + * @param max_coord texture cube (level) size - 1 > + * @param next_facesnew face values when falling off > + * @param next_xcoords new x coord values when falling off > + * @param next_ycoords new y coord values when falling off > + * > + * The arrays hold the new values when under/overflow of > + * lower x, higher x, lower y, higher y coord would occur (in this order). > + * next_xcoords/next_ycoords have two entries each (for both new lower and > + * higher coord). > + */ > +void > +lp_build_cube_new_coords(struct lp_build_context *ivec_bld, > +LLVMValueRef face, > +LLVMValueRef x0, > +LLVMValueRef x1, > +LLVMValueRef y0, > +LLVMValueRef y1, > +LLVMValueRef max_coord, > +LLVMValueRef next_faces[4], > +LLVMValueRef next_xcoords[4][2], > +LLVMValueRef next_ycoords[4][2]) > +{ > + /* > +* Lookup tables aren't nice for simd code hence try some logic here. > +* (Note that while it would not be necessary to do per-sample (4) > lookups > +* when using a LUT as it's impossible that texels fall off of positive > +* and negative edges simultaneously, it would however be necessary to > +* do 2 lookups for corner handling as in this case texels both fall off > +* of x and y axes.) > +*/ > + /* > +* Next faces (for face 012345): > +* x < 0.0 : 451110 > +* x >= 1.0 : 540001 > +* y < 0.0 : 225422 > +* y >= 1.0 : 334533 > +* Hence nfx+ (and nfy+) == nfx- (nfy-) xor 1 > +* nfx-: face > 1 ? (face == 5 ? 0 : 1) : (4 + face & 1) > +* nfy+: face & ~4 > 1 ? face + 2 : 3; > +* This could also use pshufb instead, but would need (manually coded) > +* ssse3 intrinsic (llvm won't do non-constant shuffles). > +*/ > + struct gallivm_state *gallivm = ivec_bld->gallivm; > + LLVMValueRef sel, sel_f2345, sel_f23, sel_f2, tmpsel, tmp; > + LLVMValueRef faceand1, sel_fand1, maxmx0, maxmx1, maxmy0, maxmy1; > + LLVMValueRef c2 = lp_build_const_int_vec(gallivm, ivec_bld->type, 2); > + LLVMValueRef c3 = lp_build_const_int_vec(gallivm, ivec_bld->type, 3); > + LLVMValueRef c4 = lp_build_const_int_vec(gallivm, ivec_bld->type, 4); > + LLVMValueRef c5 = lp_build_const_int_vec(gallivm, ivec_bld->type, 5); > + > + sel = lp_build_cmp(ivec_bld, PIPE_FUNC_EQUAL, face, c5); > + tmpsel = lp_build_select(ivec_bld, sel, ivec_bld->zero, ivec_bld
Re: [Mesa-dev] [PATCH] gallium: new, unified pipe_context::set_sampler_views() function
On 10/21/2013 02:09 AM, Emil Velikov wrote: On 16/10/13 03:23, Emil Velikov wrote: On 08/10/13 01:23, Brian Paul wrote: [...] This change touches quite a few files. I've probably missed something in drivers or state trackers that I can't test. Please test if you're able. Thanks. --- Will run a quick piglit with and w/o on my nv50 this weekend. Running a quick piglit test has proven a bit trickier than last time. With that said I see no regressions on my nv50, apart from the 2-3 tests with somewhat random result :) Tested-by: Emil Velikov OK, thanks. I'll probably push the patch later today. -Brian ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] libGL without X
I have a headless platform I need OpenGL to work on that does not have X. It is x86 with Intel HD 4000 graphics. Ultimately, I'm just wanting to use OpenGL to render to memory for encoding to H.264 and streaming. I'm trying to build Mesa for this platform without X and cannot get it to build libGL.so. What am I missing here? Is it not possible to use OpenGL without X? I was hoping I could use OpenGL with EGL for testing purposes. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/2] mesa: remove remnants of GL_MESA_shader_debug
With the one comment below taken care of, the series is Reviewed-by: Ian Romanick On 10/19/2013 07:34 AM, Brian Paul wrote: > This extension never saw any real use so remove it. > --- > include/GL/gl.h | 20 > src/mapi/glapi/gen/gl_API.xml | 32 > 2 files changed, 52 deletions(-) > > diff --git a/include/GL/gl.h b/include/GL/gl.h > index babb746..968032c 100644 > --- a/include/GL/gl.h > +++ b/include/GL/gl.h > @@ -2086,26 +2086,6 @@ typedef void (APIENTRYP PFNGLMULTITEXCOORD4SVARBPROC) > (GLenum target, const GLsh > > > > -#if GL_ARB_shader_objects > - > -#ifndef GL_MESA_shader_debug > -#define GL_MESA_shader_debug 1 > - > -#define GL_DEBUG_OBJECT_MESA 0x8759 > -#define GL_DEBUG_PRINT_MESA 0x875A > -#define GL_DEBUG_ASSERT_MESA 0x875B > - > -GLAPI GLhandleARB GLAPIENTRY glCreateDebugObjectMESA (void); > -GLAPI void GLAPIENTRY glClearDebugLogMESA (GLhandleARB obj, GLenum logType, > GLenum shaderType); > -GLAPI void GLAPIENTRY glGetDebugLogMESA (GLhandleARB obj, GLenum logType, > GLenum shaderType, GLsizei maxLength, > - GLsizei *length, GLcharARB > *debugLog); > -GLAPI GLsizei GLAPIENTRY glGetDebugLogLengthMESA (GLhandleARB obj, GLenum > logType, GLenum shaderType); > - > -#endif /* GL_MESA_shader_debug */ > - > -#endif /* GL_ARB_shader_objects */ > - > - > /* > * ???. GL_MESA_packed_depth_stencil > * XXX obsolete > diff --git a/src/mapi/glapi/gen/gl_API.xml b/src/mapi/glapi/gen/gl_API.xml > index 48fce36..30ab9c9 100644 > --- a/src/mapi/glapi/gen/gl_API.xml > +++ b/src/mapi/glapi/gen/gl_API.xml > @@ -13027,38 +13027,6 @@ > > > > - > - > - > - You also need to remove the enums from src/mesa/main/tests/enum_strings.cpp. I suspect 'make check' will fail otherwise. > - > - > - > - > - > - > - > - > - > - > - > - > - > - > - > - > - > - > - > - > - > - > - > - > - > - > - > - > > > > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] docs: Mark GLSL 1.50, 3.30, and geometry shaders done for i965.
With the issue that Kaelyn pointed out resolved, Reviewed-by: Ian Romanick On 10/18/2013 03:12 PM, Matt Turner wrote: > --- > docs/GL3.txt | 8 > 1 file changed, 4 insertions(+), 4 deletions(-) > > diff --git a/docs/GL3.txt b/docs/GL3.txt > index c269f19..a7c7ae6 100644 > --- a/docs/GL3.txt > +++ b/docs/GL3.txt > @@ -63,9 +63,9 @@ Signed normalized textures (GL_EXT_texture_snorm) DONE > (i965, r300, r600) > > GL 3.2: > > -Core/compatibility profiles DONE > -GLSL 1.50 in progress > -Geometry shaders (GL_ARB_geometry_shader4)partially done > +Core/compatibility profiles DONE (i965) > +GLSL 1.50 DONE (i965) > +Geometry shaders DONE (i965) > BGRA vertex order (GL_ARB_vertex_array_bgra) DONE (i965, r300, > r600, swrast) > Base vertex offset(GL_ARB_draw_elements_base_vertex) DONE (i965, r300, > r600, swrast) > Frag shader coord (GL_ARB_fragment_coord_conventions) DONE (i965, r300, > r600, swrast) > @@ -79,7 +79,7 @@ GLX_ARB_create_context_profileDONE > > GL 3.3: > > -GLSL 3.30 new features in this > version pretty much done > +GLSL 3.30 DONE (i965) > GL_ARB_blend_func_extendedDONE (i965, r600, > softpipe) > GL_ARB_explicit_attrib_location DONE (i915, i965, > r300, r600, swrast) > GL_ARB_occlusion_query2 DONE (i965, r300, > r600, swrast) > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] Ivybridge support for ARB_transform_feedback2
On 10/17/2013 11:09 PM, Kenneth Graunke wrote: > Here's my implementation of ARB_transform_feedback2. I believe it's > complete; it passes all of our Piglit tests and a lot of Intel's > oglconform tests. > > This should work out of the box on Ivybridge and Baytrail. It won't > work on Haswell at the moment, due to restrictions on register writes > (to be solved in a future kernel version). Patch 9 will need to be > replaced with something that detects whether or not we can write > registers from userspace batchbuffers. > > In the meantime, I figured I'd send out the rest for review. > > Porting this back to Sandybridge is probably doable, but annoying. > Sandybridge doesn't have the MI_LOAD_REGISTER_MEM command, so we'd have > to map the buffers and use MI_LOAD_REGISTER_IMM. Seems pretty gross. > Plus, transform feedback is done very differently pre-Ivybridge. I'm > not sure it's worth it, seeing as it's a GL 4.0 feature. I assume this is just to support glDrawTransformFeedback? Can you add that information to http://dri.freedesktop.org/wiki/I965Todo/ ? > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 3/8] mesa: Pass number of samples as a program state variable
On Fri, Oct 18, 2013 at 2:44 PM, Paul Berry wrote: > On 14 October 2013 10:12, Anuj Phogat wrote: >> >> Number of samples will be required in fragment shader program by new >> GLSL builtin uniform "gl_NumSamples". >> >> Signed-off-by: Anuj Phogat >> --- >> src/mesa/program/prog_statevars.c | 11 +++ >> src/mesa/program/prog_statevars.h | 2 ++ >> 2 files changed, 13 insertions(+) >> >> diff --git a/src/mesa/program/prog_statevars.c >> b/src/mesa/program/prog_statevars.c >> index 145c07c..8f798da 100644 >> --- a/src/mesa/program/prog_statevars.c >> +++ b/src/mesa/program/prog_statevars.c >> @@ -349,6 +349,9 @@ _mesa_fetch_state(struct gl_context *ctx, const >> gl_state_index state[], >> } >>} >>return; >> + case STATE_NUM_SAMPLES: >> + ((int *)value)[0] = ctx->DrawBuffer->Visual.samples; >> + return; >> case STATE_DEPTH_RANGE: >>value[0] = ctx->Viewport.Near; /* near */ >>value[1] = ctx->Viewport.Far; /* far*/ >> @@ -665,6 +668,9 @@ _mesa_program_state_flags(const gl_state_index >> state[STATE_LENGTH]) >> case STATE_PROGRAM_MATRIX: >>return _NEW_TRACK_MATRIX; >> >> + case STATE_NUM_SAMPLES: >> + return _NEW_MULTISAMPLE; > > > I think this should be _NEW_BUFFERS. _NEW_MULTISAMPLE is only flagged when > something in gl_multisample_attrib changes, and nothing in that category > affects ctx->DrawBuffer->Visual.samples. Right. Thanks for noticing this. I'll fix it. > With that fixed, this patch is: > > Reviewed-by: Paul Berry > >> >> + >> case STATE_DEPTH_RANGE: >>return _NEW_VIEWPORT; >> >> @@ -852,6 +858,9 @@ append_token(char *dst, gl_state_index k) >> case STATE_TEXENV_COLOR: >>append(dst, "texenv"); >>break; >> + case STATE_NUM_SAMPLES: >> + append(dst, "num.samples"); >> + break; >> case STATE_DEPTH_RANGE: >>append(dst, "depth.range"); >>break; >> @@ -1027,6 +1036,8 @@ _mesa_program_state_string(const gl_state_index >> state[STATE_LENGTH]) >>break; >> case STATE_FOG_COLOR: >>break; >> + case STATE_NUM_SAMPLES: >> + break; >> case STATE_DEPTH_RANGE: >>break; >> case STATE_FRAGMENT_PROGRAM: >> diff --git a/src/mesa/program/prog_statevars.h >> b/src/mesa/program/prog_statevars.h >> index ec22b73..c3081c4 100644 >> --- a/src/mesa/program/prog_statevars.h >> +++ b/src/mesa/program/prog_statevars.h >> @@ -103,6 +103,8 @@ typedef enum gl_state_index_ { >> >> STATE_TEXENV_COLOR, >> >> + STATE_NUM_SAMPLES, >> + >> STATE_DEPTH_RANGE, >> >> STATE_VERTEX_PROGRAM, >> -- >> 1.8.1.4 >> >> ___ >> mesa-dev mailing list >> mesa-dev@lists.freedesktop.org >> http://lists.freedesktop.org/mailman/listinfo/mesa-dev > > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] i965/vec4: Reduce working set size of live variables computation.
Orbital Explorer was generating a 4000 instruction geometry shader, which was taking 275 trips through dead code elimination and register coalescing, each of which updated live variables to get its work done, and invalidated those live variables afterwards. By using bitfields instead of bools (reducing the working set size by a factor of 8) in live variables analysis, it drops from 88% of the profile to 57%, and reduces overall runtime from I-got-bored-and-killed-it (Paul says 3+ minutes) to 10.5 seconds. Compare to f179f419d1d0a03fad36c2b0a58e8b853bae6118 on the FS side. --- .../drivers/dri/i965/brw_vec4_live_variables.cpp | 41 -- .../drivers/dri/i965/brw_vec4_live_variables.h | 10 +++--- 2 files changed, 28 insertions(+), 23 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_vec4_live_variables.cpp b/src/mesa/drivers/dri/i965/brw_vec4_live_variables.cpp index db3787b..f6675c8 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_live_variables.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_live_variables.cpp @@ -83,8 +83,8 @@ vec4_live_variables::setup_def_use() for (int j = 0; j < 4; j++) { int c = BRW_GET_SWZ(inst->src[i].swizzle, j); - if (!bd[b].def[reg * 4 + c]) - bd[b].use[reg * 4 + c] = true; + if (!BITSET_TEST(bd[b].def, reg * 4 + c)) + BITSET_SET(bd[b].use, reg * 4 + c); } } } @@ -99,8 +99,8 @@ vec4_live_variables::setup_def_use() for (int c = 0; c < 4; c++) { if (inst->dst.writemask & (1 << c)) { int reg = inst->dst.reg; - if (!bd[b].use[reg * 4 + c]) - bd[b].def[reg * 4 + c] = true; + if (!BITSET_TEST(bd[b].use, reg * 4 + c)) + BITSET_SET(bd[b].def, reg * 4 + c); } } } @@ -126,12 +126,12 @@ vec4_live_variables::compute_live_variables() for (int b = 0; b < cfg->num_blocks; b++) { /* Update livein */ -for (int i = 0; i < num_vars; i++) { - if (bd[b].use[i] || (bd[b].liveout[i] && !bd[b].def[i])) { - if (!bd[b].livein[i]) { - bd[b].livein[i] = true; - cont = true; - } +for (int i = 0; i < bitset_words; i++) { +BITSET_WORD new_livein = (bd[b].use[i] | + (bd[b].liveout[i] & ~bd[b].def[i])); +if (new_livein & ~bd[b].livein[i]) { + bd[b].livein[i] |= new_livein; + cont = true; } } @@ -140,9 +140,11 @@ vec4_live_variables::compute_live_variables() bblock_link *link = (bblock_link *)block_node; bblock_t *block = link->block; - for (int i = 0; i < num_vars; i++) { - if (bd[block->block_num].livein[i] && !bd[b].liveout[i]) { - bd[b].liveout[i] = true; + for (int i = 0; i < bitset_words; i++) { + BITSET_WORD new_liveout = (bd[block->block_num].livein[i] & + ~bd[b].liveout[i]); + if (new_liveout) { + bd[b].liveout[i] |= new_liveout; cont = true; } } @@ -159,11 +161,12 @@ vec4_live_variables::vec4_live_variables(vec4_visitor *v, cfg_t *cfg) num_vars = v->virtual_grf_count * 4; bd = rzalloc_array(mem_ctx, struct block_data, cfg->num_blocks); + bitset_words = BITSET_WORDS(num_vars); for (int i = 0; i < cfg->num_blocks; i++) { - bd[i].def = rzalloc_array(mem_ctx, bool, num_vars); - bd[i].use = rzalloc_array(mem_ctx, bool, num_vars); - bd[i].livein = rzalloc_array(mem_ctx, bool, num_vars); - bd[i].liveout = rzalloc_array(mem_ctx, bool, num_vars); + bd[i].def = rzalloc_array(mem_ctx, BITSET_WORD, bitset_words); + bd[i].use = rzalloc_array(mem_ctx, BITSET_WORD, bitset_words); + bd[i].livein = rzalloc_array(mem_ctx, BITSET_WORD, bitset_words); + bd[i].liveout = rzalloc_array(mem_ctx, BITSET_WORD, bitset_words); } setup_def_use(); @@ -248,12 +251,12 @@ vec4_visitor::calculate_live_intervals() for (int b = 0; b < cfg.num_blocks; b++) { for (int i = 0; i < livevars.num_vars; i++) { -if (livevars.bd[b].livein[i]) { +if (BITSET_TEST(livevars.bd[b].livein, i)) { start[i / 4] = MIN2(start[i / 4], cfg.blocks[b]->start_ip); end[i / 4] = MAX2(end[i / 4], cfg.blocks[b]->start_ip); } -if (livevars.bd[b].liveout[i]) { +if (BITSET_TEST(livevars.bd[b].liveout, i)) { start[i / 4] = MIN2(start[i / 4], cfg.blocks[b]->end_ip); end[i / 4] = MAX2(end[i / 4], cfg.blocks[b]->end_ip); } diff --git a/src/mesa/drivers/dri/i965/brw_vec4_live_variables.h b/src/mesa/drivers/dri/i965/brw_vec4_live_variables.h index 296468
Re: [Mesa-dev] [PATCH] i965: Implement ARB_texture_mirror_clamp.
On 10/21/2013 12:58 AM, Rico Schüller wrote: > I have one minor nitpick (see below). But either way, with the subject > fixed (as mentioned by Matt), this is: > Reviewed-by: Rico Schüller > > On 21.10.2013 07:24, Kenneth Graunke wrote: >> This passes Piglit's texwrap tests (after applying Rico's patch to >> make them use this extension). >> >> Cc: Rico Schüller >> Cc: Ian Romanick >> Signed-off-by: Kenneth Graunke >> --- >> src/mesa/drivers/dri/i965/brw_wm_sampler_state.c | 2 ++ >> src/mesa/drivers/dri/i965/intel_extensions.c | 1 + >> 2 files changed, 3 insertions(+) >> >> diff --git a/src/mesa/drivers/dri/i965/brw_wm_sampler_state.c >> b/src/mesa/drivers/dri/i965/brw_wm_sampler_state.c >> index b716d61..db7ab60 100644 >> --- a/src/mesa/drivers/dri/i965/brw_wm_sampler_state.c >> +++ b/src/mesa/drivers/dri/i965/brw_wm_sampler_state.c >> @@ -71,6 +71,8 @@ translate_wrap_mode(GLenum wrap, bool using_nearest) >>return BRW_TEXCOORDMODE_CLAMP_BORDER; >> case GL_MIRRORED_REPEAT: >>return BRW_TEXCOORDMODE_MIRROR; >> + case GL_MIRROR_CLAMP_TO_EDGE_EXT: >> + return BRW_TEXCOORDMODE_MIRROR_ONCE; > I'd prefer GL_MIRROR_CLAMP_TO_EDGE instead of > GL_MIRROR_CLAMP_TO_EDGE_EXT but as it is the same value it really > shouldn't matter. Me too. My system GL headers didn't have "GL_MIRROR_CLAMP_TO_EDGE", so I didn't realize it existed. But it does! Thanks! --Ken ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] i965/fs: In the pre-regalloc schedule, try harder at reducing reg pressure.
Chia-I Wu writes: > On Thu, Oct 17, 2013 at 3:24 AM, Matt Turner wrote: >> On Mon, Oct 14, 2013 at 4:14 PM, Eric Anholt wrote: >>> Previously, the best thing we had was to schedule the things unblocked by >>> the current instruction, on the hope that it would be consuming two values >>> at the end of their live intervals while only producing one new value. >>> Sometimes that wasn't the case. >>> >>> Now, when an instruction is the first user of a GRF we schedule (i.e. it >>> will probably be the virtual_grf_def[] instruction after computing live >>> intervals again), penalize it by how many regs it would take up. When an >>> instruction is the last user of a GRF we have to schedule (when it will >>> probably be the virtual_grf_end[] instruction), give it a boost by how >>> many regs it would free. >>> >>> The new functions are made virtual (only 1 of 2 really needs to be >>> virtual) because I expect we'll soon lift the pre-regalloc scheduling >>> heuristic over to the vec4 backend. >>> >>> shader-db: >>> total instructions in shared programs: 1512756 -> 1511604 (-0.08%) >>> instructions in affected programs: 10292 -> 9140 (-11.19%) >>> GAINED:121 >>> LOST: 38 >>> >>> Improves tropics performance at my current settings by 4.50602% +/- >>> 2.60694% (n=5). No difference on Lightsmark (n=5). No difference on >>> GLB2.7 (n=11). >>> >>> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=70445 >>> --- >> >> I think we're on the right track by considering register pressure when >> scheduling, but one aspect we're not considering is simply how many >> registers we think we're using. >> >> If I understand correctly, the pre-register allocation wants to >> shorten live intervals as much as possible which reduces register >> pressure but at the cost of larger stalls and less instruction level >> parallelism. We end up scheduling things like >> >> produce result 4 >> produce result 3 >> produce result 2 >> produce result 1 >> use result 1 >> use result 2 >> use result 3 >> use result 4 >> >> (this is why the MRF writes for the FB write are always done in the >> reverse order) > In this example, it will actually be > > produce result 4 > use result 4 > produce result 3 > use result 3 > produce result 2 > use result 2 > produce result 1 > use result 1 > > and post-regalloc will schedule again to something like > > produce result 4 > produce result 3 > produce result 2 > produce result 1 > use result 4 > use result 3 > use result 2 > use result 1 > > The pre-regalloc scheduling attempts to consume the results as soon as > they are available. > > FB write is done in reverse order because, when a result is available, > its consumers are scheduled in reverse order. The epilog of fragment > shaders is usually like this: > > placeholder_halt > mov m1, g1 > mov m2, g2 > mov m3, g3 > mov m4, g4 > send > > MOVs depend on placeholder_halt, and send depends on MOVs. The > scheduler will schedule it as follows: > > placeholder_halt > mov m4, g4 > mov m3, g3 > mov m2, g2 > mov m1, g1 > send > > The order can be corrected with the change proposed here > > http://lists.freedesktop.org/archives/mesa-dev/2013-October/046570.html > > But there is no point for making the change the current heuristic for > pre-regalloc is to be reworked. Flipping the order in which we prefer ties (on betterthanlifo-2): commit 11a511576e465f02875f39c452561775a97416a1 Author: Eric Anholt Date: Mon Oct 21 11:45:53 2013 -0700 otherway diff --git a/src/mesa/drivers/dri/i965/brw_schedule_instructions.cpp b/src/mesa/ index 9a480b4..b123015 100644 --- a/src/mesa/drivers/dri/i965/brw_schedule_instructions.cpp +++ b/src/mesa/drivers/dri/i965/brw_schedule_instructions.cpp @@ -1049,9 +1049,9 @@ fs_instruction_scheduler::choose_instruction_to_schedule() * it's the first use of a GRF, reduce its score since it means it * should be increasing register pressure. */ - for (schedule_node *node = (schedule_node *)instructions.get_tail(); - node != instructions.get_head()->prev; - node = (schedule_node *)node->prev) { + for (schedule_node *node = (schedule_node *)instructions.get_head(); + node != instructions.get_head()->next; + node = (schedule_node *)node->next) { schedule_node *n = (schedule_node *)node; fs_inst *inst = (fs_inst *)n->inst; gives: total instructions in shared programs: 1544638 -> 1546794 (0.14%) instructions in affected programs: 7163 -> 9319 (30.10%) GAINED:16 LOST: 289 with massive spilling on tropics, and a bit on lightsmark and csgo. pgpD2js8_JPZE.pgp Description: PGP signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] libGL without X
Ken, I assume the new ABI for libOpenGL.so is not far enough along to be usable in production, correct? Our application is quite big and already written against OpenGL so moving to GLESv2 or 3.0 would be a considerable effort so this is not an option. Do you know the minimal amount of X libs necessary to support building libGL? On Mon, Oct 21, 2013 at 11:56 AM, Kenneth Graunke wrote: > On 10/21/2013 07:05 AM, Chris Healy wrote: > > I have a headless platform I need OpenGL to work on that does not have > > X. It is x86 with Intel HD 4000 graphics. > > > > Ultimately, I'm just wanting to use OpenGL to render to memory for > > encoding to H.264 and streaming. > > > > I'm trying to build Mesa for this platform without X and cannot get it > > to build libGL.so. > > > > What am I missing here? Is it not possible to use OpenGL without X? I > > was hoping I could use OpenGL with EGL for testing purposes. > > Unfortunately, libGL.so contains both the OpenGL and GLX interfaces, so > I don't think it's possible today. People are working on a new ABI, > libOpenGL.so, which doesn't include GLX. So eventually, it should be > possible. > > You can definitely use EGL + OpenGL ES 3.0 (libGLESv2.so) today. > > --Ken > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] Ivybridge support for ARB_transform_feedback2
On 10/21/2013 08:40 AM, Ian Romanick wrote: > On 10/17/2013 11:09 PM, Kenneth Graunke wrote: >> Here's my implementation of ARB_transform_feedback2. I believe it's >> complete; it passes all of our Piglit tests and a lot of Intel's >> oglconform tests. >> >> This should work out of the box on Ivybridge and Baytrail. It won't >> work on Haswell at the moment, due to restrictions on register writes >> (to be solved in a future kernel version). Patch 9 will need to be >> replaced with something that detects whether or not we can write >> registers from userspace batchbuffers. >> >> In the meantime, I figured I'd send out the rest for review. >> >> Porting this back to Sandybridge is probably doable, but annoying. >> Sandybridge doesn't have the MI_LOAD_REGISTER_MEM command, so we'd have >> to map the buffers and use MI_LOAD_REGISTER_IMM. Seems pretty gross. >> Plus, transform feedback is done very differently pre-Ivybridge. I'm >> not sure it's worth it, seeing as it's a GL 4.0 feature. > > I assume this is just to support glDrawTransformFeedback? No, it's to support glResumeTransformFeedback. glDrawTransformFeedback actually just reads pipeline statistics counters and leaves them free-running. > Can you add that information to http://dri.freedesktop.org/wiki/I965Todo/ ? Actually, I'm probably wrong...on Gen7 we use MI_LOAD_REGISTER_MEM to copy offsets into the SO_WRITE_OFFSET(n) registers. But on Sandybridge, XFB is done using the geometry shader, so it works entirely differently. I don't think there is a register to load. I'll just let whoever looks into it figure it out. Not much insight anyway. --Ken ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] libGL without X
On 10/21/2013 07:05 AM, Chris Healy wrote: > I have a headless platform I need OpenGL to work on that does not have > X. It is x86 with Intel HD 4000 graphics. > > Ultimately, I'm just wanting to use OpenGL to render to memory for > encoding to H.264 and streaming. > > I'm trying to build Mesa for this platform without X and cannot get it > to build libGL.so. > > What am I missing here? Is it not possible to use OpenGL without X? I > was hoping I could use OpenGL with EGL for testing purposes. Unfortunately, libGL.so contains both the OpenGL and GLX interfaces, so I don't think it's possible today. People are working on a new ABI, libOpenGL.so, which doesn't include GLX. So eventually, it should be possible. You can definitely use EGL + OpenGL ES 3.0 (libGLESv2.so) today. --Ken ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] R600: Make sure OQAP defs and uses happen in the same clause
- Mail original - > De : Tom Stellard > À : llvm-comm...@cs.uiuc.edu > Cc : mesa-dev@lists.freedesktop.org; Tom Stellard > Envoyé le : Vendredi 11 octobre 2013 20h10 > Objet : [PATCH] R600: Make sure OQAP defs and uses happen in the same clause > > From: Tom Stellard > > Reading the special OQAP register pops the top value off the LDS > input queue and returns it to the instruction. This queue is > invalidated at the end of an ALU clause and leaving values in the queue > can lead to GPU hangs. This means that if we load a value into the queue, > we must use it before the end of the clause. > > This fixes some hangs in the OpenCV test suite. > --- > lib/Target/R600/R600MachineScheduler.cpp | 25 + > lib/Target/R600/R600MachineScheduler.h | 4 ++-- > test/CodeGen/R600/lds-input-queue.ll | 26 ++ > 3 files changed, 41 insertions(+), 14 deletions(-) > create mode 100644 test/CodeGen/R600/lds-input-queue.ll > > diff --git a/lib/Target/R600/R600MachineScheduler.cpp > b/lib/Target/R600/R600MachineScheduler.cpp > index 6c26d9e..611b7f4 100644 > --- a/lib/Target/R600/R600MachineScheduler.cpp > +++ b/lib/Target/R600/R600MachineScheduler.cpp > @@ -93,11 +93,12 @@ SUnit* R600SchedStrategy::pickNode(bool &IsTopNode) > { > } > > > - // We want to scheduled AR defs as soon as possible to make sure they > aren't > - // put in a different ALU clause from their uses. > - if (!SU && !UnscheduledARDefs.empty()) { > - SU = UnscheduledARDefs[0]; > - UnscheduledARDefs.erase(UnscheduledARDefs.begin()); > + // We want to scheduled defs that cannot be live outside of this clause > + // as soon as possible to make sure they aren't put in a different > + // ALU clause from their uses. > + if (!SU && !UnscheduledNoLiveOutDefs.empty()) { > + SU = UnscheduledNoLiveOutDefs[0]; > + UnscheduledNoLiveOutDefs.erase(UnscheduledNoLiveOutDefs.begin()); > NextInstKind = IDAlu; > } > > @@ -132,9 +133,9 @@ SUnit* R600SchedStrategy::pickNode(bool &IsTopNode) > { > > // We want to schedule the AR uses as late as possible to make sure that > // the AR defs have been released. > - if (!SU && !UnscheduledARUses.empty()) { > - SU = UnscheduledARUses[0]; > - UnscheduledARUses.erase(UnscheduledARUses.begin()); > + if (!SU && !UnscheduledNoLiveOutUses.empty()) { > + SU = UnscheduledNoLiveOutUses[0]; > + UnscheduledNoLiveOutUses.erase(UnscheduledNoLiveOutUses.begin()); Can we use std::queue instead of a std::vector for UnscheduledNoLiveOutUses ? I had to use a vector because I needed to be able to pop non topmost SUnit in some case (to fit Instruction Group const read limitation) but I would rather avoid erase(iterator) call when possible. > NextInstKind = IDAlu; > } > > @@ -217,15 +218,15 @@ void R600SchedStrategy::releaseBottomNode(SUnit *SU) > { > > int IK = getInstKind(SU); > > - // Check for AR register defines > + // Check for registers that do not live across ALU clauses. > for (MachineInstr::const_mop_iterator I = > SU->getInstr()->operands_begin(), > E = > SU->getInstr()->operands_end(); > I != E; ++I) { > - if (I->isReg() && I->getReg() == AMDGPU::AR_X) { > + if (I->isReg() && (I->getReg() == AMDGPU::AR_X || > I->getReg() == AMDGPU::OQAP)) { > if (I->isDef()) { > - UnscheduledARDefs.push_back(SU); > + UnscheduledNoLiveOutDefs.push_back(SU); > } else { > - UnscheduledARUses.push_back(SU); > + UnscheduledNoLiveOutUses.push_back(SU); > } > return; > } > diff --git a/lib/Target/R600/R600MachineScheduler.h > b/lib/Target/R600/R600MachineScheduler.h > index 0a6f120..db2e188 100644 > --- a/lib/Target/R600/R600MachineScheduler.h > +++ b/lib/Target/R600/R600MachineScheduler.h > @@ -53,8 +53,8 @@ class R600SchedStrategy : public MachineSchedStrategy { > > std::vector Available[IDLast], Pending[IDLast]; > std::vector AvailableAlus[AluLast]; > - std::vector UnscheduledARDefs; > - std::vector UnscheduledARUses; > + std::vector UnscheduledNoLiveOutDefs; > + std::vector UnscheduledNoLiveOutUses; > std::vector PhysicalRegCopy; > > InstKind CurInstKind; > diff --git a/test/CodeGen/R600/lds-input-queue.ll > b/test/CodeGen/R600/lds-input-queue.ll > new file mode 100644 > index 000..548b41c > --- /dev/null > +++ b/test/CodeGen/R600/lds-input-queue.ll > @@ -0,0 +1,26 @@ > +; RUN: llc < %s -march=r600 -mcpu=redwood | FileCheck %s Does the test work with -verify-machineinstrs flag set ? > +; > +; This test checks that the lds input queue will is empty at the end of > +; the ALU clause. > + > +; CHECK-LABEL: @lds_input_queue > +; CHECK: LDS_READ_RET * OQAP > +; CHECK-NOT: ALU clause > +; CHECK: MOV T{{[0-9]\.[XYZW]}}, OQAP > + > +@local_mem = internal addrspace(3) unnamed_addr global [2 x i32] [i32 1, i32 > 2], align 4 >
Re: [Mesa-dev] libGL without X
Actually, I think I found the answer to the minimal amount of X libs necessary. I got rid of --disable-glx from my build config and ran into the following when running configure: checking for XF86VIDMODE... no checking for DRIGL... no configure: error: Package requirements (x11 xext xdamage xfixes x11-xcb xcb-glx >= 1.8.1 xcb-dri2 >= 1.8) were not met: No package 'x11' found No package 'xext' found No package 'xdamage' found No package 'xfixes' found No package 'x11-xcb' found No package 'xcb-glx' found No package 'xcb-dri2' found Is there a way around needing all of these just to build libGL if I just want to run OpenGL with EGL and write to memory? On Mon, Oct 21, 2013 at 12:03 PM, Chris Healy wrote: > Ken, > > I assume the new ABI for libOpenGL.so is not far enough along to be usable > in production, correct? > > Our application is quite big and already written against OpenGL so moving > to GLESv2 or 3.0 would be a considerable effort so this is not an option. > > Do you know the minimal amount of X libs necessary to support building > libGL? > > > On Mon, Oct 21, 2013 at 11:56 AM, Kenneth Graunke > wrote: > >> On 10/21/2013 07:05 AM, Chris Healy wrote: >> > I have a headless platform I need OpenGL to work on that does not have >> > X. It is x86 with Intel HD 4000 graphics. >> > >> > Ultimately, I'm just wanting to use OpenGL to render to memory for >> > encoding to H.264 and streaming. >> > >> > I'm trying to build Mesa for this platform without X and cannot get it >> > to build libGL.so. >> > >> > What am I missing here? Is it not possible to use OpenGL without X? I >> > was hoping I could use OpenGL with EGL for testing purposes. >> >> Unfortunately, libGL.so contains both the OpenGL and GLX interfaces, so >> I don't think it's possible today. People are working on a new ABI, >> libOpenGL.so, which doesn't include GLX. So eventually, it should be >> possible. >> >> You can definitely use EGL + OpenGL ES 3.0 (libGLESv2.so) today. >> >> --Ken >> > > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/2] r600/llvm: Fix texbuf for pre EG gen
--- src/gallium/drivers/r600/r600_llvm.c | 29 + 1 file changed, 29 insertions(+) diff --git a/src/gallium/drivers/r600/r600_llvm.c b/src/gallium/drivers/r600/r600_llvm.c index 34dd3ad..d7fa5f8 100644 --- a/src/gallium/drivers/r600/r600_llvm.c +++ b/src/gallium/drivers/r600/r600_llvm.c @@ -427,6 +427,35 @@ static void llvm_emit_tex( emit_data->output[0] = build_intrinsic(gallivm->builder, "llvm.R600.load.texbuf", emit_data->dst_type, args, 2, LLVMReadNoneAttribute); + if (ctx->chip_class >= EVERGREEN) + return; + ctx->uses_tex_buffers = true; + LLVMDumpValue(emit_data->output[0]); + emit_data->output[0] = LLVMBuildBitCast(gallivm->builder, + emit_data->output[0], LLVMVectorType(bld_base->base.int_elem_type, 4), + ""); + LLVMValueRef Mask = llvm_load_const_buffer(bld_base, + lp_build_const_int32(gallivm, 0), + LLVM_R600_BUFFER_INFO_CONST_BUFFER); + Mask = LLVMBuildBitCast(gallivm->builder, Mask, + LLVMVectorType(bld_base->base.int_elem_type, 4), ""); + emit_data->output[0] = lp_build_emit_llvm_binary(bld_base, TGSI_OPCODE_AND, + emit_data->output[0], + Mask); + LLVMValueRef WComponent = LLVMBuildExtractElement(gallivm->builder, + emit_data->output[0], lp_build_const_int32(gallivm, 3), ""); + Mask = llvm_load_const_buffer(bld_base, lp_build_const_int32(gallivm, 1), + LLVM_R600_BUFFER_INFO_CONST_BUFFER); + Mask = LLVMBuildExtractElement(gallivm->builder, Mask, + lp_build_const_int32(gallivm, 0), ""); + Mask = LLVMBuildBitCast(gallivm->builder, Mask, + bld_base->base.int_elem_type, ""); + WComponent = lp_build_emit_llvm_binary(bld_base, TGSI_OPCODE_OR, + WComponent, Mask); + emit_data->output[0] = LLVMBuildInsertElement(gallivm->builder, + emit_data->output[0], WComponent, lp_build_const_int32(gallivm, 3), ""); + emit_data->output[0] = LLVMBuildBitCast(gallivm->builder, + emit_data->output[0], LLVMVectorType(bld_base->base.elem_type, 4), ""); } return; default: -- 1.8.3.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/2] r600/llvm: Fix isampleBuffer on preEG
--- src/gallium/drivers/r600/r600_llvm.c | 15 ++- 1 file changed, 14 insertions(+), 1 deletion(-) diff --git a/src/gallium/drivers/r600/r600_llvm.c b/src/gallium/drivers/r600/r600_llvm.c index d7fa5f8..5afe3cb 100644 --- a/src/gallium/drivers/r600/r600_llvm.c +++ b/src/gallium/drivers/r600/r600_llvm.c @@ -415,9 +415,22 @@ static void llvm_emit_tex( case TGSI_OPCODE_TXQ: { struct radeon_llvm_context * ctx = radeon_llvm_context(bld_base); ctx->uses_tex_buffers = true; - LLVMValueRef offset = lp_build_const_int32(bld_base->base.gallivm, 0); + bool isEgPlus = (ctx->chip_class >= EVERGREEN); + LLVMValueRef offset = lp_build_const_int32(bld_base->base.gallivm, + isEgPlus ? 0 : 1); LLVMValueRef cvecval = llvm_load_const_buffer(bld_base, offset, LLVM_R600_BUFFER_INFO_CONST_BUFFER); + if (!isEgPlus) { + LLVMValueRef maskval[4] = { + lp_build_const_int32(gallivm, 1), + lp_build_const_int32(gallivm, 2), + lp_build_const_int32(gallivm, 3), + lp_build_const_int32(gallivm, 0), + }; + LLVMValueRef mask = LLVMConstVector(maskval, 4); + cvecval = LLVMBuildShuffleVector(gallivm->builder, cvecval, cvecval, + mask, ""); + } emit_data->output[0] = cvecval; return; } -- 1.8.3.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] libGL without X
On Mon, Oct 21, 2013 at 4:05 PM, Chris Healy wrote: > I have a headless platform I need OpenGL to work on that does not have X. > It is x86 with Intel HD 4000 graphics. > > Ultimately, I'm just wanting to use OpenGL to render to memory for encoding > to H.264 and streaming. > > I'm trying to build Mesa for this platform without X and cannot get it to > build libGL.so. > > What am I missing here? Is it not possible to use OpenGL without X? I was > hoping I could use OpenGL with EGL for testing purposes. If you build mesa with GBM support, you should be able to render without X. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] i965/fs: In the pre-regalloc schedule, try harder at reducing reg pressure.
Matt Turner writes: > On Mon, Oct 14, 2013 at 4:14 PM, Eric Anholt wrote: >> Previously, the best thing we had was to schedule the things unblocked by >> the current instruction, on the hope that it would be consuming two values >> at the end of their live intervals while only producing one new value. >> Sometimes that wasn't the case. >> >> Now, when an instruction is the first user of a GRF we schedule (i.e. it >> will probably be the virtual_grf_def[] instruction after computing live >> intervals again), penalize it by how many regs it would take up. When an >> instruction is the last user of a GRF we have to schedule (when it will >> probably be the virtual_grf_end[] instruction), give it a boost by how >> many regs it would free. >> >> The new functions are made virtual (only 1 of 2 really needs to be >> virtual) because I expect we'll soon lift the pre-regalloc scheduling >> heuristic over to the vec4 backend. >> >> shader-db: >> total instructions in shared programs: 1512756 -> 1511604 (-0.08%) >> instructions in affected programs: 10292 -> 9140 (-11.19%) >> GAINED:121 >> LOST: 38 >> >> Improves tropics performance at my current settings by 4.50602% +/- >> 2.60694% (n=5). No difference on Lightsmark (n=5). No difference on >> GLB2.7 (n=11). >> >> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=70445 >> --- > > I think we're on the right track by considering register pressure when > scheduling, but one aspect we're not considering is simply how many > registers we think we're using. > > If I understand correctly, the pre-register allocation wants to > shorten live intervals as much as possible which reduces register > pressure but at the cost of larger stalls and less instruction level > parallelism. We end up scheduling things like > > produce result 4 > produce result 3 > produce result 2 > produce result 1 > use result 1 > use result 2 > use result 3 > use result 4 > > (this is why the MRF writes for the FB write are always done in the > reverse order) > > Take the main shader from FillTestC24Z16 in GLB2.5 or 2.7 as an > example. Before texture-grf we serialized the eight texture sends. > After that branch landed, we scheduled the code much better, leading > to a performance improvement. > > This patch causes us again to serialize the 8 texture ops in > GLB25_FillTestC24Z16, like we did before texture-from-grf. It reduces > performance from 7.0 billion texels/sec to ~6.5 on IVB. This is mostly a problem, as far as I can see, of unfortunate GRF choices between the send sources and dests. I haven't seen an easy way out of that beyond what we're doing with the round_robin flag in the register allocator already, so let's play with scheduling some more for the moment... > Can we accurately track the number of registers in use and decide what > to do based on that? An attempt to do this is on betterthanlifo-3 of my tree. The quick results: total instructions in shared programs: 1599565 -> 1599757 (0.01%) instructions in affected programs: 2014 -> 2206 (9.53%) GAINED:22 LOST: 110 That's not at all what I hoped for. But maybe the problem is that we end up faced with a ton of multiplies of components of texture results and we don't know which one we should pick next once we've picked one of them? Maybe if we give a higher weight to things that will help finish off a VGRF's use? I present betterthanlifo-6: anholt@eliezer:anholt/src/shader-db% ./report.py sched-lifo3 sched-lifo6 total instructions in shared programs: 1606060 -> 1606060 (0.00%) instructions in affected programs: 0 -> 0 GAINED:0 LOST: 0 Well that wasn't the result I was expecting. But it kinda makes sense: Once we've scheduled processing of .x, the next thing we'll probably choose even in the absence of weighting is .y, not some *other* texture which had been inserted into the list at a totally separate time. Looking at performance going from betterthanlifo-2 to betterthanlifo-3: GLB2.7: 1.39845% +/- 0.797931% (n=15/16) lm: No difference (n=3) minecraft: No difference (n=10) tropics: -4.12118% +/- 2.48834% (n=4) nexuiz: No difference (n=8) openarena: -1.46747% +/- 1.08201% (n=110) At this point I think I want to go forward with -2 (this patch) as opposed to -3. (Note: Results presented in this thread, after the original patch posting, are on top of glsl-cse, trying to reduce the significance of that one crazy Tropics shader that spawned all this flailing about in register allocation). pgpN5P2mxR8Fk.pgp Description: PGP signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] glBufferSubData() GPU stall reduction (DOTA2 optimization).
The only feedback would be that it would be nice if patch 8 were broken down somewhat. But, the only suggestions I could think of for possible items to split out were: * intel_bufferobj_buffer: remove flag parameter * set gpu_active_start/end in a lead-up patch Another question for patch 8. Would gpu_active_start/end be able to also handle the job of gpu_active? (Set them out of range when !gpu_active?) Doesn't seem all that important though. Series (with or without changes mentioned above): Reviewed-by: Jordan Justen On Tue, 2013-10-08 at 14:00 -0700, Eric Anholt wrote: > Since it sounds like valve won't be able to fix dota2's rendering to use > ARB_mbr soon, here's a series to add just a little bit of tracking that > works around most of the overhead of not using ARB_mbr with their > rendering pattern. 7.69854% +/- 0.909163% (n=3) fps improvement with > default settings. We could also leverage this for some apps that misuse > ARB_mbr in the future. > > This doesn't look like it affects GLB2.7, which has a very special-looking > access pattern to its BO. > > Code is also on the "subdata" branch of my tree. > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] i965: Only emit interpolation setup if there are actual FS inputs.
Kenneth Graunke writes: > Dead code elimination would get rid of the extra instructions, but > skipping this saves iterations through the optimization loop. > > From shader-db: > > N Min MaxMedian AvgStddev > x 14672 3 16 3 3.13345150.59904168 > + 14672 1 16 3 2.89551530.77732963 > Difference at 95.0% confidence > -0.237936 +/- 0.0158798 > -7.59342% +/- 0.506783% > (Student's t, pooled s = 0.693935) > > Embarassingly, the classic shadow mapping shader: > >void main() { } > > used to require three iterations through the optimization loop. > With this patch, it only requires one (which makes no progress). Reviewed-by: Eric Anholt pgp2dBKoPyomx.pgp Description: PGP signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/2] i965: Fix glTexImage when packing alignment != cpp
Fixes texture corruption of Weston clients on cairo-glesv2 backend. Commit 49ed599 introduced the bug. Corruption occured when glTexSubImage called intel_texsubimage_tiled_memcpy() with: x,y=10,9 w,h=7,7 format=GL_ALPHA(0x1906) type=GL_UNSIGNED_BYTE(0x1401) gl_format=MESA_FORMAT_A8(0x18) packing.alignemnt=4 The function miscalculated the source image's stride as w*cpp=7 without taking into account the packing alignment. The actual stride was 8. CC: Frank Henigman CC: Kristian Høgsberg Reported-by: U. Artie Eoff Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=70435 Signed-off-by: Chad Versace --- This series lives on my branch bug-70435. Kristian verified that it fixed weston-terminal. src/mesa/drivers/dri/i965/intel_tex_subimage.c | 13 +++-- 1 file changed, 11 insertions(+), 2 deletions(-) diff --git a/src/mesa/drivers/dri/i965/intel_tex_subimage.c b/src/mesa/drivers/dri/i965/intel_tex_subimage.c index 5cfdbd9..157108f 100644 --- a/src/mesa/drivers/dri/i965/intel_tex_subimage.c +++ b/src/mesa/drivers/dri/i965/intel_tex_subimage.c @@ -27,6 +27,7 @@ **/ #include "main/bufferobj.h" +#include "main/image.h" #include "main/macros.h" #include "main/mtypes.h" #include "main/pbo.h" @@ -532,6 +533,7 @@ intel_texsubimage_tiled_memcpy(struct gl_context * ctx, { struct brw_context *brw = brw_context(ctx); struct intel_texture_image *image = intel_texture_image(texImage); + int src_pitch; /* The miptree's buffer. */ drm_intel_bo *bo; @@ -544,6 +546,11 @@ intel_texsubimage_tiled_memcpy(struct gl_context * ctx, /* This fastpath is restricted to specific texture types: level 0 of * a 2D BGRA, RGBA, L8 or A8 texture. It could be generalized to support * more types. +* +* FINISHME: The restrictions below on packing alignment and packing row +* length are likely unneeded now because we calculate the source stride +* with _mesa_image_row_stride. However, before removing the restrictions +* we need tests. */ if (!brw->has_llc || type != GL_UNSIGNED_BYTE || @@ -609,6 +616,8 @@ intel_texsubimage_tiled_memcpy(struct gl_context * ctx, return false; } + src_pitch = _mesa_image_row_stride(packing, width, format, type); + /* We postponed printing this message until having committed to executing * the function. */ @@ -618,8 +627,8 @@ intel_texsubimage_tiled_memcpy(struct gl_context * ctx, linear_to_tiled( xoffset * cpp, (xoffset + width) * cpp, yoffset, yoffset + height, - bo->virtual, pixels - (xoffset + yoffset * width) * cpp, - image->mt->region->pitch, width * cpp, + bo->virtual, pixels - yoffset * src_pitch - xoffset * cpp, + image->mt->region->pitch, src_pitch, brw->has_swizzling, image->mt->region->tiling, mem_copy -- 1.8.3.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/2] i965: Print more debuginfo in intel_texsubimage_memcpy()
Print info about packing, format, type, and tiling. This will help debug future issues with this fastpath. CC: Frank Henigman Signed-off-by: Chad Versace --- src/mesa/drivers/dri/i965/intel_tex_subimage.c | 10 -- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/src/mesa/drivers/dri/i965/intel_tex_subimage.c b/src/mesa/drivers/dri/i965/intel_tex_subimage.c index 157108f..0384bcc 100644 --- a/src/mesa/drivers/dri/i965/intel_tex_subimage.c +++ b/src/mesa/drivers/dri/i965/intel_tex_subimage.c @@ -621,8 +621,14 @@ intel_texsubimage_tiled_memcpy(struct gl_context * ctx, /* We postponed printing this message until having committed to executing * the function. */ - DBG("%s: level=%d offset=(%d,%d) (w,h)=(%d,%d)\n", - __FUNCTION__, texImage->Level, xoffset, yoffset, width, height); + DBG("%s: level=%d offset=(%d,%d) (w,h)=(%d,%d) format=0x%x type=0x%x " + "gl_format=0x%x tiling=%d " + "packing=(alignment=%d row_length=%d skip_pixels=%d skip_rows=%d) " + "for_glTexImage=%d\n", + __FUNCTION__, texImage->Level, xoffset, yoffset, width, height, + format, type, texImage->TexFormat, image->mt->region->tiling, + packing->Alignment, packing->RowLength, packing->SkipPixels, + packing->SkipRows, for_glTexImage); linear_to_tiled( xoffset * cpp, (xoffset + width) * cpp, -- 1.8.3.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/3] glsl: Use saved values instead of recomputing them.
On 16 October 2013 16:56, Matt Turner wrote: > --- > src/glsl/opt_algebraic.cpp | 12 > 1 file changed, 4 insertions(+), 8 deletions(-) > Series is: Reviewed-by: Paul Berry > > diff --git a/src/glsl/opt_algebraic.cpp b/src/glsl/opt_algebraic.cpp > index 3e5802e..b915f3c 100644 > --- a/src/glsl/opt_algebraic.cpp > +++ b/src/glsl/opt_algebraic.cpp > @@ -257,11 +257,9 @@ ir_algebraic_visitor::handle_expression(ir_expression > *ir) > * folding. > */ >if (op_const[0] && !op_const[1]) > -reassociate_constant(ir, 0, op_const[0], > - ir->operands[1]->as_expression()); > +reassociate_constant(ir, 0, op_const[0], op_expr[1]); >if (op_const[1] && !op_const[0]) > -reassociate_constant(ir, 1, op_const[1], > - ir->operands[0]->as_expression()); > +reassociate_constant(ir, 1, op_const[1], op_expr[0]); >break; > > case ir_binop_sub: > @@ -315,11 +313,9 @@ ir_algebraic_visitor::handle_expression(ir_expression > *ir) > * constant folding. > */ >if (op_const[0] && !op_const[1]) > -reassociate_constant(ir, 0, op_const[0], > - ir->operands[1]->as_expression()); > +reassociate_constant(ir, 0, op_const[0], op_expr[1]); >if (op_const[1] && !op_const[0]) > -reassociate_constant(ir, 1, op_const[1], > - ir->operands[0]->as_expression()); > +reassociate_constant(ir, 1, op_const[1], op_expr[0]); > >break; > > -- > 1.8.3.2 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/mesa-dev > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] nv50: implement multisample textures
This is a port of 4da54c91d24da ("nvc0: implement multisample textures") to nv50. When coupled with the patch to only report 16 texture samplers (to fix crashes), all of the Piglit tests in spec/arb_texture_multisample pass. --- .../nouveau/codegen/nv50_ir_lowering_nv50.cpp |5 ++- src/gallium/drivers/nouveau/nv50/nv50_context.c| 46 src/gallium/drivers/nouveau/nv50/nv50_miptree.c|2 + src/gallium/drivers/nouveau/nv50/nv50_screen.c |3 +- src/gallium/drivers/nouveau/nv50/nv50_tex.c| 20 +++-- 5 files changed, 70 insertions(+), 6 deletions(-) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp index caaf09f..d5d1f1e 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp @@ -569,6 +569,7 @@ NV50LoweringPreSSA::handleTEX(TexInstruction *i) const int arg = i->tex.target.getArgCount(); const int dref = arg; const int lod = i->tex.target.isShadow() ? (arg + 1) : arg; + const int lyr = arg - (i->tex.target.isMS() ? 2 : 1); // dref comes before bias/lod if (i->tex.target.isShadow()) @@ -577,11 +578,11 @@ NV50LoweringPreSSA::handleTEX(TexInstruction *i) // array index must be converted to u32 if (i->tex.target.isArray()) { - Value *layer = i->getSrc(arg - 1); + Value *layer = i->getSrc(lyr); LValue *src = new_LValue(func, FILE_GPR); bld.mkCvt(OP_CVT, TYPE_U32, src, TYPE_F32, layer); bld.mkOp2(OP_MIN, TYPE_U32, src, src, bld.loadImm(NULL, 511)); - i->setSrc(arg - 1, src); + i->setSrc(lyr, src); if (i->tex.target.isCube()) { std::vector acube, a2d; diff --git a/src/gallium/drivers/nouveau/nv50/nv50_context.c b/src/gallium/drivers/nouveau/nv50/nv50_context.c index b6bdf79..45e3f5d 100644 --- a/src/gallium/drivers/nouveau/nv50/nv50_context.c +++ b/src/gallium/drivers/nouveau/nv50/nv50_context.c @@ -194,6 +194,10 @@ nv50_invalidate_resource_storage(struct nouveau_context *ctx, return ref; } +static void +nv50_context_get_sample_position(struct pipe_context *, unsigned, unsigned, + float *); + struct pipe_context * nv50_create(struct pipe_screen *pscreen, void *priv) { @@ -237,6 +241,7 @@ nv50_create(struct pipe_screen *pscreen, void *priv) pipe->flush = nv50_flush; pipe->texture_barrier = nv50_texture_barrier; + pipe->get_sample_position = nv50_context_get_sample_position; if (!screen->cur_ctx) { screen->cur_ctx = nv50; @@ -315,3 +320,44 @@ nv50_bufctx_fence(struct nouveau_bufctx *bufctx, boolean on_flush) nv50_resource_validate(res, (unsigned)ref->priv_data); } } + +static void +nv50_context_get_sample_position(struct pipe_context *pipe, + unsigned sample_count, unsigned sample_index, + float *xy) +{ + static const uint8_t ms1[1][2] = { { 0x8, 0x8 } }; + static const uint8_t ms2[2][2] = { + { 0x4, 0x4 }, { 0xc, 0xc } }; /* surface coords (0,0), (1,0) */ + static const uint8_t ms4[4][2] = { + { 0x6, 0x2 }, { 0xe, 0x6 }, /* (0,0), (1,0) */ + { 0x2, 0xa }, { 0xa, 0xe } }; /* (0,1), (1,1) */ + static const uint8_t ms8[8][2] = { + { 0x1, 0x7 }, { 0x5, 0x3 }, /* (0,0), (1,0) */ + { 0x3, 0xd }, { 0x7, 0xb }, /* (0,1), (1,1) */ + { 0x9, 0x5 }, { 0xf, 0x1 }, /* (2,0), (3,0) */ + { 0xb, 0xf }, { 0xd, 0x9 } }; /* (2,1), (3,1) */ +#if 0 + /* NOTE: NVA3+ has alternative modes for MS2 and MS8, currently not used */ + static const uint8_t ms8_alt[8][2] = { + { 0x9, 0x5 }, { 0x7, 0xb }, /* (2,0), (1,1) */ + { 0xd, 0x9 }, { 0x5, 0x3 }, /* (3,1), (1,0) */ + { 0x3, 0xd }, { 0x1, 0x7 }, /* (0,1), (0,0) */ + { 0xb, 0xf }, { 0xf, 0x1 } }; /* (2,1), (3,0) */ +#endif + + const uint8_t (*ptr)[2]; + + switch (sample_count) { + case 0: + case 1: ptr = ms1; break; + case 2: ptr = ms2; break; + case 4: ptr = ms4; break; + case 8: ptr = ms8; break; + default: + assert(0); + return; /* bad sample count -> undefined locations */ + } + xy[0] = ptr[sample_index][0] * 0.0625f; + xy[1] = ptr[sample_index][1] * 0.0625f; +} diff --git a/src/gallium/drivers/nouveau/nv50/nv50_miptree.c b/src/gallium/drivers/nouveau/nv50/nv50_miptree.c index 513d8f9..1963a4a 100644 --- a/src/gallium/drivers/nouveau/nv50/nv50_miptree.c +++ b/src/gallium/drivers/nouveau/nv50/nv50_miptree.c @@ -277,6 +277,8 @@ nv50_miptree_init_layout_tiled(struct nv50_miptree *mt) */ d = mt->layout_3d ? pt->depth0 : 1; + assert(!mt->ms_mode || !pt->last_level); + for (l = 0; l <= pt->last_level; ++l) { struct nv50_miptree_level *lvl = &mt->level[l]; unsigned tsx, tsy, tsz; diff --git a/src/gallium/drivers/nouveau/nv50/nv50_screen.c b/src/gallium/drivers/nouveau/nv50/nv50_s
[Mesa-dev] Bug with glBlitFramebufferEXT(), and a piglit test for it
Hello, everyone, Attached is a new test for piglit which exposes a bug in Mesa's software rendering (and another bug in hardware rendering, but that's not its main purpose). The bug is as follows. With software rendering, after doing a buffer swap, glBlitFrameBufferEXT() appears to to copy the whole framebuffer instead of just the specified region. This breaks clutter and cogl, since they keep track of a dirty region themselves, and they use blits instead of full buffer swaps to avoid updating the whole display on every frame unnecessarily. What is happening is actually a bit more complicated. glBlitFrameBufferEXT()'s basic machinery works correctly, but if there has been a buffer swap before it, the following happens: 1. Draw some stuff (say, to GL_BACK) 2. Swap buffers. As far as I can tell, this just causes an XPutImage() from the GL_BACK buffer to the X window. 3. Draw some stuff to GL_BACK. 4. Do glBlitFrameBufferEXT() from GL_BACK to GL_FRONT with the area you are interested in. 5. Internally, Mesa sees that the buffer for GL_FRONT has not been created yet, so it creates it and does the blit. 6. Do glFlush() so that GL_FRONT actually gets sent to the screen. This causes an XPutImage() of the *whole* of GL_FRONT, thus giving incorrect results - the area that should have been updated is the one from (4), i.e. just the blit. If you run the test program with hardware acceleration, it will work correctly. But if you run it with LIBGL_ALWAYS_SOFTWARE=1, it will fail. For a related bug, do the following: in the test program change the line that says #define SWAP_BUFFERS_BEFORE_BLIT 1 from 1 to 0. Run the program again; this time it will work correctly with software rendering, but at least on my box it fails with hardware rendering (Intel). I don't know enough about Mesa's internals to fix this quickly. Any help is appreciated. Thanks, Federico >From 5c565b6cb053b3917be826276b8e0d2254699a8f Mon Sep 17 00:00:00 2001 From: Federico Mena Quintero Date: Thu, 17 Oct 2013 14:52:31 -0500 Subject: [PATCH] fbo-blit-after-swap: New test for partial blits after a buffer swap The clutter/cogl libraries try to minimize the area that gets updated on every frame. They do this by doing glBlitFramebufferEXT() from the back buffer to the front buffer. However, this is buggy with software rendering if there has been a buffer swap *before* the first blit from the back buffer to the front buffer. In this case, Mesa copies the whole back buffer into the front buffer, instead of just the requested region. --- tests/all.tests | 1 + tests/fbo/CMakeLists.gl.txt | 1 + tests/fbo/fbo-blit-after-swap.c | 136 3 files changed, 138 insertions(+) create mode 100644 tests/fbo/fbo-blit-after-swap.c diff --git a/tests/all.tests b/tests/all.tests index 7ab841e..6c92ebf 100644 --- a/tests/all.tests +++ b/tests/all.tests @@ -1163,6 +1163,7 @@ for format in ('rgba', 'depth', 'stencil'): test_name = ' '.join(['framebuffer-blit-levels', test_mode, format]) arb_framebuffer_object[test_name] = PlainExecTest(test_name + ' -auto') add_plain_test(arb_framebuffer_object, 'fbo-alpha') +add_plain_test(arb_framebuffer_object, 'fbo-blit-after-swap') add_plain_test(arb_framebuffer_object, 'fbo-blit-stretch') add_plain_test(arb_framebuffer_object, 'fbo-blit-scaled-linear') add_plain_test(arb_framebuffer_object, 'fbo-attachments-blit-scaled-linear') diff --git a/tests/fbo/CMakeLists.gl.txt b/tests/fbo/CMakeLists.gl.txt index 588fe26..3ad9ec0 100644 --- a/tests/fbo/CMakeLists.gl.txt +++ b/tests/fbo/CMakeLists.gl.txt @@ -31,6 +31,7 @@ piglit_add_executable (fbo-alpha fbo-alpha.c) piglit_add_executable (fbo-luminance-alpha fbo-luminance-alpha.c) piglit_add_executable (fbo-bind-renderbuffer fbo-bind-renderbuffer.c) piglit_add_executable (fbo-blit fbo-blit.c) +piglit_add_executable (fbo-blit-after-swap fbo-blit-after-swap.c) piglit_add_executable (fbo-blit-d24s8 fbo-blit-d24s8.c) piglit_add_executable (fbo-blit-stretch fbo-blit-stretch.cpp) piglit_add_executable (fbo-blending-formats fbo-blending-formats.c) diff --git a/tests/fbo/fbo-blit-after-swap.c b/tests/fbo/fbo-blit-after-swap.c new file mode 100644 index 000..38fc870 --- /dev/null +++ b/tests/fbo/fbo-blit-after-swap.c @@ -0,0 +1,136 @@ +/* + * Copyright 2013 Suse, Inc. + * Copyright © 2011 Henri Verbeet + * Copyright 2011 Red Hat, Inc. + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the "Software"), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice (including the next + * pa
Re: [Mesa-dev] [PATCH 1/2] r600/llvm: Fix texbuf for pre EG gen
On Mon, Oct 21, 2013 at 10:02:12PM +0200, Vincent Lejeune wrote: Can you add an explanation to the commit messages for both patches about what was wrong with the old code? Thanks, Tom > --- > src/gallium/drivers/r600/r600_llvm.c | 29 + > 1 file changed, 29 insertions(+) > > diff --git a/src/gallium/drivers/r600/r600_llvm.c > b/src/gallium/drivers/r600/r600_llvm.c > index 34dd3ad..d7fa5f8 100644 > --- a/src/gallium/drivers/r600/r600_llvm.c > +++ b/src/gallium/drivers/r600/r600_llvm.c > @@ -427,6 +427,35 @@ static void llvm_emit_tex( > emit_data->output[0] = build_intrinsic(gallivm->builder, > "llvm.R600.load.texbuf", > emit_data->dst_type, > args, 2, LLVMReadNoneAttribute); > + if (ctx->chip_class >= EVERGREEN) > + return; > + ctx->uses_tex_buffers = true; > + LLVMDumpValue(emit_data->output[0]); > + emit_data->output[0] = > LLVMBuildBitCast(gallivm->builder, > + emit_data->output[0], > LLVMVectorType(bld_base->base.int_elem_type, 4), > + ""); > + LLVMValueRef Mask = llvm_load_const_buffer(bld_base, > + lp_build_const_int32(gallivm, 0), > + LLVM_R600_BUFFER_INFO_CONST_BUFFER); > + Mask = LLVMBuildBitCast(gallivm->builder, Mask, > + LLVMVectorType(bld_base->base.int_elem_type, > 4), ""); > + emit_data->output[0] = > lp_build_emit_llvm_binary(bld_base, TGSI_OPCODE_AND, > + emit_data->output[0], > + Mask); > + LLVMValueRef WComponent = > LLVMBuildExtractElement(gallivm->builder, > + emit_data->output[0], > lp_build_const_int32(gallivm, 3), ""); > + Mask = llvm_load_const_buffer(bld_base, > lp_build_const_int32(gallivm, 1), > + LLVM_R600_BUFFER_INFO_CONST_BUFFER); > + Mask = LLVMBuildExtractElement(gallivm->builder, Mask, > + lp_build_const_int32(gallivm, 0), ""); > + Mask = LLVMBuildBitCast(gallivm->builder, Mask, > + bld_base->base.int_elem_type, ""); > + WComponent = lp_build_emit_llvm_binary(bld_base, > TGSI_OPCODE_OR, > + WComponent, Mask); > + emit_data->output[0] = > LLVMBuildInsertElement(gallivm->builder, > + emit_data->output[0], WComponent, > lp_build_const_int32(gallivm, 3), ""); > + emit_data->output[0] = > LLVMBuildBitCast(gallivm->builder, > + emit_data->output[0], > LLVMVectorType(bld_base->base.elem_type, 4), ""); > } > return; > default: > -- > 1.8.3.1 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] egl: Enable EGL_EXT_client_extensions
Insert two fields into _egl_global to hold the client extensions and statically initialize them: ClientExtensions // a struct of bools ClientExtensionString Post-patch, Mesa supports exactly one client extension, EGL_EXT_client_extensions. Signed-off-by: Chad Versace --- src/egl/main/eglapi.c | 8 +++- src/egl/main/eglglobals.c | 8 src/egl/main/eglglobals.h | 7 +++ 3 files changed, 22 insertions(+), 1 deletion(-) diff --git a/src/egl/main/eglapi.c b/src/egl/main/eglapi.c index 2d8653f..66f96de 100644 --- a/src/egl/main/eglapi.c +++ b/src/egl/main/eglapi.c @@ -87,6 +87,7 @@ #include #include +#include "eglglobals.h" #include "eglcontext.h" #include "egldisplay.h" #include "egltypedefs.h" @@ -354,10 +355,15 @@ eglTerminate(EGLDisplay dpy) const char * EGLAPIENTRY eglQueryString(EGLDisplay dpy, EGLint name) { - _EGLDisplay *disp = _eglLockDisplay(dpy); + _EGLDisplay *disp; _EGLDriver *drv; const char *ret; + if (dpy == EGL_NO_DISPLAY && name == EGL_EXTENSIONS) { + RETURN_EGL_SUCCESS(NULL, _eglGlobal.ClientExtensionString); + } + + disp = _eglLockDisplay(dpy); _EGL_CHECK_DISPLAY(disp, NULL, drv); ret = drv->API.QueryString(drv, disp, name); diff --git a/src/egl/main/eglglobals.c b/src/egl/main/eglglobals.c index f53f078..5c2fddf 100644 --- a/src/egl/main/eglglobals.c +++ b/src/egl/main/eglglobals.c @@ -47,6 +47,14 @@ struct _egl_global _eglGlobal = _eglUnloadDrivers, /* always called last */ _eglFiniDisplay }, + + /* ClientExtensions */ + { + true /* EGL_EXT_client_extensions */ + }, + + /* ClientExtensionsString */ + "EGL_EXT_client_extensions" }; diff --git a/src/egl/main/eglglobals.h b/src/egl/main/eglglobals.h index b40e30e..63428f7 100644 --- a/src/egl/main/eglglobals.h +++ b/src/egl/main/eglglobals.h @@ -31,6 +31,7 @@ #ifndef EGLGLOBALS_INCLUDED #define EGLGLOBALS_INCLUDED +#include #include "egltypedefs.h" #include "eglmutex.h" @@ -48,6 +49,12 @@ struct _egl_global EGLint NumAtExitCalls; void (*AtExitCalls[10])(void); + + struct _egl_client_extensions { + bool EXT_client_extensions; + } ClientExtensions; + + const char *ClientExtensionString; }; -- 1.8.3.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 70743] New: Compilation on VS2013
https://bugs.freedesktop.org/show_bug.cgi?id=70743 Priority: medium Bug ID: 70743 Assignee: mesa-dev@lists.freedesktop.org Summary: Compilation on VS2013 Severity: normal Classification: Unclassified OS: All Reporter: scott.freedesk...@h4ck3r.net Hardware: Other Status: NEW Version: unspecified Component: Mesa core Product: Mesa -- You are receiving this mail because: You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 70743] Compilation on VS2013
https://bugs.freedesktop.org/show_bug.cgi?id=70743 --- Comment #1 from Scott Graham --- Doesn't compile due to changes in VS2013's standard library. -- You are receiving this mail because: You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 70743] Compilation on VS2013
https://bugs.freedesktop.org/show_bug.cgi?id=70743 --- Comment #2 from Scott Graham --- Created attachment 87961 --> https://bugs.freedesktop.org/attachment.cgi?id=87961&action=edit compile fix for vs2013 -- You are receiving this mail because: You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 70743] Compilation on VS2013
https://bugs.freedesktop.org/show_bug.cgi?id=70743 --- Comment #3 from Scott Graham --- Comment on attachment 87961 --> https://bugs.freedesktop.org/attachment.cgi?id=87961 compile fix for vs2013 Index: include/c99/stdbool.h === --- include/c99/stdbool.h(revision 229946) +++ include/c99/stdbool.h(working copy) @@ -35,7 +35,8 @@ #define bool_Bool /* For compilers that don't have the builtin _Bool type. */ -#if defined(_MSC_VER) || (__STDC_VERSION__ < 199901L && __GNUC__ < 3) +#if (defined(_MSC_VER) && _MSC_VER < 1800) || \ +(defined __GNUC__&& __STDC_VERSION__ < 199901L && __GNUC__ < 3) typedef unsigned char _Bool; #endif Index: src/mesa/main/querymatrix.c === --- src/mesa/main/querymatrix.c(revision 229946) +++ src/mesa/main/querymatrix.c(working copy) @@ -37,6 +37,7 @@ #define FLOAT_TO_FIXED(x) ((GLfixed) ((x) * 65536.0)) #if defined(_MSC_VER) +#if _MSC_VER < 1800 // Not required on VS2013 and above. /* Oddly, the fpclassify() function doesn't exist in such a form * on MSVC. This is an implementation using slightly different * lower-level Windows functions. @@ -69,6 +70,7 @@ return FP_NAN; } } +#endif // _MSC_VER < 1800 #elif defined(__APPLE__) || defined(__CYGWIN__) || defined(__FreeBSD__) || \ defined(__OpenBSD__) || defined(__NetBSD__) || defined(__DragonFly__) || \ -- You are receiving this mail because: You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 70743] Compilation on VS2013
https://bugs.freedesktop.org/show_bug.cgi?id=70743 Scott Graham changed: What|Removed |Added Attachment #87961|0 |1 is obsolete|| --- Comment #4 from Scott Graham --- Created attachment 87963 --> https://bugs.freedesktop.org/attachment.cgi?id=87963&action=edit fix compilation on 2013 -- You are receiving this mail because: You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] R600/SI: fix MIMG writemask adjustement
From: Marek Olšák This fixes piglit: - shaders/glsl-fs-texture2d-masked - shaders/glsl-fs-texture2d-masked-4 Signed-off-by: Marek Olšák Reviewed-by: Tom Stellard --- lib/Target/R600/SIISelLowering.cpp | 27 +++-- test/CodeGen/R600/llvm.SI.sample-masked.ll | 93 ++ 2 files changed, 114 insertions(+), 6 deletions(-) create mode 100644 test/CodeGen/R600/llvm.SI.sample-masked.ll diff --git a/lib/Target/R600/SIISelLowering.cpp b/lib/Target/R600/SIISelLowering.cpp index 2c9270e..bfc9e8d 100644 --- a/lib/Target/R600/SIISelLowering.cpp +++ b/lib/Target/R600/SIISelLowering.cpp @@ -1065,7 +1065,9 @@ static unsigned SubIdx2Lane(unsigned Idx) { void SITargetLowering::adjustWritemask(MachineSDNode *&Node, SelectionDAG &DAG) const { SDNode *Users[4] = { }; - unsigned Writemask = 0, Lane = 0; + unsigned Lane = 0; + unsigned OldDmask = Node->getConstantOperandVal(0); + unsigned NewDmask = 0; // Try to figure out the used register components for (SDNode::use_iterator I = Node->use_begin(), E = Node->use_end(); @@ -1076,29 +1078,42 @@ void SITargetLowering::adjustWritemask(MachineSDNode *&Node, I->getMachineOpcode() != TargetOpcode::EXTRACT_SUBREG) return; +// Lane means which subreg of %VGPRa_VGPRb_VGPRc_VGPRd is used. +// Note that subregs are packed, i.e. Lane==0 is the first bit set +// in OldDmask, so it can be any of X,Y,Z,W; Lane==1 is the second bit +// set, etc. Lane = SubIdx2Lane(I->getConstantOperandVal(1)); +// Set which texture component corresponds to the lane. +unsigned Comp; +for (unsigned i = 0, Dmask = OldDmask; i <= Lane; i++) { + assert(Dmask); + Comp = ffs(Dmask)-1; + Dmask &= ~(1 << Comp); +} + // Abort if we have more than one user per component if (Users[Lane]) return; Users[Lane] = *I; -Writemask |= 1 << Lane; +NewDmask |= 1 << Comp; } - // Abort if all components are used - if (Writemask == 0xf) + // Abort if there's no change + if (NewDmask == OldDmask) return; // Adjust the writemask in the node std::vector Ops; - Ops.push_back(DAG.getTargetConstant(Writemask, MVT::i32)); + Ops.push_back(DAG.getTargetConstant(NewDmask, MVT::i32)); for (unsigned i = 1, e = Node->getNumOperands(); i != e; ++i) Ops.push_back(Node->getOperand(i)); Node = (MachineSDNode*)DAG.UpdateNodeOperands(Node, Ops.data(), Ops.size()); // If we only got one lane, replace it with a copy - if (Writemask == (1U << Lane)) { + // (if NewDmask has only one bit set...) + if (NewDmask && (NewDmask & (NewDmask-1)) == 0) { SDValue RC = DAG.getTargetConstant(AMDGPU::VReg_32RegClassID, MVT::i32); SDNode *Copy = DAG.getMachineNode(TargetOpcode::COPY_TO_REGCLASS, SDLoc(), Users[Lane]->getValueType(0), diff --git a/test/CodeGen/R600/llvm.SI.sample-masked.ll b/test/CodeGen/R600/llvm.SI.sample-masked.ll new file mode 100644 index 000..454e48b --- /dev/null +++ b/test/CodeGen/R600/llvm.SI.sample-masked.ll @@ -0,0 +1,93 @@ +;RUN: llc < %s -march=r600 -mcpu=verde | FileCheck %s + +; CHECK-LABEL: @v1 +; CHECK: IMAGE_SAMPLE VGPR{{[[0-9]}}_VGPR{{[0-9]}}_VGPR{{[0-9]}}, 13 +define void @v1(i32 %a1) { +entry: + %0 = insertelement <1 x i32> undef, i32 %a1, i32 0 + %1 = call <4 x float> @llvm.SI.sample.v1i32(<1 x i32> %0, <32 x i8> undef, <16 x i8> undef, i32 0) + %2 = extractelement <4 x float> %1, i32 0 + %3 = extractelement <4 x float> %1, i32 2 + %4 = extractelement <4 x float> %1, i32 3 + call void @llvm.SI.export(i32 15, i32 0, i32 1, i32 12, i32 0, float %2, float %3, float %4, float %4) + ret void +} + +; CHECK-LABEL: @v2 +; CHECK: IMAGE_SAMPLE VGPR{{[[0-9]}}_VGPR{{[0-9]}}_VGPR{{[0-9]}}, 11 +define void @v2(i32 %a1) { +entry: + %0 = insertelement <1 x i32> undef, i32 %a1, i32 0 + %1 = call <4 x float> @llvm.SI.sample.v1i32(<1 x i32> %0, <32 x i8> undef, <16 x i8> undef, i32 0) + %2 = extractelement <4 x float> %1, i32 0 + %3 = extractelement <4 x float> %1, i32 1 + %4 = extractelement <4 x float> %1, i32 3 + call void @llvm.SI.export(i32 15, i32 0, i32 1, i32 12, i32 0, float %2, float %3, float %4, float %4) + ret void +} + +; CHECK-LABEL: @v3 +; CHECK: IMAGE_SAMPLE VGPR{{[[0-9]}}_VGPR{{[0-9]}}_VGPR{{[0-9]}}, 14 +define void @v3(i32 %a1) { +entry: + %0 = insertelement <1 x i32> undef, i32 %a1, i32 0 + %1 = call <4 x float> @llvm.SI.sample.v1i32(<1 x i32> %0, <32 x i8> undef, <16 x i8> undef, i32 0) + %2 = extractelement <4 x float> %1, i32 1 + %3 = extractelement <4 x float> %1, i32 2 + %4 = extractelement <4 x float> %1, i32 3 + call void @llvm.SI.export(i32 15, i32 0, i32 1, i32 12, i32 0, float %2, float %3, float %4, float %4) + ret void +} + +; CHECK-LABEL: @v4 +; CHECK: IMAGE_SAMPLE VGPR{{[[0-9]}}_VGPR{{[0-9]}}_VGPR{{[0-9]}}, 7 +define void @v4(i32 %a1) { +entry: + %0 = insertelement <1 x i32> undef, i32 %a1, i32
Re: [Mesa-dev] [PATCH] i965/fs: In the pre-regalloc schedule, try harder at reducing reg pressure.
On Tue, Oct 22, 2013 at 3:05 AM, Eric Anholt wrote: > Chia-I Wu writes: > >> On Thu, Oct 17, 2013 at 3:24 AM, Matt Turner wrote: >>> On Mon, Oct 14, 2013 at 4:14 PM, Eric Anholt wrote: Previously, the best thing we had was to schedule the things unblocked by the current instruction, on the hope that it would be consuming two values at the end of their live intervals while only producing one new value. Sometimes that wasn't the case. Now, when an instruction is the first user of a GRF we schedule (i.e. it will probably be the virtual_grf_def[] instruction after computing live intervals again), penalize it by how many regs it would take up. When an instruction is the last user of a GRF we have to schedule (when it will probably be the virtual_grf_end[] instruction), give it a boost by how many regs it would free. The new functions are made virtual (only 1 of 2 really needs to be virtual) because I expect we'll soon lift the pre-regalloc scheduling heuristic over to the vec4 backend. shader-db: total instructions in shared programs: 1512756 -> 1511604 (-0.08%) instructions in affected programs: 10292 -> 9140 (-11.19%) GAINED:121 LOST: 38 Improves tropics performance at my current settings by 4.50602% +/- 2.60694% (n=5). No difference on Lightsmark (n=5). No difference on GLB2.7 (n=11). Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=70445 --- >>> >>> I think we're on the right track by considering register pressure when >>> scheduling, but one aspect we're not considering is simply how many >>> registers we think we're using. >>> >>> If I understand correctly, the pre-register allocation wants to >>> shorten live intervals as much as possible which reduces register >>> pressure but at the cost of larger stalls and less instruction level >>> parallelism. We end up scheduling things like >>> >>> produce result 4 >>> produce result 3 >>> produce result 2 >>> produce result 1 >>> use result 1 >>> use result 2 >>> use result 3 >>> use result 4 >>> >>> (this is why the MRF writes for the FB write are always done in the >>> reverse order) >> In this example, it will actually be >> >> produce result 4 >> use result 4 >> produce result 3 >> use result 3 >> produce result 2 >> use result 2 >> produce result 1 >> use result 1 >> >> and post-regalloc will schedule again to something like >> >> produce result 4 >> produce result 3 >> produce result 2 >> produce result 1 >> use result 4 >> use result 3 >> use result 2 >> use result 1 >> >> The pre-regalloc scheduling attempts to consume the results as soon as >> they are available. >> >> FB write is done in reverse order because, when a result is available, >> its consumers are scheduled in reverse order. The epilog of fragment >> shaders is usually like this: >> >> placeholder_halt >> mov m1, g1 >> mov m2, g2 >> mov m3, g3 >> mov m4, g4 >> send >> >> MOVs depend on placeholder_halt, and send depends on MOVs. The >> scheduler will schedule it as follows: >> >> placeholder_halt >> mov m4, g4 >> mov m3, g3 >> mov m2, g2 >> mov m1, g1 >> send >> >> The order can be corrected with the change proposed here >> >> http://lists.freedesktop.org/archives/mesa-dev/2013-October/046570.html >> >> But there is no point for making the change the current heuristic for >> pre-regalloc is to be reworked. > > Flipping the order in which we prefer ties (on betterthanlifo-2): > > commit 11a511576e465f02875f39c452561775a97416a1 > Author: Eric Anholt > Date: Mon Oct 21 11:45:53 2013 -0700 > > otherway > > diff --git a/src/mesa/drivers/dri/i965/brw_schedule_instructions.cpp > b/src/mesa/ > index 9a480b4..b123015 100644 > --- a/src/mesa/drivers/dri/i965/brw_schedule_instructions.cpp > +++ b/src/mesa/drivers/dri/i965/brw_schedule_instructions.cpp > @@ -1049,9 +1049,9 @@ > fs_instruction_scheduler::choose_instruction_to_schedule() > * it's the first use of a GRF, reduce its score since it means it > * should be increasing register pressure. > */ > - for (schedule_node *node = (schedule_node *)instructions.get_tail(); > - node != instructions.get_head()->prev; > - node = (schedule_node *)node->prev) { > + for (schedule_node *node = (schedule_node *)instructions.get_head(); > + node != instructions.get_head()->next; > + node = (schedule_node *)node->next) { > schedule_node *n = (schedule_node *)node; > fs_inst *inst = (fs_inst *)n->inst; > > gives: > > total instructions in shared programs: 1544638 -> 1546794 (0.14%) > instructions in affected programs: 7163 -> 9319 (30.10%) > GAINED:16 > LOST: 289 > > with massive spilling on tropics, and a bit on lightsmark and csgo. Children of a
[Mesa-dev] [Bug 70743] Compilation on VS2013
https://bugs.freedesktop.org/show_bug.cgi?id=70743 Stephane Marchesin changed: What|Removed |Added CC||marche...@icps.u-strasbg.fr -- You are receiving this mail because: You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] i965: Improving FS register spilling performance.
In the process of trying to work around the spilling in the huge unigine soft shadowing shaders, I got to wondering if we couldn't just reduce the cost of spilling to the point of "I don't care". Notably, there is this nice message for doing unspills on gen7 where you don't need to set up the message beyond passing in g0. It turns out to be a slight win. Unfortunately, the complementary spill message was a loss. You can find the code in this submission on gen7-scratch-read of my tree, and the code I'm not trying to push is on gen7-scratch-write. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/5] i965/fs: Fix broken register spilling debug code.
Now that reg spilling generates new vgrfs, we were looping forever if you ever turned it on. Instead, move the debug code into the register allocator right near where we'd be doing spilling anyway, which should more accurately reflect how register spilling occurs in the wild. --- src/mesa/drivers/dri/i965/brw_fs.cpp | 7 --- src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp | 11 +++ 2 files changed, 11 insertions(+), 7 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index 65a4b66..5a8a45e 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -3091,13 +3091,6 @@ fs_visitor::run() assign_curb_setup(); assign_urb_setup(); - if (0) { -/* Debug of register spilling: Go spill everything. */ -for (int i = 0; i < virtual_grf_count; i++) { - spill_reg(i); -} - } - if (0) assign_regs_trivial(); else { diff --git a/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp b/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp index 157c9ae..7826cd4 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp @@ -461,6 +461,17 @@ fs_visitor::assign_regs() if (brw->gen >= 7) setup_mrf_hack_interference(g, first_mrf_hack_node); + /* Debug of register spilling: Go spill everything. */ + if (0) { + int reg = choose_spill_reg(g); + + if (reg != -1) { + spill_reg(reg); + ralloc_free(g); + return false; + } + } + if (!ra_allocate_no_spills(g)) { /* Failed to allocate registers. Spill a reg, and the caller will * loop back into here to try again. -- 1.8.4.rc3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 3/5] i965/fs: Fix register unspills from a reg_offset.
We were clearing the reg_offset before trying to use it. Oops. Fixes glsl-fs-texture2drect with the reg spilling debug enabled. --- src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp b/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp index ed0ce0d..a7ca319 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp @@ -642,13 +642,13 @@ fs_visitor::spill_reg(int spill_reg) if (inst->src[i].file == GRF && inst->src[i].reg == spill_reg) { int regs_read = inst->regs_read(this, i); +int subset_spill_offset = (spill_offset + + reg_size * inst->src[i].reg_offset); inst->src[i].reg = virtual_grf_alloc(regs_read); inst->src[i].reg_offset = 0; -emit_unspill(inst, inst->src[i], - spill_offset + reg_size * inst->src[i].reg_offset, - regs_read); +emit_unspill(inst, inst->src[i], subset_spill_offset, regs_read); } } -- 1.8.4.rc3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/5] i965/fs: Fix register spilling for 16-wide.
Things blew up when I enabled the debug register spill code without disabling 16-wide, so I decided to just fix 16-wide spilling. We still don't generate 16-wide when register spilling happens as part of allocation (since we expect it to be slower), but now we can experiment with allowing it in some cases in the future. --- src/mesa/drivers/dri/i965/brw_fs_generator.cpp| 8 src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp | 15 --- 2 files changed, 12 insertions(+), 11 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp index fa15f7b..6c8fb76 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp @@ -718,8 +718,8 @@ fs_generator::generate_spill(fs_inst *inst, struct brw_reg src) brw_MOV(p, retype(brw_message_reg(inst->base_mrf + 1), BRW_REGISTER_TYPE_UD), retype(src, BRW_REGISTER_TYPE_UD)); - brw_oword_block_write_scratch(p, brw_message_reg(inst->base_mrf), 1, -inst->offset); + brw_oword_block_write_scratch(p, brw_message_reg(inst->base_mrf), + inst->mlen, inst->offset); } void @@ -727,8 +727,8 @@ fs_generator::generate_unspill(fs_inst *inst, struct brw_reg dst) { assert(inst->mlen != 0); - brw_oword_block_read_scratch(p, dst, brw_message_reg(inst->base_mrf), 1, - inst->offset); + brw_oword_block_read_scratch(p, dst, brw_message_reg(inst->base_mrf), +dispatch_width / 8, inst->offset); } void diff --git a/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp b/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp index 7826cd4..ed0ce0d 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp @@ -540,7 +540,7 @@ fs_visitor::emit_unspill(fs_inst *inst, fs_reg dst, uint32_t spill_offset, inst->insert_before(unspill_inst); dst.reg_offset++; - spill_offset += REG_SIZE; + spill_offset += dispatch_width * sizeof(float); } } @@ -624,10 +624,11 @@ fs_visitor::choose_spill_reg(struct ra_graph *g) void fs_visitor::spill_reg(int spill_reg) { + int reg_size = dispatch_width * sizeof(float); int size = virtual_grf_sizes[spill_reg]; unsigned int spill_offset = c->last_scratch; assert(ALIGN(spill_offset, 16) == spill_offset); /* oword read/write req. */ - c->last_scratch += size * REG_SIZE; + c->last_scratch += size * reg_size; /* Generate spill/unspill instructions for the objects being * spilled. Right now, we spill or unspill the whole thing to a @@ -646,7 +647,7 @@ fs_visitor::spill_reg(int spill_reg) inst->src[i].reg_offset = 0; emit_unspill(inst, inst->src[i], - spill_offset + REG_SIZE * inst->src[i].reg_offset, + spill_offset + reg_size * inst->src[i].reg_offset, regs_read); } } @@ -654,7 +655,7 @@ fs_visitor::spill_reg(int spill_reg) if (inst->dst.file == GRF && inst->dst.reg == spill_reg) { int subset_spill_offset = (spill_offset + -REG_SIZE * inst->dst.reg_offset); +reg_size * inst->dst.reg_offset); inst->dst.reg = virtual_grf_alloc(inst->regs_written); inst->dst.reg_offset = 0; @@ -677,11 +678,11 @@ fs_visitor::spill_reg(int spill_reg) fs_inst *spill_inst = new(mem_ctx) fs_inst(FS_OPCODE_SPILL, reg_null_f, spill_src); spill_src.reg_offset++; - spill_inst->offset = subset_spill_offset + chan * REG_SIZE; + spill_inst->offset = subset_spill_offset + chan * reg_size; spill_inst->ir = inst->ir; spill_inst->annotation = inst->annotation; - spill_inst->base_mrf = 14; - spill_inst->mlen = 2; /* header, value */ + spill_inst->mlen = 1 + dispatch_width / 8; /* header, value */ + spill_inst->base_mrf = 16 - spill_inst->mlen; inst->insert_after(spill_inst); } } -- 1.8.4.rc3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 4/5] i965: Merge together opcodes for SHADER_OPCODE_GEN4_SCRATCH_READ/WRITE
I'm going to be introducing gen7 variants, and the previous naming was going to get confusing. --- src/mesa/drivers/dri/i965/brw_defines.h | 7 +++ src/mesa/drivers/dri/i965/brw_fs.cpp| 4 ++-- src/mesa/drivers/dri/i965/brw_fs.h | 4 ++-- src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 12 ++-- src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp | 12 +++- src/mesa/drivers/dri/i965/brw_shader.cpp| 14 +- src/mesa/drivers/dri/i965/brw_vec4.cpp | 4 ++-- src/mesa/drivers/dri/i965/brw_vec4_generator.cpp| 4 ++-- src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp | 4 ++-- src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp | 4 ++-- 10 files changed, 33 insertions(+), 36 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_defines.h b/src/mesa/drivers/dri/i965/brw_defines.h index 5ba9d45..72a0891 100644 --- a/src/mesa/drivers/dri/i965/brw_defines.h +++ b/src/mesa/drivers/dri/i965/brw_defines.h @@ -774,14 +774,15 @@ enum opcode { SHADER_OPCODE_SHADER_TIME_ADD, + SHADER_OPCODE_GEN4_SCRATCH_READ, + SHADER_OPCODE_GEN4_SCRATCH_WRITE, + FS_OPCODE_DDX, FS_OPCODE_DDY, FS_OPCODE_PIXEL_X, FS_OPCODE_PIXEL_Y, FS_OPCODE_CINTERP, FS_OPCODE_LINTERP, - FS_OPCODE_SPILL, - FS_OPCODE_UNSPILL, FS_OPCODE_UNIFORM_PULL_CONSTANT_LOAD, FS_OPCODE_UNIFORM_PULL_CONSTANT_LOAD_GEN7, FS_OPCODE_VARYING_PULL_CONSTANT_LOAD, @@ -795,8 +796,6 @@ enum opcode { FS_OPCODE_PLACEHOLDER_HALT, VS_OPCODE_URB_WRITE, - VS_OPCODE_SCRATCH_READ, - VS_OPCODE_SCRATCH_WRITE, VS_OPCODE_PULL_CONSTANT_LOAD, VS_OPCODE_PULL_CONSTANT_LOAD_GEN7, VS_OPCODE_UNPACK_FLAGS_SIMD4X2, diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index 5a8a45e..c9ea731 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -763,11 +763,11 @@ fs_visitor::implied_mrf_writes(fs_inst *inst) case FS_OPCODE_FB_WRITE: return 2; case FS_OPCODE_UNIFORM_PULL_CONSTANT_LOAD: - case FS_OPCODE_UNSPILL: + case SHADER_OPCODE_GEN4_SCRATCH_READ: return 1; case FS_OPCODE_VARYING_PULL_CONSTANT_LOAD: return inst->mlen; - case FS_OPCODE_SPILL: + case SHADER_OPCODE_GEN4_SCRATCH_WRITE: return 2; default: assert(!"not reached"); diff --git a/src/mesa/drivers/dri/i965/brw_fs.h b/src/mesa/drivers/dri/i965/brw_fs.h index b5aed23..f9c87c7 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.h +++ b/src/mesa/drivers/dri/i965/brw_fs.h @@ -519,8 +519,8 @@ private: void generate_ddx(fs_inst *inst, struct brw_reg dst, struct brw_reg src); void generate_ddy(fs_inst *inst, struct brw_reg dst, struct brw_reg src, bool negate_value); - void generate_spill(fs_inst *inst, struct brw_reg src); - void generate_unspill(fs_inst *inst, struct brw_reg dst); + void generate_scratch_write(fs_inst *inst, struct brw_reg src); + void generate_scratch_read(fs_inst *inst, struct brw_reg dst); void generate_uniform_pull_constant_load(fs_inst *inst, struct brw_reg dst, struct brw_reg index, struct brw_reg offset); diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp index 6c8fb76..6aebc41 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp @@ -711,7 +711,7 @@ fs_generator::generate_discard_jump(fs_inst *inst) } void -fs_generator::generate_spill(fs_inst *inst, struct brw_reg src) +fs_generator::generate_scratch_write(fs_inst *inst, struct brw_reg src) { assert(inst->mlen != 0); @@ -723,7 +723,7 @@ fs_generator::generate_spill(fs_inst *inst, struct brw_reg src) } void -fs_generator::generate_unspill(fs_inst *inst, struct brw_reg dst) +fs_generator::generate_scratch_read(fs_inst *inst, struct brw_reg dst) { assert(inst->mlen != 0); @@ -1509,12 +1509,12 @@ fs_generator::generate_code(exec_list *instructions) generate_ddy(inst, dst, src[0], c->key.render_to_fbo); break; - case FS_OPCODE_SPILL: -generate_spill(inst, src[0]); + case SHADER_OPCODE_GEN4_SCRATCH_WRITE: +generate_scratch_write(inst, src[0]); break; - case FS_OPCODE_UNSPILL: -generate_unspill(inst, dst); + case SHADER_OPCODE_GEN4_SCRATCH_READ: +generate_scratch_read(inst, dst); break; case FS_OPCODE_UNIFORM_PULL_CONSTANT_LOAD: diff --git a/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp b/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp index a7ca319..75090a6 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp @@ -527,7 +527,8 @@ fs_visitor::emit_unspill(fs_inst *inst, fs_reg dst, uint32_t spill_offset,
[Mesa-dev] [PATCH 5/5] i965/fs: Use the gen7 scratch read opcode when possible.
This avoids a lot of message setup we had to do otherwise. Improves GLB2.7 performance with register spilling force enabled by 1.6442% +/- 0.553218% (n=4). --- src/mesa/drivers/dri/i965/brw_defines.h| 7 src/mesa/drivers/dri/i965/brw_eu.h | 5 +++ src/mesa/drivers/dri/i965/brw_eu_emit.c| 41 ++ src/mesa/drivers/dri/i965/brw_fs.h | 1 + src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 10 ++ src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp | 21 +++ .../drivers/dri/i965/brw_schedule_instructions.cpp | 12 +++ src/mesa/drivers/dri/i965/brw_shader.cpp | 2 ++ 8 files changed, 93 insertions(+), 6 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_defines.h b/src/mesa/drivers/dri/i965/brw_defines.h index 72a0891..276ab44 100644 --- a/src/mesa/drivers/dri/i965/brw_defines.h +++ b/src/mesa/drivers/dri/i965/brw_defines.h @@ -776,6 +776,7 @@ enum opcode { SHADER_OPCODE_GEN4_SCRATCH_READ, SHADER_OPCODE_GEN4_SCRATCH_WRITE, + SHADER_OPCODE_GEN7_SCRATCH_READ, FS_OPCODE_DDX, FS_OPCODE_DDY, @@ -1135,6 +1136,12 @@ enum brw_message_target { #define GEN7_DATAPORT_DC_BYTE_SCATTERED_WRITE 12 #define GEN7_DATAPORT_DC_UNTYPED_SURFACE_WRITE 13 +#define GEN7_DATAPORT_SCRATCH_READ((1 << 18) | \ + (0 << 17)) +#define GEN7_DATAPORT_SCRATCH_WRITE ((1 << 18) | \ + (1 << 17)) +#define GEN7_DATAPORT_SCRATCH_NUM_REGS_SHIFT12 + /* HSW */ #define HSW_DATAPORT_DC_PORT0_OWORD_BLOCK_READ 0 #define HSW_DATAPORT_DC_PORT0_UNALIGNED_OWORD_BLOCK_READ1 diff --git a/src/mesa/drivers/dri/i965/brw_eu.h b/src/mesa/drivers/dri/i965/brw_eu.h index 072310d..a307948 100644 --- a/src/mesa/drivers/dri/i965/brw_eu.h +++ b/src/mesa/drivers/dri/i965/brw_eu.h @@ -379,6 +379,11 @@ void brw_oword_block_write_scratch(struct brw_compile *p, int num_regs, GLuint offset); +void gen7_block_read_scratch(struct brw_compile *p, + struct brw_reg dest, + int num_regs, + GLuint offset); + void brw_shader_time_add(struct brw_compile *p, struct brw_reg payload, uint32_t surf_index); diff --git a/src/mesa/drivers/dri/i965/brw_eu_emit.c b/src/mesa/drivers/dri/i965/brw_eu_emit.c index 8efd679..accf324 100644 --- a/src/mesa/drivers/dri/i965/brw_eu_emit.c +++ b/src/mesa/drivers/dri/i965/brw_eu_emit.c @@ -2055,6 +2055,47 @@ brw_oword_block_read_scratch(struct brw_compile *p, } } +void +gen7_block_read_scratch(struct brw_compile *p, +struct brw_reg dest, +int num_regs, +GLuint offset) +{ + dest = retype(dest, BRW_REGISTER_TYPE_UW); + + struct brw_instruction *insn = next_insn(p, BRW_OPCODE_SEND); + + assert(insn->header.predicate_control == 0); + insn->header.compression_control = BRW_COMPRESSION_NONE; + + brw_set_dest(p, insn, dest); + + /* The HW requires that the header is present; this is to get the g0.5 +* scratch offset. +*/ + bool header_present = true; + brw_set_src0(p, insn, brw_vec8_grf(0, 0)); + + brw_set_message_descriptor(p, insn, + GEN7_SFID_DATAPORT_DATA_CACHE, + 1, /* mlen: just g0 */ + num_regs, + header_present, + false); + + insn->bits3.ud |= GEN7_DATAPORT_SCRATCH_READ; + + assert(num_regs == 1 || num_regs == 2 || num_regs == 4); + insn->bits3.ud |= (num_regs - 1) << GEN7_DATAPORT_SCRATCH_NUM_REGS_SHIFT; + + /* The "HWORD" unit in the docs just happens to mean "the size of a +* register" +*/ + offset /= REG_SIZE; + assert(offset < (1 << 12)); + insn->bits3.ud |= offset; +} + /** * Read a float[4] vector from the data port Data Cache (const buffer). * Location (in buffer) should be a multiple of 16. diff --git a/src/mesa/drivers/dri/i965/brw_fs.h b/src/mesa/drivers/dri/i965/brw_fs.h index f9c87c7..432f3df 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.h +++ b/src/mesa/drivers/dri/i965/brw_fs.h @@ -521,6 +521,7 @@ private: bool negate_value); void generate_scratch_write(fs_inst *inst, struct brw_reg src); void generate_scratch_read(fs_inst *inst, struct brw_reg dst); + void generate_scratch_read_gen7(fs_inst *inst, struct brw_reg dst); void generate_uniform_pull_constant_load(fs_inst *inst, struct brw_reg dst, struct brw_reg index, struct b
Re: [Mesa-dev] [PATCH 6/9] mesa: Add a new GetTransformFeedbackVertexCount() driver hook.
Kenneth Graunke writes: > DrawTransformFeedback() needs to obtain the number of vertices written > to a particular stream during the last Begin/EndTransformFeedback block. > The new driver hook returns exactly that information. > > Gallium drivers already implement this functionality by passing the > transform feedback object to the drawing function. I prefer to avoid > this for two reasons: > > 1. Complexity: > > Normally, the drawing function takes an array of _mesa_prim objects, > each of which specifies a vertex count. If tfb_vertcount != NULL, > however, there will only be one _mesa_prim object with an invalid > vertex count (of 1), so it needs to be ignored. > > Since the _mesa_prim pointers are const, you can't even override it to > the proper value; you need to pass around extra "ignore that, here's > the real count" parameters. > > The drawing function is already terribly complicated, so I don't want to > make it even more complicated. > > 2. Primitive restart: > > vbo_draw_arrays() performs software primitive restart, splitting a draw > call in two when necessary. vbo_draw_transform_feedback() currently > doesn't because it has no idea how many vertices need to be drawn. The > new driver hook gives it that information, allowing us to reuse the > existing vbo_draw_arrays() code to do everything right. This interface means synchronizing with the GPU, which sucks when we have the ability to actually do DTFB in the hardware pipeline (Indirect Parameter Enable of 3DPRIMITIVE). We could mostly use the hw pipelined version only, as long as we had core contexts (meaning that we don't need vertex start/count to figure out how much user vertex array data to upload). But, given that we have sw primitive restart on some lame hardware that we want to support this on, we've got to have this path anyway. pgpXWlpz9ioTY.pgp Description: PGP signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] libGL without X
I would still need to build Mesa with X so that libGL is built though, correct? On Mon, Oct 21, 2013 at 2:03 PM, Erik Faye-Lund wrote: > On Mon, Oct 21, 2013 at 4:05 PM, Chris Healy wrote: > > I have a headless platform I need OpenGL to work on that does not have X. > > It is x86 with Intel HD 4000 graphics. > > > > Ultimately, I'm just wanting to use OpenGL to render to memory for > encoding > > to H.264 and streaming. > > > > I'm trying to build Mesa for this platform without X and cannot get it to > > build libGL.so. > > > > What am I missing here? Is it not possible to use OpenGL without X? I > was > > hoping I could use OpenGL with EGL for testing purposes. > > If you build mesa with GBM support, you should be able to render without X. > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 9/9] i965: Enable the ARB_transform_feedback2 extension on Gen7+.
Kenneth Graunke writes: > All the necessary pieces are now in place. > > Signed-off-by: Kenneth Graunke > --- > src/mesa/drivers/dri/i965/intel_extensions.c | 4 > 1 file changed, 4 insertions(+) > > diff --git a/src/mesa/drivers/dri/i965/intel_extensions.c > b/src/mesa/drivers/dri/i965/intel_extensions.c > index 334be05..c09ee39 100644 > --- a/src/mesa/drivers/dri/i965/intel_extensions.c > +++ b/src/mesa/drivers/dri/i965/intel_extensions.c > @@ -133,6 +133,10 @@ intelInitExtensions(struct gl_context *ctx) >ctx->Const.GLSLVersion = 120; > _mesa_override_glsl_version(ctx); > > + if (brw->gen >= 7) { > + ctx->Extensions.ARB_transform_feedback2 = true; > + } If HSW doesn't actually work because the kernel won't let us run the commands, I think we shouldn't turn it on on hsw. pgpZg2637g2Xu.pgp Description: PGP signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 8/9] i965: Implement glDrawTransformFeedback().
Kenneth Graunke writes: > Implementing the GetTransformFeedbackVertexCount() driver hook allows > the VBO module to call us with the right number of vertices. > > The hardware doesn't directly count the number of vertices written by > SOL, so we instead use the SO_NUM_PRIMS_WRITTEN(n) counters and multiply > by the number of vertices per primitive. > > Unfortunately, counting the number of primitives generated is tricky: > a program might pause a transform feedback operation, start a second one > with a different object, then switch back and resume. Both transform > feedback operations share the SO_NUM_PRIMS_WRITTEN counters. > > To work around this, we save the counter values at Begin, Pause, Resume, > and End. This "bookends" each section where transform feedback is > active for the current object. Adding up differences of pairs gives > us the number of primitives generated. (This is similar to what we > do for occlusion queries on platforms without hardware contexts.) > > Signed-off-by: Kenneth Graunke > --- > src/mesa/drivers/dri/i965/brw_context.c| 2 + > src/mesa/drivers/dri/i965/brw_context.h| 26 > src/mesa/drivers/dri/i965/gen6_sol.c | 1 + > src/mesa/drivers/dri/i965/gen7_sol_state.c | 190 > - > 4 files changed, 218 insertions(+), 1 deletion(-) > +/** > + * Tally the number of primitives generated so far. > + * > + * The buffer contains a series of pairs: > + * (, ) ; > + * (, ) ; > + * > + * For each stream, we subtract the pair of values (end - start) to get the > + * number of primitives generated during one section. We accumulate these > + * values, adding them up to get the total number of primitives generated. > + */ > +static void > +gen7_tally_prims_generated(struct brw_context *brw, > + struct brw_transform_feedback_object *obj) > +{ > + /* If the current batch is still contributing to the number of primitives > +* generated, flush it now so the results will be present when mapped. > +*/ > + if (drm_intel_bo_references(brw->batch.bo, obj->prim_count_bo)) > + intel_batchbuffer_flush(brw); > + > + if (unlikely(brw->perf_debug && drm_intel_bo_busy(obj->prim_count_bo))) > + perf_debug("Stalling for # of transform feedback primitives > written.\n"); > + > + drm_intel_bo_map(obj->prim_count_bo, false); > + uint64_t *prim_counts = obj->prim_count_bo->virtual; > + > + assert(obj->prim_count_buffer_index % 2 * BRW_MAX_XFB_STREAMS == 0); I think you want parens around "2 * BRW_MAX_XFB_STREAMS" here. I was really impressed with how legible I found this patch. Thanks! All but patches 4, 9 are: Reviewed-by: Eric Anholt and 9 gets r-b with the obvious change. pgpFAU3WwDDJ9.pgp Description: PGP signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/3] glsl: Use saved values instead of recomputing them.
Matt Turner writes: > --- > src/glsl/opt_algebraic.cpp | 12 > 1 file changed, 4 insertions(+), 8 deletions(-) > > diff --git a/src/glsl/opt_algebraic.cpp b/src/glsl/opt_algebraic.cpp > index 3e5802e..b915f3c 100644 > --- a/src/glsl/opt_algebraic.cpp > +++ b/src/glsl/opt_algebraic.cpp > @@ -257,11 +257,9 @@ ir_algebraic_visitor::handle_expression(ir_expression > *ir) > * folding. > */ >if (op_const[0] && !op_const[1]) > - reassociate_constant(ir, 0, op_const[0], > - ir->operands[1]->as_expression()); > + reassociate_constant(ir, 0, op_const[0], op_expr[1]); >if (op_const[1] && !op_const[0]) > - reassociate_constant(ir, 1, op_const[1], > - ir->operands[0]->as_expression()); > + reassociate_constant(ir, 1, op_const[1], op_expr[0]); >break; > > case ir_binop_sub: > @@ -315,11 +313,9 @@ ir_algebraic_visitor::handle_expression(ir_expression > *ir) > * constant folding. > */ >if (op_const[0] && !op_const[1]) > - reassociate_constant(ir, 0, op_const[0], > - ir->operands[1]->as_expression()); > + reassociate_constant(ir, 0, op_const[0], op_expr[1]); >if (op_const[1] && !op_const[0]) > - reassociate_constant(ir, 1, op_const[1], > - ir->operands[0]->as_expression()); > + reassociate_constant(ir, 1, op_const[1], op_expr[0]); > >break; Series is: Reviewed-by: Eric Anholt pgpD2z1aiCmrx.pgp Description: PGP signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 5/8] i965/gs: Fix up gl_PointSize input swizzling for DUAL_INSTANCED gs.
Paul Berry writes: > On 18 October 2013 17:04, Eric Anholt wrote: >> Putting these fixups for a couple of weird cases in just MOV and ADD >> feels wrong to me, but maybe when I understand better what's going on >> it'll seem more natural. >> > > Another possibility I'd be equally happy with would be to put the fixup at > the top of vec4_generator::generate_vec4_instruction(), before the switch > statement. It would look something like this: > >if (dst.width == BRW_WIDTH_4) { > /* This happens in attribute fixups for "dual instanced" geometry >* shaders, since they use attributes that are vec4's. Since the exec >* width is only 4, it's essential that the caller set >* force_writemask_all in order to make sure the instruction is > executed >* regardless of which channels are enabled. >*/ > assert(inst->force_writemask_all); > > /* Fix up any <8;8,1> or <0;4,1> source registers to <4;4,1> to > satisfy >* the following register region restrictions (from Graphics BSpec: >* 3D-Media-GPGPU Engine > EU Overview > Registers and Register > Regions >* > Register Region Restrictions) >* >* 1. ExecSize must be greater than or equal to Width. >* >* 2. If ExecSize = Width and HorzStride != 0, VertStride must be > set >*to Width * HorzStride." >*/ > for (int i = 0; i < 3; i++) { > if (src[i].file == BRW_GENERAL_REGISTER_FILE) > src[i] = stride(src[i], 4, 4, 1); > } >} > > Does that seem better to you? I actually think I like it slightly better > because by making the assertion more general, I caught another case where I > think I should be setting force_writemask_all to be on the safe side (the > "clear r0.2" instruction in the gs prolog). I like this better -- it makes more sense to me for the fixup to be non-opcode-specific. Any I think I get the problem now -- our registers would make a ton of sense as <4;4,1> in general (that's how I think of our GRFs in align16, at least!), except that we can't because then we'd guess an execsize of 4. I'm in favor of the kill-guess-execsize plan, even if we leave it in place for gen4/5 SF/CLIP threads (which bounce execsize all over the place iirc and would be a pain to convert) and only convert the new backends. Another option: How about instead of that assert in brw_eu_emit.c, we just smash the vstride to be width * hstride? We know the vstride doesn't matter, because you're only using execsize components, so let's just not bother our brw_eu.c callers with this little problem. pgpJ7q3F_3aVA.pgp Description: PGP signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] libGL without X
On Tue, Oct 22, 2013 at 2:58 AM, Chris Healy wrote: > On Mon, Oct 21, 2013 at 2:03 PM, Erik Faye-Lund wrote: >> On Mon, Oct 21, 2013 at 4:05 PM, Chris Healy wrote: >> > I have a headless platform I need OpenGL to work on that does not have >> > X. >> > It is x86 with Intel HD 4000 graphics. >> > >> > Ultimately, I'm just wanting to use OpenGL to render to memory for >> > encoding >> > to H.264 and streaming. >> > >> > I'm trying to build Mesa for this platform without X and cannot get it >> > to >> > build libGL.so. >> > >> > What am I missing here? Is it not possible to use OpenGL without X? I >> > was >> > hoping I could use OpenGL with EGL for testing purposes. >> >> If you build mesa with GBM support, you should be able to render without >> X. > > I would still need to build Mesa with X so that libGL is built though, > correct? > Probably, yeah. But you can run the resulting binary without an x-server running. Dunno if that's sufficient for you or not. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] gallivm: implement fully accurate corner filtering for seamless cube maps
From: Roland Scheidegger d3d10 requires that cube corners are filtered with accurate weights (that is, the weight of the non-existing corner texel should be evenly distributed to the other 3 texels). OpenGL does not require this (but recommends it). This requires us to use different filtering code, since we need per-texel weights which our 2d lerp doesn't (and can't) do. And of course the (now per element) weights need to be adjusted too for it to work. Invoke the new filtering code whenever there's an edge to keep things simpler, as it will work for edges too not just corners but of course it's only needed with corners. More ugly code for not much gain but at least a hacked up cubemap demo shows very nice corners now... Not sure yet if and how this should be configurable... --- src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c | 138 +++-- 1 file changed, 130 insertions(+), 8 deletions(-) diff --git a/src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c b/src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c index 8e2d0d9..5d3511d 100644 --- a/src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c +++ b/src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c @@ -840,7 +840,11 @@ lp_build_sample_image_linear(struct lp_build_sample_context *bld, const LLVMValueRef *offsets, LLVMValueRef colors_out[4]) { + LLVMBuilderRef builder = bld->gallivm->builder; + struct lp_build_context *ivec_bld = &bld->int_coord_bld; + struct lp_build_context *coord_bld = &bld->coord_bld; const unsigned dims = bld->dims; + struct lp_build_if_state edge_if; LLVMValueRef width_vec; LLVMValueRef height_vec; LLVMValueRef depth_vec; @@ -848,6 +852,7 @@ lp_build_sample_image_linear(struct lp_build_sample_context *bld, LLVMValueRef flt_width_vec; LLVMValueRef flt_height_vec; LLVMValueRef flt_depth_vec; + LLVMValueRef fall_off[4], have_edge; LLVMValueRef z1 = NULL; LLVMValueRef z00 = NULL, z01 = NULL, z10 = NULL, z11 = NULL; LLVMValueRef x00 = NULL, x01 = NULL, x10 = NULL, x11 = NULL; @@ -856,6 +861,7 @@ lp_build_sample_image_linear(struct lp_build_sample_context *bld, LLVMValueRef xs[4], ys[4], zs[4]; LLVMValueRef neighbors[2][2][4]; int chan, texel_index; + boolean silly_but_accurate_cube_corner_filtering = TRUE; lp_build_extract_image_sizes(bld, &bld->int_size_bld, @@ -918,12 +924,7 @@ lp_build_sample_image_linear(struct lp_build_sample_context *bld, } } else { - LLVMBuilderRef builder = bld->gallivm->builder; - struct lp_build_context *ivec_bld = &bld->int_coord_bld; - struct lp_build_context *coord_bld = &bld->coord_bld; - struct lp_build_if_state edge_if; - LLVMValueRef new_faces[4], new_xcoords[4][2], new_ycoords[4][2]; - LLVMValueRef fall_off[4], coord, have_edge; + LLVMValueRef new_faces[4], new_xcoords[4][2], new_ycoords[4][2], coord; LLVMValueRef fall_off_ym_notxm, fall_off_ym_notxp; LLVMValueRef fall_off_yp_notxm, fall_off_yp_notxp; LLVMValueRef x0, x1, y0, y1, y0_clamped, y1_clamped; @@ -1074,7 +1075,7 @@ lp_build_sample_image_linear(struct lp_build_sample_context *bld, if (linear_mask) { /* - * Whack filter weights into place. Whatever pixel had more weight is + * Whack filter weights into place. Whatever texel had more weight is * the one which should have been selected by nearest filtering hence * just use 100% weight for it. */ @@ -1135,7 +1136,7 @@ lp_build_sample_image_linear(struct lp_build_sample_context *bld, } else { /* 2D/3D texture */ - LLVMValueRef colors0[4]; + LLVMValueRef colors0[4], colorss[4]; /* get x0/x1 texels at y1 */ lp_build_sample_texel_soa(bld, @@ -1149,6 +1150,111 @@ lp_build_sample_image_linear(struct lp_build_sample_context *bld, row_stride_vec, img_stride_vec, data_ptr, mipoffsets, neighbors[1][1]); + /* + * To avoid having to duplicate linear_mask / fetch code use + * another branch (with same edge condition) here (note that + * since we're using another branch anyway we COULD restrict this + * rather easily to just corners). + */ + if (silly_but_accurate_cube_corner_filtering && + bld->static_texture_state->target == PIPE_TEXTURE_CUBE && + bld->static_sampler_state->seamless_cube_map) { + LLVMValueRef w00, w01, w10, w11, wx0, wy0; + LLVMValueRef c_weight, c00, c01, c10, c11; + LLVMValueRef one_third, tmp; + + colorss[0] = lp_build_alloca(bld->gallivm, coord_bld->vec_type, "cs"); + colorss[1] = lp_build_alloca(bld->gallivm, coord_bld->vec_type, "cs"); + colorss[2] = lp_build_alloca(bld->gallivm, coord_bld->vec_type, "cs"); + colorss[3] = lp_build_alloca(bld->gallivm, coord_bld->vec_type,
[Mesa-dev] [PATCH] clover: Refuse to create context with invalid properties
the specs say that clCreateContext reutrns error "if platform value specified in properties is not a valid platform" The orignal approach fials if invalid valu other than NULL pointer is provided. Fixes piglit cl-api-create-context. Signed-off-by: Jan Vesely --- src/gallium/state_trackers/clover/api/context.cpp | 11 --- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/src/gallium/state_trackers/clover/api/context.cpp b/src/gallium/state_trackers/clover/api/context.cpp index 7b020a6..67adf8f 100644 --- a/src/gallium/state_trackers/clover/api/context.cpp +++ b/src/gallium/state_trackers/clover/api/context.cpp @@ -34,14 +34,19 @@ clCreateContext(const cl_context_properties *d_props, cl_uint num_devs, void *user_data, cl_int *r_errcode) try { auto props = obj(d_props); auto devs = objs(d_devs, num_devs); + cl_platform_id platform; + cl_uint num_platforms; if (!pfn_notify && user_data) throw error(CL_INVALID_VALUE); + + int ret = clGetPlatformIDs(1, &platform, &num_platforms); + if (ret || !num_platforms) + throw error(CL_INVALID_PLATFORM); for (auto &prop : props) { - if (prop.first == CL_CONTEXT_PLATFORM) - obj(prop.second.as()); - else + if (prop.first != CL_CONTEXT_PLATFORM || + prop.second.as() != platform) throw error(CL_INVALID_PROPERTY); } -- 1.8.3.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/2] glsl: mark variables produced by lower_named_interface_blocks.
These variables will need to be treated specially by program_resource_visitor, so that they can be addressed through the API using their interface block name (and array index, for interface block arrays). --- src/glsl/ir.h | 12 src/glsl/lower_named_interface_blocks.cpp | 2 ++ 2 files changed, 14 insertions(+) diff --git a/src/glsl/ir.h b/src/glsl/ir.h index aac8cbb..91eb4c6 100644 --- a/src/glsl/ir.h +++ b/src/glsl/ir.h @@ -579,6 +579,18 @@ public: unsigned location_frac:2; /** +* Non-zero if this variable was created by lowering a named interface +* block which was not an array. +*/ + unsigned from_named_ifc_block_nonarray:1; + + /** +* Non-zero if this variable was created by lowering a named interface +* block which was an array. +*/ + unsigned from_named_ifc_block_array:1; + + /** * \brief Layout qualifier for gl_FragDepth. * * This is not equal to \c ir_depth_layout_none if and only if this diff --git a/src/glsl/lower_named_interface_blocks.cpp b/src/glsl/lower_named_interface_blocks.cpp index f415252..6329d5a 100644 --- a/src/glsl/lower_named_interface_blocks.cpp +++ b/src/glsl/lower_named_interface_blocks.cpp @@ -140,6 +140,7 @@ flatten_named_interface_blocks_declarations::run(exec_list *instructions) new(mem_ctx) ir_variable(iface_t->fields.structure[i].type, var_name, (ir_variable_mode) var->mode); + new_var->from_named_ifc_block_nonarray = 1; } else { const glsl_type *new_array_type = glsl_type::get_array_instance( @@ -149,6 +150,7 @@ flatten_named_interface_blocks_declarations::run(exec_list *instructions) new(mem_ctx) ir_variable(new_array_type, var_name, (ir_variable_mode) var->mode); + new_var->from_named_ifc_block_array = 1; } new_var->location = iface_t->fields.structure[i].location; -- 1.8.4.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/2] glsl: Account for interface block lowering in program_resource_visitor.
When program_resource_visitor visits variables that were created by lower_named_interface_blocks, it needs to do extra work to un-do the effects of lower_named_interface_blocks and construct the proper API names. Fixes piglit test spec/glsl-1.50/execution/interface-blocks-api-access-members. --- src/glsl/link_uniforms.cpp | 58 +- 1 file changed, 57 insertions(+), 1 deletion(-) diff --git a/src/glsl/link_uniforms.cpp b/src/glsl/link_uniforms.cpp index 4bd4034..ea71b30 100644 --- a/src/glsl/link_uniforms.cpp +++ b/src/glsl/link_uniforms.cpp @@ -75,7 +75,63 @@ program_resource_visitor::process(ir_variable *var) */ /* Only strdup the name if we actually will need to modify it. */ - if (t->is_record() || (t->is_array() && t->fields.array->is_record())) { + if (var->from_named_ifc_block_array) { + /* lower_named_interface_blocks created this variable by lowering an + * interface block array to an array variable. For example if the + * original source code was: + * + * out Blk { vec4 bar } foo[3]; + * + * Then the variable is now: + * + * out vec4 bar[3]; + * + * We need to visit each array element using the names constructed like + * so: + * + * Blk[0].bar + * Blk[1].bar + * Blk[2].bar + */ + assert(t->is_array()); + const glsl_type *ifc_type = var->get_interface_type(); + char *name = ralloc_strdup(NULL, ifc_type->name); + size_t name_length = strlen(name); + for (unsigned i = 0; i < t->length; i++) { + size_t new_length = name_length; + ralloc_asprintf_rewrite_tail(&name, &new_length, "[%u].%s", i, + var->name); + /* Note: row_major is only meaningful for uniform blocks, and + * lowering is only applied to non-uniform interface blocks, so we + * can safely pass false for row_major. + */ + recursion(var->type, &name, new_length, false, NULL); + } + ralloc_free(name); + } else if (var->from_named_ifc_block_nonarray) { + /* lower_named_interface_blocks created this variable by lowering a + * named interface block (non-array) to an ordinary variable. For + * example if the original source code was: + * + * out Blk { vec4 bar } foo; + * + * Then the variable is now: + * + * out vec4 bar; + * + * We need to visit this variable using the name: + * + * Blk.bar + */ + const glsl_type *ifc_type = var->get_interface_type(); + char *name = ralloc_asprintf(NULL, "%s.%s", ifc_type->name, var->name); + /* Note: row_major is only meaningful for uniform blocks, and lowering + * is only applied to non-uniform interface blocks, so we can safely + * pass false for row_major. + */ + recursion(var->type, &name, strlen(name), false, NULL); + ralloc_free(name); + } else if (t->is_record() || (t->is_array() && t->fields.array->is_record())) { char *name = ralloc_strdup(NULL, var->name); recursion(var->type, &name, strlen(name), false, NULL); ralloc_free(name); -- 1.8.4.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] clover: Refuse to create context with invalid properties
Jan Vesely writes: > the specs say that clCreateContext reutrns error > "if platform value specified in properties is not a valid platform" > > The orignal approach fials if invalid valu other than NULL pointer is > provided. > > Fixes piglit cl-api-create-context. > Honestly, I don't think this test makes much sense. It's unreasonable to expect that the CL will be able to catch any bad pointer you give it as argument and fail gracefully. The only reliable solution that comes to my mind would be to build a global hash table for each CL object type that keeps track of the valid objects that have been allocated. That seems like a lot of effort with the only purpose of finding out if the user is doing something *very* stupid and very unlikely. That said, we're already doing three forms of object validation: first, the pointers provided by the user are compared against NULL; second, we make sure that the dispatch table pointer is at the correct location in memory; third, if the object is part of a non-trivial class hierarchy, as is the case for events and memory objects, we use RTTI to make sure that the object is of the expected type. I don't think we want or need more validation, it would probably be more useful to drop that test from piglit. Apparently nVidia's libOpenCL fails the test as well. Thanks. > Signed-off-by: Jan Vesely > --- > src/gallium/state_trackers/clover/api/context.cpp | 11 --- > 1 file changed, 8 insertions(+), 3 deletions(-) > > diff --git a/src/gallium/state_trackers/clover/api/context.cpp > b/src/gallium/state_trackers/clover/api/context.cpp > index 7b020a6..67adf8f 100644 > --- a/src/gallium/state_trackers/clover/api/context.cpp > +++ b/src/gallium/state_trackers/clover/api/context.cpp > @@ -34,14 +34,19 @@ clCreateContext(const cl_context_properties *d_props, > cl_uint num_devs, > void *user_data, cl_int *r_errcode) try { > auto props = obj(d_props); > auto devs = objs(d_devs, num_devs); > + cl_platform_id platform; > + cl_uint num_platforms; > > if (!pfn_notify && user_data) >throw error(CL_INVALID_VALUE); > + > + int ret = clGetPlatformIDs(1, &platform, &num_platforms); > + if (ret || !num_platforms) > + throw error(CL_INVALID_PLATFORM); > > for (auto &prop : props) { > - if (prop.first == CL_CONTEXT_PLATFORM) > - obj(prop.second.as()); > - else > + if (prop.first != CL_CONTEXT_PLATFORM || > + prop.second.as() != platform) > throw error(CL_INVALID_PROPERTY); > } > > -- > 1.8.3.1 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/mesa-dev pgpKP5W0G9sUP.pgp Description: PGP signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] i965: Only emit interpolation setup if there are actual FS inputs.
Reviewed-by: Matt Turner ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev