Re: [Mesa-dev] [PATCH] nvc0: fix geometry program revalidation of clipping params
What piglit test does this fix? On Sat, Jul 11, 2015 at 7:13 PM, Ilia Mirkin wrote: > Signed-off-by: Ilia Mirkin > Cc: mesa-sta...@lists.freedesktop.org > --- > > Even though in practice a geometry program will never be using UCP's, > we still were revalidating (aka recompiling) the program when more > clip planes became enabled (which also are used for regular clip > distances). > > This seems like it should have led to massive fail, but I guess you > don't change the number of clip planes when using geometry shaders. > But I'm going to put this through a full piglit run just in case > there's something I'm missing. > > src/gallium/drivers/nouveau/nvc0/nvc0_state_validate.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_state_validate.c > b/src/gallium/drivers/nouveau/nvc0/nvc0_state_validate.c > index 785e52e..11f2b10 100644 > --- a/src/gallium/drivers/nouveau/nvc0/nvc0_state_validate.c > +++ b/src/gallium/drivers/nouveau/nvc0/nvc0_state_validate.c > @@ -339,7 +339,7 @@ nvc0_check_program_ucps(struct nvc0_context *nvc0, >nvc0_vertprog_validate(nvc0); > else > if (likely(vp == nvc0->gmtyprog)) > - nvc0_vertprog_validate(nvc0); > + nvc0_gmtyprog_validate(nvc0); > else >nvc0_tevlprog_validate(nvc0); > } > -- > 2.3.6 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/mesa-dev > -- Best regards, Samuel Pitoiset. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/5] i965: Move BEGIN_BATCH() into same control flow as ADVANCE_BATCH().
I think Chris did not review this one: Reviewed-by: Iago Toral Quiroga Iago On Fri, 2015-07-10 at 11:44 -0700, Matt Turner wrote: > BEGIN_BATCH() and ADVANCE_BATCH() will contain "do {" and "} while (0)" > respectively to allow declaring local variables used by intervening > OUT_BATCH macros. As such, BEGIN_BATCH() and ADVANCE_BATCH() need to be > in the same control flow. > --- > src/mesa/drivers/dri/i965/brw_draw.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/src/mesa/drivers/dri/i965/brw_draw.c > b/src/mesa/drivers/dri/i965/brw_draw.c > index 69ad4d4..ec13473 100644 > --- a/src/mesa/drivers/dri/i965/brw_draw.c > +++ b/src/mesa/drivers/dri/i965/brw_draw.c > @@ -261,17 +261,17 @@ static void brw_emit_prim(struct brw_context *brw, >indirect_flag = 0; > } > > + BEGIN_BATCH(brw->gen >= 7 ? 7 : 6); > + > if (brw->gen >= 7) { >if (brw->predicate.state == BRW_PREDICATE_STATE_USE_BIT) > predicate_enable = GEN7_3DPRIM_PREDICATE_ENABLE; >else > predicate_enable = 0; > > - BEGIN_BATCH(7); >OUT_BATCH(CMD_3D_PRIM << 16 | (7 - 2) | indirect_flag | > predicate_enable); >OUT_BATCH(hw_prim | vertex_access_type); > } else { > - BEGIN_BATCH(6); >OUT_BATCH(CMD_3D_PRIM << 16 | (6 - 2) | > hw_prim << GEN4_3DPRIM_TOPOLOGY_TYPE_SHIFT | > vertex_access_type); ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/2] nv50: turn samples counts off during blit
Fixes the following piglit test: occlusion_query_meta_no_fragments Signed-off-by: Samuel Pitoiset --- src/gallium/drivers/nouveau/nv50/nv50_surface.c | 11 +++ 1 file changed, 11 insertions(+) diff --git a/src/gallium/drivers/nouveau/nv50/nv50_surface.c b/src/gallium/drivers/nouveau/nv50/nv50_surface.c index dc9852d..66eccc2 100644 --- a/src/gallium/drivers/nouveau/nv50/nv50_surface.c +++ b/src/gallium/drivers/nouveau/nv50/nv50_surface.c @@ -1432,6 +1432,7 @@ static void nv50_blit(struct pipe_context *pipe, const struct pipe_blit_info *info) { struct nv50_context *nv50 = nv50_context(pipe); + struct nouveau_pushbuf *push = nv50->base.pushbuf; boolean eng3d = FALSE; if (util_format_is_depth_or_stencil(info->dst.resource->format)) { @@ -1493,10 +1494,20 @@ nv50_blit(struct pipe_context *pipe, const struct pipe_blit_info *info) info->src.box.height != -info->dst.box.height)) eng3d = TRUE; + if (nv50->screen->num_occlusion_queries_active) { + BEGIN_NV04(push, NV50_3D(SAMPLECNT_ENABLE), 1); + PUSH_DATA (push, 0); + } + if (!eng3d) nv50_blit_eng2d(nv50, info); else nv50_blit_3d(nv50, info); + + if (nv50->screen->num_occlusion_queries_active) { + BEGIN_NV04(push, NV50_3D(SAMPLECNT_ENABLE), 1); + PUSH_DATA (push, 1); + } } static void -- 2.4.5 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/2] nv50: add nesting support for occlusion queries
This is loosely based on nvc0. Signed-off-by: Samuel Pitoiset --- src/gallium/drivers/nouveau/nv50/nv50_query.c | 27 -- src/gallium/drivers/nouveau/nv50/nv50_screen.h | 2 ++ 2 files changed, 19 insertions(+), 10 deletions(-) diff --git a/src/gallium/drivers/nouveau/nv50/nv50_query.c b/src/gallium/drivers/nouveau/nv50/nv50_query.c index 81f7474..80d3fd2 100644 --- a/src/gallium/drivers/nouveau/nv50/nv50_query.c +++ b/src/gallium/drivers/nouveau/nv50/nv50_query.c @@ -49,6 +49,7 @@ struct nv50_query { uint32_t offset; /* base + i * 32 */ uint8_t state; boolean is64bit; + int nesting; /* only used for occlusion queries */ struct nouveau_mm_allocation *mm; struct nouveau_fence *fence; }; @@ -175,11 +176,16 @@ nv50_query_begin(struct pipe_context *pipe, struct pipe_query *pq) switch (q->type) { case PIPE_QUERY_OCCLUSION_COUNTER: - PUSH_SPACE(push, 4); - BEGIN_NV04(push, NV50_3D(COUNTER_RESET), 1); - PUSH_DATA (push, NV50_3D_COUNTER_RESET_SAMPLECNT); - BEGIN_NV04(push, NV50_3D(SAMPLECNT_ENABLE), 1); - PUSH_DATA (push, 1); + q->nesting = nv50->screen->num_occlusion_queries_active++; + if (q->nesting) { + nv50_query_get(push, q, 0x10, 0x0100f002); + } else { + PUSH_SPACE(push, 4); + BEGIN_NV04(push, NV50_3D(COUNTER_RESET), 1); + PUSH_DATA (push, NV50_3D_COUNTER_RESET_SAMPLECNT); + BEGIN_NV04(push, NV50_3D(SAMPLECNT_ENABLE), 1); + PUSH_DATA (push, 1); + } break; case PIPE_QUERY_PRIMITIVES_GENERATED: nv50_query_get(push, q, 0x10, 0x06805002); @@ -223,9 +229,11 @@ nv50_query_end(struct pipe_context *pipe, struct pipe_query *pq) switch (q->type) { case PIPE_QUERY_OCCLUSION_COUNTER: nv50_query_get(push, q, 0, 0x0100f002); - PUSH_SPACE(push, 2); - BEGIN_NV04(push, NV50_3D(SAMPLECNT_ENABLE), 1); - PUSH_DATA (push, 0); + if (--nv50->screen->num_occlusion_queries_active == 0) { + PUSH_SPACE(push, 2); + BEGIN_NV04(push, NV50_3D(SAMPLECNT_ENABLE), 1); + PUSH_DATA (push, 0); + } break; case PIPE_QUERY_PRIMITIVES_GENERATED: nv50_query_get(push, q, 0, 0x06805002); @@ -396,8 +404,7 @@ nv50_render_condition(struct pipe_context *pipe, case PIPE_QUERY_OCCLUSION_COUNTER: case PIPE_QUERY_OCCLUSION_PREDICATE: if (likely(!condition)) { -/* XXX: Placeholder, handle nesting here if available */ -if (unlikely(false)) +if (unlikely(q->nesting)) cond = wait ? NV50_3D_COND_MODE_NOT_EQUAL : NV50_3D_COND_MODE_ALWAYS; else diff --git a/src/gallium/drivers/nouveau/nv50/nv50_screen.h b/src/gallium/drivers/nouveau/nv50/nv50_screen.h index 881051b..3a12a1f 100644 --- a/src/gallium/drivers/nouveau/nv50/nv50_screen.h +++ b/src/gallium/drivers/nouveau/nv50/nv50_screen.h @@ -54,6 +54,8 @@ struct nv50_screen { struct nv50_context *cur_ctx; struct nv50_graph_state save_state; + int num_occlusion_queries_active; + struct nouveau_bo *code; struct nouveau_bo *uniforms; struct nouveau_bo *txc; /* TIC (offset 0) and TSC (65536) */ -- 2.4.5 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] glsl: free interface_types and replace old hash_table uses
On Sat, 2015-07-11 at 10:13 +1000, Timothy Arceri wrote: > The util/hash_table was intended to be a fast hash table > replacement for the program/hash_table see 35fd61bd99c1 and 72e55bb6888ff. > > This replaces some more uses of the old hash table and also > destroys the interface_types hash table when _mesa_glsl_release_types() > is called which wasn't previously being done. > --- > Was looking at the remaining program/hash_table uses and noticed that > interface_types wasnt being freed so thought I'd fix that and replace the > hash while I was there. > > No measurable compile time changes to the public shader-db > > src/glsl/glsl_types.cpp | 85 > ++--- > src/glsl/glsl_types.h | 2 +- > 2 files changed, 46 insertions(+), 41 deletions(-) > > diff --git a/src/glsl/glsl_types.cpp b/src/glsl/glsl_types.cpp > index 281ff51..255bd69 100644 > --- a/src/glsl/glsl_types.cpp > +++ b/src/glsl/glsl_types.cpp > @@ -25,7 +25,7 @@ > #include "main/core.h" /* for Elements, MAX2 */ > #include "glsl_parser_extras.h" > #include "glsl_types.h" > -#include "program/hash_table.h" > +#include "util/hash_table.h" > > > mtx_t glsl_type::mutex = _MTX_INITIALIZER_NP; > @@ -329,14 +329,19 @@ _mesa_glsl_release_types(void) > * necessary. > */ > if (glsl_type::array_types != NULL) { > - hash_table_dtor(glsl_type::array_types); > + _mesa_hash_table_destroy(glsl_type::array_types, NULL); >glsl_type::array_types = NULL; > } > > if (glsl_type::record_types != NULL) { > - hash_table_dtor(glsl_type::record_types); > + _mesa_hash_table_destroy(glsl_type::record_types, NULL); >glsl_type::record_types = NULL; > } > + > + if (glsl_type::interface_types != NULL) { > + _mesa_hash_table_destroy(glsl_type::interface_types, NULL); > + glsl_type::interface_types = NULL; > + } I think it is probably best to put the destruction of interface_types in a separate patch, it is a different issue after all. You can add my Reviewed-by on that patch. With that and a couple of other minor nitpicks I mention below fixed, this is: Reviewed-by: Iago Toral Quiroga > } > > > @@ -648,27 +653,28 @@ glsl_type::get_array_instance(const glsl_type *base, > unsigned array_size) > mtx_lock(&glsl_type::mutex); > > if (array_types == NULL) { > - array_types = hash_table_ctor(64, hash_table_string_hash, > - hash_table_string_compare); > + array_types = _mesa_hash_table_create(NULL, _mesa_key_hash_string, > +_mesa_key_string_equal); > } > > - const glsl_type *t = (glsl_type *) hash_table_find(array_types, key); > - > - if (t == NULL) { > + const struct hash_entry *entry = _mesa_hash_table_search(array_types, > key); > + if (entry == NULL) { >mtx_unlock(&glsl_type::mutex); > - t = new glsl_type(base, array_size); > + const glsl_type *t = new glsl_type(base, array_size); >mtx_lock(&glsl_type::mutex); > > - hash_table_insert(array_types, (void *) t, ralloc_strdup(mem_ctx, > key)); > + entry = _mesa_hash_table_insert(array_types, > + ralloc_strdup(mem_ctx, key), > + (void *) t); > } > > - assert(t->base_type == GLSL_TYPE_ARRAY); > - assert(t->length == array_size); > - assert(t->fields.array == base); > + assert(((glsl_type *)entry->data)->base_type == GLSL_TYPE_ARRAY); > + assert(((glsl_type *)entry->data)->length == array_size); > + assert(((glsl_type *)entry->data)->fields.array == base); Other parts of this file put a blank between the type cast and the variable, so I would add that here (and in all other places where you cast entry to glsl_type* in this patch). > mtx_unlock(&glsl_type::mutex); > > - return t; > + return (glsl_type *)entry->data; > } > > > @@ -722,19 +728,13 @@ glsl_type::record_compare(const glsl_type *b) const > } > > > -int > +bool > glsl_type::record_key_compare(const void *a, const void *b) > { > const glsl_type *const key1 = (glsl_type *) a; > const glsl_type *const key2 = (glsl_type *) b; > > - /* Return zero is the types match (there is zero difference) or non-zero > -* otherwise. > -*/ > - if (strcmp(key1->name, key2->name) != 0) > - return 1; > - > - return !key1->record_compare(key2); > + return strcmp(key1->name, key2->name) == 0 && key1->record_compare(key2); > } > > > @@ -772,25 +772,27 @@ glsl_type::get_record_instance(const glsl_struct_field > *fields, > mtx_lock(&glsl_type::mutex); > > if (record_types == NULL) { > - record_types = hash_table_ctor(64, record_key_hash, > record_key_compare); > + record_types = _mesa_hash_table_create(NULL, record_key_hash, > + record_key_compare); > } > > - const glsl_type *t = (glsl_type *) hash_
[Mesa-dev] [PATCH] nv50: limit the maximum number of samplers to 16
NV50_3D_BIND_TSC only allows to bind 16 samplers, and since we don't want to do anything with NV50_3D_BIND_TSC2, just limit the maximum number of samplers to 16 like for nvc0. This fixes dmesg fails with the following piglit test: max-samplers But the test still fails. Signed-off-by: Samuel Pitoiset --- src/gallium/drivers/nouveau/nv50/nv50_screen.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/gallium/drivers/nouveau/nv50/nv50_screen.c b/src/gallium/drivers/nouveau/nv50/nv50_screen.c index 6583a35..46ae0b8 100644 --- a/src/gallium/drivers/nouveau/nv50/nv50_screen.c +++ b/src/gallium/drivers/nouveau/nv50/nv50_screen.c @@ -286,7 +286,7 @@ nv50_screen_get_shader_param(struct pipe_screen *pscreen, unsigned shader, case PIPE_SHADER_CAP_MAX_TEXTURE_SAMPLERS: /* The chip could handle more sampler views than samplers */ case PIPE_SHADER_CAP_MAX_SAMPLER_VIEWS: - return MIN2(32, PIPE_MAX_SAMPLERS); + return MIN2(16, PIPE_MAX_SAMPLERS); case PIPE_SHADER_CAP_DOUBLES: case PIPE_SHADER_CAP_TGSI_DROUND_SUPPORTED: case PIPE_SHADER_CAP_TGSI_DFRACEXP_DLDEXP_SUPPORTED: -- 2.4.5 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 61/78] i965/nir/vec4: Add skeleton implementation of nir_emit_texture()
On 01/07/15 01:37, Jason Ekstrand wrote: > If we can avoid duplication in the texturing code, that would be > really nice. Could we do this as a refactor of the old code and then > a much smaller NIR function that calls some shared code? That's what > we did for FS and it worked ok. I looked at the layout of your code > and, after you finish reading instruction inputs, very little of it is > actually NIR-specific. Our wip second batch of commits has now a version that refactors the old code. But I have a doubt. As part of this refactor, I created a new method to be used by brw_fs and brw_vec4, that returns the ir texture opcode based on the nir texture opcode. You can take a look to that method here [1]. The issue with this method is that ir_texture_opcode is defined at glsl/ir.h inside a #ifdef __cplusplus. So I needed to rename brw_nir.c to brw_nir.cpp. On a previous early review [2] you mentioned that it would be better to keep it as brw_nir.c unless we have a good reason. Is using this enum good reason to do the renaming? If not, the other option I see is modify glsl/ir.h and move the enum definitions to the top, outside the ifdef __cplusplus, as other headers (like glsl/glsl_types.h) have. BR [1] https://github.com/Igalia/mesa/commit/e81ce150ef931e50b6cb1aae0b42a79f448863f5 [2] https://bugs.freedesktop.org/show_bug.cgi?id=89580#c7 > --Jason > > On Fri, Jun 26, 2015 at 1:07 AM, Eduardo Lima Mitev wrote: >> From: Alejandro Piñeiro >> >> This sets up the basic structure and placeholders for the implementation of >> texture operations, which will be added incrementally in subsequent patches. >> >> Apart from helping split the texture support into smaller patches, this patch >> is useful to get a simplified picture of the steps involved in the emission >> of >> texture operations. >> >> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89580 >> --- >> src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 120 >> - >> 1 file changed, 119 insertions(+), 1 deletion(-) >> >> diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp >> b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp >> index 08a70b0..c3638a0 100644 >> --- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp >> +++ b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp >> @@ -1412,7 +1412,125 @@ vec4_visitor::nir_swizzle_result(nir_tex_instr >> *instr, dst_reg dest, >> void >> vec4_visitor::nir_emit_texture(nir_tex_instr *instr) >> { >> - /* @TODO: Not yet implemented */ >> + unsigned sampler = instr->sampler_index; >> + src_reg sampler_reg = src_reg(sampler); >> + >> + const glsl_type *dest_type; >> + >> + /* Load the texture operation sources */ >> + for (unsigned i = 0; i < instr->num_srcs; i++) { >> + src_reg src = get_nir_src(instr->src[i].src); >> + >> + switch (instr->src[i].src_type) { >> + case nir_tex_src_comparitor: >> + /* @TODO: not yet implemented */ >> + break; >> + >> + case nir_tex_src_coord: >> + /* @TODO: not yet implemented */ >> + break; >> + >> + case nir_tex_src_ddx: >> + /* @TODO: not yet implemented */ >> + break; >> + >> + case nir_tex_src_ddy: >> + /* @TODO: not yet implemented */ >> + break; >> + >> + case nir_tex_src_lod: >> + /* @TODO: not yet implemented */ >> + break; >> + >> + case nir_tex_src_ms_index: >> + /* @TODO: not yet implemented */ >> + break; >> + >> + case nir_tex_src_offset: >> + /* @TODO: not yet implemented */ >> + break; >> + >> + case nir_tex_src_sampler_offset: >> + /* @TODO: not yet implemented */ >> + break; >> + >> + case nir_tex_src_projector: >> + unreachable("Should be lowered by do_lower_texture_projection"); >> + >> + case nir_tex_src_bias: >> + unreachable("LOD bias is not valid for vertex shaders.\n"); >> + >> + default: >> + unreachable("unknown texture source"); >> + } >> + } >> + >> + /* Get the texture operation opcode */ >> + enum opcode opcode = shader_opcode_for_nir_opcode(instr->op); >> + >> + /* Build the instruction */ >> + enum glsl_base_type dest_base_type = >> + brw_glsl_base_type_for_nir_type(instr->dest_type); >> + >> + dest_type = >> + glsl_type::get_instance(dest_base_type, >> nir_tex_instr_dest_size(instr), 1); >> + >> + vec4_instruction *inst = new(mem_ctx) >> + vec4_instruction(opcode, dst_reg(this, dest_type)); >> + >> + for (unsigned i = 0; i < 3; i++) { >> + if (instr->const_offset[i] != 0) { >> + inst->offset = brw_texture_offset(instr->const_offset, 3); >> + break; >> + } >> + } >> + >> + /* The message header is necessary for: >> +* - Texel offsets >> +* - Gather channel selection >> +* - Sampler indices too large to fit in a 4-bit value. >> +*/ >> + inst->header_size = >> + (inst->offset != 0 || >> + instr->op == nir_texop_t
Re: [Mesa-dev] [PATCH v2] glsl: avoid compiler's segfault when processing operators with void arguments
On 11/07/15 19:38, Renaud Gaubert wrote: > This is done by returning an rvalue of type void in the > ast_function_expression::hir function instead of a void expression. > > This produces (in the case of the ternary) an hir with a call > to the void returning function and an assignment of a void variable > which will be optimized out (the assignment) during the optimization > pass. > > This fix results in having a valid subexpression in the many > different cases where the subexpressions are functions whose > return values are void. > > Thus preventing to dereference NULL in the following cases: > * binary operator > * unary operators > * ternary operator > * comparison operators (except equal and nequal operator) > > Equal and nequal had to be handled as a special case because > instead of segfaulting on a forbidden syntax it was now accepting > expressions with a void return value on either (or both) side of > the expression. > > Signed-off-by: Renaud Gaubert > Reviewed-by: Gabriel Laskar > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=85252 > > --- > Piglit tests were sent to the Piglit mailing list: > * glsl-1.10 Adds tests on how void functions are handled > > src/glsl/ast_function.cpp | 9 - > src/glsl/ast_to_hir.cpp | 9 - > 2 files changed, 16 insertions(+), 2 deletions(-) > > diff --git a/src/glsl/ast_function.cpp b/src/glsl/ast_function.cpp > index 92e26bf..ac32723 100644 > --- a/src/glsl/ast_function.cpp > +++ b/src/glsl/ast_function.cpp > @@ -1785,7 +1785,14 @@ ast_function_expression::hir(exec_list *instructions, >/* an error has already been emitted */ >value = ir_rvalue::error_value(ctx); >} else { > - value = generate_call(instructions, sig, &actual_parameters, state); > +value = generate_call(instructions, sig, &actual_parameters, state); > +if (!value) { > + ir_variable *const tmp = new(ctx) > ir_variable(glsl_type::void_type, > + "void_var", > + ir_var_temporary); > + instructions->push_tail(tmp); > + value = new(ctx) ir_dereference_variable(tmp); > +} Indention. You used two spaces indention instead of three. With that fixed, Reviewed-by: Samuel Iglesias Gonsálvez If you don't have commit rights, I can fix the indention, add the R-b and push the patch to master (no need of v3 patch). Just tell me to do so (*). Thanks for your contribution! Sam (*) Also for your piglit patch. If nobody has pushed your piglit patch yet, I can do it. >} > >return value; > diff --git a/src/glsl/ast_to_hir.cpp b/src/glsl/ast_to_hir.cpp > index 8cb46be..0d0ad2a 100644 > --- a/src/glsl/ast_to_hir.cpp > +++ b/src/glsl/ast_to_hir.cpp > @@ -1270,7 +1270,14 @@ ast_expression::do_hir(exec_list *instructions, > *applied to one operand that can make them match, in which > *case this conversion is done." > */ > - if ((!apply_implicit_conversion(op[0]->type, op[1], state) > + > + if (op[0]->type == glsl_type::void_type || op[1]->type == > glsl_type::void_type) { > + _mesa_glsl_error(& loc, state, "`%s': wrong operand types: " > + "no operation `%1$s' exists that takes a left-hand " > + "operand of type 'void' or a right operand of type " > + "'void'", (this->oper == ast_equal) ? "==" : "!="); > + error_emitted = true; > + } else if ((!apply_implicit_conversion(op[0]->type, op[1], state) > && !apply_implicit_conversion(op[1]->type, op[0], state)) >|| (op[0]->type != op[1]->type)) { > _mesa_glsl_error(& loc, state, "operands of `%s' must have the same > " > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v2] i965/fs: Don't use the pixel interpolater for centroid interpolation
Chris Forbes writes: > Nitpicks aside, I don't think this is a great idea now that you've got > the SKL PI working. Can you explain why you don't think this is a good idea? Is it because it is an optimisation for something that is not known to be a big use case so carrying around the extra code just adds unnecessary maintenance burden? I could agree with that so I'm happy to abandon the patch for now if that's the general consensus. > I also think it's broken -- you need to arrange to have the centroid > barycentric coords delivered to the FS thread, which won't be > happening if this is the *only* use of them. Masked in the tests, > because they compare with a centroid-qualified input. [I'm assuming > you don't always get these delivered to the FS in SKL, but no docs > access...] The changes to brw_compute_barycentric_interp_modes in the patch ensure that the centroid barycentric coords are delivered whenever interpolateAtCentroid is used in a shader. I don't think this is a problem. At least it seems to work in a simple test without using a separate varying with the centroid qualifier. Regards, - Neil ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 18/78] i965: Take is_scalar_shader_stage() method out to allow reuse
On Fri, Jul 10, 2015 at 8:53 AM, Eduardo Lima Mitev wrote: > On 06/30/2015 06:58 PM, Jason Ekstrand wrote: >> On Fri, Jun 26, 2015 at 1:06 AM, Eduardo Lima Mitev wrote: >>> This patch makes public the is_scalar_shader_stage() method in brw_shader, >>> and >>> renames it to brw_compiler_is_scalar_shader_stage(). The plan is to later >>> reuse it >>> in brw_nir, to enable/disable optimization passes depending on the type >>> of shader stage. >> >> I'm not so sure that this is a good plan. It assumes that whether we >> are doing a scalar or vec4 compile is based entirely on the shader >> stage and some static information (possibly based on environment >> variables). Ken and I were talking around the office and we may want >> to use both SIMD4x2 and SIMD8 mode for geometry shaders depending on >> the number of inputs, etc. This won't work in the given paradigm. >> > > If I understand correctly, what you propose is having a function to > dynamically choose the type of shader (scalar vs. vec4) when compiling > the shader, using not only gen and stage, but also actual application > data. I think this is a good idea and will allow experimenting with > different combinations of shaders with real input data. > > However, I wonder if we this should be added later after more elaborated > thoughts on what exactly do we need and where to plug it. I have been > experimenting with a function following the use case you mentioned, > choosing shader backend based on inputs to a GS. But honestly it feels > like a blind guess from my side to what actually you and Ken have in mind. > > Current patch basically reuses the function we already have to select > the shader, so what I propose is to move forward with it for the moment > (adding the missing MESA_SHADER_COMPUTE) and discuss how to extend it to > factor in dynamic data; or perhaps you can explain us your proposal with > a bit more detail. > > WDYT? My primary issue was with having it based on a is_scalar_stage function at all. It seems like a better long-term plan would be to simply pass an is_scalar boolean into those lowering functions. That said, I haven't taken a real hard look as to what calls those functions and whether or not that's actually what we want either. Make sense? If there's no better place to make that determination further up the call chain, then this is fine for now. --Jason >>> The new method accepts a brw_compiler instead of a brw_context. This is done >>> for consistency, since the actual info we need (scalar_vs) is in >>> brw_compiler, >>> and fetching in through brw_content->intelScreen->compiler seems like too >>> much indirection. >>> >>> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89580 >>> --- >>> src/mesa/drivers/dri/i965/brw_shader.cpp | 22 ++ >>> src/mesa/drivers/dri/i965/brw_shader.h | 13 + >>> 2 files changed, 19 insertions(+), 16 deletions(-) >>> >>> diff --git a/src/mesa/drivers/dri/i965/brw_shader.cpp >>> b/src/mesa/drivers/dri/i965/brw_shader.cpp >>> index 0b53647..3b99046 100644 >>> --- a/src/mesa/drivers/dri/i965/brw_shader.cpp >>> +++ b/src/mesa/drivers/dri/i965/brw_shader.cpp >>> @@ -182,19 +182,6 @@ brw_shader_precompile(struct gl_context *ctx, >>> return true; >>> } >>> >>> -static inline bool >>> -is_scalar_shader_stage(struct brw_context *brw, int stage) >>> -{ >>> - switch (stage) { >>> - case MESA_SHADER_FRAGMENT: >>> - return true; >>> - case MESA_SHADER_VERTEX: >>> - return brw->intelScreen->compiler->scalar_vs; >>> - default: >>> - return false; >>> - } >>> -} >>> - >>> static void >>> brw_lower_packing_builtins(struct brw_context *brw, >>> gl_shader_stage shader_type, >>> @@ -205,7 +192,8 @@ brw_lower_packing_builtins(struct brw_context *brw, >>> | LOWER_PACK_UNORM_2x16 >>> | LOWER_UNPACK_UNORM_2x16; >>> >>> - if (is_scalar_shader_stage(brw, shader_type)) { >>> + if (brw_compiler_is_scalar_shader_stage(brw->intelScreen->compiler, >>> + shader_type)) { >>>ops |= LOWER_UNPACK_UNORM_4x8 >>> | LOWER_UNPACK_SNORM_4x8 >>> | LOWER_PACK_UNORM_4x8 >>> @@ -218,7 +206,8 @@ brw_lower_packing_builtins(struct brw_context *brw, >>> * lowering is needed. For SOA code, the Half2x16 ops must be >>> * scalarized. >>> */ >>> - if (is_scalar_shader_stage(brw, shader_type)) { >>> + if (brw_compiler_is_scalar_shader_stage(brw->intelScreen->compiler, >>> + shader_type)) { >>> ops |= LOWER_PACK_HALF_2x16_TO_SPLIT >>> | LOWER_UNPACK_HALF_2x16_TO_SPLIT; >>>} >>> @@ -294,7 +283,8 @@ process_glsl_ir(struct brw_context *brw, >>> do { >>>progress = false; >>> >>> - if (is_scalar_shader_stage(brw, shader->Stage)) { >>> + if (brw_compiler_is_scalar_shader_stage(brw->intelScreen->compiler, >>> +
Re: [Mesa-dev] [PATCH 16/78] i965/nir/vec4: Implement store_output intrinsic
On Wed, Jul 8, 2015 at 11:54 PM, Eduardo Lima Mitev wrote: > On 06/30/2015 06:51 PM, Jason Ekstrand wrote: >> On Fri, Jun 26, 2015 at 1:06 AM, Eduardo Lima Mitev wrote: >>> The index into the output_reg array where to store the destination register >>> is >>> fetched from the nir_outputs map built during nir_setup_outputs stage. >>> >>> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89580 >>> --- >>> src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 17 +++-- >>> 1 file changed, 15 insertions(+), 2 deletions(-) >>> >>> diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp >>> b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp >>> index 8a2d335..55d4490 100644 >>> --- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp >>> +++ b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp >>> @@ -520,10 +520,23 @@ vec4_visitor::nir_emit_intrinsic(nir_intrinsic_instr >>> *instr) >>> } >>> >>> case nir_intrinsic_store_output_indirect: >>> + has_indirect = true; >>>/* fallthrough */ >>> - case nir_intrinsic_store_output: >>> - /* @TODO: Not yet implemented */ >>> + case nir_intrinsic_store_output: { >>> + int offset = instr->const_index[0]; >>> + int output = nir_outputs[offset]; >>> + >>> + src = get_nir_src(instr->src[0], nir_output_types[offset]); >>> + dest = dst_reg(src); >>> + >>> + dest.writemask = brw_writemask_for_size(instr->num_components); >>> + >>> + if (has_indirect) >>> + dest.reladdr = new(mem_ctx) src_reg(get_nir_src(instr->src[1])); >>> + >>> + output_reg[output] = dest; >> >> I'm very confused about the amount of indirection going on here. It >> seems to me that we should be setting these outputs up in >> setup_outputs() rather than storring off a map from ints to other ints >> and setting it up here. I didn't make this comment on the patch for >> setup_outputs() because I wanted to wait to see it used before I >> commented on it. >> >> I'm guessing you did it this way because the nir_assign_var_locations >> is giving you bogus values. If so, then it might be better to just >> assign variable locations in setup_outputs() rather than having a >> remap table. The whole point of nir_lower_io is to make IO easy for >> the back-end. If you need a re-map table, then it's no longer making >> it easy and we need to think more about what's going on. >> --Jason >> > > That double indirection felt bad since the beginning, but it was needed > to store the original variable's location (var->data.location). Let me > explain: > > We are (re)using the plumbering in vec4_visitor to setup URB, so the > only thing we need to do is to store the out register in "output_reg" > map at the correct location. And that location is given by the original > location in the shader (var->data.location). > > So, in this case, "nir_assign_var_locations" pass, which constructs > var->data.driver_location, is not useful to us, except to give us > consecutive indexes to construct the other map we have, the type map, > which is needed to carry the correct type from the original variable to > the output register. If nir_assign_var_locations isn't doing anything for you, don't call it. You'll need to do something with var->data.driver_location. If what you really want is var->data.location, then just copy that to var->data.driver_location when you do nir_setup_outputs. Or (depending on how the URB setup works, I don't actually know), put the actual URB location in var->data.driver_location when you walk the outputs. From there, you have two options. One would be to setup output_reg at the same time with the correct types right away and emit a MOV when you get a store_output. (Copy propagation should clean up the MOV.) For what it's worth, I don't think the type matters; a URB write just writes data to something so as long as you don't have a type mismatch in a MOV, the hardware won't care. The other option, would be to directly emit the URB write in store_output. At the moment, it may be better to take the first option since that better matches what the FS does right now. But both should work fine. > So, before knowing that I could modify nir_lower_io, my best shot at > transferring the original variable location was to create this > nir_outputs map. Now, what I have done is to put that value in > const_index[1] of the intrinsic instruction, which was previously > unused. What do you think? > > That removes the offset to offset map, but we still need the type map. > > About your comment on initializing the register during setup stage, I'm > a bit confused: the register that we need to store is not available > during setup stage, because we still don't have local registers allocated. What do you mean? Because you don't have the destination of the output_write intrinsic allocated? Even if the register has a file of BAD_FILE, you could still store the type there. Also, as I said above, the hardware shouldn't care about the types of data. As long as
Re: [Mesa-dev] [PATCH 61/78] i965/nir/vec4: Add skeleton implementation of nir_emit_texture()
On Wed, Jul 1, 2015 at 1:07 AM, Alejandro Piñeiro wrote: > On 01/07/15 01:37, Jason Ekstrand wrote: >> If we can avoid duplication in the texturing code, that would be >> really nice. Could we do this as a refactor of the old code and then >> a much smaller NIR function that calls some shared code? > > We worked under the assumption that the IR path would be eventually > removed, so having a nir self-contained version would make that removal > easier. > >> That's what >> we did for FS and it worked ok. > > Yes, I realized that the fs nir path was reusing code, instead of a new > implementation. But as the fs nir code was doing some nir to ir > structure/enum mappings, I assumed that was a first version of the > texture support, and that it would be modified in the future when ir get > ridded. As mentioned, I went directly to a nir self-contained version, > so when ir get ridded, it would not be needed to modify the nir path, > but just remove the ir one. In the FS, it's useful for two reasons. One was to share code between the old and new visitors. The other is because we have 3 different texture paths in the FS for different gens. In vec4, we only have one so it's a little less justified. However, it will make code review easier. >> I looked at the layout of your code >> and, after you finish reading instruction inputs, very little of it is >> actually NIR-specific. > > Yes, current nir texture support is basically using the nir structures > to get all the info, and then just adapt the old code at vec4_visitor, > so it would be possible to reuse the code. In any case, as it is right > now happening with the fs path, probably it would be needed to do some > mappings between nir to ir structures/enums. > > I will work now on a version of the support based on refactor the > existing code, in order to be used by the nir and ir paths. > > BR > >> --Jason >> >> On Fri, Jun 26, 2015 at 1:07 AM, Eduardo Lima Mitev wrote: >>> From: Alejandro Piñeiro >>> >>> This sets up the basic structure and placeholders for the implementation of >>> texture operations, which will be added incrementally in subsequent patches. >>> >>> Apart from helping split the texture support into smaller patches, this >>> patch >>> is useful to get a simplified picture of the steps involved in the emission >>> of >>> texture operations. >>> >>> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89580 >>> --- >>> src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 120 >>> - >>> 1 file changed, 119 insertions(+), 1 deletion(-) >>> >>> diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp >>> b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp >>> index 08a70b0..c3638a0 100644 >>> --- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp >>> +++ b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp >>> @@ -1412,7 +1412,125 @@ vec4_visitor::nir_swizzle_result(nir_tex_instr >>> *instr, dst_reg dest, >>> void >>> vec4_visitor::nir_emit_texture(nir_tex_instr *instr) >>> { >>> - /* @TODO: Not yet implemented */ >>> + unsigned sampler = instr->sampler_index; >>> + src_reg sampler_reg = src_reg(sampler); >>> + >>> + const glsl_type *dest_type; >>> + >>> + /* Load the texture operation sources */ >>> + for (unsigned i = 0; i < instr->num_srcs; i++) { >>> + src_reg src = get_nir_src(instr->src[i].src); >>> + >>> + switch (instr->src[i].src_type) { >>> + case nir_tex_src_comparitor: >>> + /* @TODO: not yet implemented */ >>> + break; >>> + >>> + case nir_tex_src_coord: >>> + /* @TODO: not yet implemented */ >>> + break; >>> + >>> + case nir_tex_src_ddx: >>> + /* @TODO: not yet implemented */ >>> + break; >>> + >>> + case nir_tex_src_ddy: >>> + /* @TODO: not yet implemented */ >>> + break; >>> + >>> + case nir_tex_src_lod: >>> + /* @TODO: not yet implemented */ >>> + break; >>> + >>> + case nir_tex_src_ms_index: >>> + /* @TODO: not yet implemented */ >>> + break; >>> + >>> + case nir_tex_src_offset: >>> + /* @TODO: not yet implemented */ >>> + break; >>> + >>> + case nir_tex_src_sampler_offset: >>> + /* @TODO: not yet implemented */ >>> + break; >>> + >>> + case nir_tex_src_projector: >>> + unreachable("Should be lowered by do_lower_texture_projection"); >>> + >>> + case nir_tex_src_bias: >>> + unreachable("LOD bias is not valid for vertex shaders.\n"); >>> + >>> + default: >>> + unreachable("unknown texture source"); >>> + } >>> + } >>> + >>> + /* Get the texture operation opcode */ >>> + enum opcode opcode = shader_opcode_for_nir_opcode(instr->op); >>> + >>> + /* Build the instruction */ >>> + enum glsl_base_type dest_base_type = >>> + brw_glsl_base_type_for_nir_type(instr->dest_type); >>> + >>> + dest_type = >>> + glsl_type::get_instance(dest_base_type, >>> nir_tex_instr_dest_si
Re: [Mesa-dev] [PATCH 61/78] i965/nir/vec4: Add skeleton implementation of nir_emit_texture()
On Mon, Jul 13, 2015 at 3:00 AM, Alejandro Piñeiro wrote: > On 01/07/15 01:37, Jason Ekstrand wrote: >> If we can avoid duplication in the texturing code, that would be >> really nice. Could we do this as a refactor of the old code and then >> a much smaller NIR function that calls some shared code? That's what >> we did for FS and it worked ok. I looked at the layout of your code >> and, after you finish reading instruction inputs, very little of it is >> actually NIR-specific. > > Our wip second batch of commits has now a version that refactors the old > code. But I have a doubt. As part of this refactor, I created a new > method to be used by brw_fs and brw_vec4, that returns the ir texture > opcode based on the nir texture opcode. You can take a look to that > method here [1]. The issue with this method is that ir_texture_opcode is > defined at glsl/ir.h inside a #ifdef __cplusplus. So I needed to rename > brw_nir.c to brw_nir.cpp. On a previous early review [2] you mentioned > that it would be better to keep it as brw_nir.c unless we have a good > reason. Is using this enum good reason to do the renaming? If not, the > other option I see is modify glsl/ir.h and move the enum definitions to > the top, outside the ifdef __cplusplus, as other headers (like > glsl/glsl_types.h) have. Go ahead and make it a cpp file if you have to. Another option would be to do the map the other way: ir_opcode -> nir_tex_op and refactor the code to use the nir_tex_op. That's probably what we should be doing in the FS now that the old visitor is gone. --Jason > BR > > [1] > https://github.com/Igalia/mesa/commit/e81ce150ef931e50b6cb1aae0b42a79f448863f5 > [2] https://bugs.freedesktop.org/show_bug.cgi?id=89580#c7 >> --Jason >> >> On Fri, Jun 26, 2015 at 1:07 AM, Eduardo Lima Mitev wrote: >>> From: Alejandro Piñeiro >>> >>> This sets up the basic structure and placeholders for the implementation of >>> texture operations, which will be added incrementally in subsequent patches. >>> >>> Apart from helping split the texture support into smaller patches, this >>> patch >>> is useful to get a simplified picture of the steps involved in the emission >>> of >>> texture operations. >>> >>> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89580 >>> --- >>> src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 120 >>> - >>> 1 file changed, 119 insertions(+), 1 deletion(-) >>> >>> diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp >>> b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp >>> index 08a70b0..c3638a0 100644 >>> --- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp >>> +++ b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp >>> @@ -1412,7 +1412,125 @@ vec4_visitor::nir_swizzle_result(nir_tex_instr >>> *instr, dst_reg dest, >>> void >>> vec4_visitor::nir_emit_texture(nir_tex_instr *instr) >>> { >>> - /* @TODO: Not yet implemented */ >>> + unsigned sampler = instr->sampler_index; >>> + src_reg sampler_reg = src_reg(sampler); >>> + >>> + const glsl_type *dest_type; >>> + >>> + /* Load the texture operation sources */ >>> + for (unsigned i = 0; i < instr->num_srcs; i++) { >>> + src_reg src = get_nir_src(instr->src[i].src); >>> + >>> + switch (instr->src[i].src_type) { >>> + case nir_tex_src_comparitor: >>> + /* @TODO: not yet implemented */ >>> + break; >>> + >>> + case nir_tex_src_coord: >>> + /* @TODO: not yet implemented */ >>> + break; >>> + >>> + case nir_tex_src_ddx: >>> + /* @TODO: not yet implemented */ >>> + break; >>> + >>> + case nir_tex_src_ddy: >>> + /* @TODO: not yet implemented */ >>> + break; >>> + >>> + case nir_tex_src_lod: >>> + /* @TODO: not yet implemented */ >>> + break; >>> + >>> + case nir_tex_src_ms_index: >>> + /* @TODO: not yet implemented */ >>> + break; >>> + >>> + case nir_tex_src_offset: >>> + /* @TODO: not yet implemented */ >>> + break; >>> + >>> + case nir_tex_src_sampler_offset: >>> + /* @TODO: not yet implemented */ >>> + break; >>> + >>> + case nir_tex_src_projector: >>> + unreachable("Should be lowered by do_lower_texture_projection"); >>> + >>> + case nir_tex_src_bias: >>> + unreachable("LOD bias is not valid for vertex shaders.\n"); >>> + >>> + default: >>> + unreachable("unknown texture source"); >>> + } >>> + } >>> + >>> + /* Get the texture operation opcode */ >>> + enum opcode opcode = shader_opcode_for_nir_opcode(instr->op); >>> + >>> + /* Build the instruction */ >>> + enum glsl_base_type dest_base_type = >>> + brw_glsl_base_type_for_nir_type(instr->dest_type); >>> + >>> + dest_type = >>> + glsl_type::get_instance(dest_base_type, >>> nir_tex_instr_dest_size(instr), 1); >>> + >>> + vec4_instruction *inst = new(mem_ctx) >>> + vec4_instruction(opcode, dst_reg(this, dest_type)); >>> + >>> + for (un
[Mesa-dev] [PATCH] i965/fs: Make the texturing helpers take NIR opcodes instead of old IR ones
Now that the old GLSL IR visitor code is gone, having the remap is silly. --- src/mesa/drivers/dri/i965/brw_fs.h | 12 +-- src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 18 +--- src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 135 ++- 3 files changed, 75 insertions(+), 90 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs.h b/src/mesa/drivers/dri/i965/brw_fs.h index 5243079..1690f4a 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.h +++ b/src/mesa/drivers/dri/i965/brw_fs.h @@ -109,7 +109,7 @@ public: void compute_clip_distance(gl_clip_plane *clip_planes); uint32_t gather_channel(int orig_chan, uint32_t sampler); - void swizzle_result(ir_texture_opcode op, int dest_components, + void swizzle_result(nir_texop op, int dest_components, fs_reg orig_val, uint32_t sampler); int type_size(const struct glsl_type *type); @@ -207,28 +207,28 @@ public: void compute_sample_position(fs_reg dst, fs_reg int_sample_pos); fs_reg rescale_texcoord(fs_reg coordinate, int coord_components, bool is_rect, uint32_t sampler, int texunit); - fs_inst *emit_texture_gen4(ir_texture_opcode op, fs_reg dst, + fs_inst *emit_texture_gen4(nir_texop op, fs_reg dst, fs_reg coordinate, int coord_components, fs_reg shadow_comp, fs_reg lod, fs_reg lod2, int grad_components, uint32_t sampler); - fs_inst *emit_texture_gen4_simd16(ir_texture_opcode op, fs_reg dst, + fs_inst *emit_texture_gen4_simd16(nir_texop op, fs_reg dst, fs_reg coordinate, int vector_elements, fs_reg shadow_c, fs_reg lod, uint32_t sampler); - fs_inst *emit_texture_gen5(ir_texture_opcode op, fs_reg dst, + fs_inst *emit_texture_gen5(nir_texop op, fs_reg dst, fs_reg coordinate, int coord_components, fs_reg shadow_comp, fs_reg lod, fs_reg lod2, int grad_components, fs_reg sample_index, uint32_t sampler, bool has_offset); - fs_inst *emit_texture_gen7(ir_texture_opcode op, fs_reg dst, + fs_inst *emit_texture_gen7(nir_texop op, fs_reg dst, fs_reg coordinate, int coord_components, fs_reg shadow_comp, fs_reg lod, fs_reg lod2, int grad_components, fs_reg sample_index, fs_reg mcs, fs_reg sampler, fs_reg offset_value); - void emit_texture(ir_texture_opcode op, + void emit_texture(nir_texop op, const glsl_type *dest_type, fs_reg coordinate, int components, fs_reg shadow_c, diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp index caf1300..d8a6f3c 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp @@ -1739,23 +1739,7 @@ fs_visitor::nir_emit_texture(const fs_builder &bld, nir_tex_instr *instr) glsl_type::get_instance(dest_base_type, nir_tex_instr_dest_size(instr), 1); - ir_texture_opcode op; - switch (instr->op) { - case nir_texop_lod: op = ir_lod; break; - case nir_texop_query_levels: op = ir_query_levels; break; - case nir_texop_tex: op = ir_tex; break; - case nir_texop_tg4: op = ir_tg4; break; - case nir_texop_txb: op = ir_txb; break; - case nir_texop_txd: op = ir_txd; break; - case nir_texop_txf: op = ir_txf; break; - case nir_texop_txf_ms: op = ir_txf_ms; break; - case nir_texop_txl: op = ir_txl; break; - case nir_texop_txs: op = ir_txs; break; - default: - unreachable("unknown texture opcode"); - } - - emit_texture(op, dest_type, coordinate, instr->coord_components, + emit_texture(instr->op, dest_type, coordinate, instr->coord_components, shadow_comparitor, lod, lod2, lod_components, sample_index, tex_offset, mcs, gather_component, is_cube_array, is_rect, sampler, sampler_reg, texunit); diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp index 94d6a58..c726dcc 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp @@ -78,7 +78,7 @@ fs_visitor::emit_vs_system_value(int location) } fs_inst * -fs_visitor::emit_texture_gen4(ir_texture_opcode op, fs_reg dst, +fs_visitor::emit_texture_gen4(nir_texop op, fs_reg dst, fs_reg coordinate, int coord_components, fs_reg shadow_c, fs_reg lod, fs_reg dPdy, int grad_components, @@ -106,13
Re: [Mesa-dev] Extension to get Mesa IRs (Was: [Bug 91173])
On 12/07/15 01:49, Ilia Mirkin wrote: On Thu, Jul 2, 2015 at 4:54 PM, Jose Fonseca wrote: On 02/07/15 19:45, Ilia Mirkin wrote: On Thu, Jul 2, 2015 at 2:31 PM, Jose Fonseca wrote: On 02/07/15 17:49, Ilia Mirkin wrote: On Thu, Jul 2, 2015 at 12:40 PM, Jose Fonseca wrote: On 02/07/15 17:24, Ilia Mirkin wrote: On Thu, Jul 2, 2015 at 12:17 PM, Jose Fonseca wrote: Ah OK. So I guess tilers will have to disable their render queues for this one. Which seems like a reasonable trade-off... I don't see why. This is a purely SW query. So I don't see why the HW needs to see any difference. It just won't have compiled the shaders, I think. I guess this could force it. AFAIK, tiles defer the _rendering_, not the compilation. At least llvmpipe compiles everything at draw time. That said, glretrace already does glReadPixels when dumping state, so one way or the other, when inspecting state in qapitrace, everything will be flushed and and synched. But that's too late -- you said the glGetActiveBla would go right after the draw call. Presumably if you did it right after glReadPixels it'd end up seeing the state left over from a blit or something? Fair enough. It's the first thing after glDraw. Forget about glReadpixels. I guess just still don't understand what's special about tilers. But I don't think it's pertinent now. What's special about tilers is that they defer renders. Compiling the program can similarly get deferred because they can. (And sometimes entire renders get dropped due to clears, etc.) Should it get deferred? Dunno. I don't even remember if freedreno defers compilation, and never knew what vc4 did. Perhaps the API should instead be glEnable(GL_PROGRAM_SAVE_DUMP) glProgramDumpDebugInfo(progid, callback) which would then optionally dump any info associated with that program. That way it doesn't even have to be internally active (due to a subsequent blit or who-knows-what). But it would rely on that program having been previously-drawn-with which would have generated the relevant data. Doing this immediately after draw call is no problem at all. I don't think it's worth complicating things by allowing a lag between draw and shader extraction. It just makes things more unreliable which defeats the point. Would it really complicate things though? Internally, it can never drop the debug info since a program might later be reused wholesale and there won't be another compilation, so it has to store the info on the program object. Program object might not exist (e.g. when debugging fixed-function). And the concept of program object looses meaning in the downstream layers (e.g inside gallium pipe drivers, where TGSI can come from all sort of utility modules and not just GLSL). I have little doubts: for this to be feasible, it's imperative this applies to the immediately validated state. Our stack has too many layer to do anything else: it would be complex and buggy. Jose Jose, Were you planning on working on something like this? I could _really_ use it for some bugs I'm tracking down (and failing thus far), unfortunately the shaders are unreadable and get compiled very far away from time of use, which makes it harder to track. No, I'm afraid I don't have the time myself. It's not directly useful to anything I'm working on at the moment. My goal was only to help come up with good design for this, so that if/when somebody couldn't resist the urge to scratch this itch, there was a tentative design/plan in place already. Jose ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] i965/fs: Make the texturing helpers take NIR opcodes instead of old IR ones
Jason Ekstrand writes: > Now that the old GLSL IR visitor code is gone, having the remap is silly. > --- > src/mesa/drivers/dri/i965/brw_fs.h | 12 +-- > src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 18 +--- > src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 135 > ++- > 3 files changed, 75 insertions(+), 90 deletions(-) > > diff --git a/src/mesa/drivers/dri/i965/brw_fs.h > b/src/mesa/drivers/dri/i965/brw_fs.h > index 5243079..1690f4a 100644 > --- a/src/mesa/drivers/dri/i965/brw_fs.h > +++ b/src/mesa/drivers/dri/i965/brw_fs.h > @@ -109,7 +109,7 @@ public: > void compute_clip_distance(gl_clip_plane *clip_planes); > > uint32_t gather_channel(int orig_chan, uint32_t sampler); > - void swizzle_result(ir_texture_opcode op, int dest_components, > + void swizzle_result(nir_texop op, int dest_components, > fs_reg orig_val, uint32_t sampler); > > int type_size(const struct glsl_type *type); > @@ -207,28 +207,28 @@ public: > void compute_sample_position(fs_reg dst, fs_reg int_sample_pos); > fs_reg rescale_texcoord(fs_reg coordinate, int coord_components, > bool is_rect, uint32_t sampler, int texunit); > - fs_inst *emit_texture_gen4(ir_texture_opcode op, fs_reg dst, > + fs_inst *emit_texture_gen4(nir_texop op, fs_reg dst, >fs_reg coordinate, int coord_components, >fs_reg shadow_comp, >fs_reg lod, fs_reg lod2, int grad_components, >uint32_t sampler); > - fs_inst *emit_texture_gen4_simd16(ir_texture_opcode op, fs_reg dst, > + fs_inst *emit_texture_gen4_simd16(nir_texop op, fs_reg dst, > fs_reg coordinate, int vector_elements, > fs_reg shadow_c, fs_reg lod, > uint32_t sampler); > - fs_inst *emit_texture_gen5(ir_texture_opcode op, fs_reg dst, > + fs_inst *emit_texture_gen5(nir_texop op, fs_reg dst, >fs_reg coordinate, int coord_components, >fs_reg shadow_comp, >fs_reg lod, fs_reg lod2, int grad_components, >fs_reg sample_index, uint32_t sampler, >bool has_offset); > - fs_inst *emit_texture_gen7(ir_texture_opcode op, fs_reg dst, > + fs_inst *emit_texture_gen7(nir_texop op, fs_reg dst, >fs_reg coordinate, int coord_components, >fs_reg shadow_comp, >fs_reg lod, fs_reg lod2, int grad_components, >fs_reg sample_index, fs_reg mcs, fs_reg > sampler, >fs_reg offset_value); > - void emit_texture(ir_texture_opcode op, > + void emit_texture(nir_texop op, > const glsl_type *dest_type, > fs_reg coordinate, int components, > fs_reg shadow_c, Hold on, I'm about to send a series that gets rid of the ir_texture_opcode argument of these gen-specific functions and replaces it with a backend_instruction opcode. If you wait a little this change will involve much less churn. > diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp > b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp > index caf1300..d8a6f3c 100644 > --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp > +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp > @@ -1739,23 +1739,7 @@ fs_visitor::nir_emit_texture(const fs_builder &bld, > nir_tex_instr *instr) >glsl_type::get_instance(dest_base_type, nir_tex_instr_dest_size(instr), >1); > > - ir_texture_opcode op; > - switch (instr->op) { > - case nir_texop_lod: op = ir_lod; break; > - case nir_texop_query_levels: op = ir_query_levels; break; > - case nir_texop_tex: op = ir_tex; break; > - case nir_texop_tg4: op = ir_tg4; break; > - case nir_texop_txb: op = ir_txb; break; > - case nir_texop_txd: op = ir_txd; break; > - case nir_texop_txf: op = ir_txf; break; > - case nir_texop_txf_ms: op = ir_txf_ms; break; > - case nir_texop_txl: op = ir_txl; break; > - case nir_texop_txs: op = ir_txs; break; > - default: > - unreachable("unknown texture opcode"); > - } > - > - emit_texture(op, dest_type, coordinate, instr->coord_components, > + emit_texture(instr->op, dest_type, coordinate, instr->coord_components, > shadow_comparitor, lod, lod2, lod_components, sample_index, > tex_offset, mcs, gather_component, > is_cube_array, is_rect, sampler, sampler_reg, texunit); > diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp > b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp > index 94d6a58..c726dcc 100644 > --- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp > +++ b/src/mesa/dr
Re: [Mesa-dev] [PATCH 61/78] i965/nir/vec4: Add skeleton implementation of nir_emit_texture()
On 13/07/15 14:05, Jason Ekstrand wrote: > On Mon, Jul 13, 2015 at 3:00 AM, Alejandro Piñeiro > wrote: >> On 01/07/15 01:37, Jason Ekstrand wrote: >>> If we can avoid duplication in the texturing code, that would be >>> really nice. Could we do this as a refactor of the old code and then >>> a much smaller NIR function that calls some shared code? That's what >>> we did for FS and it worked ok. I looked at the layout of your code >>> and, after you finish reading instruction inputs, very little of it is >>> actually NIR-specific. >> Our wip second batch of commits has now a version that refactors the old >> code. But I have a doubt. As part of this refactor, I created a new >> method to be used by brw_fs and brw_vec4, that returns the ir texture >> opcode based on the nir texture opcode. You can take a look to that >> method here [1]. The issue with this method is that ir_texture_opcode is >> defined at glsl/ir.h inside a #ifdef __cplusplus. So I needed to rename >> brw_nir.c to brw_nir.cpp. On a previous early review [2] you mentioned >> that it would be better to keep it as brw_nir.c unless we have a good >> reason. Is using this enum good reason to do the renaming? If not, the >> other option I see is modify glsl/ir.h and move the enum definitions to >> the top, outside the ifdef __cplusplus, as other headers (like >> glsl/glsl_types.h) have. > Go ahead and make it a cpp file if you have to. Ok, thanks. > Another option would > be to do the map the other way: ir_opcode -> nir_tex_op and refactor > the code to use the nir_tex_op. But even if we do the map in the other way, the helper function would be using the enum type defined at glsl/ir.h, so we would need to make it a cpp file in any case. So for now, I will keep things simple, and keep the current mapping, and do the change later. > That's probably what we should be > doing in the FS now that the old visitor is gone. > --Jason > >> BR >> >> [1] >> https://github.com/Igalia/mesa/commit/e81ce150ef931e50b6cb1aae0b42a79f448863f5 >> [2] https://bugs.freedesktop.org/show_bug.cgi?id=89580#c7 >>> --Jason >>> >>> On Fri, Jun 26, 2015 at 1:07 AM, Eduardo Lima Mitev >>> wrote: From: Alejandro Piñeiro This sets up the basic structure and placeholders for the implementation of texture operations, which will be added incrementally in subsequent patches. Apart from helping split the texture support into smaller patches, this patch is useful to get a simplified picture of the steps involved in the emission of texture operations. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89580 --- src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 120 - 1 file changed, 119 insertions(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp index 08a70b0..c3638a0 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp @@ -1412,7 +1412,125 @@ vec4_visitor::nir_swizzle_result(nir_tex_instr *instr, dst_reg dest, void vec4_visitor::nir_emit_texture(nir_tex_instr *instr) { - /* @TODO: Not yet implemented */ + unsigned sampler = instr->sampler_index; + src_reg sampler_reg = src_reg(sampler); + + const glsl_type *dest_type; + + /* Load the texture operation sources */ + for (unsigned i = 0; i < instr->num_srcs; i++) { + src_reg src = get_nir_src(instr->src[i].src); + + switch (instr->src[i].src_type) { + case nir_tex_src_comparitor: + /* @TODO: not yet implemented */ + break; + + case nir_tex_src_coord: + /* @TODO: not yet implemented */ + break; + + case nir_tex_src_ddx: + /* @TODO: not yet implemented */ + break; + + case nir_tex_src_ddy: + /* @TODO: not yet implemented */ + break; + + case nir_tex_src_lod: + /* @TODO: not yet implemented */ + break; + + case nir_tex_src_ms_index: + /* @TODO: not yet implemented */ + break; + + case nir_tex_src_offset: + /* @TODO: not yet implemented */ + break; + + case nir_tex_src_sampler_offset: + /* @TODO: not yet implemented */ + break; + + case nir_tex_src_projector: + unreachable("Should be lowered by do_lower_texture_projection"); + + case nir_tex_src_bias: + unreachable("LOD bias is not valid for vertex shaders.\n"); + + default: + unreachable("unknown texture source"); + } + } + + /* Get the texture operation opcode */ + enum
Re: [Mesa-dev] [PATCH 61/78] i965/nir/vec4: Add skeleton implementation of nir_emit_texture()
On Jul 13, 2015 7:14 AM, "Alejandro Piñeiro" wrote: > > > > On 13/07/15 14:05, Jason Ekstrand wrote: > > On Mon, Jul 13, 2015 at 3:00 AM, Alejandro Piñeiro wrote: > >> On 01/07/15 01:37, Jason Ekstrand wrote: > >>> If we can avoid duplication in the texturing code, that would be > >>> really nice. Could we do this as a refactor of the old code and then > >>> a much smaller NIR function that calls some shared code? That's what > >>> we did for FS and it worked ok. I looked at the layout of your code > >>> and, after you finish reading instruction inputs, very little of it is > >>> actually NIR-specific. > >> Our wip second batch of commits has now a version that refactors the old > >> code. But I have a doubt. As part of this refactor, I created a new > >> method to be used by brw_fs and brw_vec4, that returns the ir texture > >> opcode based on the nir texture opcode. You can take a look to that > >> method here [1]. The issue with this method is that ir_texture_opcode is > >> defined at glsl/ir.h inside a #ifdef __cplusplus. So I needed to rename > >> brw_nir.c to brw_nir.cpp. On a previous early review [2] you mentioned > >> that it would be better to keep it as brw_nir.c unless we have a good > >> reason. Is using this enum good reason to do the renaming? If not, the > >> other option I see is modify glsl/ir.h and move the enum definitions to > >> the top, outside the ifdef __cplusplus, as other headers (like > >> glsl/glsl_types.h) have. > > Go ahead and make it a cpp file if you have to. > > Ok, thanks. > > > Another option would > > be to do the map the other way: ir_opcode -> nir_tex_op and refactor > > the code to use the nir_tex_op. > > But even if we do the map in the other way, the helper function would be > using the enum type defined at glsl/ir.h, so we would need to make it a > cpp file in any case. So for now, I will keep things simple, and keep > the current mapping, and do the change later. The difference is that, with the patch I sent a few hours ago, you only need to do the conversion one place so there's no need for the helper. > > That's probably what we should be > > doing in the FS now that the old visitor is gone. > > --Jason > > > >> BR > >> > >> [1] > >> https://github.com/Igalia/mesa/commit/e81ce150ef931e50b6cb1aae0b42a79f448863f5 > >> [2] https://bugs.freedesktop.org/show_bug.cgi?id=89580#c7 > >>> --Jason > >>> > >>> On Fri, Jun 26, 2015 at 1:07 AM, Eduardo Lima Mitev wrote: > From: Alejandro Piñeiro > > This sets up the basic structure and placeholders for the implementation of > texture operations, which will be added incrementally in subsequent patches. > > Apart from helping split the texture support into smaller patches, this patch > is useful to get a simplified picture of the steps involved in the emission of > texture operations. > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89580 > --- > src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 120 - > 1 file changed, 119 insertions(+), 1 deletion(-) > > diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp > index 08a70b0..c3638a0 100644 > --- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp > +++ b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp > @@ -1412,7 +1412,125 @@ vec4_visitor::nir_swizzle_result(nir_tex_instr *instr, dst_reg dest, > void > vec4_visitor::nir_emit_texture(nir_tex_instr *instr) > { > - /* @TODO: Not yet implemented */ > + unsigned sampler = instr->sampler_index; > + src_reg sampler_reg = src_reg(sampler); > + > + const glsl_type *dest_type; > + > + /* Load the texture operation sources */ > + for (unsigned i = 0; i < instr->num_srcs; i++) { > + src_reg src = get_nir_src(instr->src[i].src); > + > + switch (instr->src[i].src_type) { > + case nir_tex_src_comparitor: > + /* @TODO: not yet implemented */ > + break; > + > + case nir_tex_src_coord: > + /* @TODO: not yet implemented */ > + break; > + > + case nir_tex_src_ddx: > + /* @TODO: not yet implemented */ > + break; > + > + case nir_tex_src_ddy: > + /* @TODO: not yet implemented */ > + break; > + > + case nir_tex_src_lod: > + /* @TODO: not yet implemented */ > + break; > + > + case nir_tex_src_ms_index: > + /* @TODO: not yet implemented */ > + break; > + > + case nir_tex_src_offset: > + /* @TODO: not yet implemented */ > + break; > + > + case nir_tex_src_sampler_offset: > + /* @TODO: not yet implemented */ > + break; > + > + case nir_tex_src_projector:
Re: [Mesa-dev] [PATCH] i965/fs: Make the texturing helpers take NIR opcodes instead of old IR ones
On Jul 13, 2015 6:04 AM, "Francisco Jerez" wrote: > > Jason Ekstrand writes: > > > Now that the old GLSL IR visitor code is gone, having the remap is silly. > > --- > > src/mesa/drivers/dri/i965/brw_fs.h | 12 +-- > > src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 18 +--- > > src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 135 ++- > > 3 files changed, 75 insertions(+), 90 deletions(-) > > > > diff --git a/src/mesa/drivers/dri/i965/brw_fs.h b/src/mesa/drivers/dri/i965/brw_fs.h > > index 5243079..1690f4a 100644 > > --- a/src/mesa/drivers/dri/i965/brw_fs.h > > +++ b/src/mesa/drivers/dri/i965/brw_fs.h > > @@ -109,7 +109,7 @@ public: > > void compute_clip_distance(gl_clip_plane *clip_planes); > > > > uint32_t gather_channel(int orig_chan, uint32_t sampler); > > - void swizzle_result(ir_texture_opcode op, int dest_components, > > + void swizzle_result(nir_texop op, int dest_components, > > fs_reg orig_val, uint32_t sampler); > > > > int type_size(const struct glsl_type *type); > > @@ -207,28 +207,28 @@ public: > > void compute_sample_position(fs_reg dst, fs_reg int_sample_pos); > > fs_reg rescale_texcoord(fs_reg coordinate, int coord_components, > > bool is_rect, uint32_t sampler, int texunit); > > - fs_inst *emit_texture_gen4(ir_texture_opcode op, fs_reg dst, > > + fs_inst *emit_texture_gen4(nir_texop op, fs_reg dst, > >fs_reg coordinate, int coord_components, > >fs_reg shadow_comp, > >fs_reg lod, fs_reg lod2, int grad_components, > >uint32_t sampler); > > - fs_inst *emit_texture_gen4_simd16(ir_texture_opcode op, fs_reg dst, > > + fs_inst *emit_texture_gen4_simd16(nir_texop op, fs_reg dst, > > fs_reg coordinate, int vector_elements, > > fs_reg shadow_c, fs_reg lod, > > uint32_t sampler); > > - fs_inst *emit_texture_gen5(ir_texture_opcode op, fs_reg dst, > > + fs_inst *emit_texture_gen5(nir_texop op, fs_reg dst, > >fs_reg coordinate, int coord_components, > >fs_reg shadow_comp, > >fs_reg lod, fs_reg lod2, int grad_components, > >fs_reg sample_index, uint32_t sampler, > >bool has_offset); > > - fs_inst *emit_texture_gen7(ir_texture_opcode op, fs_reg dst, > > + fs_inst *emit_texture_gen7(nir_texop op, fs_reg dst, > >fs_reg coordinate, int coord_components, > >fs_reg shadow_comp, > >fs_reg lod, fs_reg lod2, int grad_components, > >fs_reg sample_index, fs_reg mcs, fs_reg sampler, > >fs_reg offset_value); > > - void emit_texture(ir_texture_opcode op, > > + void emit_texture(nir_texop op, > > const glsl_type *dest_type, > > fs_reg coordinate, int components, > > fs_reg shadow_c, > > Hold on, I'm about to send a series that gets rid of the > ir_texture_opcode argument of these gen-specific functions and replaces > it with a backend_instruction opcode. If you wait a little this change > will involve much less churn. Fine with me. > > diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp > > index caf1300..d8a6f3c 100644 > > --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp > > +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp > > @@ -1739,23 +1739,7 @@ fs_visitor::nir_emit_texture(const fs_builder &bld, nir_tex_instr *instr) > >glsl_type::get_instance(dest_base_type, nir_tex_instr_dest_size(instr), > >1); > > > > - ir_texture_opcode op; > > - switch (instr->op) { > > - case nir_texop_lod: op = ir_lod; break; > > - case nir_texop_query_levels: op = ir_query_levels; break; > > - case nir_texop_tex: op = ir_tex; break; > > - case nir_texop_tg4: op = ir_tg4; break; > > - case nir_texop_txb: op = ir_txb; break; > > - case nir_texop_txd: op = ir_txd; break; > > - case nir_texop_txf: op = ir_txf; break; > > - case nir_texop_txf_ms: op = ir_txf_ms; break; > > - case nir_texop_txl: op = ir_txl; break; > > - case nir_texop_txs: op = ir_txs; break; > > - default: > > - unreachable("unknown texture opcode"); > > - } > > - > > - emit_texture(op, dest_type, coordinate, instr->coord_components, > > + emit_texture(instr->op, dest_type, coordinate, instr->coord_components, > > shadow_comparitor, lod, lod2, lod_components, sample_index, > > tex_offset, mcs, gather_component, > > is_cube_array, is_rect, sampler, sampler_reg,
Re: [Mesa-dev] [RFC] gallium: add interface for writable shader images
On 09/07/15 22:05, Marek Olšák wrote: I'd like to discuss one more thing that will affect whether image slots will be global (shared by all shaders) or not. Which image unit an image uniform uses is not a compile-time thing, but it's specified later using glUniform1i. That means we need a per-shader table that maps image uniforms to global image units. One possible solution is to add this pipe_context function: void (*set_shader_image_mapping)( struct pipe_context *, unsigned shader, unsigned start_decl_index, unsigned count, unsigned *decl_to_slot_mapping); This is only required if the shader image slots are global and not per-shader. (if they are per-shader, st/mesa can reorder the slots for each shader independently just like it already does for textures and UBOs) Shader storage buffer objects suffer from the same issue. Atomic counters don't. Therefore, image slots must be per-shader (like sampler slots) to avoid this craziness and keep things simple. Sounds OK to me too. D3D11's UAV binding points are global, but that can be easily accomodated by binding the same UAVs array on all stages. Jose ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 61/78] i965/nir/vec4: Add skeleton implementation of nir_emit_texture()
On Jul 13, 2015 7:17 AM, "Jason Ekstrand" wrote: > > > On Jul 13, 2015 7:14 AM, "Alejandro Piñeiro" wrote: > > > > > > > > On 13/07/15 14:05, Jason Ekstrand wrote: > > > On Mon, Jul 13, 2015 at 3:00 AM, Alejandro Piñeiro < apinhe...@igalia.com> wrote: > > >> On 01/07/15 01:37, Jason Ekstrand wrote: > > >>> If we can avoid duplication in the texturing code, that would be > > >>> really nice. Could we do this as a refactor of the old code and then > > >>> a much smaller NIR function that calls some shared code? That's what > > >>> we did for FS and it worked ok. I looked at the layout of your code > > >>> and, after you finish reading instruction inputs, very little of it is > > >>> actually NIR-specific. > > >> Our wip second batch of commits has now a version that refactors the old > > >> code. But I have a doubt. As part of this refactor, I created a new > > >> method to be used by brw_fs and brw_vec4, that returns the ir texture > > >> opcode based on the nir texture opcode. You can take a look to that > > >> method here [1]. The issue with this method is that ir_texture_opcode is > > >> defined at glsl/ir.h inside a #ifdef __cplusplus. So I needed to rename > > >> brw_nir.c to brw_nir.cpp. On a previous early review [2] you mentioned > > >> that it would be better to keep it as brw_nir.c unless we have a good > > >> reason. Is using this enum good reason to do the renaming? If not, the > > >> other option I see is modify glsl/ir.h and move the enum definitions to > > >> the top, outside the ifdef __cplusplus, as other headers (like > > >> glsl/glsl_types.h) have. > > > Go ahead and make it a cpp file if you have to. > > > > Ok, thanks. > > > > > Another option would > > > be to do the map the other way: ir_opcode -> nir_tex_op and refactor > > > the code to use the nir_tex_op. > > > > But even if we do the map in the other way, the helper function would be > > using the enum type defined at glsl/ir.h, so we would need to make it a > > cpp file in any case. So for now, I will keep things simple, and keep > > the current mapping, and do the change later. > > The difference is that, with the patch I sent a few hours ago, you only need to do the conversion one place so there's no need for the helper. For whatever it's worth I'm also OK with just duplicating the 10/lines of code to do the conversion and not bothering with the helper. > > > That's probably what we should be > > > doing in the FS now that the old visitor is gone. > > > --Jason > > > > > >> BR > > >> > > >> [1] > > >> https://github.com/Igalia/mesa/commit/e81ce150ef931e50b6cb1aae0b42a79f448863f5 > > >> [2] https://bugs.freedesktop.org/show_bug.cgi?id=89580#c7 > > >>> --Jason > > >>> > > >>> On Fri, Jun 26, 2015 at 1:07 AM, Eduardo Lima Mitev < el...@igalia.com> wrote: > > From: Alejandro Piñeiro > > > > This sets up the basic structure and placeholders for the implementation of > > texture operations, which will be added incrementally in subsequent patches. > > > > Apart from helping split the texture support into smaller patches, this patch > > is useful to get a simplified picture of the steps involved in the emission of > > texture operations. > > > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89580 > > --- > > src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 120 - > > 1 file changed, 119 insertions(+), 1 deletion(-) > > > > diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp > > index 08a70b0..c3638a0 100644 > > --- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp > > +++ b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp > > @@ -1412,7 +1412,125 @@ vec4_visitor::nir_swizzle_result(nir_tex_instr *instr, dst_reg dest, > > void > > vec4_visitor::nir_emit_texture(nir_tex_instr *instr) > > { > > - /* @TODO: Not yet implemented */ > > + unsigned sampler = instr->sampler_index; > > + src_reg sampler_reg = src_reg(sampler); > > + > > + const glsl_type *dest_type; > > + > > + /* Load the texture operation sources */ > > + for (unsigned i = 0; i < instr->num_srcs; i++) { > > + src_reg src = get_nir_src(instr->src[i].src); > > + > > + switch (instr->src[i].src_type) { > > + case nir_tex_src_comparitor: > > + /* @TODO: not yet implemented */ > > + break; > > + > > + case nir_tex_src_coord: > > + /* @TODO: not yet implemented */ > > + break; > > + > > + case nir_tex_src_ddx: > > + /* @TODO: not yet implemented */ > > + break; > > + > > + case nir_tex_src_ddy: > > + /* @TODO: not yet implemented */ > > + break; > > + > > + case nir_tex_src_lod: > > + /* @TODO: not yet implemented */ > > +
Re: [Mesa-dev] [PATCH 3/4] radeon, r200: allow hyperz for radeon DRM module v2
On Sat, Jul 11, 2015 at 8:48 AM, Roland Scheidegger wrote: > Am 11.07.2015 um 11:25 schrieb Marek Olšák: >> On Fri, Jul 10, 2015 at 11:02 PM, Roland Scheidegger >> wrote: >>> Am 10.07.2015 um 19:41 schrieb Emil Velikov: On 10 July 2015 at 13:18, Roland Scheidegger wrote: > Am 10.07.2015 um 05:44 schrieb Michel Dänzer: >> On 10.07.2015 05:13, Emil Velikov wrote: >>> The original code only half considered hyperz as an option. As per >>> previous commit "major != 2 cannot occur" we can simply things, and >>> allow users to set the option if they choose to do so. >>> >>> Signed-off-by: Emil Velikov >>> --- >>> src/mesa/drivers/dri/r200/r200_context.c | 10 ++ >>> src/mesa/drivers/dri/radeon/radeon_context.c | 9 ++--- >>> 2 files changed, 4 insertions(+), 15 deletions(-) >>> >>> diff --git a/src/mesa/drivers/dri/r200/r200_context.c >>> b/src/mesa/drivers/dri/r200/r200_context.c >>> index 40cc50a..2a42ab3 100644 >>> --- a/src/mesa/drivers/dri/r200/r200_context.c >>> +++ b/src/mesa/drivers/dri/r200/r200_context.c >>> @@ -225,14 +225,8 @@ GLboolean r200CreateContext( gl_api api, >>> rmesa->radeon.initialMaxAnisotropy = >>> driQueryOptionf(&rmesa->radeon.optionCache, >>> >>> "def_max_anisotropy"); >>> >>> - if ( sPriv->drm_version.major == 1 >>> - && driQueryOptionb( &rmesa->radeon.optionCache, "hyperz" ) ) { >>> - if ( sPriv->drm_version.minor < 13 ) >>> - fprintf( stderr, "DRM version 1.%d too old to support HyperZ, " >>> - "disabling.\n", sPriv->drm_version.minor ); >>> - else >>> - rmesa->using_hyperz = GL_TRUE; >>> - } >> >> This code only set rmesa->using_hyperz = GL_TRUE if >> sPriv->drm_version.major == 1. It was disabled for KMS in commit >> e541845959761e9f47d14ade6b58a32db04ef7e4 ("r200: Fix piglit paths >> test."). >> >> >>> + if (driQueryOptionb( &rmesa->radeon.optionCache, "hyperz")) >>> + rmesa->using_hyperz = GL_TRUE; >> >> This enables it again for KMS. Maybe that's okay though, especially if >> the driconf option is disabled by default. > > > Oh you're right. The reason given though why it was disabled looks bogus > to me ("Piglit doesn't like HyperZ warning so disable it for kms." ???), > and I can't see why that would have only applied to r200, not r100. So > it should be fine. (Of course, you will get more failures with that > enabled with piglit, some things just plain won't work, but that was > just the case with UMS too, and the reason why it never was enabled by > default.) > Yes without Roland's knowledge if hyperz is supposed to work for KMS the current code is quite ambiguous. If you guys prefer I can simply rip out the whole thing, then again hyperz is disabled by default so no harm should follow with this patch. I don't mind either way. Emil >>> >>> I'd say keep the option (for both drivers) for now. >>> I've got some r200 in a dust bin actually, haven't touched it in years... >>> I think it should work in the same way as it does on r100. >>> The depth buffer metadata (i.e. the bits saying if a tile is compressed >>> or not etc.) is really fixed onchip cache, and there's no attempt to >>> grab or restore that data when the depth buffer is changed, which >>> obviously isn't quite right... The other limitation is that you cannot >>> read or write the depth buffer directly (well you can but you get back >>> garbage - not being able to do a glReadPixel on the depth buffer alone >>> will cause lots of piglit failures). UMS could also do fast z clear, >>> this is pretty simple as the command would just set the on-chip tile >>> bits to the "cleared" state, but this never made it to KMS - the rest of >>> hyperz should work without this, however, though I'm not entirely sure. >>> IIRC the r300 has actually pretty much the same hw limitations, however >>> the driver there fixes these issues (though I think that chip had some >>> nicer way of fast z clear). >> >> Not really, there is just a packet to clear the on-chip tile bits and >> we always clear the whole surface. > Yes but for r100/r200 I don't know if such a packet exists. The kernel > did all the effort for figuring out the actual on-chip addresses to > clear (of course, with the shared z buffer of late, that way actually > multiple apps ran with hyperz enabled in parallel correctly, as long as > they were placed "far enough apart" on the screen (so the tiles wouldn't > intersect). > It is possible though such a packet exists on r100/r200 too, and we just > never figured it out :-). If there wasn't a special packet, I think you could just write to some registers to get the same effect. Alex > > Roland > >> >> For reading back zbuffers, a full-sc
Re: [Mesa-dev] [PATCH] nv50: limit the maximum number of samplers to 16
Reviewed-by: Ilia Mirkin On Mon, Jul 13, 2015 at 5:24 AM, Samuel Pitoiset wrote: > NV50_3D_BIND_TSC only allows to bind 16 samplers, and since we don't > want to do anything with NV50_3D_BIND_TSC2, just limit the maximum > number of samplers to 16 like for nvc0. > > This fixes dmesg fails with the following piglit test: > max-samplers > > But the test still fails. > > Signed-off-by: Samuel Pitoiset > --- > src/gallium/drivers/nouveau/nv50/nv50_screen.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/src/gallium/drivers/nouveau/nv50/nv50_screen.c > b/src/gallium/drivers/nouveau/nv50/nv50_screen.c > index 6583a35..46ae0b8 100644 > --- a/src/gallium/drivers/nouveau/nv50/nv50_screen.c > +++ b/src/gallium/drivers/nouveau/nv50/nv50_screen.c > @@ -286,7 +286,7 @@ nv50_screen_get_shader_param(struct pipe_screen *pscreen, > unsigned shader, > case PIPE_SHADER_CAP_MAX_TEXTURE_SAMPLERS: >/* The chip could handle more sampler views than samplers */ > case PIPE_SHADER_CAP_MAX_SAMPLER_VIEWS: > - return MIN2(32, PIPE_MAX_SAMPLERS); > + return MIN2(16, PIPE_MAX_SAMPLERS); > case PIPE_SHADER_CAP_DOUBLES: > case PIPE_SHADER_CAP_TGSI_DROUND_SUPPORTED: > case PIPE_SHADER_CAP_TGSI_DFRACEXP_DLDEXP_SUPPORTED: > -- > 2.4.5 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/2] nv50: add nesting support for occlusion queries
This doesn't fix up nv50_query_result. And there's also all the rotate logic in nvc0 which seems to think it's required. Quite frankly I never fully grokked all that stuff, but it seems highly relevant to this. On Mon, Jul 13, 2015 at 4:49 AM, Samuel Pitoiset wrote: > This is loosely based on nvc0. > > Signed-off-by: Samuel Pitoiset > --- > src/gallium/drivers/nouveau/nv50/nv50_query.c | 27 > -- > src/gallium/drivers/nouveau/nv50/nv50_screen.h | 2 ++ > 2 files changed, 19 insertions(+), 10 deletions(-) > > diff --git a/src/gallium/drivers/nouveau/nv50/nv50_query.c > b/src/gallium/drivers/nouveau/nv50/nv50_query.c > index 81f7474..80d3fd2 100644 > --- a/src/gallium/drivers/nouveau/nv50/nv50_query.c > +++ b/src/gallium/drivers/nouveau/nv50/nv50_query.c > @@ -49,6 +49,7 @@ struct nv50_query { > uint32_t offset; /* base + i * 32 */ > uint8_t state; > boolean is64bit; > + int nesting; /* only used for occlusion queries */ > struct nouveau_mm_allocation *mm; > struct nouveau_fence *fence; > }; > @@ -175,11 +176,16 @@ nv50_query_begin(struct pipe_context *pipe, struct > pipe_query *pq) > > switch (q->type) { > case PIPE_QUERY_OCCLUSION_COUNTER: > - PUSH_SPACE(push, 4); > - BEGIN_NV04(push, NV50_3D(COUNTER_RESET), 1); > - PUSH_DATA (push, NV50_3D_COUNTER_RESET_SAMPLECNT); > - BEGIN_NV04(push, NV50_3D(SAMPLECNT_ENABLE), 1); > - PUSH_DATA (push, 1); > + q->nesting = nv50->screen->num_occlusion_queries_active++; > + if (q->nesting) { > + nv50_query_get(push, q, 0x10, 0x0100f002); > + } else { > + PUSH_SPACE(push, 4); > + BEGIN_NV04(push, NV50_3D(COUNTER_RESET), 1); > + PUSH_DATA (push, NV50_3D_COUNTER_RESET_SAMPLECNT); > + BEGIN_NV04(push, NV50_3D(SAMPLECNT_ENABLE), 1); > + PUSH_DATA (push, 1); > + } >break; > case PIPE_QUERY_PRIMITIVES_GENERATED: >nv50_query_get(push, q, 0x10, 0x06805002); > @@ -223,9 +229,11 @@ nv50_query_end(struct pipe_context *pipe, struct > pipe_query *pq) > switch (q->type) { > case PIPE_QUERY_OCCLUSION_COUNTER: >nv50_query_get(push, q, 0, 0x0100f002); > - PUSH_SPACE(push, 2); > - BEGIN_NV04(push, NV50_3D(SAMPLECNT_ENABLE), 1); > - PUSH_DATA (push, 0); > + if (--nv50->screen->num_occlusion_queries_active == 0) { > + PUSH_SPACE(push, 2); > + BEGIN_NV04(push, NV50_3D(SAMPLECNT_ENABLE), 1); > + PUSH_DATA (push, 0); > + } >break; > case PIPE_QUERY_PRIMITIVES_GENERATED: >nv50_query_get(push, q, 0, 0x06805002); > @@ -396,8 +404,7 @@ nv50_render_condition(struct pipe_context *pipe, >case PIPE_QUERY_OCCLUSION_COUNTER: >case PIPE_QUERY_OCCLUSION_PREDICATE: > if (likely(!condition)) { > -/* XXX: Placeholder, handle nesting here if available */ > -if (unlikely(false)) > +if (unlikely(q->nesting)) > cond = wait ? NV50_3D_COND_MODE_NOT_EQUAL : > NV50_3D_COND_MODE_ALWAYS; > else > diff --git a/src/gallium/drivers/nouveau/nv50/nv50_screen.h > b/src/gallium/drivers/nouveau/nv50/nv50_screen.h > index 881051b..3a12a1f 100644 > --- a/src/gallium/drivers/nouveau/nv50/nv50_screen.h > +++ b/src/gallium/drivers/nouveau/nv50/nv50_screen.h > @@ -54,6 +54,8 @@ struct nv50_screen { > struct nv50_context *cur_ctx; > struct nv50_graph_state save_state; > > + int num_occlusion_queries_active; > + > struct nouveau_bo *code; > struct nouveau_bo *uniforms; > struct nouveau_bo *txc; /* TIC (offset 0) and TSC (65536) */ > -- > 2.4.5 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/5] applegl: Provide requirements of _SET_DrawBuffers
On 18 June 2015 at 06:53, Julien Isorce wrote: > From: Jon TURNEY > > _SET_DrawBuffers requires driDispatchRemapTable, > so we need to link with libmesa for remap.c. > libmesa requires the C++ linker. > > Also need to arrange to call _mesa_init_remap_table() > to initialize the remap table. > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90311 > Signed-off-by: Jon TURNEY > --- > src/glx/Makefile.am | 5 - > src/glx/apple/apple_glapi.c | 3 +++ > 2 files changed, 7 insertions(+), 1 deletion(-) > > diff --git a/src/glx/Makefile.am b/src/glx/Makefile.am > index 6e50e09..de24496 100644 > --- a/src/glx/Makefile.am > +++ b/src/glx/Makefile.am > @@ -139,7 +139,10 @@ libglx_la_SOURCES += \ > applegl_glx.c > > SUBDIRS += apple > -libglx_la_LIBADD += $(builddir)/apple/libappleglx.la > +libglx_la_LIBADD += \ > + $(builddir)/apple/libappleglx.la \ > + $(top_builddir)/src/mesa/libmesa.la > +nodist_EXTRA_lib@GL_LIB@_la_SOURCES = dummy.cpp > endif > > GL_LIBS = \ > diff --git a/src/glx/apple/apple_glapi.c b/src/glx/apple/apple_glapi.c > index 4d19f7f..849044b 100644 > --- a/src/glx/apple/apple_glapi.c > +++ b/src/glx/apple/apple_glapi.c > @@ -39,6 +39,7 @@ > #include > > #include "main/glheader.h" > +#include "main/remap.h" > #include "glapi.h" > #include "glapitable.h" > #include "main/dispatch.h" > @@ -54,6 +55,8 @@ static void _apple_glapi_create_table(void) { > if (__applegl_api) > return; > > +_mesa_init_remap_table(); > + > __ogl_framework_api = > _glapi_create_table_from_handle(apple_cgl_get_dl_handle(), "gl"); > assert(__ogl_framework_api); > So as mentioned previously glx (libGL) should interact with the dispatch directly, rather than have any knowledge in the implementation detail (which lies in mesa). src/glx/indirect_glx.c is a nice example, where __glXNewIndirectAPI (autogenerated from mapi/glapi/gen/glX_proto_send.py) is used to create the dispatch table and then set it with _glapi_set_dispatch(). AppleDRI uses similar approach _glapi_create_table_from_handle(generated from mapi/glapi/gen/gl_gentable.py), but there are things that could be improved/fixed. - __ogl_framework_api does not need to be over a thousand entries long. There are six overrides + __ogl_framework_api->Scissor used. So tweaking the generator to directly produce __applegl_api and(?) a slimmed down __ogl_framework_api sounds like a good idea to me. Further on one could also: - Drop the multiple forward declarations of __ogl_framework_api. - Wrap/implement appledri around __GLXDRI(display|screen) shoving storing things like dl_handle, thus allowing things to be torn down on exit. Namely I'm suggesting to cleanup and test Jon's earlier work. - apple_glapi_set_dispatch is called "too late" - at applegl_bind_context. According to the docs/spec one should be able to fetch the function pointers without any context let alone a currently bound one. - As a follow up from the above struct glx_context_vtable::get_proc_address could be nuked, making the struct fit nicely into 32/64 byte cache :-) Obviously if you can find an alternative way (but not as hacky as this one) that'll be great. Cheers, Emil P.S. Why is there no appledriproto package, similar to xf86driproto and friends ? Having appledri{,str}.h duplicated seems fragile and ill-advised. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/4] r200: fix fbo rendering by disabling optimized texture format chooser
Hi Roland, Can I put forward a couple of trivial requrest On 11 July 2015 at 19:39, wrote: > From: Roland Scheidegger > > It is rather unfortunate that we don't know if a texture is going to be used > as a rt later, and we lack the means to do something about a format chosen > which we can't render to directly, so disable this and always chose renderable > format for rgba8 textures. > This addresses an issue raised on (old) bug 51658 Please use the full bugzilla link. > with gnome-shell, don't know > if that's still applicable but it might fix other things as well. > --- > src/mesa/drivers/dri/radeon/radeon_texture.c | 14 +- > 1 file changed, 13 insertions(+), 1 deletion(-) > > diff --git a/src/mesa/drivers/dri/radeon/radeon_texture.c > b/src/mesa/drivers/dri/radeon/radeon_texture.c > index edfd48b..95496b1 100644 > --- a/src/mesa/drivers/dri/radeon/radeon_texture.c > +++ b/src/mesa/drivers/dri/radeon/radeon_texture.c > @@ -224,7 +224,19 @@ static mesa_format > radeonChooseTexFormat(radeonContextPtr rmesa, > const GLuint ui = 1; > const GLubyte littleEndian = *((const GLubyte *)&ui); > > - if (fbo) > + > + /* Unfortunately, regardless the fbo flag, we might still be asked to > +* attach a texture to a fbo later, which then won't succeed if we > chose > +* one which isn't renderable. And unlike more exotic formats, apps > aren't > +* really prepared for the incomplete framebuffer this results in > (they'd > +* have to retry with same internalFormat even, just different > +* srcFormat/srcType, which can't really be expected anyway). > +* Ideally, we'd defer format selection until later (if the texture is > +* used as a rt it's likely there's never data uploaded to it before > attached > +* to a fbo), but this isn't really possible, so for now just always > use > +* a renderable format. > +*/ > + if (1||fbo) Some spaces around || will greatly improve readability. Cheers Emil ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 3/4] radeon: fix some potential big endian issues
On 11 July 2015 at 19:39, wrote: > @@ -186,17 +172,24 @@ static inline void emit_cb_setup(struct r100_context > *r100, > > /* XXX others? BE/LE? */ Drop the BE/LE part of the comment ? > switch (mesa_format) { > +/* le */ > case MESA_FORMAT_B8G8R8A8_UNORM: > case MESA_FORMAT_B8G8R8X8_UNORM: > +/* be */ > +case MESA_FORMAT_A8R8G8B8_UNORM: > +case MESA_FORMAT_X8R8G8B8_UNORM: Something like the following might be easier for you/others X months down the line. +/* The former format of each pair is for LE while latter for BE systems. */ switch (mesa_format) { case MESA_FORMAT_B8G8R8A8_UNORM: +case MESA_FORMAT_A8R8G8B8_UNORM: case MESA_FORMAT_B8G8R8X8_UNORM: +case MESA_FORMAT_X8R8G8B8_UNORM: Same suggestion applies for 4/4. -Emil ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] nv50: limit the maximum number of samplers to 16
On 13 July 2015 at 10:24, Samuel Pitoiset wrote: > NV50_3D_BIND_TSC only allows to bind 16 samplers, and since we don't > want to do anything with NV50_3D_BIND_TSC2 Is there some widely know issue with it ? mesa & envytools does not mention anything (based of a quick scan). Thanks Emil ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 61/78] i965/nir/vec4: Add skeleton implementation of nir_emit_texture()
On 13/07/15 16:26, Jason Ekstrand wrote: > > > > > > > Go ahead and make it a cpp file if you have to. > > > > > > Ok, thanks. > > > > > > > Another option would > > > > be to do the map the other way: ir_opcode -> nir_tex_op and refactor > > > > the code to use the nir_tex_op. > > > > > > But even if we do the map in the other way, the helper function > would be > > > using the enum type defined at glsl/ir.h, so we would need to make > it a > > > cpp file in any case. So for now, I will keep things simple, and keep > > > the current mapping, and do the change later. > > > > The difference is that, with the patch I sent a few hours ago, you > only need to do the conversion one place so there's no need for the > helper. > > For whatever it's worth I'm also OK with just duplicating the 10/lines > of code to do the conversion and not bothering with the helper. > I just realized that today Francisco Jerez mentioned that is working to remove the ir->nir texture op mapping on FS, replacing it with backend instruction opcode: http://lists.freedesktop.org/archives/mesa-dev/2015-July/088696.html So probably it would be better to wait for his change, and avoid the need of that mapping on the vec4 path too. That would remove the need of the helper function and the cpp renaming. That sounds ok or should we go on independently of Francisco's work? BR -- Alejandro Piñeiro (apinhe...@igalia.com) ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] nv50: limit the maximum number of samplers to 16
On Mon, Jul 13, 2015 at 12:08 PM, Emil Velikov wrote: > On 13 July 2015 at 10:24, Samuel Pitoiset wrote: >> NV50_3D_BIND_TSC only allows to bind 16 samplers, and since we don't >> want to do anything with NV50_3D_BIND_TSC2 > Is there some widely know issue with it ? mesa & envytools does not > mention anything (based of a quick scan). It's for an extra 2 system samplers (not sure what for, really). I've never seen the blob touch them. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] nv50: limit the maximum number of samplers to 16
On 13/07/2015 18:08, Emil Velikov wrote: Is there some widely know issue with it ? mesa & envytools does not mention anything (based of a quick scan). There is no know issue with NV50_3D_BIND_TSC2, but Ilia doesn't seem to want to use it. :-) That's why I followed his suggestion by reducing the max. number of samplers to 16 like for nvc0. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v2] i965/fs: Don't use the pixel interpolater for centroid interpolation
Oh, never mind - I see there's another hunk that my mailer had folded away for some reason. I'm happy that it's correct now :) On Jul 13, 2015 23:33, "Neil Roberts" wrote: > Chris Forbes writes: > > > Nitpicks aside, I don't think this is a great idea now that you've got > > the SKL PI working. > > Can you explain why you don't think this is a good idea? Is it because > it is an optimisation for something that is not known to be a big use > case so carrying around the extra code just adds unnecessary maintenance > burden? I could agree with that so I'm happy to abandon the patch for > now if that's the general consensus. > > > I also think it's broken -- you need to arrange to have the centroid > > barycentric coords delivered to the FS thread, which won't be > > happening if this is the *only* use of them. Masked in the tests, > > because they compare with a centroid-qualified input. [I'm assuming > > you don't always get these delivered to the FS in SKL, but no docs > > access...] > > The changes to brw_compute_barycentric_interp_modes in the patch ensure > that the centroid barycentric coords are delivered whenever > interpolateAtCentroid is used in a shader. I don't think this is a > problem. At least it seems to work in a simple test without using a > separate varying with the centroid qualifier. > > Regards, > - Neil > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/2] nv50: add nesting support for occlusion queries
This is loosely based on nvc0. Signed-off-by: Samuel Pitoiset --- src/gallium/drivers/nouveau/nv50/nv50_query.c | 29 -- src/gallium/drivers/nouveau/nv50/nv50_screen.h | 2 ++ 2 files changed, 20 insertions(+), 11 deletions(-) diff --git a/src/gallium/drivers/nouveau/nv50/nv50_query.c b/src/gallium/drivers/nouveau/nv50/nv50_query.c index 81f7474..a5b95c1 100644 --- a/src/gallium/drivers/nouveau/nv50/nv50_query.c +++ b/src/gallium/drivers/nouveau/nv50/nv50_query.c @@ -49,6 +49,7 @@ struct nv50_query { uint32_t offset; /* base + i * 32 */ uint8_t state; boolean is64bit; + int nesting; /* only used for occlusion queries */ struct nouveau_mm_allocation *mm; struct nouveau_fence *fence; }; @@ -175,11 +176,16 @@ nv50_query_begin(struct pipe_context *pipe, struct pipe_query *pq) switch (q->type) { case PIPE_QUERY_OCCLUSION_COUNTER: - PUSH_SPACE(push, 4); - BEGIN_NV04(push, NV50_3D(COUNTER_RESET), 1); - PUSH_DATA (push, NV50_3D_COUNTER_RESET_SAMPLECNT); - BEGIN_NV04(push, NV50_3D(SAMPLECNT_ENABLE), 1); - PUSH_DATA (push, 1); + q->nesting = nv50->screen->num_occlusion_queries_active++; + if (q->nesting) { + nv50_query_get(push, q, 0x10, 0x0100f002); + } else { + PUSH_SPACE(push, 4); + BEGIN_NV04(push, NV50_3D(COUNTER_RESET), 1); + PUSH_DATA (push, NV50_3D_COUNTER_RESET_SAMPLECNT); + BEGIN_NV04(push, NV50_3D(SAMPLECNT_ENABLE), 1); + PUSH_DATA (push, 1); + } break; case PIPE_QUERY_PRIMITIVES_GENERATED: nv50_query_get(push, q, 0x10, 0x06805002); @@ -223,9 +229,11 @@ nv50_query_end(struct pipe_context *pipe, struct pipe_query *pq) switch (q->type) { case PIPE_QUERY_OCCLUSION_COUNTER: nv50_query_get(push, q, 0, 0x0100f002); - PUSH_SPACE(push, 2); - BEGIN_NV04(push, NV50_3D(SAMPLECNT_ENABLE), 1); - PUSH_DATA (push, 0); + if (--nv50->screen->num_occlusion_queries_active == 0) { + PUSH_SPACE(push, 2); + BEGIN_NV04(push, NV50_3D(SAMPLECNT_ENABLE), 1); + PUSH_DATA (push, 0); + } break; case PIPE_QUERY_PRIMITIVES_GENERATED: nv50_query_get(push, q, 0, 0x06805002); @@ -319,7 +327,7 @@ nv50_query_result(struct pipe_context *pipe, struct pipe_query *pq, res8[0] = TRUE; break; case PIPE_QUERY_OCCLUSION_COUNTER: /* u32 sequence, u32 count, u64 time */ - res64[0] = q->data[1]; + res64[0] = q->data[1] - q->data[5]; break; case PIPE_QUERY_PRIMITIVES_GENERATED: /* u64 count, u64 time */ case PIPE_QUERY_PRIMITIVES_EMITTED: /* u64 count, u64 time */ @@ -396,8 +404,7 @@ nv50_render_condition(struct pipe_context *pipe, case PIPE_QUERY_OCCLUSION_COUNTER: case PIPE_QUERY_OCCLUSION_PREDICATE: if (likely(!condition)) { -/* XXX: Placeholder, handle nesting here if available */ -if (unlikely(false)) +if (unlikely(q->nesting)) cond = wait ? NV50_3D_COND_MODE_NOT_EQUAL : NV50_3D_COND_MODE_ALWAYS; else diff --git a/src/gallium/drivers/nouveau/nv50/nv50_screen.h b/src/gallium/drivers/nouveau/nv50/nv50_screen.h index 881051b..3a12a1f 100644 --- a/src/gallium/drivers/nouveau/nv50/nv50_screen.h +++ b/src/gallium/drivers/nouveau/nv50/nv50_screen.h @@ -54,6 +54,8 @@ struct nv50_screen { struct nv50_context *cur_ctx; struct nv50_graph_state save_state; + int num_occlusion_queries_active; + struct nouveau_bo *code; struct nouveau_bo *uniforms; struct nouveau_bo *txc; /* TIC (offset 0) and TSC (65536) */ -- 2.4.5 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/2] nv50: turn samples counts off during blit
Reviewed-by: Ilia Mirkin On Mon, Jul 13, 2015 at 4:49 AM, Samuel Pitoiset wrote: > Fixes the following piglit test: > occlusion_query_meta_no_fragments > > Signed-off-by: Samuel Pitoiset > --- > src/gallium/drivers/nouveau/nv50/nv50_surface.c | 11 +++ > 1 file changed, 11 insertions(+) > > diff --git a/src/gallium/drivers/nouveau/nv50/nv50_surface.c > b/src/gallium/drivers/nouveau/nv50/nv50_surface.c > index dc9852d..66eccc2 100644 > --- a/src/gallium/drivers/nouveau/nv50/nv50_surface.c > +++ b/src/gallium/drivers/nouveau/nv50/nv50_surface.c > @@ -1432,6 +1432,7 @@ static void > nv50_blit(struct pipe_context *pipe, const struct pipe_blit_info *info) > { > struct nv50_context *nv50 = nv50_context(pipe); > + struct nouveau_pushbuf *push = nv50->base.pushbuf; > boolean eng3d = FALSE; > > if (util_format_is_depth_or_stencil(info->dst.resource->format)) { > @@ -1493,10 +1494,20 @@ nv50_blit(struct pipe_context *pipe, const struct > pipe_blit_info *info) > info->src.box.height != -info->dst.box.height)) >eng3d = TRUE; > > + if (nv50->screen->num_occlusion_queries_active) { > + BEGIN_NV04(push, NV50_3D(SAMPLECNT_ENABLE), 1); > + PUSH_DATA (push, 0); > + } > + > if (!eng3d) >nv50_blit_eng2d(nv50, info); > else >nv50_blit_3d(nv50, info); > + > + if (nv50->screen->num_occlusion_queries_active) { > + BEGIN_NV04(push, NV50_3D(SAMPLECNT_ENABLE), 1); > + PUSH_DATA (push, 1); > + } > } > > static void > -- > 2.4.5 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/2] nv50: add nesting support for occlusion queries
Reviewed-by: Ilia Mirkin On Mon, Jul 13, 2015 at 12:49 PM, Samuel Pitoiset wrote: > This is loosely based on nvc0. > > Signed-off-by: Samuel Pitoiset > --- > src/gallium/drivers/nouveau/nv50/nv50_query.c | 29 > -- > src/gallium/drivers/nouveau/nv50/nv50_screen.h | 2 ++ > 2 files changed, 20 insertions(+), 11 deletions(-) > > diff --git a/src/gallium/drivers/nouveau/nv50/nv50_query.c > b/src/gallium/drivers/nouveau/nv50/nv50_query.c > index 81f7474..a5b95c1 100644 > --- a/src/gallium/drivers/nouveau/nv50/nv50_query.c > +++ b/src/gallium/drivers/nouveau/nv50/nv50_query.c > @@ -49,6 +49,7 @@ struct nv50_query { > uint32_t offset; /* base + i * 32 */ > uint8_t state; > boolean is64bit; > + int nesting; /* only used for occlusion queries */ > struct nouveau_mm_allocation *mm; > struct nouveau_fence *fence; > }; > @@ -175,11 +176,16 @@ nv50_query_begin(struct pipe_context *pipe, struct > pipe_query *pq) > > switch (q->type) { > case PIPE_QUERY_OCCLUSION_COUNTER: > - PUSH_SPACE(push, 4); > - BEGIN_NV04(push, NV50_3D(COUNTER_RESET), 1); > - PUSH_DATA (push, NV50_3D_COUNTER_RESET_SAMPLECNT); > - BEGIN_NV04(push, NV50_3D(SAMPLECNT_ENABLE), 1); > - PUSH_DATA (push, 1); > + q->nesting = nv50->screen->num_occlusion_queries_active++; > + if (q->nesting) { > + nv50_query_get(push, q, 0x10, 0x0100f002); > + } else { > + PUSH_SPACE(push, 4); > + BEGIN_NV04(push, NV50_3D(COUNTER_RESET), 1); > + PUSH_DATA (push, NV50_3D_COUNTER_RESET_SAMPLECNT); > + BEGIN_NV04(push, NV50_3D(SAMPLECNT_ENABLE), 1); > + PUSH_DATA (push, 1); > + } >break; > case PIPE_QUERY_PRIMITIVES_GENERATED: >nv50_query_get(push, q, 0x10, 0x06805002); > @@ -223,9 +229,11 @@ nv50_query_end(struct pipe_context *pipe, struct > pipe_query *pq) > switch (q->type) { > case PIPE_QUERY_OCCLUSION_COUNTER: >nv50_query_get(push, q, 0, 0x0100f002); > - PUSH_SPACE(push, 2); > - BEGIN_NV04(push, NV50_3D(SAMPLECNT_ENABLE), 1); > - PUSH_DATA (push, 0); > + if (--nv50->screen->num_occlusion_queries_active == 0) { > + PUSH_SPACE(push, 2); > + BEGIN_NV04(push, NV50_3D(SAMPLECNT_ENABLE), 1); > + PUSH_DATA (push, 0); > + } >break; > case PIPE_QUERY_PRIMITIVES_GENERATED: >nv50_query_get(push, q, 0, 0x06805002); > @@ -319,7 +327,7 @@ nv50_query_result(struct pipe_context *pipe, struct > pipe_query *pq, >res8[0] = TRUE; >break; > case PIPE_QUERY_OCCLUSION_COUNTER: /* u32 sequence, u32 count, u64 time */ > - res64[0] = q->data[1]; > + res64[0] = q->data[1] - q->data[5]; >break; > case PIPE_QUERY_PRIMITIVES_GENERATED: /* u64 count, u64 time */ > case PIPE_QUERY_PRIMITIVES_EMITTED: /* u64 count, u64 time */ > @@ -396,8 +404,7 @@ nv50_render_condition(struct pipe_context *pipe, >case PIPE_QUERY_OCCLUSION_COUNTER: >case PIPE_QUERY_OCCLUSION_PREDICATE: > if (likely(!condition)) { > -/* XXX: Placeholder, handle nesting here if available */ > -if (unlikely(false)) > +if (unlikely(q->nesting)) > cond = wait ? NV50_3D_COND_MODE_NOT_EQUAL : > NV50_3D_COND_MODE_ALWAYS; > else > diff --git a/src/gallium/drivers/nouveau/nv50/nv50_screen.h > b/src/gallium/drivers/nouveau/nv50/nv50_screen.h > index 881051b..3a12a1f 100644 > --- a/src/gallium/drivers/nouveau/nv50/nv50_screen.h > +++ b/src/gallium/drivers/nouveau/nv50/nv50_screen.h > @@ -54,6 +54,8 @@ struct nv50_screen { > struct nv50_context *cur_ctx; > struct nv50_graph_state save_state; > > + int num_occlusion_queries_active; > + > struct nouveau_bo *code; > struct nouveau_bo *uniforms; > struct nouveau_bo *txc; /* TIC (offset 0) and TSC (65536) */ > -- > 2.4.5 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/3] i965/bdw: Fix setting the instancing state for the SGVS element
When gl_VertexID or gl_InstanceID is used a 3DSTATE_VF_SGVS instruction is sent to create a sort of element to store the generated values. The last instruction in this chunk of code looks like it was trying to set the instancing state for the element using the 3DSTATE_VF_INSTANCING instruction. However it was sending brw->vb.nr_buffers instead of the element index. This instruction is supposed to take an element index and that is how it is used further down in the function so the previous code looks wrong. Perhaps previously the number of buffers coincidentally matched the number of enabled elements so the value was generally correct anyway. In a subsequent patch I want to change a bit how it chooses the SGVS element index so this needs to be fixed. Cc: "10.6 10.5" --- src/mesa/drivers/dri/i965/gen8_draw_upload.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/gen8_draw_upload.c b/src/mesa/drivers/dri/i965/gen8_draw_upload.c index 1af90ec..f7d9952 100644 --- a/src/mesa/drivers/dri/i965/gen8_draw_upload.c +++ b/src/mesa/drivers/dri/i965/gen8_draw_upload.c @@ -74,7 +74,7 @@ gen8_emit_vertices(struct brw_context *brw) BEGIN_BATCH(3); OUT_BATCH(_3DSTATE_VF_INSTANCING << 16 | (3 - 2)); - OUT_BATCH(brw->vb.nr_buffers | GEN8_VF_INSTANCING_ENABLE); + OUT_BATCH(vue | GEN8_VF_INSTANCING_ENABLE); OUT_BATCH(0); ADVANCE_BATCH(); } else { -- 1.9.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 0/3] i965: Fix problems with gl_Vertex/InstanceID and glPolygonMode
This series fixes these two related bugs: https://bugs.freedesktop.org/show_bug.cgi?id=84677 https://bugs.freedesktop.org/show_bug.cgi?id=91292 The first bug has had a fix for a while but until now I wasn't able to figure out what to do on BDW. I had previously made a fix for the second bug but as pointed out by the original bug reporter the two bugs interact and the fixes would conflict. This series has a v2 of the patch so that the two fixes work together. I've posted a patch to make the Piglit testing more thorough here: http://patchwork.freedesktop.org/patch/54320/ In the v1 for patch 3 Chris Forbes had a comment that it would be better to move the edge flag element within brw->vb.* instead of while emitting the state. I think this would be a good idea except it won't help in this case now that it works with the SGVS element as well because there is no input for that at all in brw->vb.*. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 3/3 v2] i965/bdw: Fix 3DSTATE_VF_INSTANCING when the edge flag is used
When the edge flag element is enabled then the elements are slightly reordered so that the edge flag is always the last one. This was confusing the code to upload the 3DSTATE_VF_INSTANCING state because that is uploaded with a separate loop which has an instruction for each element. The indices used in these instructions weren't taking into account the reordering so the state would be incorrect. v2: Use nr_elements instead of brw->vb.nr_enabled so that it will cope when gl_VertexID is used. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91292 Cc: "10.6 10.5" --- src/mesa/drivers/dri/i965/gen8_draw_upload.c | 15 +-- 1 file changed, 13 insertions(+), 2 deletions(-) diff --git a/src/mesa/drivers/dri/i965/gen8_draw_upload.c b/src/mesa/drivers/dri/i965/gen8_draw_upload.c index 2bac5ff..1b48643 100644 --- a/src/mesa/drivers/dri/i965/gen8_draw_upload.c +++ b/src/mesa/drivers/dri/i965/gen8_draw_upload.c @@ -246,13 +246,24 @@ gen8_emit_vertices(struct brw_context *brw) } ADVANCE_BATCH(); - for (unsigned i = 0; i < brw->vb.nr_enabled; i++) { + for (unsigned i = 0, j = 0; i < brw->vb.nr_enabled; i++) { const struct brw_vertex_element *input = brw->vb.enabled[i]; const struct brw_vertex_buffer *buffer = &brw->vb.buffers[input->buffer]; + unsigned element_index; + + /* The edge flag element is reordered to be the last one in the code + * above so we need to compensate for that in the element indices used + * below. + */ + if (input == gen6_edgeflag_input) + element_index = nr_elements - 1; + else + element_index = j++; BEGIN_BATCH(3); OUT_BATCH(_3DSTATE_VF_INSTANCING << 16 | (3 - 2)); - OUT_BATCH(i | (buffer->step_rate ? GEN8_VF_INSTANCING_ENABLE : 0)); + OUT_BATCH(element_index | +(buffer->step_rate ? GEN8_VF_INSTANCING_ENABLE : 0)); OUT_BATCH(buffer->step_rate); ADVANCE_BATCH(); } -- 1.9.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/3 v2] i965: Swap the order of the vertex ID and edge flag attributes
The edge flag data on Gen6+ is passed through the fixed function hardware as an extra attribute. According to the PRM it must be the last valid VERTEX_ELEMENT structure. However if the vertex ID is also used then another extra element is added to source the VID. This made it so the vertex ID is in the wrong register in the vertex shader and the edge attribute is no longer in the last element. v2: Also implement for BDW+ Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=84677 Cc: "10.6 10.5" --- src/mesa/drivers/dri/i965/brw_draw_upload.c | 30 +++ src/mesa/drivers/dri/i965/gen8_draw_upload.c | 56 +--- 2 files changed, 57 insertions(+), 29 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_draw_upload.c b/src/mesa/drivers/dri/i965/brw_draw_upload.c index 320e40e..67a8dfd 100644 --- a/src/mesa/drivers/dri/i965/brw_draw_upload.c +++ b/src/mesa/drivers/dri/i965/brw_draw_upload.c @@ -787,21 +787,6 @@ static void brw_emit_vertices(struct brw_context *brw) ((i * 4) << BRW_VE1_DST_OFFSET_SHIFT)); } - if (brw->gen >= 6 && gen6_edgeflag_input) { - uint32_t format = - brw_get_vertex_surface_type(brw, gen6_edgeflag_input->glarray); - - OUT_BATCH((gen6_edgeflag_input->buffer << GEN6_VE0_INDEX_SHIFT) | -GEN6_VE0_VALID | -GEN6_VE0_EDGE_FLAG_ENABLE | -(format << BRW_VE0_FORMAT_SHIFT) | -(gen6_edgeflag_input->offset << BRW_VE0_SRC_OFFSET_SHIFT)); - OUT_BATCH((BRW_VE1_COMPONENT_STORE_SRC << BRW_VE1_COMPONENT_0_SHIFT) | -(BRW_VE1_COMPONENT_STORE_0 << BRW_VE1_COMPONENT_1_SHIFT) | -(BRW_VE1_COMPONENT_STORE_0 << BRW_VE1_COMPONENT_2_SHIFT) | -(BRW_VE1_COMPONENT_STORE_0 << BRW_VE1_COMPONENT_3_SHIFT)); - } - if (brw->vs.prog_data->uses_vertexid || brw->vs.prog_data->uses_instanceid) { uint32_t dw0 = 0, dw1 = 0; uint32_t comp0 = BRW_VE1_COMPONENT_STORE_0; @@ -842,6 +827,21 @@ static void brw_emit_vertices(struct brw_context *brw) OUT_BATCH(dw1); } + if (brw->gen >= 6 && gen6_edgeflag_input) { + uint32_t format = + brw_get_vertex_surface_type(brw, gen6_edgeflag_input->glarray); + + OUT_BATCH((gen6_edgeflag_input->buffer << GEN6_VE0_INDEX_SHIFT) | +GEN6_VE0_VALID | +GEN6_VE0_EDGE_FLAG_ENABLE | +(format << BRW_VE0_FORMAT_SHIFT) | +(gen6_edgeflag_input->offset << BRW_VE0_SRC_OFFSET_SHIFT)); + OUT_BATCH((BRW_VE1_COMPONENT_STORE_SRC << BRW_VE1_COMPONENT_0_SHIFT) | +(BRW_VE1_COMPONENT_STORE_0 << BRW_VE1_COMPONENT_1_SHIFT) | +(BRW_VE1_COMPONENT_STORE_0 << BRW_VE1_COMPONENT_2_SHIFT) | +(BRW_VE1_COMPONENT_STORE_0 << BRW_VE1_COMPONENT_3_SHIFT)); + } + ADVANCE_BATCH(); } diff --git a/src/mesa/drivers/dri/i965/gen8_draw_upload.c b/src/mesa/drivers/dri/i965/gen8_draw_upload.c index f7d9952..2bac5ff 100644 --- a/src/mesa/drivers/dri/i965/gen8_draw_upload.c +++ b/src/mesa/drivers/dri/i965/gen8_draw_upload.c @@ -40,16 +40,25 @@ gen8_emit_vertices(struct brw_context *brw) { struct gl_context *ctx = &brw->ctx; uint32_t mocs_wb = brw->gen >= 9 ? SKL_MOCS_WB : BDW_MOCS_WB; + bool uses_edge_flag; brw_prepare_vertices(brw); brw_prepare_shader_draw_parameters(brw); + uses_edge_flag = (ctx->Polygon.FrontMode != GL_FILL || + ctx->Polygon.BackMode != GL_FILL); + if (brw->vs.prog_data->uses_vertexid || brw->vs.prog_data->uses_instanceid) { unsigned vue = brw->vb.nr_enabled; - WARN_ONCE(brw->vs.prog_data->inputs_read & VERT_BIT_EDGEFLAG, -"Using VID/IID with edgeflags, need to reorder the " -"vertex attributes"); + /* The element for the edge flags must always be last, so we have to + * insert the SGVS before it in that case. + */ + if (uses_edge_flag) { + assert(vue > 0); + vue--; + } + WARN_ONCE(vue >= 33, "Trying to insert VID/IID past 33rd vertex element, " "need to reorder the vertex attrbutes."); @@ -138,7 +147,18 @@ gen8_emit_vertices(struct brw_context *brw) ADVANCE_BATCH(); } - unsigned nr_elements = brw->vb.nr_enabled + brw->vs.prog_data->uses_vertexid; + /* Normally we don't need an element for the SGVS attribute because the +* 3DSTATE_VF_SGVS instruction lets you store the generated attribute in an +* element that is past the list in 3DSTATE_VERTEX_ELEMENTS. However if the +* vertex ID is used then it needs an element for the base vertex buffer. +* Additionally if there is an edge flag element then the SGVS can't be +* inserted past that so we need a dummy element to ensure that the edge +* flag is the last one. +*/ + bool needs_sgvs_element = (brw->vs.prog_data->uses_vertexid || + (brw->vs.prog_data->use
Re: [Mesa-dev] [PATCH] nvc0: fix geometry program revalidation of clipping params
Any one which, after using a geometry shader, enables an extra clip distance. i.e. none. On Mon, Jul 13, 2015 at 4:16 AM, Samuel Pitoiset wrote: > What piglit test does this fix? > > On Sat, Jul 11, 2015 at 7:13 PM, Ilia Mirkin wrote: >> >> Signed-off-by: Ilia Mirkin >> Cc: mesa-sta...@lists.freedesktop.org >> --- >> >> Even though in practice a geometry program will never be using UCP's, >> we still were revalidating (aka recompiling) the program when more >> clip planes became enabled (which also are used for regular clip >> distances). >> >> This seems like it should have led to massive fail, but I guess you >> don't change the number of clip planes when using geometry shaders. >> But I'm going to put this through a full piglit run just in case >> there's something I'm missing. >> >> src/gallium/drivers/nouveau/nvc0/nvc0_state_validate.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_state_validate.c >> b/src/gallium/drivers/nouveau/nvc0/nvc0_state_validate.c >> index 785e52e..11f2b10 100644 >> --- a/src/gallium/drivers/nouveau/nvc0/nvc0_state_validate.c >> +++ b/src/gallium/drivers/nouveau/nvc0/nvc0_state_validate.c >> @@ -339,7 +339,7 @@ nvc0_check_program_ucps(struct nvc0_context *nvc0, >>nvc0_vertprog_validate(nvc0); >> else >> if (likely(vp == nvc0->gmtyprog)) >> - nvc0_vertprog_validate(nvc0); >> + nvc0_gmtyprog_validate(nvc0); >> else >>nvc0_tevlprog_validate(nvc0); >> } >> -- >> 2.3.6 >> >> ___ >> mesa-dev mailing list >> mesa-dev@lists.freedesktop.org >> http://lists.freedesktop.org/mailman/listinfo/mesa-dev > > > > > -- > Best regards, > Samuel Pitoiset. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] nvc0: fix geometry program revalidation of clipping params
This was, btw, introduced in commit 3a8ae6ac243b (nvc0: adapt to new clip state). Back then there was no real geometry support yet. On Mon, Jul 13, 2015 at 2:05 PM, Ilia Mirkin wrote: > Any one which, after using a geometry shader, enables an extra clip > distance. i.e. none. > > On Mon, Jul 13, 2015 at 4:16 AM, Samuel Pitoiset > wrote: >> What piglit test does this fix? >> >> On Sat, Jul 11, 2015 at 7:13 PM, Ilia Mirkin wrote: >>> >>> Signed-off-by: Ilia Mirkin >>> Cc: mesa-sta...@lists.freedesktop.org >>> --- >>> >>> Even though in practice a geometry program will never be using UCP's, >>> we still were revalidating (aka recompiling) the program when more >>> clip planes became enabled (which also are used for regular clip >>> distances). >>> >>> This seems like it should have led to massive fail, but I guess you >>> don't change the number of clip planes when using geometry shaders. >>> But I'm going to put this through a full piglit run just in case >>> there's something I'm missing. >>> >>> src/gallium/drivers/nouveau/nvc0/nvc0_state_validate.c | 2 +- >>> 1 file changed, 1 insertion(+), 1 deletion(-) >>> >>> diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_state_validate.c >>> b/src/gallium/drivers/nouveau/nvc0/nvc0_state_validate.c >>> index 785e52e..11f2b10 100644 >>> --- a/src/gallium/drivers/nouveau/nvc0/nvc0_state_validate.c >>> +++ b/src/gallium/drivers/nouveau/nvc0/nvc0_state_validate.c >>> @@ -339,7 +339,7 @@ nvc0_check_program_ucps(struct nvc0_context *nvc0, >>>nvc0_vertprog_validate(nvc0); >>> else >>> if (likely(vp == nvc0->gmtyprog)) >>> - nvc0_vertprog_validate(nvc0); >>> + nvc0_gmtyprog_validate(nvc0); >>> else >>>nvc0_tevlprog_validate(nvc0); >>> } >>> -- >>> 2.3.6 >>> >>> ___ >>> mesa-dev mailing list >>> mesa-dev@lists.freedesktop.org >>> http://lists.freedesktop.org/mailman/listinfo/mesa-dev >> >> >> >> >> -- >> Best regards, >> Samuel Pitoiset. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] nv50: limit the maximum number of samplers to 16
On 13 July 2015 at 17:31, Ilia Mirkin wrote: > On Mon, Jul 13, 2015 at 12:08 PM, Emil Velikov > wrote: >> On 13 July 2015 at 10:24, Samuel Pitoiset wrote: >>> NV50_3D_BIND_TSC only allows to bind 16 samplers, and since we don't >>> want to do anything with NV50_3D_BIND_TSC2 >> Is there some widely know issue with it ? mesa & envytools does not >> mention anything (based of a quick scan). > > It's for an extra 2 system samplers (not sure what for, really). I've > never seen the blob touch them. I guess that they were expecting for the minimum to be bumped with OpenGL 4.x (from 16 with OpenGL 3.0). Thanks Emil ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] nv50: limit the maximum number of samplers to 16
On 13 July 2015 at 17:32, samuel.pitoiset wrote: > On 13/07/2015 18:08, Emil Velikov wrote: >> >> Is there some widely know issue with it ? mesa & envytools does not >> mention anything (based of a quick scan). > > > There is no know issue with NV50_3D_BIND_TSC2, but Ilia doesn't seem to want > to use it. :-) > That's why I followed his suggestion by reducing the max. number of samplers > to 16 like for nvc0. > Ack, thanks for the confirmation. Currently it reads like a warning sign "stay away from XXX" - so if you want to you can soften if a bit. -Emil ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] nv50: limit the maximum number of samplers to 16
On Mon, Jul 13, 2015 at 2:28 PM, Emil Velikov wrote: > On 13 July 2015 at 17:32, samuel.pitoiset wrote: >> On 13/07/2015 18:08, Emil Velikov wrote: >>> >>> Is there some widely know issue with it ? mesa & envytools does not >>> mention anything (based of a quick scan). >> >> >> There is no know issue with NV50_3D_BIND_TSC2, but Ilia doesn't seem to want >> to use it. :-) >> That's why I followed his suggestion by reducing the max. number of samplers >> to 16 like for nvc0. >> > Ack, thanks for the confirmation. Currently it reads like a warning > sign "stay away from XXX" - so if you want to you can soften if a bit. "stay away from BIND_TSC2" is right. It's very much not to be used for regular applications from what I can tell. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] nv50: limit the maximum number of samplers to 16
On 13 July 2015 at 19:30, Ilia Mirkin wrote: > On Mon, Jul 13, 2015 at 2:28 PM, Emil Velikov > wrote: >> On 13 July 2015 at 17:32, samuel.pitoiset wrote: >>> On 13/07/2015 18:08, Emil Velikov wrote: Is there some widely know issue with it ? mesa & envytools does not mention anything (based of a quick scan). >>> >>> >>> There is no know issue with NV50_3D_BIND_TSC2, but Ilia doesn't seem to want >>> to use it. :-) >>> That's why I followed his suggestion by reducing the max. number of samplers >>> to 16 like for nvc0. >>> >> Ack, thanks for the confirmation. Currently it reads like a warning >> sign "stay away from XXX" - so if you want to you can soften if a bit. > > "stay away from BIND_TSC2" is right. It's very much not to be used for > regular applications from what I can tell. I guess I'm having a dull moment and cannot justify the "hostility" towards it. Is it solely based upon "the blob does not use it so we shouldn't either" or is there something else ? Perhaps they cannot be bothered to hook it up. Should that be a discouragement towards using it in nouveau ? -Emil ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] nv50: limit the maximum number of samplers to 16
On Mon, Jul 13, 2015 at 2:44 PM, Emil Velikov wrote: > On 13 July 2015 at 19:30, Ilia Mirkin wrote: >> On Mon, Jul 13, 2015 at 2:28 PM, Emil Velikov >> wrote: >>> On 13 July 2015 at 17:32, samuel.pitoiset wrote: On 13/07/2015 18:08, Emil Velikov wrote: > > Is there some widely know issue with it ? mesa & envytools does not > mention anything (based of a quick scan). There is no know issue with NV50_3D_BIND_TSC2, but Ilia doesn't seem to want to use it. :-) That's why I followed his suggestion by reducing the max. number of samplers to 16 like for nvc0. >>> Ack, thanks for the confirmation. Currently it reads like a warning >>> sign "stay away from XXX" - so if you want to you can soften if a bit. >> >> "stay away from BIND_TSC2" is right. It's very much not to be used for >> regular applications from what I can tell. > I guess I'm having a dull moment and cannot justify the "hostility" > towards it. Is it solely based upon "the blob does not use it so we > shouldn't either" or is there something else ? Perhaps they cannot be > bothered to hook it up. Should that be a discouragement towards using > it in nouveau ? It's an extra 2 (not 16 as the original patch thought!) samplers. Whole separate system for setting them. Seems like they may have been separate for a reason. There's no API reason to expose them either. -ilia ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 00/11] Bye bye __NOT_HAVE_DRM_H
On 8 July 2015 at 18:07, Emil Velikov wrote: > Hi all, > > As double-negatives are not too appealing in the English language I've > decided to nuke the common ifndef __NOT_HAVE_DRM_H. > > The first three patches are trivial bugfix + cleanups which I've noticed > while browsing through, 04-09 remove all occurrences of the lastly macro > (HAVE_LIBDRM is used instead). With the last two dropping kms_swrast > from the scons/android build. > > Note: I'm not 100% sure if the scons build sets the HAVE_LIBDRM macro, > despite that it builds just fine. If anyone can confirm that will be > appreciated. > For anyone willing to take a look - please focus on 1 & 4, everything else is dead obvious :-) [PATCH 01/11] configure.ac: do not set HAVE_DRI(23) when libdrm is missing [PATCH 04/11] dri_interface: drop __NOT_HAVE_DRM_H magic Thanks Emil ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/4] i965: Split batch emission from relocation functions.
So that everything writing to the batch between BEGIN_BATCH() and ADVANCE_BATCH() goes through OUT_BATCH. Reviewed-by: Chris Wilson --- src/mesa/drivers/dri/i965/intel_batchbuffer.c | 30 ++- src/mesa/drivers/dri/i965/intel_batchbuffer.h | 34 ++- 2 files changed, 30 insertions(+), 34 deletions(-) diff --git a/src/mesa/drivers/dri/i965/intel_batchbuffer.c b/src/mesa/drivers/dri/i965/intel_batchbuffer.c index d93ee6e..f82958f 100644 --- a/src/mesa/drivers/dri/i965/intel_batchbuffer.c +++ b/src/mesa/drivers/dri/i965/intel_batchbuffer.c @@ -393,11 +393,11 @@ _intel_batchbuffer_flush(struct brw_context *brw, /* This is the only way buffers get added to the validate list. */ -bool -intel_batchbuffer_emit_reloc(struct brw_context *brw, - drm_intel_bo *buffer, - uint32_t read_domains, uint32_t write_domain, -uint32_t delta) +uint32_t +intel_batchbuffer_reloc(struct brw_context *brw, +drm_intel_bo *buffer, +uint32_t read_domains, uint32_t write_domain, +uint32_t delta) { int ret; @@ -411,16 +411,14 @@ intel_batchbuffer_emit_reloc(struct brw_context *brw, * case the buffer doesn't move and we can short-circuit the relocation * processing in the kernel */ - intel_batchbuffer_emit_dword(brw, buffer->offset64 + delta); - - return true; + return buffer->offset64 + delta; } -bool -intel_batchbuffer_emit_reloc64(struct brw_context *brw, - drm_intel_bo *buffer, - uint32_t read_domains, uint32_t write_domain, - uint32_t delta) +uint64_t +intel_batchbuffer_reloc64(struct brw_context *brw, + drm_intel_bo *buffer, + uint32_t read_domains, uint32_t write_domain, + uint32_t delta) { int ret = drm_intel_bo_emit_reloc(brw->batch.bo, 4*brw->batch.used, buffer, delta, @@ -432,11 +430,7 @@ intel_batchbuffer_emit_reloc64(struct brw_context *brw, * case the buffer doesn't move and we can short-circuit the relocation * processing in the kernel */ - uint64_t offset = buffer->offset64 + delta; - intel_batchbuffer_emit_dword(brw, offset); - intel_batchbuffer_emit_dword(brw, offset >> 32); - - return true; + return buffer->offset64 + delta; } diff --git a/src/mesa/drivers/dri/i965/intel_batchbuffer.h b/src/mesa/drivers/dri/i965/intel_batchbuffer.h index e58eae4..c0456f3 100644 --- a/src/mesa/drivers/dri/i965/intel_batchbuffer.h +++ b/src/mesa/drivers/dri/i965/intel_batchbuffer.h @@ -57,16 +57,16 @@ void intel_batchbuffer_data(struct brw_context *brw, const void *data, GLuint bytes, enum brw_gpu_ring ring); -bool intel_batchbuffer_emit_reloc(struct brw_context *brw, - drm_intel_bo *buffer, - uint32_t read_domains, - uint32_t write_domain, - uint32_t offset); -bool intel_batchbuffer_emit_reloc64(struct brw_context *brw, -drm_intel_bo *buffer, -uint32_t read_domains, -uint32_t write_domain, -uint32_t offset); +uint32_t intel_batchbuffer_reloc(struct brw_context *brw, + drm_intel_bo *buffer, + uint32_t read_domains, + uint32_t write_domain, + uint32_t offset); +uint64_t intel_batchbuffer_reloc64(struct brw_context *brw, + drm_intel_bo *buffer, + uint32_t read_domains, + uint32_t write_domain, + uint32_t offset); static inline uint32_t float_as_int(float f) { union { @@ -164,14 +164,16 @@ intel_batchbuffer_advance(struct brw_context *brw) #define BEGIN_BATCH_BLT(n) intel_batchbuffer_begin(brw, n, BLT_RING) #define OUT_BATCH(d) intel_batchbuffer_emit_dword(brw, d) #define OUT_BATCH_F(f) intel_batchbuffer_emit_float(brw, f) -#define OUT_RELOC(buf, read_domains, write_domain, delta) do { \ - intel_batchbuffer_emit_reloc(brw, buf, \ - read_domains, write_domain, delta); \ -} while (0) +#define OUT_RELOC(buf, read_domains, write_domain, delta) \ + OUT_BATCH(intel_batchbuffer_reloc(brw, buf, read_domains, write_domain, \ + delta)) /* Handle 48-bit address relocations for Gen8+ */ -#define OUT_RELOC64(buf, read_domains, write_domain, delta) do {
[Mesa-dev] [PATCH 1/4] i965: Move BEGIN_BATCH() into same control flow as ADVANCE_BATCH().
BEGIN_BATCH() and ADVANCE_BATCH() will contain "do {" and "} while (0)" respectively to allow declaring local variables used by intervening OUT_BATCH macros. As such, BEGIN_BATCH() and ADVANCE_BATCH() need to be in the same control flow. Reviewed-by: Iago Toral Quiroga --- src/mesa/drivers/dri/i965/brw_draw.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_draw.c b/src/mesa/drivers/dri/i965/brw_draw.c index 69ad4d4..ec13473 100644 --- a/src/mesa/drivers/dri/i965/brw_draw.c +++ b/src/mesa/drivers/dri/i965/brw_draw.c @@ -261,17 +261,17 @@ static void brw_emit_prim(struct brw_context *brw, indirect_flag = 0; } + BEGIN_BATCH(brw->gen >= 7 ? 7 : 6); + if (brw->gen >= 7) { if (brw->predicate.state == BRW_PREDICATE_STATE_USE_BIT) predicate_enable = GEN7_3DPRIM_PREDICATE_ENABLE; else predicate_enable = 0; - BEGIN_BATCH(7); OUT_BATCH(CMD_3D_PRIM << 16 | (7 - 2) | indirect_flag | predicate_enable); OUT_BATCH(hw_prim | vertex_access_type); } else { - BEGIN_BATCH(6); OUT_BATCH(CMD_3D_PRIM << 16 | (6 - 2) | hw_prim << GEN4_3DPRIM_TOPOLOGY_TYPE_SHIFT | vertex_access_type); -- 2.3.6 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 3/4] i965: Add and use USED_BATCH macro.
The next patch will replace the .used field with an on-demand calculation of batchbuffer usage. --- src/mesa/drivers/dri/i965/brw_blorp.cpp | 4 ++-- src/mesa/drivers/dri/i965/brw_performance_monitor.c | 6 +++--- src/mesa/drivers/dri/i965/brw_state_batch.c | 4 ++-- src/mesa/drivers/dri/i965/brw_urb.c | 4 ++-- src/mesa/drivers/dri/i965/intel_batchbuffer.c | 20 ++-- src/mesa/drivers/dri/i965/intel_batchbuffer.h | 9 ++--- 6 files changed, 25 insertions(+), 22 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_blorp.cpp b/src/mesa/drivers/dri/i965/brw_blorp.cpp index 2ccfae1..eac1f00 100644 --- a/src/mesa/drivers/dri/i965/brw_blorp.cpp +++ b/src/mesa/drivers/dri/i965/brw_blorp.cpp @@ -226,7 +226,7 @@ retry: intel_batchbuffer_require_space(brw, estimated_max_batch_usage, RENDER_RING); intel_batchbuffer_save_state(brw); drm_intel_bo *saved_bo = brw->batch.bo; - uint32_t saved_used = brw->batch.used; + uint32_t saved_used = USED_BATCH(brw->batch); uint32_t saved_state_batch_offset = brw->batch.state_batch_offset; switch (brw->gen) { @@ -245,7 +245,7 @@ retry: * reserved enough space that a wrap will never happen. */ assert(brw->batch.bo == saved_bo); - assert((brw->batch.used - saved_used) * 4 + + assert((USED_BATCH(brw->batch) - saved_used) * 4 + (saved_state_batch_offset - brw->batch.state_batch_offset) < estimated_max_batch_usage); /* Shut up compiler warnings on release build */ diff --git a/src/mesa/drivers/dri/i965/brw_performance_monitor.c b/src/mesa/drivers/dri/i965/brw_performance_monitor.c index 0a12375..7e90e8a 100644 --- a/src/mesa/drivers/dri/i965/brw_performance_monitor.c +++ b/src/mesa/drivers/dri/i965/brw_performance_monitor.c @@ -710,7 +710,7 @@ emit_mi_report_perf_count(struct brw_context *brw, /* Make sure the commands to take a snapshot fits in a single batch. */ intel_batchbuffer_require_space(brw, MI_REPORT_PERF_COUNT_BATCH_DWORDS * 4, RENDER_RING); - int batch_used = brw->batch.used; + int batch_used = USED_BATCH(brw->batch); /* Reports apparently don't always get written unless we flush first. */ brw_emit_mi_flush(brw); @@ -754,7 +754,7 @@ emit_mi_report_perf_count(struct brw_context *brw, brw_emit_mi_flush(brw); (void) batch_used; - assert(brw->batch.used - batch_used <= MI_REPORT_PERF_COUNT_BATCH_DWORDS * 4); + assert(USED_BATCH(brw->batch) - batch_used <= MI_REPORT_PERF_COUNT_BATCH_DWORDS * 4); } /** @@ -1386,7 +1386,7 @@ void brw_perf_monitor_new_batch(struct brw_context *brw) { assert(brw->batch.ring == RENDER_RING); - assert(brw->gen < 6 || brw->batch.used == 0); + assert(brw->gen < 6 || USED_BATCH(brw->batch) == 0); if (brw->perfmon.oa_users == 0) return; diff --git a/src/mesa/drivers/dri/i965/brw_state_batch.c b/src/mesa/drivers/dri/i965/brw_state_batch.c index a405a80..d79e0ea 100644 --- a/src/mesa/drivers/dri/i965/brw_state_batch.c +++ b/src/mesa/drivers/dri/i965/brw_state_batch.c @@ -87,7 +87,7 @@ brw_annotate_aub(struct brw_context *brw) drm_intel_aub_annotation annotations[annotation_count]; int a = 0; make_annotation(&annotations[a++], AUB_TRACE_TYPE_BATCH, 0, - 4*brw->batch.used); + 4 * USED_BATCH(brw->batch)); for (int i = brw->state_batch_count; i-- > 0; ) { uint32_t type = brw->state_batch_list[i].type; uint32_t start_offset = brw->state_batch_list[i].offset; @@ -136,7 +136,7 @@ __brw_state_batch(struct brw_context *brw, * space, then flush and try again. */ if (batch->state_batch_offset < size || - offset < 4*batch->used + batch->reserved_space) { + offset < 4 * USED_BATCH(*batch) + batch->reserved_space) { intel_batchbuffer_flush(brw); offset = ROUND_DOWN_TO(batch->state_batch_offset - size, alignment); } diff --git a/src/mesa/drivers/dri/i965/brw_urb.c b/src/mesa/drivers/dri/i965/brw_urb.c index 6fcf1b0..8fc06ba 100644 --- a/src/mesa/drivers/dri/i965/brw_urb.c +++ b/src/mesa/drivers/dri/i965/brw_urb.c @@ -249,8 +249,8 @@ void brw_upload_urb_fence(struct brw_context *brw) uf.bits1.cs_fence = brw->urb.size; /* erratum: URB_FENCE must not cross a 64byte cacheline */ - if ((brw->batch.used & 15) > 12) { - int pad = 16 - (brw->batch.used & 15); + if ((USED_BATCH(brw->batch) & 15) > 12) { + int pad = 16 - (USED_BATCH(brw->batch) & 15); do brw->batch.map[brw->batch.used++] = MI_NOOP; while (--pad); diff --git a/src/mesa/drivers/dri/i965/intel_batchbuffer.c b/src/mesa/drivers/dri/i965/intel_batchbuffer.c index f82958f..628a7b7 100644 --- a/src/mesa/drivers/dri/i965/intel_batchbuffer.c +++ b/src/mesa/drivers/dri/i965/intel_batchbuffer.c @@ -94,7 +94,7 @@ intel_batchbuffer_reset_to_saved(struct brw_context *brw) drm_intel_gem_bo_clear_relocs(brw->batch.bo, brw->b
[Mesa-dev] [PATCH 4/4] i965: Optimize batchbuffer macros.
Previously OUT_BATCH was just a macro around an inline function which does brw->batch.map[brw->batch.used++] = dword; When making consecutive calls to intel_batchbuffer_emit_dword() the compiler isn't able to recognize that we're writing consecutive memory locations or that it doesn't need to write batch.used back to memory each time. We can avoid both of these problems by making a local pointer to the next location in the batch in BEGIN_BATCH(). Cuts 18k from the .text size. text data bss dec hex filename 4946956 19515226192 5168300 4edcac i965_dri.so before 4928956 19515226192 5150300 4e965c i965_dri.so after This series (including commit c0433948) improves performance of Synmark OglBatch7 by 8.01389% +/- 0.63922% (n=83) on Ivybridge. --- Again, Chris makes an excellent suggestion that in turn improves performance! src/mesa/drivers/dri/i965/brw_context.h | 5 ++- src/mesa/drivers/dri/i965/brw_draw_upload.c | 12 -- src/mesa/drivers/dri/i965/brw_urb.c | 2 +- src/mesa/drivers/dri/i965/intel_batchbuffer.c | 19 - src/mesa/drivers/dri/i965/intel_batchbuffer.h | 55 ++- src/mesa/drivers/dri/i965/intel_blit.c| 19 - 6 files changed, 70 insertions(+), 42 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_context.h b/src/mesa/drivers/dri/i965/brw_context.h index 44d1aea..34a49b2 100644 --- a/src/mesa/drivers/dri/i965/brw_context.h +++ b/src/mesa/drivers/dri/i965/brw_context.h @@ -873,7 +873,8 @@ struct intel_batchbuffer { #ifdef DEBUG uint16_t emit, total; #endif - uint16_t used, reserved_space; + uint16_t reserved_space; + uint32_t *map_next; uint32_t *map; uint32_t *cpu_map; #define BATCH_SZ (8192*sizeof(uint32_t)) @@ -883,7 +884,7 @@ struct intel_batchbuffer { bool needs_sol_reset; struct { - uint16_t used; + uint32_t *map_next; int reloc_count; } saved; }; diff --git a/src/mesa/drivers/dri/i965/brw_draw_upload.c b/src/mesa/drivers/dri/i965/brw_draw_upload.c index 320e40e..7cb091d 100644 --- a/src/mesa/drivers/dri/i965/brw_draw_upload.c +++ b/src/mesa/drivers/dri/i965/brw_draw_upload.c @@ -604,14 +604,15 @@ brw_prepare_shader_draw_parameters(struct brw_context *brw) /** * Emit a VERTEX_BUFFER_STATE entry (part of 3DSTATE_VERTEX_BUFFERS). */ -static void +static uint32_t * emit_vertex_buffer_state(struct brw_context *brw, unsigned buffer_nr, drm_intel_bo *bo, unsigned bo_ending_address, unsigned bo_offset, unsigned stride, - unsigned step_rate) + unsigned step_rate, + uint32_t *__map) { struct gl_context *ctx = &brw->ctx; uint32_t dw0; @@ -643,7 +644,10 @@ emit_vertex_buffer_state(struct brw_context *brw, OUT_BATCH(0); } OUT_BATCH(step_rate); + + return __map; } +#define EMIT_VERTEX_BUFFER_STATE(...) __map = emit_vertex_buffer_state(__VA_ARGS__, __map) static void brw_emit_vertices(struct brw_context *brw) { @@ -704,14 +708,14 @@ static void brw_emit_vertices(struct brw_context *brw) OUT_BATCH((_3DSTATE_VERTEX_BUFFERS << 16) | (4 * nr_buffers - 1)); for (i = 0; i < brw->vb.nr_buffers; i++) { struct brw_vertex_buffer *buffer = &brw->vb.buffers[i]; - emit_vertex_buffer_state(brw, i, buffer->bo, buffer->bo->size - 1, + EMIT_VERTEX_BUFFER_STATE(brw, i, buffer->bo, buffer->bo->size - 1, buffer->offset, buffer->stride, buffer->step_rate); } if (brw->vs.prog_data->uses_vertexid) { - emit_vertex_buffer_state(brw, brw->vb.nr_buffers, + EMIT_VERTEX_BUFFER_STATE(brw, brw->vb.nr_buffers, brw->draw.draw_params_bo, brw->draw.draw_params_bo->size - 1, brw->draw.draw_params_offset, diff --git a/src/mesa/drivers/dri/i965/brw_urb.c b/src/mesa/drivers/dri/i965/brw_urb.c index 8fc06ba..6078c38 100644 --- a/src/mesa/drivers/dri/i965/brw_urb.c +++ b/src/mesa/drivers/dri/i965/brw_urb.c @@ -252,7 +252,7 @@ void brw_upload_urb_fence(struct brw_context *brw) if ((USED_BATCH(brw->batch) & 15) > 12) { int pad = 16 - (USED_BATCH(brw->batch) & 15); do -brw->batch.map[brw->batch.used++] = MI_NOOP; + *brw->batch.map_next++ = MI_NOOP; while (--pad); } diff --git a/src/mesa/drivers/dri/i965/intel_batchbuffer.c b/src/mesa/drivers/dri/i965/intel_batchbuffer.c index 628a7b7..088ffd2 100644 --- a/src/mesa/drivers/dri/i965/intel_batchbuffer.c +++ b/src/mesa/drivers/dri/i965/intel_batchbuffer.c @@ -48,6 +48,7 @@ intel_batchbuffer_init(struct brw_context *brw) if (!brw->has_llc) { brw->batch.cpu_map = malloc(BATCH_SZ); b
Re: [Mesa-dev] [PATCH 4/5] i965: Split batch emission from relocation functions.
On Sat, Jul 11, 2015 at 11:08 AM, Chris Wilson wrote: > On Fri, Jul 10, 2015 at 11:44:58AM -0700, Matt Turner wrote: >> So that everything writing to the batch between BEGIN_BATCH() and >> ADVANCE_BATCH() goes through OUT_BATCH. > > Reviewed-by: Chris Wilson > >> +#define OUT_RELOC64(buf, read_domains, write_domain, delta) do {\ >> + uint64_t reloc64 = intel_batchbuffer_reloc64(brw, buf, read_domains, \ >> +write_domain, delta); \ >> + OUT_BATCH(reloc64); \ >> + OUT_BATCH(reloc64 >> 32);\ >> } while (0) > > Is the compiler smart enough to convert this to movq? It is not, but I'm not totally sure that's better -- an unaligned movq might cross a cache boundary which is apparently pretty expensive. > The pointer emission variant at least makes it trivial to use >*((uint64_t *)map)++ = reloc64 > if so desired. I'll do some performance tests on top of my v3 series. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 5/5] i965: Optimize batchbuffer macros.
On Sat, Jul 11, 2015 at 11:02 AM, Chris Wilson wrote: > On Fri, Jul 10, 2015 at 11:44:59AM -0700, Matt Turner wrote: >> Previously OUT_BATCH was just a macro around an inline function which >> does >> >>brw->batch.map[brw->batch.used++] = dword; >> >> When making consecutive calls to intel_batchbuffer_emit_dword() the >> compiler isn't able to recognize that we're writing consecutive memory >> locations or that it doesn't need to write batch.used back to memory >> each time. >> >> We can avoid both of these problems by making a local pointer to the >> next location in the batch in BEGIN_BATCH(), indexing it with a local >> variable, and incrementing batch.used once in ADVANCE_BATCH(). >> >> Cuts 18k from the .text size. >> >>text data bss dec hex filename >> 4946956 19515226192 5168300 4edcac i965_dri.so before >> 4928588 19515226192 5149932 4e94ec i965_dri.so after >> >> This series (including commit c0433948) improves performance of Synmark >> OglBatch7 by 3.64514% +/- 0.298131% (n=282) on Ivybridge. >> --- >> That -4.19005% +/- 1.15188% (n=30) regression on Ivybridge is now a >> performance improvement! Thanks Chris for the help! >> >> src/mesa/drivers/dri/i965/intel_batchbuffer.c | 8 ++--- >> src/mesa/drivers/dri/i965/intel_batchbuffer.h | 52 >> +++ >> 2 files changed, 41 insertions(+), 19 deletions(-) >> >> diff --git a/src/mesa/drivers/dri/i965/intel_batchbuffer.c >> b/src/mesa/drivers/dri/i965/intel_batchbuffer.c >> index f82958f..93f2872 100644 >> --- a/src/mesa/drivers/dri/i965/intel_batchbuffer.c >> +++ b/src/mesa/drivers/dri/i965/intel_batchbuffer.c >> @@ -395,13 +395,13 @@ _intel_batchbuffer_flush(struct brw_context *brw, >> */ >> uint32_t >> intel_batchbuffer_reloc(struct brw_context *brw, >> -drm_intel_bo *buffer, >> +drm_intel_bo *buffer, uint32_t offset, >> uint32_t read_domains, uint32_t write_domain, >> uint32_t delta) >> { >> int ret; >> >> - ret = drm_intel_bo_emit_reloc(brw->batch.bo, 4*brw->batch.used, >> + ret = drm_intel_bo_emit_reloc(brw->batch.bo, offset, >>buffer, delta, >>read_domains, write_domain); >> assert(ret == 0); >> @@ -416,11 +416,11 @@ intel_batchbuffer_reloc(struct brw_context *brw, >> >> uint64_t >> intel_batchbuffer_reloc64(struct brw_context *brw, >> - drm_intel_bo *buffer, >> + drm_intel_bo *buffer, uint32_t offset, >>uint32_t read_domains, uint32_t write_domain, >>uint32_t delta) >> { >> - int ret = drm_intel_bo_emit_reloc(brw->batch.bo, 4*brw->batch.used, >> + int ret = drm_intel_bo_emit_reloc(brw->batch.bo, offset, >> buffer, delta, >> read_domains, write_domain); >> assert(ret == 0); >> diff --git a/src/mesa/drivers/dri/i965/intel_batchbuffer.h >> b/src/mesa/drivers/dri/i965/intel_batchbuffer.h >> index c0456f3..6342c97 100644 >> --- a/src/mesa/drivers/dri/i965/intel_batchbuffer.h >> +++ b/src/mesa/drivers/dri/i965/intel_batchbuffer.h >> @@ -59,14 +59,16 @@ void intel_batchbuffer_data(struct brw_context *brw, >> >> uint32_t intel_batchbuffer_reloc(struct brw_context *brw, >> drm_intel_bo *buffer, >> + uint32_t offset, >> uint32_t read_domains, >> uint32_t write_domain, >> - uint32_t offset); >> + uint32_t delta); >> uint64_t intel_batchbuffer_reloc64(struct brw_context *brw, >> drm_intel_bo *buffer, >> + uint32_t offset, >> uint32_t read_domains, >> uint32_t write_domain, >> - uint32_t offset); >> + uint32_t delta); >> static inline uint32_t float_as_int(float f) >> { >> union { >> @@ -160,23 +162,43 @@ intel_batchbuffer_advance(struct brw_context *brw) >> #endif >> } >> >> -#define BEGIN_BATCH(n) intel_batchbuffer_begin(brw, n, RENDER_RING) >> -#define BEGIN_BATCH_BLT(n) intel_batchbuffer_begin(brw, n, BLT_RING) >> -#define OUT_BATCH(d) intel_batchbuffer_emit_dword(brw, d) >> -#define OUT_BATCH_F(f) intel_batchbuffer_emit_float(brw, f) >> -#define OUT_RELOC(buf, read_domains, write_domain, delta) \ >> - OUT_BATCH(intel_batchbuffer_reloc(brw, buf, read_domains, write_domain, \ >> - delta)) >> +#define BEGIN_BATCH(n) do {\ >> + intel_batchbuffer_begin(brw, (n), RENDER_RING); \ >> + uint32_t *__map = &brw->batch.map[brw
[Mesa-dev] [PATCH] i965: Fix 32 bit build warnings in intel_get_yf_ys_bo_size()
Along with fixing the type of pitch parameter, patch also changes the types of few local variables and function return type. Warnings fixed are: intel_mipmap_tree.c:671:7: warning: passing argument 3 of 'intel_get_yf_ys_bo_size' from incompatible pointer type intel_mipmap_tree.c:563:1: note: expected 'uint64_t *' but argument is of type 'long unsigned int *' Reported-by: Kenneth Graunke Signed-off-by: Anuj Phogat --- src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c index fb896a9..1529651 100644 --- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c +++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c @@ -559,14 +559,14 @@ intel_lower_compressed_format(struct brw_context *brw, mesa_format format) } /* This function computes Yf/Ys tiled bo size, alignment and pitch. */ -static uint64_t +static unsigned long intel_get_yf_ys_bo_size(struct intel_mipmap_tree *mt, unsigned *alignment, -uint64_t *pitch) +unsigned long *pitch) { const uint32_t bpp = mt->cpp * 8; const uint32_t aspect_ratio = (bpp == 16 || bpp == 64) ? 2 : 1; uint32_t tile_width, tile_height; - uint64_t stride, size, aligned_y; + unsigned long stride, size, aligned_y; assert(mt->tr_mode != INTEL_MIPTREE_TRMODE_NONE); -- 1.9.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] [v3] i965: Split out gen8 push constant state upload
On Fri, Jul 10, 2015 at 02:24:42PM -0700, Ben Widawsky wrote: > On Fri, Jul 10, 2015 at 12:03:54PM -0700, Matt Turner wrote: > > I don't think putting Intel-internal links in the commit message is a good > > idea. > > > > Ken's made similar comments to me. > > > > Also, so much off the wall commentary... > > Maybe my definition of "off the wall" is different than yours. The only thing > off the wall to me, was the bit about missing Mark. It was *some* off the wall > commentary. > > That aside though, I think the internal links is a good point and thing to > discuss... I've had a couple of cases already where I, or Neil benefited from > the Jenkins links being there to try to figure out some later regression. I > can > sympathize with not having internal links in the history since it isn't > accessible to anyone. Earlier, I would have fought somewhat strongly for the > links, except that when Mark moved servers he didn't preserve the old links, > so > that made me feel like it's a lot more transient than I initially felt. > > However, I think it's really valuable for us to have them in the patches, > especially for review by some of the internal folks - like isn't it great to > see > for yourself that I ran it? I suppose I can discard the URLs before pushing. > The > cases I mentioned above would have benefited just as well having the links on > the list and not in the commit history (albeit a bit harder to find). Any > opposition to that? It's worth pointing out in this discussion that old results are pruned after so many days. It's simply not practical for us to store all of the results forever. Dylan signature.asc Description: Digital signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] i965: Mark constant static data as const.
--- src/mesa/drivers/dri/i965/brw_curbe.c | 2 +- src/mesa/drivers/dri/i965/brw_draw_upload.c | 44 ++--- 2 files changed, 23 insertions(+), 23 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_curbe.c b/src/mesa/drivers/dri/i965/brw_curbe.c index befd7a9..a149ce3 100644 --- a/src/mesa/drivers/dri/i965/brw_curbe.c +++ b/src/mesa/drivers/dri/i965/brw_curbe.c @@ -176,7 +176,7 @@ void brw_upload_cs_urb_state(struct brw_context *brw) ADVANCE_BATCH(); } -static GLfloat fixed_plane[6][4] = { +static const GLfloat fixed_plane[6][4] = { { 0,0, -1, 1 }, { 0,0,1, 1 }, { 0, -1,0, 1 }, diff --git a/src/mesa/drivers/dri/i965/brw_draw_upload.c b/src/mesa/drivers/dri/i965/brw_draw_upload.c index 7cb091d..c95f0c3 100644 --- a/src/mesa/drivers/dri/i965/brw_draw_upload.c +++ b/src/mesa/drivers/dri/i965/brw_draw_upload.c @@ -40,7 +40,7 @@ #include "intel_batchbuffer.h" #include "intel_buffer_objects.h" -static GLuint double_types[5] = { +static const GLuint double_types[5] = { 0, BRW_SURFACEFORMAT_R64_FLOAT, BRW_SURFACEFORMAT_R64G64_FLOAT, @@ -48,7 +48,7 @@ static GLuint double_types[5] = { BRW_SURFACEFORMAT_R64G64B64A64_FLOAT }; -static GLuint float_types[5] = { +static const GLuint float_types[5] = { 0, BRW_SURFACEFORMAT_R32_FLOAT, BRW_SURFACEFORMAT_R32G32_FLOAT, @@ -56,7 +56,7 @@ static GLuint float_types[5] = { BRW_SURFACEFORMAT_R32G32B32A32_FLOAT }; -static GLuint half_float_types[5] = { +static const GLuint half_float_types[5] = { 0, BRW_SURFACEFORMAT_R16_FLOAT, BRW_SURFACEFORMAT_R16G16_FLOAT, @@ -64,7 +64,7 @@ static GLuint half_float_types[5] = { BRW_SURFACEFORMAT_R16G16B16A16_FLOAT }; -static GLuint fixed_point_types[5] = { +static const GLuint fixed_point_types[5] = { 0, BRW_SURFACEFORMAT_R32_SFIXED, BRW_SURFACEFORMAT_R32G32_SFIXED, @@ -72,7 +72,7 @@ static GLuint fixed_point_types[5] = { BRW_SURFACEFORMAT_R32G32B32A32_SFIXED, }; -static GLuint uint_types_direct[5] = { +static const GLuint uint_types_direct[5] = { 0, BRW_SURFACEFORMAT_R32_UINT, BRW_SURFACEFORMAT_R32G32_UINT, @@ -80,7 +80,7 @@ static GLuint uint_types_direct[5] = { BRW_SURFACEFORMAT_R32G32B32A32_UINT }; -static GLuint uint_types_norm[5] = { +static const GLuint uint_types_norm[5] = { 0, BRW_SURFACEFORMAT_R32_UNORM, BRW_SURFACEFORMAT_R32G32_UNORM, @@ -88,7 +88,7 @@ static GLuint uint_types_norm[5] = { BRW_SURFACEFORMAT_R32G32B32A32_UNORM }; -static GLuint uint_types_scale[5] = { +static const GLuint uint_types_scale[5] = { 0, BRW_SURFACEFORMAT_R32_USCALED, BRW_SURFACEFORMAT_R32G32_USCALED, @@ -96,7 +96,7 @@ static GLuint uint_types_scale[5] = { BRW_SURFACEFORMAT_R32G32B32A32_USCALED }; -static GLuint int_types_direct[5] = { +static const GLuint int_types_direct[5] = { 0, BRW_SURFACEFORMAT_R32_SINT, BRW_SURFACEFORMAT_R32G32_SINT, @@ -104,7 +104,7 @@ static GLuint int_types_direct[5] = { BRW_SURFACEFORMAT_R32G32B32A32_SINT }; -static GLuint int_types_norm[5] = { +static const GLuint int_types_norm[5] = { 0, BRW_SURFACEFORMAT_R32_SNORM, BRW_SURFACEFORMAT_R32G32_SNORM, @@ -112,7 +112,7 @@ static GLuint int_types_norm[5] = { BRW_SURFACEFORMAT_R32G32B32A32_SNORM }; -static GLuint int_types_scale[5] = { +static const GLuint int_types_scale[5] = { 0, BRW_SURFACEFORMAT_R32_SSCALED, BRW_SURFACEFORMAT_R32G32_SSCALED, @@ -120,7 +120,7 @@ static GLuint int_types_scale[5] = { BRW_SURFACEFORMAT_R32G32B32A32_SSCALED }; -static GLuint ushort_types_direct[5] = { +static const GLuint ushort_types_direct[5] = { 0, BRW_SURFACEFORMAT_R16_UINT, BRW_SURFACEFORMAT_R16G16_UINT, @@ -128,7 +128,7 @@ static GLuint ushort_types_direct[5] = { BRW_SURFACEFORMAT_R16G16B16A16_UINT }; -static GLuint ushort_types_norm[5] = { +static const GLuint ushort_types_norm[5] = { 0, BRW_SURFACEFORMAT_R16_UNORM, BRW_SURFACEFORMAT_R16G16_UNORM, @@ -136,7 +136,7 @@ static GLuint ushort_types_norm[5] = { BRW_SURFACEFORMAT_R16G16B16A16_UNORM }; -static GLuint ushort_types_scale[5] = { +static const GLuint ushort_types_scale[5] = { 0, BRW_SURFACEFORMAT_R16_USCALED, BRW_SURFACEFORMAT_R16G16_USCALED, @@ -144,7 +144,7 @@ static GLuint ushort_types_scale[5] = { BRW_SURFACEFORMAT_R16G16B16A16_USCALED }; -static GLuint short_types_direct[5] = { +static const GLuint short_types_direct[5] = { 0, BRW_SURFACEFORMAT_R16_SINT, BRW_SURFACEFORMAT_R16G16_SINT, @@ -152,7 +152,7 @@ static GLuint short_types_direct[5] = { BRW_SURFACEFORMAT_R16G16B16A16_SINT }; -static GLuint short_types_norm[5] = { +static const GLuint short_types_norm[5] = { 0, BRW_SURFACEFORMAT_R16_SNORM, BRW_SURFACEFORMAT_R16G16_SNORM, @@ -160,7 +160,7 @@ static GLuint short_types_norm[5] = { BRW_SURFACEFORMAT_R16G16B16A16_SNORM }; -static GLuint short_types_scale[5] = { +static const GL
[Mesa-dev] [PATCH 11/13] program: Avoid double promotion.
--- src/mesa/program/prog_execute.c | 30 +++--- 1 file changed, 15 insertions(+), 15 deletions(-) diff --git a/src/mesa/program/prog_execute.c b/src/mesa/program/prog_execute.c index 77274e2..2c52d0d 100644 --- a/src/mesa/program/prog_execute.c +++ b/src/mesa/program/prog_execute.c @@ -623,7 +623,7 @@ _mesa_execute_program(struct gl_context * ctx, GLfloat a[4], result[4]; fetch_vector1(&inst->SrcReg[0], machine, a); result[0] = result[1] = result[2] = result[3] - = (GLfloat) cos(a[0]); + = cosf(a[0]); store_vector4(inst, machine, result); } break; @@ -776,7 +776,7 @@ _mesa_execute_program(struct gl_context * ctx, if (inst->SrcReg[0].File != PROGRAM_UNDEFINED) { GLfloat a[4]; fetch_vector1(&inst->SrcReg[0], machine, a); - cond = (a[0] != 0.0); + cond = (a[0] != 0.0F); } else { cond = eval_condition(machine, inst); @@ -834,7 +834,7 @@ _mesa_execute_program(struct gl_context * ctx, val = -FLT_MAX; } else { - val = (float)(log(a[0]) * 1.442695F); + val = logf(a[0]) * 1.442695F; } result[0] = result[1] = result[2] = result[3] = val; store_vector4(inst, machine, result); @@ -853,10 +853,10 @@ _mesa_execute_program(struct gl_context * ctx, result[1] = a[0]; /* XXX we could probably just use pow() here */ if (a[0] > 0.0F) { - if (a[1] == 0.0 && a[3] == 0.0) + if (a[1] == 0.0F && a[3] == 0.0F) result[2] = 1.0F; else - result[2] = (GLfloat) pow(a[1], a[3]); + result[2] = powf(a[1], a[3]); } else { result[2] = 0.0F; @@ -886,12 +886,12 @@ _mesa_execute_program(struct gl_context * ctx, int exponent; GLfloat mantissa = frexpf(t[0], &exponent); q[0] = (GLfloat) (exponent - 1); - q[1] = (GLfloat) (2.0 * mantissa); /* map [.5, 1) -> [1, 2) */ + q[1] = 2.0F * mantissa; /* map [.5, 1) -> [1, 2) */ /* The fast LOG2 macro doesn't meet the precision * requirements. */ - q[2] = (float)(log(t[0]) * 1.442695F); + q[2] = logf(t[0]) * 1.442695F; } } else { @@ -1051,7 +1051,7 @@ _mesa_execute_program(struct gl_context * ctx, fetch_vector1(&inst->SrcReg[0], machine, a); fetch_vector1(&inst->SrcReg[1], machine, b); result[0] = result[1] = result[2] = result[3] - = (GLfloat) pow(a[0], b[0]); + = powf(a[0], b[0]); store_vector4(inst, machine, result); } break; @@ -1095,10 +1095,10 @@ _mesa_execute_program(struct gl_context * ctx, { GLfloat a[4], result[4]; fetch_vector1(&inst->SrcReg[0], machine, a); -result[0] = (GLfloat) cos(a[0]); -result[1] = (GLfloat) sin(a[0]); -result[2] = 0.0;/* undefined! */ -result[3] = 0.0;/* undefined! */ +result[0] = cosf(a[0]); +result[1] = sinf(a[0]); +result[2] = 0.0F;/* undefined! */ +result[3] = 0.0F;/* undefined! */ store_vector4(inst, machine, result); } break; @@ -1161,7 +1161,7 @@ _mesa_execute_program(struct gl_context * ctx, GLfloat a[4], result[4]; fetch_vector1(&inst->SrcReg[0], machine, a); result[0] = result[1] = result[2] = result[3] - = (GLfloat) sin(a[0]); + = sinf(a[0]); store_vector4(inst, machine, result); } break; @@ -1360,7 +1360,7 @@ _mesa_execute_program(struct gl_context * ctx, * zero, we'd probably be fine except for an assert in * IROUND_POS() which gets triggered by the inf values created. */ -if (texcoord[3] != 0.0) { +if (texcoord[3] != 0.0F) { texcoord[0] /= texcoord[3]; texcoord[1] /= texcoord[3]; texcoord[2] /= texcoord[3]; @@ -1380,7 +1380,7 @@ _mesa_execute_program(struct gl_context * ctx, fetch_vector4(&inst->SrcReg[0], machine, texcoord); if (inst->TexSrcTarget != TEXTURE_CUBE_INDEX && -texcoord[3] != 0.0) { +texcoord[3] != 0.0F) { texcoord[0] /= texcoord[3]; texcoord[1] /= texcoord[3]; texcoord[2] /= texcoord[3]; -- 2.3.6 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.fr
[Mesa-dev] [PATCH 04/13] i965: Use float calculations when double is unnecessary.
Literals without an f/F suffix are of type double, and implicit conversion rules specify that the float in (float op double) be converted to a double before the operation is performed. I believe float execution was intended (in nearly all cases) or is sufficient (in the case of gen7_urb.c). Removes a lot of float <-> double conversion instructions and replaces many double instructions with float instructions which are cheaper. text data bss dec hex filename 4928659 19516026192 5150011 4e953b i965_dri.so before 4928315 19515226192 5149659 4e93db i965_dri.so after --- src/mesa/drivers/dri/i965/brw_blorp_blit.cpp | 22 +++--- src/mesa/drivers/dri/i965/brw_fs.cpp | 4 ++-- src/mesa/drivers/dri/i965/brw_meta_fast_clear.c| 4 ++-- src/mesa/drivers/dri/i965/brw_meta_stencil_blit.c | 4 ++-- src/mesa/drivers/dri/i965/brw_misc_state.c | 4 ++-- src/mesa/drivers/dri/i965/brw_sampler_state.c | 4 ++-- src/mesa/drivers/dri/i965/brw_sf_state.c | 9 + src/mesa/drivers/dri/i965/brw_state_cache.c| 2 +- src/mesa/drivers/dri/i965/brw_util.h | 4 ++-- src/mesa/drivers/dri/i965/gen6_multisample_state.c | 4 ++-- src/mesa/drivers/dri/i965/gen6_sf_state.c | 2 +- src/mesa/drivers/dri/i965/gen7_sf_state.c | 2 +- src/mesa/drivers/dri/i965/gen7_urb.c | 2 +- src/mesa/drivers/dri/i965/gen8_sf_state.c | 2 +- 14 files changed, 35 insertions(+), 34 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp b/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp index 1561b59..205c905 100644 --- a/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp +++ b/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp @@ -1285,8 +1285,8 @@ brw_blorp_blit_program::translate_dst_to_src() /* Round the float coordinates down to nearest integer */ emit_rndd(Xp_f, X_f); emit_rndd(Yp_f, Y_f); - emit_mul(X_f, Xp_f, brw_imm_f(1 / key->x_scale)); - emit_mul(Y_f, Yp_f, brw_imm_f(1 / key->y_scale)); + emit_mul(X_f, Xp_f, brw_imm_f(1.0f / key->x_scale)); + emit_mul(Y_f, Yp_f, brw_imm_f(1.0f / key->y_scale)); SWAP_XY_AND_XPYP(); } else if (!key->bilinear_filter) { /* Round the float coordinates down to nearest integer by moving to @@ -1442,7 +1442,7 @@ brw_blorp_blit_program::manual_blend_average(unsigned num_samples) for (int j = 0; j < 4; ++j) { emit_mul(offset(texture_data[0], 2*j), offset(vec8(texture_data[0]), 2*j), - brw_imm_f(1.0/num_samples)); + brw_imm_f(1.0f / num_samples)); } } @@ -1475,9 +1475,9 @@ brw_blorp_blit_program::manual_blend_bilinear(unsigned num_samples) /* Compute pixel coordinates */ emit_add(vec16(x_sample_coords), Xp_f, - brw_imm_f((float)(i & 0x1) * (1.0 / key->x_scale))); + brw_imm_f((float)(i & 0x1) * (1.0f / key->x_scale))); emit_add(vec16(y_sample_coords), Yp_f, - brw_imm_f((float)((i >> 1) & 0x1) * (1.0 / key->y_scale))); + brw_imm_f((float)((i >> 1) & 0x1) * (1.0f / key->y_scale))); emit_mov(vec16(X), x_sample_coords); emit_mov(vec16(Y), y_sample_coords); @@ -1789,7 +1789,7 @@ brw_blorp_coord_transform_params::setup(GLfloat src0, GLfloat src1, * so 0.5 provides the necessary correction. */ multiplier = scale; - offset = src0 + (-dst0 + 0.5) * scale; + offset = src0 + (-dst0 + 0.5f) * scale; } else { /* When mirroring X we need: * src_x - src_x0 = dst_x1 - dst_x - 0.5 @@ -1797,7 +1797,7 @@ brw_blorp_coord_transform_params::setup(GLfloat src0, GLfloat src1, * src_x = src_x0 + (dst_x1 -dst_x - 0.5) * scale */ multiplier = -scale; - offset = src0 + (dst1 - 0.5) * scale; + offset = src0 + (dst1 - 0.5f) * scale; } } @@ -1952,8 +1952,8 @@ brw_blorp_blit_params::brw_blorp_blit_params(struct brw_context *brw, /* Scaling factors used for bilinear filtering in multisample scaled * blits. */ - wm_prog_key.x_scale = 2.0; - wm_prog_key.y_scale = src_mt->num_samples / 2.0; + wm_prog_key.x_scale = 2.0f; + wm_prog_key.y_scale = src_mt->num_samples / 2.0f; if (filter == GL_LINEAR && src.num_samples <= 1 && dst.num_samples <= 1) wm_prog_key.bilinear_filter = true; @@ -2000,9 +2000,9 @@ brw_blorp_blit_params::brw_blorp_blit_params(struct brw_context *brw, x1 = wm_push_consts.dst_x1 = roundf(dst_x1); y1 = wm_push_consts.dst_y1 = roundf(dst_y1); wm_push_consts.rect_grid_x1 = (minify(src_mt->logical_width0, src_level) * - wm_prog_key.x_scale - 1.0); + wm_prog_key.x_scale - 1.0f); wm_push_consts.rect_grid_y1 = (minify(src_mt->logical_height0, src_level) * - wm_prog_key.y_scale - 1.0); +
[Mesa-dev] [PATCH 06/13] gallium/auxiliary: Avoid double promotion.
--- src/gallium/auxiliary/util/u_format_rgb9e5.h | 2 +- src/gallium/auxiliary/util/u_math.h | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/src/gallium/auxiliary/util/u_format_rgb9e5.h b/src/gallium/auxiliary/util/u_format_rgb9e5.h index 7a01f7f..d1ace3f 100644 --- a/src/gallium/auxiliary/util/u_format_rgb9e5.h +++ b/src/gallium/auxiliary/util/u_format_rgb9e5.h @@ -75,7 +75,7 @@ typedef union { static INLINE float rgb9e5_ClampRange(float x) { - if (x > 0.0) { + if (x > 0.0f) { if (x >= MAX_RGB9E5) { return MAX_RGB9E5; } else { diff --git a/src/gallium/auxiliary/util/u_math.h b/src/gallium/auxiliary/util/u_math.h index 3b4040f..9c3cb6a 100644 --- a/src/gallium/auxiliary/util/u_math.h +++ b/src/gallium/auxiliary/util/u_math.h @@ -240,7 +240,7 @@ util_iround(float f) static INLINE boolean util_is_approx(float a, float b, float tol) { - return fabs(b - a) <= tol; + return fabsf(b - a) <= tol; } -- 2.3.6 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 03/13] gallium/auxiliary: Use exp2(x) instead of pow(2.0, x).
--- src/gallium/auxiliary/util/u_format_rgb9e5.h | 6 +++--- src/gallium/auxiliary/util/u_math.c | 2 +- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/src/gallium/auxiliary/util/u_format_rgb9e5.h b/src/gallium/auxiliary/util/u_format_rgb9e5.h index c2a3f6f..7a01f7f 100644 --- a/src/gallium/auxiliary/util/u_format_rgb9e5.h +++ b/src/gallium/auxiliary/util/u_format_rgb9e5.h @@ -115,8 +115,8 @@ static INLINE unsigned float3_to_rgb9e5(const float rgb[3]) exp_shared = MAX2(-RGB9E5_EXP_BIAS-1, rgb9e5_FloorLog2(maxrgb)) + 1 + RGB9E5_EXP_BIAS; assert(exp_shared <= RGB9E5_MAX_VALID_BIASED_EXP); assert(exp_shared >= 0); - /* This pow function could be replaced by a table. */ - denom = pow(2, exp_shared - RGB9E5_EXP_BIAS - RGB9E5_MANTISSA_BITS); + /* This exp2 function could be replaced by a table. */ + denom = exp2(exp_shared - RGB9E5_EXP_BIAS - RGB9E5_MANTISSA_BITS); maxm = (int) floor(maxrgb / denom + 0.5); if (maxm == MAX_RGB9E5_MANTISSA+1) { @@ -154,7 +154,7 @@ static INLINE void rgb9e5_to_float3(unsigned rgb, float retval[3]) v.raw = rgb; exponent = v.field.biasedexponent - RGB9E5_EXP_BIAS - RGB9E5_MANTISSA_BITS; - scale = (float) pow(2, exponent); + scale = exp2f(exponent); retval[0] = v.field.r * scale; retval[1] = v.field.g * scale; diff --git a/src/gallium/auxiliary/util/u_math.c b/src/gallium/auxiliary/util/u_math.c index ae9e951..c58af91 100644 --- a/src/gallium/auxiliary/util/u_math.c +++ b/src/gallium/auxiliary/util/u_math.c @@ -48,7 +48,7 @@ init_pow2_table(void) { int i; for (i = 0; i < POW2_TABLE_SIZE; i++) - pow2_table[i] = (float) pow(2.0, (i - POW2_TABLE_OFFSET) / POW2_TABLE_SCALE); + pow2_table[i] = exp2f((i - POW2_TABLE_OFFSET) / POW2_TABLE_SCALE); } -- 2.3.6 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 02/13] program: Use exp2(x) instead of pow(2.0, x).
--- src/mesa/program/prog_execute.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/mesa/program/prog_execute.c b/src/mesa/program/prog_execute.c index 46260b5..77274e2 100644 --- a/src/mesa/program/prog_execute.c +++ b/src/mesa/program/prog_execute.c @@ -723,7 +723,7 @@ _mesa_execute_program(struct gl_context * ctx, * result.z = result.x * APPX(result.y) * We do what the ARB extension says. */ - q[2] = (GLfloat) pow(2.0, t[0]); + q[2] = exp2f(t[0]); } q[1] = t[0] - floor_t0; q[3] = 1.0F; @@ -734,7 +734,7 @@ _mesa_execute_program(struct gl_context * ctx, { GLfloat a[4], result[4], val; fetch_vector1(&inst->SrcReg[0], machine, a); -val = (GLfloat) pow(2.0, a[0]); +val = exp2f(a[0]); /* if (IS_INF_OR_NAN(val)) val = 1.0e10; -- 2.3.6 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 07/13] util: Avoid double promition.
--- src/util/register_allocate.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/util/register_allocate.c b/src/util/register_allocate.c index 2ad8c3c..95be20f 100644 --- a/src/util/register_allocate.c +++ b/src/util/register_allocate.c @@ -648,7 +648,7 @@ ra_get_best_spill_node(struct ra_graph *g) float cost = g->nodes[n].spill_cost; float benefit; - if (cost <= 0.0) + if (cost <= 0.0f) continue; if (g->nodes[n].in_stack) -- 2.3.6 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 08/13] vbo: Avoid double promotion.
--- src/mesa/vbo/vbo_context.c| 6 +++--- src/mesa/vbo/vbo_exec_array.c | 4 ++-- 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/src/mesa/vbo/vbo_context.c b/src/mesa/vbo/vbo_context.c index fd1ffe2..e3eb286 100644 --- a/src/mesa/vbo/vbo_context.c +++ b/src/mesa/vbo/vbo_context.c @@ -37,9 +37,9 @@ static GLuint check_size( const GLfloat *attr ) { - if (attr[3] != 1.0) return 4; - if (attr[2] != 0.0) return 3; - if (attr[1] != 0.0) return 2; + if (attr[3] != 1.0F) return 4; + if (attr[2] != 0.0F) return 3; + if (attr[1] != 0.0F) return 2; return 1; } diff --git a/src/mesa/vbo/vbo_exec_array.c b/src/mesa/vbo/vbo_exec_array.c index 72b8206..b73aa97 100644 --- a/src/mesa/vbo/vbo_exec_array.c +++ b/src/mesa/vbo/vbo_exec_array.c @@ -255,7 +255,7 @@ check_array_data(struct gl_context *ctx, struct gl_client_array *array, GLint k; for (k = 0; k < array->Size; k++) { if (IS_INF_OR_NAN(f[k]) || - f[k] >= 1.0e20 || f[k] <= -1.0e10) { + f[k] >= 1.0e20F || f[k] <= -1.0e10F) { printf("Bad array data:\n"); printf(" Element[%u].%u = %f\n", j, k, f[k]); printf(" Array %u at %p\n", attrib, (void* ) array); @@ -263,7 +263,7 @@ check_array_data(struct gl_context *ctx, struct gl_client_array *array, array->Type, array->Size, array->Stride); printf(" Address/offset %p in Buffer Object %u\n", array->Ptr, array->BufferObj->Name); - f[k] = 1.0; /* XXX replace the bad value! */ + f[k] = 1.0F; /* XXX replace the bad value! */ } /*assert(!IS_INF_OR_NAN(f[k]));*/ } -- 2.3.6 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 05/13] nir: Avoid double promition.
--- src/glsl/nir/nir_opcodes.py | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/glsl/nir/nir_opcodes.py b/src/glsl/nir/nir_opcodes.py index 56e96d9..df5b7e2 100644 --- a/src/glsl/nir/nir_opcodes.py +++ b/src/glsl/nir/nir_opcodes.py @@ -474,10 +474,10 @@ else """) opcode("ldexp", 0, tfloat, [0, 0], [tfloat, tint], "", """ -dst = ldexp(src0, src1); +dst = ldexpf(src0, src1); /* flush denormals to zero. */ if (!isnormal(dst)) - dst = copysign(0.0f, src0); + dst = copysignf(0.0f, src0); """) # Combines the first component of each input to make a 2-component vector. -- 2.3.6 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 10/13] swrast: Avoid double promotion.
--- src/mesa/swrast/s_aaline.c| 28 ++-- src/mesa/swrast/s_aalinetemp.h| 4 ++-- src/mesa/swrast/s_atifragshader.c | 4 ++-- src/mesa/swrast/s_copypix.c | 6 +++--- src/mesa/swrast/s_drawpix.c | 12 ++-- src/mesa/swrast/s_fragprog.c | 4 ++-- src/mesa/swrast/s_lines.c | 4 ++-- src/mesa/swrast/s_points.c| 10 +- src/mesa/swrast/s_span.c | 10 +- src/mesa/swrast/s_texcombine.c| 6 +++--- src/mesa/swrast/s_texfilter.c | 8 src/mesa/swrast/s_tritemp.h | 2 +- src/mesa/swrast/s_zoom.c | 2 +- 13 files changed, 50 insertions(+), 50 deletions(-) diff --git a/src/mesa/swrast/s_aaline.c b/src/mesa/swrast/s_aaline.c index f3258e8..de5b42b 100644 --- a/src/mesa/swrast/s_aaline.c +++ b/src/mesa/swrast/s_aaline.c @@ -116,11 +116,11 @@ compute_plane(GLfloat x0, GLfloat y0, GLfloat x1, GLfloat y1, const GLfloat b = pz * py; const GLfloat c = px * px + py * py; const GLfloat d = -(a * x0 + b * y0 + c * z0); - if (a == 0.0 && b == 0.0 && c == 0.0 && d == 0.0) { - plane[0] = 0.0; - plane[1] = 0.0; - plane[2] = 1.0; - plane[3] = 0.0; + if (a == 0.0F && b == 0.0F && c == 0.0F && d == 0.0F) { + plane[0] = 0.0F; + plane[1] = 0.0F; + plane[2] = 1.0F; + plane[3] = 0.0F; } else { plane[0] = a; @@ -135,9 +135,9 @@ compute_plane(GLfloat x0, GLfloat y0, GLfloat x1, GLfloat y1, static inline void constant_plane(GLfloat value, GLfloat plane[4]) { - plane[0] = 0.0; - plane[1] = 0.0; - plane[2] = -1.0; + plane[0] = 0.0F; + plane[1] = 0.0F; + plane[2] = -1.0F; plane[3] = value; } @@ -160,8 +160,8 @@ static inline GLfloat solve_plane_recip(GLfloat x, GLfloat y, const GLfloat plane[4]) { const GLfloat denom = plane[3] + plane[0] * x + plane[1] * y; - if (denom == 0.0) - return 0.0; + if (denom == 0.0F) + return 0.0F; else return -plane[2] / denom; } @@ -374,7 +374,7 @@ segment(struct gl_context *ctx, if (x0 < x1) { xLeft = x0 - line->halfWidth; xRight = x1 + line->halfWidth; - if (line->dy >= 0.0) { + if (line->dy >= 0.0F) { yBot = y0 - 3.0F * line->halfWidth; yTop = y0 + line->halfWidth; } @@ -386,7 +386,7 @@ segment(struct gl_context *ctx, else { xLeft = x1 - line->halfWidth; xRight = x0 + line->halfWidth; - if (line->dy <= 0.0) { + if (line->dy <= 0.0F) { yBot = y1 - 3.0F * line->halfWidth; yTop = y1 + line->halfWidth; } @@ -420,7 +420,7 @@ segment(struct gl_context *ctx, if (y0 < y1) { yBot = y0 - line->halfWidth; yTop = y1 + line->halfWidth; - if (line->dx >= 0.0) { + if (line->dx >= 0.0F) { xLeft = x0 - 3.0F * line->halfWidth; xRight = x0 + line->halfWidth; } @@ -432,7 +432,7 @@ segment(struct gl_context *ctx, else { yBot = y1 - line->halfWidth; yTop = y0 + line->halfWidth; - if (line->dx <= 0.0) { + if (line->dx <= 0.0F) { xLeft = x1 - 3.0F * line->halfWidth; xRight = x1 + line->halfWidth; } diff --git a/src/mesa/swrast/s_aalinetemp.h b/src/mesa/swrast/s_aalinetemp.h index f1d078f..bebb131 100644 --- a/src/mesa/swrast/s_aalinetemp.h +++ b/src/mesa/swrast/s_aalinetemp.h @@ -44,7 +44,7 @@ NAME(plot)(struct gl_context *ctx, struct LineInfo *line, int ix, int iy) (void) swrast; - if (coverage == 0.0) + if (coverage == 0.0F) return; line->span.end++; @@ -123,7 +123,7 @@ NAME(line)(struct gl_context *ctx, const SWvertex *v0, const SWvertex *v1) ctx->Const.MinLineWidthAA, ctx->Const.MaxLineWidthAA); - if (line.len == 0.0 || IS_INF_OR_NAN(line.len)) + if (line.len == 0.0F || IS_INF_OR_NAN(line.len)) return; INIT_SPAN(line.span, GL_LINE); diff --git a/src/mesa/swrast/s_atifragshader.c b/src/mesa/swrast/s_atifragshader.c index 9e029db..2974dee 100644 --- a/src/mesa/swrast/s_atifragshader.c +++ b/src/mesa/swrast/s_atifragshader.c @@ -436,13 +436,13 @@ execute_shader(struct gl_context *ctx, const struct ati_fragment_shader *shader, for (i = 0; i < 3; i++) { dst[optype][i] = (src[optype][2][i] > - 0.5) ? src[optype][0][i] : src[optype][1][i]; + 0.5F) ? src[optype][0][i] : src[optype][1][i]; } } else { dst[optype][3] = (src[optype][2][3] > -0.5) ? src[optype][0][3] : src[optype][1][3]; +0.5F) ? src[optype][0][3] : src[optype][1][3]; } break; diff --g
[Mesa-dev] [PATCH 12/13] mesa/math: Avoid double promotion.
--- src/mesa/math/m_clip_tmp.h | 20 ++--- src/mesa/math/m_matrix.c | 70 +++--- src/mesa/math/m_norm_tmp.h | 2 +- 3 files changed, 46 insertions(+), 46 deletions(-) diff --git a/src/mesa/math/m_clip_tmp.h b/src/mesa/math/m_clip_tmp.h index e289be7..60c0004 100644 --- a/src/mesa/math/m_clip_tmp.h +++ b/src/mesa/math/m_clip_tmp.h @@ -194,13 +194,13 @@ static GLvector4f * TAG(cliptest_points3)( GLvector4f *clip_vec, STRIDE_LOOP { const GLfloat cx = from[0], cy = from[1], cz = from[2]; GLubyte mask = 0; - if (cx > 1.0) mask |= CLIP_RIGHT_BIT; - else if (cx < -1.0) mask |= CLIP_LEFT_BIT; - if (cy > 1.0) mask |= CLIP_TOP_BIT; - else if (cy < -1.0) mask |= CLIP_BOTTOM_BIT; + if (cx > 1.0F) mask |= CLIP_RIGHT_BIT; + else if (cx < -1.0F) mask |= CLIP_LEFT_BIT; + if (cy > 1.0F) mask |= CLIP_TOP_BIT; + else if (cy < -1.0F) mask |= CLIP_BOTTOM_BIT; if (viewport_z_clip) { -if (cz > 1.0) mask |= CLIP_FAR_BIT; -else if (cz < -1.0) mask |= CLIP_NEAR_BIT; +if (cz > 1.0F) mask |= CLIP_FAR_BIT; +else if (cz < -1.0F) mask |= CLIP_NEAR_BIT; } clipMask[i] = mask; tmpOrMask |= mask; @@ -230,10 +230,10 @@ static GLvector4f * TAG(cliptest_points2)( GLvector4f *clip_vec, STRIDE_LOOP { const GLfloat cx = from[0], cy = from[1]; GLubyte mask = 0; - if (cx > 1.0) mask |= CLIP_RIGHT_BIT; - else if (cx < -1.0) mask |= CLIP_LEFT_BIT; - if (cy > 1.0) mask |= CLIP_TOP_BIT; - else if (cy < -1.0) mask |= CLIP_BOTTOM_BIT; + if (cx > 1.0F) mask |= CLIP_RIGHT_BIT; + else if (cx < -1.0F) mask |= CLIP_LEFT_BIT; + if (cy > 1.0F) mask |= CLIP_TOP_BIT; + else if (cy < -1.0F) mask |= CLIP_BOTTOM_BIT; clipMask[i] = mask; tmpOrMask |= mask; tmpAndMask &= mask; diff --git a/src/mesa/math/m_matrix.c b/src/mesa/math/m_matrix.c index 6a42c6c..6522200 100644 --- a/src/mesa/math/m_matrix.c +++ b/src/mesa/math/m_matrix.c @@ -380,7 +380,7 @@ static GLboolean invert_matrix_general( GLmatrix *mat ) if (fabsf(r3[0])>fabsf(r2[0])) SWAP_ROWS(r3, r2); if (fabsf(r2[0])>fabsf(r1[0])) SWAP_ROWS(r2, r1); if (fabsf(r1[0])>fabsf(r0[0])) SWAP_ROWS(r1, r0); - if (0.0 == r0[0]) return GL_FALSE; + if (0.0F == r0[0]) return GL_FALSE; /* eliminate first variable */ m1 = r1[0]/r0[0]; m2 = r2[0]/r0[0]; m3 = r3[0]/r0[0]; @@ -388,31 +388,31 @@ static GLboolean invert_matrix_general( GLmatrix *mat ) s = r0[2]; r1[2] -= m1 * s; r2[2] -= m2 * s; r3[2] -= m3 * s; s = r0[3]; r1[3] -= m1 * s; r2[3] -= m2 * s; r3[3] -= m3 * s; s = r0[4]; - if (s != 0.0) { r1[4] -= m1 * s; r2[4] -= m2 * s; r3[4] -= m3 * s; } + if (s != 0.0F) { r1[4] -= m1 * s; r2[4] -= m2 * s; r3[4] -= m3 * s; } s = r0[5]; - if (s != 0.0) { r1[5] -= m1 * s; r2[5] -= m2 * s; r3[5] -= m3 * s; } + if (s != 0.0F) { r1[5] -= m1 * s; r2[5] -= m2 * s; r3[5] -= m3 * s; } s = r0[6]; - if (s != 0.0) { r1[6] -= m1 * s; r2[6] -= m2 * s; r3[6] -= m3 * s; } + if (s != 0.0F) { r1[6] -= m1 * s; r2[6] -= m2 * s; r3[6] -= m3 * s; } s = r0[7]; - if (s != 0.0) { r1[7] -= m1 * s; r2[7] -= m2 * s; r3[7] -= m3 * s; } + if (s != 0.0F) { r1[7] -= m1 * s; r2[7] -= m2 * s; r3[7] -= m3 * s; } /* choose pivot - or die */ if (fabsf(r3[1])>fabsf(r2[1])) SWAP_ROWS(r3, r2); if (fabsf(r2[1])>fabsf(r1[1])) SWAP_ROWS(r2, r1); - if (0.0 == r1[1]) return GL_FALSE; + if (0.0F == r1[1]) return GL_FALSE; /* eliminate second variable */ m2 = r2[1]/r1[1]; m3 = r3[1]/r1[1]; r2[2] -= m2 * r1[2]; r3[2] -= m3 * r1[2]; r2[3] -= m2 * r1[3]; r3[3] -= m3 * r1[3]; - s = r1[4]; if (0.0 != s) { r2[4] -= m2 * s; r3[4] -= m3 * s; } - s = r1[5]; if (0.0 != s) { r2[5] -= m2 * s; r3[5] -= m3 * s; } - s = r1[6]; if (0.0 != s) { r2[6] -= m2 * s; r3[6] -= m3 * s; } - s = r1[7]; if (0.0 != s) { r2[7] -= m2 * s; r3[7] -= m3 * s; } + s = r1[4]; if (0.0F != s) { r2[4] -= m2 * s; r3[4] -= m3 * s; } + s = r1[5]; if (0.0F != s) { r2[5] -= m2 * s; r3[5] -= m3 * s; } + s = r1[6]; if (0.0F != s) { r2[6] -= m2 * s; r3[6] -= m3 * s; } + s = r1[7]; if (0.0F != s) { r2[7] -= m2 * s; r3[7] -= m3 * s; } /* choose pivot - or die */ if (fabsf(r3[2])>fabsf(r2[2])) SWAP_ROWS(r3, r2); - if (0.0 == r2[2]) return GL_FALSE; + if (0.0F == r2[2]) return GL_FALSE; /* eliminate third variable */ m3 = r3[2]/r2[2]; @@ -421,7 +421,7 @@ static GLboolean invert_matrix_general( GLmatrix *mat ) r3[7] -= m3 * r2[7]; /* last check */ - if (0.0 == r3[3]) return GL_FALSE; + if (0.0F == r3[3]) return GL_FALSE; s = 1.0F/r3[3]; /* now back substitute row 3 */ r3[4] *= s; r3[5] *= s; r3[6] *= s; r3[7] *= s; @@ -490,26 +490,26 @@ static GLboolean invert_matrix_3d_general( GLmatrix *mat ) */ pos = neg = 0.0;
[Mesa-dev] [PATCH 09/13] tnl: Avoid double promotion.
There are a couple of unrelated changes in t_vb_lighttmp.h that I hope you'll excuse -- there's a block of code that's duplicated modulo a few trivial differences that I took the liberty of fixing. --- src/mesa/tnl/t_draw.c | 2 +- src/mesa/tnl/t_rasterpos.c | 6 +++--- src/mesa/tnl/t_vb_fog.c | 6 +++--- src/mesa/tnl/t_vb_light.c | 16 src/mesa/tnl/t_vb_lighttmp.h| 16 +++- src/mesa/tnl/t_vb_normals.c | 4 ++-- src/mesa/tnl/t_vertex_generic.c | 2 +- 7 files changed, 25 insertions(+), 27 deletions(-) diff --git a/src/mesa/tnl/t_draw.c b/src/mesa/tnl/t_draw.c index 6adf1dc..713c6a9 100644 --- a/src/mesa/tnl/t_draw.c +++ b/src/mesa/tnl/t_draw.c @@ -257,7 +257,7 @@ static GLboolean *_tnl_import_edgeflag( struct gl_context *ctx, GLuint i; for (i = 0; i < count; i++) { - *bptr++ = ((GLfloat *)ptr)[0] == 1.0; + *bptr++ = ((GLfloat *)ptr)[0] == 1.0F; ptr += stride; } diff --git a/src/mesa/tnl/t_rasterpos.c b/src/mesa/tnl/t_rasterpos.c index 7ef50ea..4bd9ac8 100644 --- a/src/mesa/tnl/t_rasterpos.c +++ b/src/mesa/tnl/t_rasterpos.c @@ -148,7 +148,7 @@ shade_rastpos(struct gl_context *ctx, SUB_3V(VP, light->_Position, vertex); /* d = length(VP) */ d = (GLfloat) LEN_3FV( VP ); -if (d > 1.0e-6) { +if (d > 1.0e-6F) { /* normalize VP */ GLfloat invd = 1.0F / d; SELF_SCALE_SCALAR_3V(VP, invd); @@ -172,7 +172,7 @@ shade_rastpos(struct gl_context *ctx, } } - if (attenuation < 1e-3) + if (attenuation < 1e-3F) continue; n_dot_VP = DOT3( normal, VP ); @@ -219,7 +219,7 @@ shade_rastpos(struct gl_context *ctx, shine = ctx->Light.Material.Attrib[MAT_ATTRIB_FRONT_SHININESS][0]; spec_coef = powf(n_dot_h, shine); - if (spec_coef > 1.0e-10) { + if (spec_coef > 1.0e-10F) { if (ctx->Light.Model.ColorControl==GL_SEPARATE_SPECULAR_COLOR) { ACC_SCALE_SCALAR_3V( specularContrib, spec_coef, light->_MatSpecular[0]); diff --git a/src/mesa/tnl/t_vb_fog.c b/src/mesa/tnl/t_vb_fog.c index 1ca72f8..5489ed6 100644 --- a/src/mesa/tnl/t_vb_fog.c +++ b/src/mesa/tnl/t_vb_fog.c @@ -45,8 +45,8 @@ struct fog_stage_data { #define FOG_STAGE_DATA(stage) ((struct fog_stage_data *)stage->privatePtr) #define FOG_EXP_TABLE_SIZE 256 -#define FOG_MAX (10.0) -#define EXP_FOG_MAX .0006595 +#define FOG_MAX (10.0F) +#define EXP_FOG_MAX .0006595F #define FOG_INCR (FOG_MAX/FOG_EXP_TABLE_SIZE) static GLfloat exp_table[FOG_EXP_TABLE_SIZE]; static GLfloat inited = 0; @@ -54,7 +54,7 @@ static GLfloat inited = 0; #if 1 #define NEG_EXP( result, narg ) \ do { \ - GLfloat f = (GLfloat) (narg * (1.0/FOG_INCR)); \ + GLfloat f = (GLfloat) (narg * (1.0F / FOG_INCR)); \ GLint k = (GLint) f; \ if (k > FOG_EXP_TABLE_SIZE-2) \ result = (GLfloat) EXP_FOG_MAX; \ diff --git a/src/mesa/tnl/t_vb_light.c b/src/mesa/tnl/t_vb_light.c index dbd57fa..df9073e 100644 --- a/src/mesa/tnl/t_vb_light.c +++ b/src/mesa/tnl/t_vb_light.c @@ -137,23 +137,23 @@ validate_shine_table( struct gl_context *ctx, GLuint side, GLfloat shininess ) break; m = s->tab; - m[0] = 0.0; - if (shininess == 0.0) { + m[0] = 0.0F; + if (shininess == 0.0F) { for (j = 1 ; j <= SHINE_TABLE_SIZE ; j++) - m[j] = 1.0; + m[j] = 1.0F; } else { for (j = 1 ; j < SHINE_TABLE_SIZE ; j++) { GLdouble t, x = j / (GLfloat) (SHINE_TABLE_SIZE - 1); -if (x < 0.005) /* underflow check */ - x = 0.005; +if (x < 0.005F) /* underflow check */ + x = 0.005F; t = pow(x, shininess); - if (t > 1e-20) + if (t > 1e-20F) m[j] = (GLfloat) t; else - m[j] = 0.0; + m[j] = 0.0F; } -m[SHINE_TABLE_SIZE] = 1.0; +m[SHINE_TABLE_SIZE] = 1.0F; } s->shininess = shininess; diff --git a/src/mesa/tnl/t_vb_lighttmp.h b/src/mesa/tnl/t_vb_lighttmp.h index f8786ac..3aebcd4 100644 --- a/src/mesa/tnl/t_vb_lighttmp.h +++ b/src/mesa/tnl/t_vb_lighttmp.h @@ -112,7 +112,7 @@ static void TAG(light_rgba_spec)( struct gl_context *ctx, GLint side; GLfloat contrib[3]; GLfloat attenuation; -GLfloat VP[3]; /* unit vector from vertex to light */ +GLfloat VP[3]; /* unit vector from vertex to light */ GLfloat n_dot_VP; /* n dot VP */ GLfloat *h; @@ -129,7 +129,7 @@ static void TAG(lig
[Mesa-dev] [PATCH 00/13] Avoid double promotion
In the process of looking at the assembly generated from the OUT_BATCH series, I noticed a bunch of float <-> double round trips. Mostly this arises because unsuffixed floating-point literals are of type double, and (float op double) is a double operation in which the float is implicitly converted to double. This leads to some awful code for simple things like float x; ... x += 1.0; x is converted to double, 1.0 is added, and then x is converted back to a float. :( Most of the series is just adding f/F to floating-point literals (I tried to use the suffix the surrounding code used) but there are also some changes like s/sin/sinf/. gcc has a warning flag (-Wdouble-promotion) that identifies places where floats are implicitly cast to doubles (that I only found after I found and fixed a bunch of things by looking at assembly) but it's not really useful in general because float arguments are always cast to double when passed as arguments to varargs functions like printf (why?), and it warns about that, generating a lot of noise. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 01/13] mesa: Use floats for viewport bounds.
ARB_viewport_array specifies that DEPTH_RANGE consists of double- precision parameters (corresponding commit d4dc35987), and a preparatory commit (6340e609a) added _mesa_get_viewport_xform() which returned double-precision scale[3] and translate[3] vectors, even though X, Y, Width, and Height were still floats. All users of _mesa_get_viewport_xform() immediately convert the double scale and translation vectors into floats (which were floats originally, but were converted to doubles in _mesa_get_viewport_xform(), sigh). i965 at least cannot consume doubles (see SF_CLIP_VIEWPORT). If we want to pass doubles to hardware, we should have a different function that does that. --- src/mesa/drivers/dri/i915/i915_state.c | 2 +- src/mesa/drivers/dri/i965/brw_sf_state.c| 2 +- src/mesa/drivers/dri/i965/gen6_viewport_state.c | 2 +- src/mesa/drivers/dri/i965/gen7_viewport_state.c | 2 +- src/mesa/drivers/dri/i965/gen8_viewport_state.c | 2 +- src/mesa/drivers/dri/r200/r200_state.c | 2 +- src/mesa/drivers/dri/radeon/radeon_state.c | 2 +- src/mesa/main/viewport.c| 14 +++--- src/mesa/main/viewport.h| 2 +- src/mesa/math/m_matrix.c| 4 ++-- src/mesa/math/m_matrix.h| 4 ++-- src/mesa/state_tracker/st_atom_viewport.c | 2 +- src/mesa/tnl/t_context.c| 2 +- src/mesa/tnl/t_rasterpos.c | 2 +- 14 files changed, 22 insertions(+), 22 deletions(-) diff --git a/src/mesa/drivers/dri/i915/i915_state.c b/src/mesa/drivers/dri/i915/i915_state.c index 5f10b84..4c83073 100644 --- a/src/mesa/drivers/dri/i915/i915_state.c +++ b/src/mesa/drivers/dri/i915/i915_state.c @@ -402,7 +402,7 @@ void intelCalcViewport(struct gl_context * ctx) { struct intel_context *intel = intel_context(ctx); - double scale[3], translate[3]; + float scale[3], translate[3]; _mesa_get_viewport_xform(ctx, 0, scale, translate); diff --git a/src/mesa/drivers/dri/i965/brw_sf_state.c b/src/mesa/drivers/dri/i965/brw_sf_state.c index 5d98922..3be6e4a 100644 --- a/src/mesa/drivers/dri/i965/brw_sf_state.c +++ b/src/mesa/drivers/dri/i965/brw_sf_state.c @@ -45,7 +45,7 @@ static void upload_sf_vp(struct brw_context *brw) struct gl_context *ctx = &brw->ctx; struct brw_sf_viewport *sfv; GLfloat y_scale, y_bias; - double scale[3], translate[3]; + float scale[3], translate[3]; const bool render_to_fbo = _mesa_is_user_fbo(ctx->DrawBuffer); sfv = brw_state_batch(brw, AUB_TRACE_SF_VP_STATE, diff --git a/src/mesa/drivers/dri/i965/gen6_viewport_state.c b/src/mesa/drivers/dri/i965/gen6_viewport_state.c index 7c8d884..11b9a36 100644 --- a/src/mesa/drivers/dri/i965/gen6_viewport_state.c +++ b/src/mesa/drivers/dri/i965/gen6_viewport_state.c @@ -101,7 +101,7 @@ gen6_upload_sf_vp(struct brw_context *brw) } for (unsigned i = 0; i < ctx->Const.MaxViewports; i++) { - double scale[3], translate[3]; + float scale[3], translate[3]; /* _NEW_VIEWPORT */ _mesa_get_viewport_xform(ctx, i, scale, translate); diff --git a/src/mesa/drivers/dri/i965/gen7_viewport_state.c b/src/mesa/drivers/dri/i965/gen7_viewport_state.c index b655205..c75dc99 100644 --- a/src/mesa/drivers/dri/i965/gen7_viewport_state.c +++ b/src/mesa/drivers/dri/i965/gen7_viewport_state.c @@ -53,7 +53,7 @@ gen7_upload_sf_clip_viewport(struct brw_context *brw) } for (unsigned i = 0; i < ctx->Const.MaxViewports; i++) { - double scale[3], translate[3]; + float scale[3], translate[3]; _mesa_get_viewport_xform(ctx, i, scale, translate); /* According to the "Vertex X,Y Clamping and Quantization" section of diff --git a/src/mesa/drivers/dri/i965/gen8_viewport_state.c b/src/mesa/drivers/dri/i965/gen8_viewport_state.c index 2d8eeb1..2692ad5 100644 --- a/src/mesa/drivers/dri/i965/gen8_viewport_state.c +++ b/src/mesa/drivers/dri/i965/gen8_viewport_state.c @@ -53,7 +53,7 @@ gen8_upload_sf_clip_viewport(struct brw_context *brw) } for (unsigned i = 0; i < ctx->Const.MaxViewports; i++) { - double scale[3], translate[3]; + float scale[3], translate[3]; _mesa_get_viewport_xform(ctx, i, scale, translate); /* _NEW_VIEWPORT: Viewport Matrix Elements */ diff --git a/src/mesa/drivers/dri/r200/r200_state.c b/src/mesa/drivers/dri/r200/r200_state.c index 6fe70b5..68485dd 100644 --- a/src/mesa/drivers/dri/r200/r200_state.c +++ b/src/mesa/drivers/dri/r200/r200_state.c @@ -1546,7 +1546,7 @@ void r200UpdateWindow( struct gl_context *ctx ) GLfloat xoffset = 0; GLfloat yoffset = dPriv ? (GLfloat) dPriv->h : 0; const GLboolean render_to_fbo = (ctx->DrawBuffer ? _mesa_is_user_fbo(ctx->DrawBuffer) : 0); - double scale[3], translate[3]; + float scale[3], translate[3]; GLfloat y_scale, y_bias; if (render_to_fbo) { diff --git a/src/mesa/drivers/dri/radeon/radeon_state.c b/src/mesa/drivers/dri/
[Mesa-dev] [PATCH 13/13] mesa: Avoid double promotion.
--- src/mesa/main/ffvertex_prog.c | 10 +- src/mesa/main/fog.c | 2 +- src/mesa/main/get.c | 2 +- src/mesa/main/light.c | 30 +++--- src/mesa/main/lines.c | 4 ++-- src/mesa/main/multisample.c | 4 ++-- src/mesa/main/pack.c | 14 +++--- src/mesa/main/pixel.c | 4 ++-- src/mesa/main/pixeltransfer.c | 8 src/mesa/main/points.c| 8 src/mesa/main/readpix.c | 4 ++-- src/mesa/main/samplerobj.c| 2 +- src/mesa/main/texparam.c | 2 +- src/mesa/swrast_setup/ss_tritmp.h | 4 ++-- 14 files changed, 49 insertions(+), 49 deletions(-) diff --git a/src/mesa/main/ffvertex_prog.c b/src/mesa/main/ffvertex_prog.c index 70adaf8..95b428d 100644 --- a/src/mesa/main/ffvertex_prog.c +++ b/src/mesa/main/ffvertex_prog.c @@ -189,15 +189,15 @@ static void make_state_key( struct gl_context *ctx, struct state_key *key ) if (light->Enabled) { key->unit[i].light_enabled = 1; - if (light->EyePosition[3] == 0.0) + if (light->EyePosition[3] == 0.0F) key->unit[i].light_eyepos3_is_zero = 1; - if (light->SpotCutoff == 180.0) + if (light->SpotCutoff == 180.0F) key->unit[i].light_spotcutoff_is_180 = 1; - if (light->ConstantAttenuation != 1.0 || - light->LinearAttenuation != 0.0 || - light->QuadraticAttenuation != 0.0) + if (light->ConstantAttenuation != 1.0F || + light->LinearAttenuation != 0.0F || + light->QuadraticAttenuation != 0.0F) key->unit[i].light_attenuated = 1; } } diff --git a/src/mesa/main/fog.c b/src/mesa/main/fog.c index 3bce289..45f343d 100644 --- a/src/mesa/main/fog.c +++ b/src/mesa/main/fog.c @@ -115,7 +115,7 @@ _mesa_Fogfv( GLenum pname, const GLfloat *params ) ctx->Fog.Mode = m; break; case GL_FOG_DENSITY: -if (*params<0.0) { +if (*params<0.0F) { _mesa_error( ctx, GL_INVALID_VALUE, "glFog" ); return; } diff --git a/src/mesa/main/get.c b/src/mesa/main/get.c index 3d6d639..785a9b5 100644 --- a/src/mesa/main/get.c +++ b/src/mesa/main/get.c @@ -626,7 +626,7 @@ find_custom_value(struct gl_context *ctx, const struct value_desc *d, union valu break; case GL_EDGE_FLAG: - v->value_bool = ctx->Current.Attrib[VERT_ATTRIB_EDGEFLAG][0] == 1.0; + v->value_bool = ctx->Current.Attrib[VERT_ATTRIB_EDGEFLAG][0] == 1.0F; break; case GL_READ_BUFFER: diff --git a/src/mesa/main/light.c b/src/mesa/main/light.c index 4021dbe..fe2ce8c 100644 --- a/src/mesa/main/light.c +++ b/src/mesa/main/light.c @@ -143,7 +143,7 @@ _mesa_light(struct gl_context *ctx, GLuint lnum, GLenum pname, const GLfloat *pa COPY_3V(light->SpotDirection, params); break; case GL_SPOT_EXPONENT: - assert(params[0] >= 0.0); + assert(params[0] >= 0.0F); assert(params[0] <= ctx->Const.MaxSpotExponent); if (light->SpotExponent == params[0]) return; @@ -151,12 +151,12 @@ _mesa_light(struct gl_context *ctx, GLuint lnum, GLenum pname, const GLfloat *pa light->SpotExponent = params[0]; break; case GL_SPOT_CUTOFF: - assert(params[0] == 180.0 || (params[0] >= 0.0 && params[0] <= 90.0)); + assert(params[0] == 180.0F || (params[0] >= 0.0F && params[0] <= 90.0F)); if (light->SpotCutoff == params[0]) return; FLUSH_VERTICES(ctx, _NEW_LIGHT); light->SpotCutoff = params[0]; - light->_CosCutoff = (GLfloat) (cos(light->SpotCutoff * M_PI / 180.0)); + light->_CosCutoff = (cosf(light->SpotCutoff * M_PI / 180.0)); if (light->_CosCutoff < 0) light->_CosCutoff = 0; if (light->SpotCutoff != 180.0F) @@ -165,21 +165,21 @@ _mesa_light(struct gl_context *ctx, GLuint lnum, GLenum pname, const GLfloat *pa light->_Flags &= ~LIGHT_SPOT; break; case GL_CONSTANT_ATTENUATION: - assert(params[0] >= 0.0); + assert(params[0] >= 0.0F); if (light->ConstantAttenuation == params[0]) return; FLUSH_VERTICES(ctx, _NEW_LIGHT); light->ConstantAttenuation = params[0]; break; case GL_LINEAR_ATTENUATION: - assert(params[0] >= 0.0); + assert(params[0] >= 0.0F); if (light->LinearAttenuation == params[0]) return; FLUSH_VERTICES(ctx, _NEW_LIGHT); light->LinearAttenuation = params[0]; break; case GL_QUADRATIC_ATTENUATION: - assert(params[0] >= 0.0); + assert(params[0] >= 0.0F); if (light->QuadraticAttenuation == params[0]) return; FLUSH_VERTICES(ctx, _NEW_LIGHT); @@ -238,31 +238,31 @@ _mesa_Lightfv( GLenum light, GLenum pname, const GLfloat *params ) params = temp; break; case GL_SPOT_EXPONENT: - if (params[0]
Re: [Mesa-dev] [PATCH 61/78] i965/nir/vec4: Add skeleton implementation of nir_emit_texture()
On Mon, Jul 13, 2015 at 9:09 AM, Alejandro Piñeiro wrote: > > > On 13/07/15 16:26, Jason Ekstrand wrote: >> >> >> >> > > > Go ahead and make it a cpp file if you have to. >> > > >> > > Ok, thanks. >> > > >> > > > Another option would >> > > > be to do the map the other way: ir_opcode -> nir_tex_op and refactor >> > > > the code to use the nir_tex_op. >> > > >> > > But even if we do the map in the other way, the helper function >> would be >> > > using the enum type defined at glsl/ir.h, so we would need to make >> it a >> > > cpp file in any case. So for now, I will keep things simple, and keep >> > > the current mapping, and do the change later. >> > >> > The difference is that, with the patch I sent a few hours ago, you >> only need to do the conversion one place so there's no need for the >> helper. >> >> For whatever it's worth I'm also OK with just duplicating the 10/lines >> of code to do the conversion and not bothering with the helper. >> > > I just realized that today Francisco Jerez mentioned that is working to > remove the ir->nir texture op mapping on FS, replacing it with backend > instruction opcode: > http://lists.freedesktop.org/archives/mesa-dev/2015-July/088696.html > > So probably it would be better to wait for his change, and avoid the > need of that mapping on the vec4 path too. That would remove the need of > the helper function and the cpp renaming. > > That sounds ok or should we go on independently of Francisco's work? I think it's probably best to just move foward with no assumptions of code-sharing between vec4 and FS. --Jason ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 03/13] gallium/auxiliary: Use exp2(x) instead of pow(2.0, x).
Did you replace 2 of them by exp2f but one by exp2f on purpose? I don't think we can use exp2/exp2f in gallium. This requires msvc 2013 (all of exp2, exp2f, powf are c99, powf is supported by older msvc but the others are not). I guess though could throw some wrappers into c99_math.h. Roland Am 14.07.2015 um 01:22 schrieb Matt Turner: > --- > src/gallium/auxiliary/util/u_format_rgb9e5.h | 6 +++--- > src/gallium/auxiliary/util/u_math.c | 2 +- > 2 files changed, 4 insertions(+), 4 deletions(-) > > diff --git a/src/gallium/auxiliary/util/u_format_rgb9e5.h > b/src/gallium/auxiliary/util/u_format_rgb9e5.h > index c2a3f6f..7a01f7f 100644 > --- a/src/gallium/auxiliary/util/u_format_rgb9e5.h > +++ b/src/gallium/auxiliary/util/u_format_rgb9e5.h > @@ -115,8 +115,8 @@ static INLINE unsigned float3_to_rgb9e5(const float > rgb[3]) > exp_shared = MAX2(-RGB9E5_EXP_BIAS-1, rgb9e5_FloorLog2(maxrgb)) + 1 + > RGB9E5_EXP_BIAS; > assert(exp_shared <= RGB9E5_MAX_VALID_BIASED_EXP); > assert(exp_shared >= 0); > - /* This pow function could be replaced by a table. */ > - denom = pow(2, exp_shared - RGB9E5_EXP_BIAS - RGB9E5_MANTISSA_BITS); > + /* This exp2 function could be replaced by a table. */ > + denom = exp2(exp_shared - RGB9E5_EXP_BIAS - RGB9E5_MANTISSA_BITS); > > maxm = (int) floor(maxrgb / denom + 0.5); > if (maxm == MAX_RGB9E5_MANTISSA+1) { > @@ -154,7 +154,7 @@ static INLINE void rgb9e5_to_float3(unsigned rgb, float > retval[3]) > > v.raw = rgb; > exponent = v.field.biasedexponent - RGB9E5_EXP_BIAS - > RGB9E5_MANTISSA_BITS; > - scale = (float) pow(2, exponent); > + scale = exp2f(exponent); > > retval[0] = v.field.r * scale; > retval[1] = v.field.g * scale; > diff --git a/src/gallium/auxiliary/util/u_math.c > b/src/gallium/auxiliary/util/u_math.c > index ae9e951..c58af91 100644 > --- a/src/gallium/auxiliary/util/u_math.c > +++ b/src/gallium/auxiliary/util/u_math.c > @@ -48,7 +48,7 @@ init_pow2_table(void) > { > int i; > for (i = 0; i < POW2_TABLE_SIZE; i++) > - pow2_table[i] = (float) pow(2.0, (i - POW2_TABLE_OFFSET) / > POW2_TABLE_SCALE); > + pow2_table[i] = exp2f((i - POW2_TABLE_OFFSET) / POW2_TABLE_SCALE); > } > > > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 09/13] mesa: overhaul the glGetCompressedTexImage code
Same idea as the previous patch. --- src/mesa/main/texgetimage.c | 345 +--- 1 file changed, 196 insertions(+), 149 deletions(-) diff --git a/src/mesa/main/texgetimage.c b/src/mesa/main/texgetimage.c index b74517a..bcdcfe8 100644 --- a/src/mesa/main/texgetimage.c +++ b/src/mesa/main/texgetimage.c @@ -1369,109 +1369,161 @@ _mesa_GetTextureImage(GLuint texture, GLint level, GLenum format, GLenum type, /** - * Do error checking for a glGetCompressedTexImage() call. - * \return GL_TRUE if any error, GL_FALSE if no errors. + * Do error checking for getting compressed texture images. + * \return true if any error, false if no errors. */ -static GLboolean +static bool getcompressedteximage_error_check(struct gl_context *ctx, - struct gl_texture_image *texImage, - GLenum target, - GLint level, GLsizei clientMemSize, - GLvoid *img, bool dsa) + struct gl_texture_object *texObj, + GLenum target, GLint level, + GLsizei clientMemSize, GLvoid *img, + const char *caller) { - const GLint maxLevels = _mesa_max_texture_levels(ctx, target); - GLuint compressedSize, dimensions; - const char *suffix = dsa ? "ture" : ""; + struct gl_texture_image *texImage; + GLint maxLevels; + const bool dsa = (target == 0); - assert(texImage); + if (!texObj || texObj->Target == 0) { + _mesa_error(ctx, GL_INVALID_OPERATION, "%s(invalid texture)", caller); + return true; + } + + if (dsa) { + target = texObj->Target; + } if (!legal_getteximage_target(ctx, target, dsa)) { - _mesa_error(ctx, GL_INVALID_ENUM, - "glGetCompressedTex%sImage(target=%s)", suffix, - _mesa_lookup_enum_by_nr(target)); - return GL_TRUE; + _mesa_error(ctx, GL_INVALID_ENUM, "%s(target=%s)", caller, + _mesa_lookup_enum_by_nr(texObj->Target)); + return true; } + maxLevels = _mesa_max_texture_levels(ctx, target); assert(maxLevels != 0); if (level < 0 || level >= maxLevels) { _mesa_error(ctx, GL_INVALID_VALUE, - "glGetCompressedTex%sImage(bad level = %d)", suffix, level); - return GL_TRUE; + "%s(bad level = %d)", caller, level); + return true; + } + + texImage = _mesa_select_tex_image(texObj, target, level); + if (!texImage) { + /* non-existant texture image */ + _mesa_error(ctx, GL_INVALID_VALUE, "%s(level)", caller); + return true; } if (!_mesa_is_format_compressed(texImage->TexFormat)) { _mesa_error(ctx, GL_INVALID_OPERATION, - "glGetCompressedTex%sImage(texture is not compressed)", - suffix); - return GL_TRUE; + "%s(texture is not compressed)", caller); + return true; + } + + return false; +} + + +static GLsizei +packed_compressed_size(GLuint dimensions, mesa_format format, + GLsizei width, GLsizei height, GLsizei depth, + const struct gl_pixelstore_attrib *packing) +{ + struct compressed_pixelstore st; + GLsizei totalBytes; + + _mesa_compute_compressed_pixelstore(dimensions, format, + width, height, depth, + packing, &st); + totalBytes = + (st.CopySlices - 1) * st.TotalRowsPerSlice * st.TotalBytesPerRow + + st.SkipBytes + + (st.CopyRowsPerSlice - 1) * st.TotalBytesPerRow + + st.CopyBytesPerRow; + + return totalBytes; +} + + +/** + * Common helper for all glGetCompressed-teximage functions. + */ +static void +get_compressed_texture_image(struct gl_context *ctx, + struct gl_texture_object *texObj, + GLenum target, GLint level, + GLint xoffset, GLint yoffset, GLint zoffset, + GLsizei width, GLsizei height, GLint depth, + GLsizei bufSize, GLvoid *pixels, + const char *caller) +{ + struct gl_texture_image *texImage; + unsigned firstFace, numFaces, i, imageStride; + GLsizei totalBytes; + GLuint dimensions; + + FLUSH_VERTICES(ctx, 0); + + if (getcompressedteximage_error_check(ctx, texObj, target, level, + bufSize, pixels, caller)) { + return; + } + + texImage = _mesa_select_tex_image(texObj, target, level); + assert(texImage); /* should have been error checked already */ + + /* check sub-image dimensions */ + if (width == WHOLE_WIDTH && height == WHOLE_HEIGHT && depth == WHOLE_DEPTH) { + /* getting whole image */ + width = texImage->Width; + height = texImage->Height; + depth = texImage->Dept
[Mesa-dev] [PATCH 02/13] meta: add offset, width, height parameters to decompress_texture_image()
In preparation for decompressing texture sub images. --- src/mesa/drivers/common/meta.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/src/mesa/drivers/common/meta.c b/src/mesa/drivers/common/meta.c index 54c3d5a..34a8e4b 100644 --- a/src/mesa/drivers/common/meta.c +++ b/src/mesa/drivers/common/meta.c @@ -2966,15 +2966,14 @@ static bool decompress_texture_image(struct gl_context *ctx, struct gl_texture_image *texImage, GLuint slice, + GLint xoffset, GLint yoffset, + GLsizei width, GLsizei height, GLenum destFormat, GLenum destType, GLvoid *dest) { struct decompress_state *decompress = &ctx->Meta->Decompress; struct decompress_fbo_state *decompress_fbo; struct gl_texture_object *texObj = texImage->TexObject; - const GLint width = texImage->Width; - const GLint height = texImage->Height; - const GLint depth = texImage->Height; const GLenum target = texObj->Target; GLenum rbFormat; GLenum faceTarget; @@ -3093,7 +3092,7 @@ decompress_texture_image(struct gl_context *ctx, memset(verts, 0, sizeof(verts)); _mesa_meta_setup_texture_coords(faceTarget, slice, - 0, 0, width, height, + xoffset, yoffset, width, height, texImage->Width, texImage->Height, texImage->Depth, verts[0].tex, @@ -3224,7 +3223,8 @@ _mesa_meta_GetTexImage(struct gl_context *ctx, else { dst = pixels; } - result = decompress_texture_image(ctx, texImage, slice, + result = decompress_texture_image(ctx, texImage, slice, 0, 0, + texImage->Width, texImage->Height, format, type, dst); if (!result) break; -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 11/13] mesa: add API dispatch for GL_ARB_get_texture_sub_image
This adds the new glGetTextureSubImage() and glGetCompressedTextureSubImage() functions. Also update the dispatch sanity test program. --- src/mapi/glapi/gen/ARB_get_texture_sub_image.xml | 42 src/mapi/glapi/gen/Makefile.am | 1 + src/mapi/glapi/gen/gl_API.xml| 3 ++ src/mesa/main/tests/dispatch_sanity.cpp | 5 +++ 4 files changed, 51 insertions(+) create mode 100644 src/mapi/glapi/gen/ARB_get_texture_sub_image.xml diff --git a/src/mapi/glapi/gen/ARB_get_texture_sub_image.xml b/src/mapi/glapi/gen/ARB_get_texture_sub_image.xml new file mode 100644 index 000..6acc92d --- /dev/null +++ b/src/mapi/glapi/gen/ARB_get_texture_sub_image.xml @@ -0,0 +1,42 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +{ + + + + diff --git a/src/mapi/glapi/gen/Makefile.am b/src/mapi/glapi/gen/Makefile.am index 5b163b0..170898c 100644 --- a/src/mapi/glapi/gen/Makefile.am +++ b/src/mapi/glapi/gen/Makefile.am @@ -135,6 +135,7 @@ API_XML = \ ARB_framebuffer_object.xml \ ARB_geometry_shader4.xml \ ARB_get_program_binary.xml \ + ARB_get_texture_sub_image.xml \ ARB_gpu_shader_fp64.xml \ ARB_gpu_shader5.xml \ ARB_instanced_arrays.xml \ diff --git a/src/mapi/glapi/gen/gl_API.xml b/src/mapi/glapi/gen/gl_API.xml index 2f33075..9cc2c3a 100644 --- a/src/mapi/glapi/gen/gl_API.xml +++ b/src/mapi/glapi/gen/gl_API.xml @@ -12890,4 +12890,7 @@ http://www.w3.org/2001/XInclude"/> +http://www.w3.org/2001/XInclude"/> + + diff --git a/src/mesa/main/tests/dispatch_sanity.cpp b/src/mesa/main/tests/dispatch_sanity.cpp index 800720b..cc89acb 100644 --- a/src/mesa/main/tests/dispatch_sanity.cpp +++ b/src/mesa/main/tests/dispatch_sanity.cpp @@ -930,6 +930,11 @@ const struct function common_desktop_functions_possible[] = { /* GL_EXT_polygon_offset_clamp */ { "glPolygonOffsetClampEXT", 11, -1 }, + + /* GL_ARB_get_texture_sub_image */ + { "glGetTextureSubImage", 12, -1 }, + { "glGetCompressedTextureSubImage", 12, -1 }, + { NULL, 0, -1 } }; -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 01/13] meta: handle subimages in _mesa_meta_setup_texture_coords()
v2: fix depth, total_depth mix-up in meta.h, per Laura Ekstrand. --- src/mesa/drivers/common/meta.c | 88 +- src/mesa/drivers/common/meta.h | 6 +- src/mesa/drivers/common/meta_generate_mipmap.c | 4 +- 3 files changed, 65 insertions(+), 33 deletions(-) diff --git a/src/mesa/drivers/common/meta.c b/src/mesa/drivers/common/meta.c index 9a75019..54c3d5a 100644 --- a/src/mesa/drivers/common/meta.c +++ b/src/mesa/drivers/common/meta.c @@ -2449,30 +2449,53 @@ _mesa_meta_Bitmap(struct gl_context *ctx, /** * Compute the texture coordinates for the four vertices of a quad for - * drawing a 2D texture image or slice of a cube/3D texture. + * drawing a 2D texture image or slice of a cube/3D texture. The offset + * and width, height specify a sub-region of the 2D image. + * * \param faceTarget GL_TEXTURE_1D/2D/3D or cube face name * \param slice slice of a 1D/2D array texture or 3D texture - * \param width width of the texture image - * \param height height of the texture image + * \param xoffset X position of sub texture + * \param yoffset Y position of sub texture + * \param width width of the sub texture image + * \param height height of the sub texture image + * \param total_width total width of the texture image + * \param total_height total height of the texture image + * \param total_depth total depth of the texture image * \param coords0/1/2/3 returns the computed texcoords */ void _mesa_meta_setup_texture_coords(GLenum faceTarget, GLint slice, +GLint xoffset, +GLint yoffset, GLint width, GLint height, -GLint depth, +GLint total_width, +GLint total_height, +GLint total_depth, GLfloat coords0[4], GLfloat coords1[4], GLfloat coords2[4], GLfloat coords3[4]) { - static const GLfloat st[4][2] = { - {0.0f, 0.0f}, {1.0f, 0.0f}, {1.0f, 1.0f}, {0.0f, 1.0f} - }; + float st[4][2]; GLuint i; + const float s0 = (float) xoffset / (float) total_width; + const float s1 = (float) (xoffset + width) / (float) total_width; + const float t0 = (float) yoffset / (float) total_height; + const float t1 = (float) (yoffset + height) / (float) total_height; GLfloat r; + /* setup the reference texcoords */ + st[0][0] = s0; + st[0][1] = t0; + st[1][0] = s1; + st[1][1] = t0; + st[2][0] = s1; + st[2][1] = t1; + st[3][0] = s0; + st[3][1] = t1; + if (faceTarget == GL_TEXTURE_CUBE_MAP_ARRAY) faceTarget = GL_TEXTURE_CUBE_MAP_POSITIVE_X + slice % 6; @@ -2489,52 +2512,52 @@ _mesa_meta_setup_texture_coords(GLenum faceTarget, case GL_TEXTURE_3D: case GL_TEXTURE_2D_ARRAY: if (faceTarget == GL_TEXTURE_3D) { - assert(slice < depth); - assert(depth >= 1); - r = (slice + 0.5f) / depth; + assert(slice < total_depth); + assert(total_depth >= 1); + r = (slice + 0.5f) / total_depth; } else if (faceTarget == GL_TEXTURE_2D_ARRAY) r = (float) slice; else r = 0.0F; - coords0[0] = 0.0F; /* s */ - coords0[1] = 0.0F; /* t */ + coords0[0] = st[0][0]; /* s */ + coords0[1] = st[0][1]; /* t */ coords0[2] = r; /* r */ - coords1[0] = 1.0F; - coords1[1] = 0.0F; + coords1[0] = st[1][0]; + coords1[1] = st[1][1]; coords1[2] = r; - coords2[0] = 1.0F; - coords2[1] = 1.0F; + coords2[0] = st[2][0]; + coords2[1] = st[2][1]; coords2[2] = r; - coords3[0] = 0.0F; - coords3[1] = 1.0F; + coords3[0] = st[3][0]; + coords3[1] = st[3][1]; coords3[2] = r; break; case GL_TEXTURE_RECTANGLE_ARB: - coords0[0] = 0.0F; /* s */ - coords0[1] = 0.0F; /* t */ + coords0[0] = (float) xoffset; /* s */ + coords0[1] = (float) yoffset; /* t */ coords0[2] = 0.0F; /* r */ - coords1[0] = (float) width; - coords1[1] = 0.0F; + coords1[0] = (float) (xoffset + width); + coords1[1] = (float) yoffset; coords1[2] = 0.0F; - coords2[0] = (float) width; - coords2[1] = (float) height; + coords2[0] = (float) (xoffset + width); + coords2[1] = (float) (yoffset + height); coords2[2] = 0.0F; - coords3[0] = 0.0F; - coords3[1] = (float) height; + coords3[0] = (float) xoffset; + coords3[1] = (float) (yoffset + height); coords3[2] = 0.0F; break; case GL_TEXTURE_1D_ARRAY: - coords0[0] = 0.0F; /* s */ + coords0[0] = st[0][0]; /* s */ coords0[1] = (float) slice; /* t */ coords0[2] = 0.0F; /* r */ - coords
[Mesa-dev] [PATCH 10/13] mesa: add new _mesa_Get[Compressed]TextureSubImage() functions
Simple implementations in terms of get_[compressed_]texture_image(). --- src/mesa/main/texgetimage.c | 38 ++ src/mesa/main/texgetimage.h | 15 +++ 2 files changed, 53 insertions(+) diff --git a/src/mesa/main/texgetimage.c b/src/mesa/main/texgetimage.c index bcdcfe8..07da5f4 100644 --- a/src/mesa/main/texgetimage.c +++ b/src/mesa/main/texgetimage.c @@ -1367,6 +1367,22 @@ _mesa_GetTextureImage(GLuint texture, GLint level, GLenum format, GLenum type, } +void GLAPIENTRY +_mesa_GetTextureSubImage(GLuint texture, GLint level, + GLint xoffset, GLint yoffset, GLint zoffset, + GLsizei width, GLsizei height, GLsizei depth, + GLenum format, GLenum type, GLsizei bufSize, + void *pixels) +{ + GET_CURRENT_CONTEXT(ctx); + struct gl_texture_object *texObj = _mesa_lookup_texture(ctx, texture); + + get_texture_image(ctx, texObj, 0, level, + xoffset, yoffset, zoffset, width, height, depth, + format, type, bufSize, pixels, "glGetTextureSubImage"); +} + + /** * Do error checking for getting compressed texture images. @@ -1423,6 +1439,10 @@ getcompressedteximage_error_check(struct gl_context *ctx, } +/** + * Compute the number of bytes which will be written when retrieving + * a sub-region of a compressed texture. + */ static GLsizei packed_compressed_size(GLuint dimensions, mesa_format format, GLsizei width, GLsizei height, GLsizei depth, @@ -1637,3 +1657,21 @@ _mesa_GetCompressedTextureImage(GLuint texture, GLint level, bufSize, pixels, "glGetCompressedTextureImage"); } + + +void APIENTRY +_mesa_GetCompressedTextureSubImage(GLuint texture, GLint level, + GLint xoffset, GLint yoffset, + GLint zoffset, GLsizei width, + GLsizei height, GLsizei depth, + GLsizei bufSize, void *pixels) +{ + GET_CURRENT_CONTEXT(ctx); + struct gl_texture_object *texObj = _mesa_lookup_texture(ctx, texture); + + get_compressed_texture_image(ctx, texObj, 0, level, +xoffset, yoffset, zoffset, +width, height, depth, +bufSize, pixels, +"glGetCompressedTextureImage"); +} diff --git a/src/mesa/main/texgetimage.h b/src/mesa/main/texgetimage.h index 040495a..63c75eb 100644 --- a/src/mesa/main/texgetimage.h +++ b/src/mesa/main/texgetimage.h @@ -71,6 +71,14 @@ _mesa_GetTextureImage(GLuint texture, GLint level, GLenum format, GLenum type, GLsizei bufSize, GLvoid *pixels); extern void GLAPIENTRY +_mesa_GetTextureSubImage(GLuint texture, GLint level, + GLint xoffset, GLint yoffset, GLint zoffset, + GLsizei width, GLsizei height, GLsizei depth, + GLenum format, GLenum type, GLsizei bufSize, + void *pixels); + + +extern void GLAPIENTRY _mesa_GetCompressedTexImage(GLenum target, GLint lod, GLvoid *img); extern void GLAPIENTRY @@ -81,4 +89,11 @@ extern void GLAPIENTRY _mesa_GetCompressedTextureImage(GLuint texture, GLint level, GLsizei bufSize, GLvoid *pixels); +extern void APIENTRY +_mesa_GetCompressedTextureSubImage(GLuint texture, GLint level, + GLint xoffset, GLint yoffset, + GLint zoffset, GLsizei width, + GLsizei height, GLsizei depth, + GLsizei bufSize, void *pixels); + #endif /* TEXGETIMAGE_H */ -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 06/13] mesa: replace Driver.GetCompressedTexImage() w/ GetCompressedTexSubImage()
For now, pass offsets of zero and width/height/depth equal to the whole image. --- src/mesa/drivers/common/driverfuncs.c | 2 +- src/mesa/main/dd.h | 9 ++--- src/mesa/main/texgetimage.c| 28 src/mesa/main/texgetimage.h| 9 ++--- src/mesa/state_tracker/st_cb_texture.c | 2 +- 5 files changed, 30 insertions(+), 20 deletions(-) diff --git a/src/mesa/drivers/common/driverfuncs.c b/src/mesa/drivers/common/driverfuncs.c index ce99620..6fe42b1 100644 --- a/src/mesa/drivers/common/driverfuncs.c +++ b/src/mesa/drivers/common/driverfuncs.c @@ -101,7 +101,7 @@ _mesa_init_driver_functions(struct dd_function_table *driver) driver->TestProxyTexImage = _mesa_test_proxy_teximage; driver->CompressedTexImage = _mesa_store_compressed_teximage; driver->CompressedTexSubImage = _mesa_store_compressed_texsubimage; - driver->GetCompressedTexImage = _mesa_GetCompressedTexImage_sw; + driver->GetCompressedTexSubImage = _mesa_GetCompressedTexSubImage_sw; driver->BindTexture = NULL; driver->NewTextureObject = _mesa_new_texture_object; driver->DeleteTexture = _mesa_delete_texture_object; diff --git a/src/mesa/main/dd.h b/src/mesa/main/dd.h index 754abd5..0a54e7a 100644 --- a/src/mesa/main/dd.h +++ b/src/mesa/main/dd.h @@ -335,9 +335,12 @@ struct dd_function_table { /** * Called by glGetCompressedTexImage. */ - void (*GetCompressedTexImage)(struct gl_context *ctx, - struct gl_texture_image *texImage, - GLvoid *data); + void (*GetCompressedTexSubImage)(struct gl_context *ctx, +struct gl_texture_image *texImage, +GLint xoffset, GLint yoffset, +GLint zoffset, GLsizei width, +GLint height, GLint depth, +GLvoid *data); /*@}*/ /** diff --git a/src/mesa/main/texgetimage.c b/src/mesa/main/texgetimage.c index 30a7d06..fb3c2c8 100644 --- a/src/mesa/main/texgetimage.c +++ b/src/mesa/main/texgetimage.c @@ -772,13 +772,16 @@ _mesa_GetTexSubImage_sw(struct gl_context *ctx, /** - * This is the software fallback for Driver.GetCompressedTexImage(). + * This is the software fallback for Driver.GetCompressedTexSubImage(). * All error checking will have been done before this routine is called. */ void -_mesa_GetCompressedTexImage_sw(struct gl_context *ctx, - struct gl_texture_image *texImage, - GLvoid *img) +_mesa_GetCompressedTexSubImage_sw(struct gl_context *ctx, + struct gl_texture_image *texImage, + GLint xoffset, GLint yoffset, + GLint zoffset, GLsizei width, + GLint height, GLint depth, + GLvoid *img) { const GLuint dimensions = _mesa_get_texture_dimensions(texImage->TexObject->Target); @@ -787,10 +790,8 @@ _mesa_GetCompressedTexImage_sw(struct gl_context *ctx, GLubyte *dest; _mesa_compute_compressed_pixelstore(dimensions, texImage->TexFormat, - texImage->Width, texImage->Height, - texImage->Depth, - &ctx->Pack, - &store); + width, height, depth, + &ctx->Pack, &store); if (_mesa_is_bufferobj(ctx->Pack.BufferObj)) { /* pack texture image into a PBO */ @@ -816,8 +817,8 @@ _mesa_GetCompressedTexImage_sw(struct gl_context *ctx, GLubyte *src; /* map src texture buffer */ - ctx->Driver.MapTextureImage(ctx, texImage, slice, - 0, 0, texImage->Width, texImage->Height, + ctx->Driver.MapTextureImage(ctx, texImage, zoffset + slice, + xoffset, yoffset, width, height, GL_MAP_READ_BIT, &src, &srcRowStride); if (src) { @@ -828,7 +829,7 @@ _mesa_GetCompressedTexImage_sw(struct gl_context *ctx, src += srcRowStride; } - ctx->Driver.UnmapTextureImage(ctx, texImage, slice); + ctx->Driver.UnmapTextureImage(ctx, texImage, zoffset + slice); /* Advance to next slice */ dest += store.TotalBytesPerRow * (store.TotalRowsPerSlice - store.CopyRowsPerSlice); @@ -1323,7 +1324,10 @@ _mesa_get_compressed_texture_image(struct gl_context *ctx, _mesa_lock_texture(ctx, texObj); { - ctx->Driver.GetCompressedTexImage(ctx, texImage, pixels); + ctx->Driver.GetCompressedTexSubImage(ctx, texImage, + 0, 0, 0, + texImage->Wi
[Mesa-dev] [PATCH 03/13] mesa: replace Driver.GetTexImage with GetTexSubImage()
The new driver hook has x/y/zoffset and width/height/depth parameters for the new glGetTextureSubImage() function. The meta code and gallium state tracker are updated to handle the new parameters. Callers to Driver.GetTexSubImage() pass in offsets=0 and sizes equal to the whole texture size. --- src/mesa/drivers/common/driverfuncs.c | 2 +- src/mesa/drivers/common/meta.c | 22 -- src/mesa/drivers/common/meta.h | 8 +--- src/mesa/main/dd.h | 10 ++ src/mesa/main/debug.c | 4 +++- src/mesa/main/mipmap.c | 9 ++--- src/mesa/main/texgetimage.c| 15 ++- src/mesa/main/texgetimage.h| 9 + src/mesa/state_tracker/st_cb_texture.c | 33 - 9 files changed, 68 insertions(+), 44 deletions(-) diff --git a/src/mesa/drivers/common/driverfuncs.c b/src/mesa/drivers/common/driverfuncs.c index 71c1a76..ce99620 100644 --- a/src/mesa/drivers/common/driverfuncs.c +++ b/src/mesa/drivers/common/driverfuncs.c @@ -94,7 +94,7 @@ _mesa_init_driver_functions(struct dd_function_table *driver) driver->QuerySamplesForFormat = _mesa_query_samples_for_format; driver->TexImage = _mesa_store_teximage; driver->TexSubImage = _mesa_store_texsubimage; - driver->GetTexImage = _mesa_meta_GetTexImage; + driver->GetTexSubImage = _mesa_meta_GetTexSubImage; driver->ClearTexSubImage = _mesa_meta_ClearTexSubImage; driver->CopyTexSubImage = _mesa_meta_CopyTexSubImage; driver->GenerateMipmap = _mesa_meta_GenerateMipmap; diff --git a/src/mesa/drivers/common/meta.c b/src/mesa/drivers/common/meta.c index 34a8e4b..12045eb 100644 --- a/src/mesa/drivers/common/meta.c +++ b/src/mesa/drivers/common/meta.c @@ -3196,15 +3196,17 @@ decompress_texture_image(struct gl_context *ctx, * from core Mesa. */ void -_mesa_meta_GetTexImage(struct gl_context *ctx, - GLenum format, GLenum type, GLvoid *pixels, - struct gl_texture_image *texImage) +_mesa_meta_GetTexSubImage(struct gl_context *ctx, + GLint xoffset, GLint yoffset, GLint zoffset, + GLsizei width, GLsizei height, GLsizei depth, + GLenum format, GLenum type, GLvoid *pixels, + struct gl_texture_image *texImage) { if (_mesa_is_format_compressed(texImage->TexFormat)) { GLuint slice; bool result = true; - for (slice = 0; slice < texImage->Depth; slice++) { + for (slice = 0; slice < depth; slice++) { void *dst; if (texImage->TexObject->Target == GL_TEXTURE_2D_ARRAY || texImage->TexObject->Target == GL_TEXTURE_CUBE_MAP_ARRAY) { @@ -3216,15 +3218,14 @@ _mesa_meta_GetTexImage(struct gl_context *ctx, struct gl_pixelstore_attrib packing = ctx->Pack; packing.SkipPixels = 0; packing.SkipRows = 0; -dst = _mesa_image_address3d(&packing, pixels, texImage->Width, -texImage->Height, format, type, -slice, 0, 0); +dst = _mesa_image_address3d(&packing, pixels, width, height, +format, type, slice, 0, 0); } else { dst = pixels; } - result = decompress_texture_image(ctx, texImage, slice, 0, 0, - texImage->Width, texImage->Height, + result = decompress_texture_image(ctx, texImage, slice, + xoffset, yoffset, width, height, format, type, dst); if (!result) break; @@ -3234,7 +3235,8 @@ _mesa_meta_GetTexImage(struct gl_context *ctx, return; } - _mesa_GetTexImage_sw(ctx, format, type, pixels, texImage); + _mesa_GetTexSubImage_sw(ctx, xoffset, yoffset, zoffset, + width, height, depth, format, type, pixels, texImage); } diff --git a/src/mesa/drivers/common/meta.h b/src/mesa/drivers/common/meta.h index f5b74c4..fe43915 100644 --- a/src/mesa/drivers/common/meta.h +++ b/src/mesa/drivers/common/meta.h @@ -560,9 +560,11 @@ _mesa_meta_ClearTexSubImage(struct gl_context *ctx, const GLvoid *clearValue); extern void -_mesa_meta_GetTexImage(struct gl_context *ctx, - GLenum format, GLenum type, GLvoid *pixels, - struct gl_texture_image *texImage); +_mesa_meta_GetTexSubImage(struct gl_context *ctx, + GLint xoffset, GLint yoffset, GLint zoffset, + GLsizei width, GLsizei height, GLsizei depth, + GLenum format, GLenum type, GLvoid *pixels, + struct gl_texture_image *texImage); extern void _mesa_meta_DrawTex(struct gl_context *
[Mesa-dev] [PATCH 04/13] mesa: plumb offset/size parameters through GetTexSubImage code
Needed for GL_ARB_get_texture_sub_image. But at this point, the offsets are always zero and the sizes match the whole texture image. v2: Fixes, suggestions from Laura Ekstrand: * Fix calls to ctx->Driver.UnmapTextureImage() to pass the correct slice value. * Added comments and assertions to check zoffset+depth<=tex->Depth before the 'img' loops. * Added a new zoffset==0 assert in get_tex_memcpy(). --- src/mesa/main/texgetimage.c | 137 ++-- 1 file changed, 80 insertions(+), 57 deletions(-) diff --git a/src/mesa/main/texgetimage.c b/src/mesa/main/texgetimage.c index 01f1a15..d17dd52 100644 --- a/src/mesa/main/texgetimage.c +++ b/src/mesa/main/texgetimage.c @@ -75,12 +75,11 @@ type_needs_clamping(GLenum type) */ static void get_tex_depth(struct gl_context *ctx, GLuint dimensions, + GLint xoffset, GLint yoffset, GLint zoffset, + GLsizei width, GLsizei height, GLint depth, GLenum format, GLenum type, GLvoid *pixels, struct gl_texture_image *texImage) { - const GLint width = texImage->Width; - GLint height = texImage->Height; - GLint depth = texImage->Depth; GLint img, row; GLfloat *depthRow = malloc(width * sizeof(GLfloat)); @@ -94,14 +93,15 @@ get_tex_depth(struct gl_context *ctx, GLuint dimensions, height = 1; } + assert(zoffset + depth <= texImage->Depth); for (img = 0; img < depth; img++) { GLubyte *srcMap; GLint srcRowStride; /* map src texture buffer */ - ctx->Driver.MapTextureImage(ctx, texImage, img, - 0, 0, width, height, GL_MAP_READ_BIT, - &srcMap, &srcRowStride); + ctx->Driver.MapTextureImage(ctx, texImage, zoffset + img, + xoffset, yoffset, width, height, + GL_MAP_READ_BIT, &srcMap, &srcRowStride); if (srcMap) { for (row = 0; row < height; row++) { @@ -113,7 +113,7 @@ get_tex_depth(struct gl_context *ctx, GLuint dimensions, _mesa_pack_depth_span(ctx, width, dest, type, depthRow, &ctx->Pack); } - ctx->Driver.UnmapTextureImage(ctx, texImage, img); + ctx->Driver.UnmapTextureImage(ctx, texImage, zoffset + img); } else { _mesa_error(ctx, GL_OUT_OF_MEMORY, "glGetTexImage"); @@ -130,26 +130,26 @@ get_tex_depth(struct gl_context *ctx, GLuint dimensions, */ static void get_tex_depth_stencil(struct gl_context *ctx, GLuint dimensions, + GLint xoffset, GLint yoffset, GLint zoffset, + GLsizei width, GLsizei height, GLint depth, GLenum format, GLenum type, GLvoid *pixels, struct gl_texture_image *texImage) { - const GLint width = texImage->Width; - const GLint height = texImage->Height; - const GLint depth = texImage->Depth; GLint img, row; assert(format == GL_DEPTH_STENCIL); assert(type == GL_UNSIGNED_INT_24_8 || type == GL_FLOAT_32_UNSIGNED_INT_24_8_REV); + assert(zoffset + depth <= texImage->Depth); for (img = 0; img < depth; img++) { GLubyte *srcMap; GLint rowstride; /* map src texture buffer */ - ctx->Driver.MapTextureImage(ctx, texImage, img, - 0, 0, width, height, GL_MAP_READ_BIT, - &srcMap, &rowstride); + ctx->Driver.MapTextureImage(ctx, texImage, zoffset + img, + xoffset, yoffset, width, height, + GL_MAP_READ_BIT, &srcMap, &rowstride); if (srcMap) { for (row = 0; row < height; row++) { @@ -166,7 +166,7 @@ get_tex_depth_stencil(struct gl_context *ctx, GLuint dimensions, } } - ctx->Driver.UnmapTextureImage(ctx, texImage, img); + ctx->Driver.UnmapTextureImage(ctx, texImage, zoffset + img); } else { _mesa_error(ctx, GL_OUT_OF_MEMORY, "glGetTexImage"); @@ -180,12 +180,11 @@ get_tex_depth_stencil(struct gl_context *ctx, GLuint dimensions, */ static void get_tex_stencil(struct gl_context *ctx, GLuint dimensions, +GLint xoffset, GLint yoffset, GLint zoffset, +GLsizei width, GLsizei height, GLint depth, GLenum format, GLenum type, GLvoid *pixels, struct gl_texture_image *texImage) { - const GLint width = texImage->Width; - const GLint height = texImage->Height; - const GLint depth = texImage->Depth; GLint img, row; assert(format == GL_STENCIL_INDEX); @@ -195,8 +194,9 @@ get_tex_stencil(struct gl_context *ctx, GLuint dimensions, GLint rowstride; /* map src texture buffer */ - ctx->Driver.MapTextureImage(ctx, texImage, img, - 0, 0, width, height, GL_MAP_READ_BIT, + ctx->Driver.MapTextureImage(ctx, texImage, z
[Mesa-dev] [PATCH 07/13] mesa: 80-column wrapping in texgetimage.c
--- src/mesa/main/texgetimage.c | 13 - 1 file changed, 8 insertions(+), 5 deletions(-) diff --git a/src/mesa/main/texgetimage.c b/src/mesa/main/texgetimage.c index fb3c2c8..e180a4c 100644 --- a/src/mesa/main/texgetimage.c +++ b/src/mesa/main/texgetimage.c @@ -449,7 +449,8 @@ get_tex_rgba_uncompressed(struct gl_context *ctx, GLuint dimensions, rebaseSwizzle[1] = MESA_FORMAT_SWIZZLE_ZERO; rebaseSwizzle[2] = MESA_FORMAT_SWIZZLE_ZERO; rebaseSwizzle[3] = MESA_FORMAT_SWIZZLE_W; -} else if (texImage->_BaseFormat != _mesa_get_format_base_format(texFormat)) { +} else if (texImage->_BaseFormat != + _mesa_get_format_base_format(texFormat)) { needsRebase = _mesa_compute_rgba2base2rgba_component_mapping(texImage->_BaseFormat, rebaseSwizzle); @@ -531,8 +532,8 @@ get_tex_rgba_uncompressed(struct gl_context *ctx, GLuint dimensions, /* If we had to rebase, we have already handled that */ needsRebase = false; - /* If we were lucky and our RGBA conversion matches the dst format, then - * we are done. + /* If we were lucky and our RGBA conversion matches the dst format, + * then we are done. */ if (!need_convert) goto do_swap; @@ -832,7 +833,8 @@ _mesa_GetCompressedTexSubImage_sw(struct gl_context *ctx, ctx->Driver.UnmapTextureImage(ctx, texImage, zoffset + slice); /* Advance to next slice */ - dest += store.TotalBytesPerRow * (store.TotalRowsPerSlice - store.CopyRowsPerSlice); + dest += store.TotalBytesPerRow * (store.TotalRowsPerSlice - + store.CopyRowsPerSlice); } else { _mesa_error(ctx, GL_OUT_OF_MEMORY, "glGetCompresssedTexImage"); @@ -953,7 +955,8 @@ getteximage_error_check(struct gl_context *ctx, "glGetTex%sImage(format mismatch)", suffix); return GL_TRUE; } - else if (!_mesa_is_stencil_format(format) && _mesa_is_enum_format_integer(format) != + else if (!_mesa_is_stencil_format(format) && +_mesa_is_enum_format_integer(format) != _mesa_is_format_integer(texImage->TexFormat)) { _mesa_error(ctx, GL_INVALID_OPERATION, "glGetTex%sImage(format mismatch)", suffix); -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 05/13] mesa: make _mesa_get_[compressed_]texture_image() static
These functions are only called from teximage.c --- src/mesa/main/texgetimage.c | 24 src/mesa/main/texgetimage.h | 7 --- 2 files changed, 12 insertions(+), 19 deletions(-) diff --git a/src/mesa/main/texgetimage.c b/src/mesa/main/texgetimage.c index d17dd52..30a7d06 100644 --- a/src/mesa/main/texgetimage.c +++ b/src/mesa/main/texgetimage.c @@ -1004,12 +1004,12 @@ getteximage_error_check(struct gl_context *ctx, * \param dsa True when the caller is an ARB_direct_state_access function, *false otherwise */ -void -_mesa_get_texture_image(struct gl_context *ctx, -struct gl_texture_object *texObj, -struct gl_texture_image *texImage, GLenum target, -GLint level, GLenum format, GLenum type, -GLsizei bufSize, GLvoid *pixels, bool dsa) +static void +get_texture_image(struct gl_context *ctx, + struct gl_texture_object *texObj, + struct gl_texture_image *texImage, GLenum target, + GLint level, GLenum format, GLenum type, + GLsizei bufSize, GLvoid *pixels, bool dsa) { assert(texObj); assert(texImage); @@ -1103,8 +1103,8 @@ _mesa_GetnTexImageARB(GLenum target, GLint level, GLenum format, if (!texImage) return; - _mesa_get_texture_image(ctx, texObj, texImage, target, level, format, type, - bufSize, pixels, false); + get_texture_image(ctx, texObj, texImage, target, level, format, type, + bufSize, pixels, false); } @@ -1179,8 +1179,8 @@ _mesa_GetTextureImage(GLuint texture, GLint level, GLenum format, texImage = texObj->Image[i][level]; assert(texImage); - _mesa_get_texture_image(ctx, texObj, texImage, texObj->Target, level, - format, type, bufSize, pixels, true); + get_texture_image(ctx, texObj, texImage, texObj->Target, level, + format, type, bufSize, pixels, true); image_stride = _mesa_image_image_stride(&ctx->Pack, texImage->Width, texImage->Height, format, @@ -1194,8 +1194,8 @@ _mesa_GetTextureImage(GLuint texture, GLint level, GLenum format, if (!texImage) return; - _mesa_get_texture_image(ctx, texObj, texImage, texObj->Target, level, - format, type, bufSize, pixels, true); + get_texture_image(ctx, texObj, texImage, texObj->Target, level, +format, type, bufSize, pixels, true); } } diff --git a/src/mesa/main/texgetimage.h b/src/mesa/main/texgetimage.h index 611b1bd..5124851 100644 --- a/src/mesa/main/texgetimage.h +++ b/src/mesa/main/texgetimage.h @@ -49,13 +49,6 @@ _mesa_GetCompressedTexImage_sw(struct gl_context *ctx, GLvoid *data); extern void -_mesa_get_texture_image(struct gl_context *ctx, -struct gl_texture_object *texObj, -struct gl_texture_image *texImage, GLenum target, -GLint level, GLenum format, GLenum type, -GLsizei bufSize, GLvoid *pixels, bool dsa); - -extern void _mesa_get_compressed_texture_image( struct gl_context *ctx, struct gl_texture_object *texObj, struct gl_texture_image *texImage, -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 12/13] mesa: enable GL_ARB_get_texture_sub_image for all drivers
--- src/mesa/main/extensions.c | 1 + 1 file changed, 1 insertion(+) diff --git a/src/mesa/main/extensions.c b/src/mesa/main/extensions.c index 4176a69..d2df4fb 100644 --- a/src/mesa/main/extensions.c +++ b/src/mesa/main/extensions.c @@ -121,6 +121,7 @@ static const struct extension extension_table[] = { { "GL_ARB_framebuffer_object", o(ARB_framebuffer_object), GL, 2005 }, { "GL_ARB_framebuffer_sRGB",o(EXT_framebuffer_sRGB), GL, 1998 }, { "GL_ARB_get_program_binary", o(dummy_true), GL, 2010 }, + { "GL_ARB_get_texture_sub_image", o(dummy_true), GL, 2014 }, { "GL_ARB_gpu_shader5", o(ARB_gpu_shader5), GLC,2010 }, { "GL_ARB_gpu_shader_fp64", o(ARB_gpu_shader_fp64), GLC,2010 }, { "GL_ARB_half_float_pixel",o(dummy_true), GL, 2003 }, -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 08/13] mesa: overhaul the glGetTexImage code
1. Reorganize the error checking code. 2. Lay groundwork for getting sub images. 3. Implement _mesa_GetnTexImageARB(), _mesa_GetTexImage() and _mesa_GetTextureImage() all in terms of get_texture_image(). --- src/mesa/main/texgetimage.c | 539 +--- 1 file changed, 352 insertions(+), 187 deletions(-) diff --git a/src/mesa/main/texgetimage.c b/src/mesa/main/texgetimage.c index e180a4c..b74517a 100644 --- a/src/mesa/main/texgetimage.c +++ b/src/mesa/main/texgetimage.c @@ -49,6 +49,13 @@ #include "format_utils.h" #include "pixeltransfer.h" + +/** Some magic numbers used when getting whole texture image */ +#define WHOLE_WIDTH -123 +#define WHOLE_HEIGHT -123 +#define WHOLE_DEPTH -123 + + /** * Can the given type represent negative values? */ @@ -891,27 +898,59 @@ legal_getteximage_target(struct gl_context *ctx, GLenum target, bool dsa) /** - * Do error checking for a glGetTex(ture)Image() call. - * \return GL_TRUE if any error, GL_FALSE if no errors. + * Do error checking for all (non-compressed) get-texture-image functions. + * \param target user-provided texture target, or 0 if called from DSA function + * \return true if any error, false if no errors. */ -static GLboolean +static bool getteximage_error_check(struct gl_context *ctx, -struct gl_texture_image *texImage, +struct gl_texture_object *texObj, GLenum target, GLint level, GLenum format, GLenum type, GLsizei clientMemSize, -GLvoid *pixels, bool dsa) +GLvoid *pixels, const char *caller) { - const GLint maxLevels = _mesa_max_texture_levels(ctx, target); - const GLuint dimensions = (target == GL_TEXTURE_3D) ? 3 : 2; - GLenum baseFormat; - const char *suffix = dsa ? "ture" : ""; + struct gl_texture_image *texImage; + GLenum baseFormat, err; + GLint maxLevels; + const bool dsa = (target == 0); - assert(texImage); - assert(maxLevels != 0); + if (!texObj || texObj->Target == 0) { + _mesa_error(ctx, GL_INVALID_OPERATION, "%s(invalid texture)", caller); + return true; + } + + if (dsa) { + target = texObj->Target; + } + + if (!legal_getteximage_target(ctx, target, dsa)) { + _mesa_error(ctx, GL_INVALID_ENUM, "%s(target=%s)", caller, + _mesa_lookup_enum_by_nr(texObj->Target)); + return true; + } + + err = _mesa_error_check_format_and_type(ctx, format, type); + if (err != GL_NO_ERROR) { + _mesa_error(ctx, err, "%s(format/type)", caller); + return true; + } + + maxLevels = _mesa_max_texture_levels(ctx, target); if (level < 0 || level >= maxLevels) { _mesa_error(ctx, GL_INVALID_VALUE, - "glGetTex%sImage(level out of range)", suffix); - return GL_TRUE; + "%s(level = %d)", caller, level); + return true; + } + + if (target == GL_TEXTURE_CUBE_MAP) { + target = GL_TEXTURE_CUBE_MAP_POSITIVE_X; + } + + texImage = _mesa_select_tex_image(texObj, target, level); + if (!texImage) { + /* non-existant texture image */ + _mesa_error(ctx, GL_INVALID_OPERATION, "%s", caller); + return true; } /* @@ -927,247 +966,325 @@ getteximage_error_check(struct gl_context *ctx, if (_mesa_is_color_format(format) && !_mesa_is_color_format(baseFormat)) { _mesa_error(ctx, GL_INVALID_OPERATION, - "glGetTex%sImage(format mismatch)", suffix); - return GL_TRUE; + "%s(format mismatch)", caller); + return true; } else if (_mesa_is_depth_format(format) && !_mesa_is_depth_format(baseFormat) && !_mesa_is_depthstencil_format(baseFormat)) { _mesa_error(ctx, GL_INVALID_OPERATION, - "glGetTex%sImage(format mismatch)", suffix); - return GL_TRUE; + "%s(format mismatch)", caller); + return true; } else if (_mesa_is_stencil_format(format) && !ctx->Extensions.ARB_texture_stencil8) { _mesa_error(ctx, GL_INVALID_ENUM, - "glGetTex%sImage(format=GL_STENCIL_INDEX)", suffix); - return GL_TRUE; + "%s(format=GL_STENCIL_INDEX)", caller); + return true; } else if (_mesa_is_ycbcr_format(format) && !_mesa_is_ycbcr_format(baseFormat)) { _mesa_error(ctx, GL_INVALID_OPERATION, - "glGetTex%sImage(format mismatch)", suffix); - return GL_TRUE; + "%s(format mismatch)", caller); + return true; } else if (_mesa_is_depthstencil_format(format) && !_mesa_is_depthstencil_format(baseFormat)) { _mesa_error(ctx, GL_INVALID_OPERATION, - "glGetTex%sImage(format mismatch)", suffix); - return GL_TRUE; + "%s(format mismatch)", caller); + return true; } else if (!_mesa_is_stencil_format(forma
[Mesa-dev] [PATCH 13/13] docs: document that GL_ARB_get_texture_sub_image is completed
--- docs/GL3.txt | 2 +- docs/relnotes/10.7.0.html | 1 + 2 files changed, 2 insertions(+), 1 deletion(-) diff --git a/docs/GL3.txt b/docs/GL3.txt index 33a282e..2144257 100644 --- a/docs/GL3.txt +++ b/docs/GL3.txt @@ -202,7 +202,7 @@ GL 4.5, GLSL 4.50: - Sampler object DONE - Program Pipeline objectDONE - Query object DONE (will require changes when GL_ARB_query_buffer_object lands) - GL_ARB_get_texture_sub_image started (Brian Paul) + GL_ARB_get_texture_sub_image DONE (all drivers) GL_ARB_shader_texture_image_samples not started GL_ARB_texture_barrier DONE (nv50, nvc0, r600, radeonsi) GL_KHR_context_flush_control DONE (all - but needs GLX/EXT extension to be useful) diff --git a/docs/relnotes/10.7.0.html b/docs/relnotes/10.7.0.html index 42ea807..4533e78 100644 --- a/docs/relnotes/10.7.0.html +++ b/docs/relnotes/10.7.0.html @@ -44,6 +44,7 @@ Note: some of the new features are only available with certain drivers. +GL_ARB_get_texture_sub_image for all drivers GL_AMD_vertex_shader_viewport_index on radeonsi GL_ARB_fragment_layer_viewport on radeonsi GL_ARB_framebuffer_no_attachments on i965 -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 90264] [Regression, bisected] Tooltip corruption in Chrome
https://bugs.freedesktop.org/show_bug.cgi?id=90264 Shawn Starr changed: What|Removed |Added CC||shawn.st...@rogers.com -- You are receiving this mail because: You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 03/13] gallium/auxiliary: Use exp2(x) instead of pow(2.0, x).
On Mon, Jul 13, 2015 at 5:13 PM, Roland Scheidegger wrote: > Did you replace 2 of them by exp2f but one by exp2f on purpose? Yes, two of them are converting their results to floats so they were converted to exp2f(). Upon inspection, the argument of the one exp2 is just an integer... can't we convert that to (double)(1 << arg)? > I don't think we can use exp2/exp2f in gallium. This requires msvc 2013 > (all of exp2, exp2f, powf are c99, powf is supported by older msvc but > the others are not). I guess though could throw some wrappers into > c99_math.h. Yeah, if we need to put wrappers in c99_math.h I can do that. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 11/13] mesa: add API dispatch for GL_ARB_get_texture_sub_image
On Jul 13, 2015 9:21 PM, "Brian Paul" wrote: > > This adds the new glGetTextureSubImage() and > glGetCompressedTextureSubImage() functions. Also update the > dispatch sanity test program. > --- > src/mapi/glapi/gen/ARB_get_texture_sub_image.xml | 42 > src/mapi/glapi/gen/Makefile.am | 1 + > src/mapi/glapi/gen/gl_API.xml| 3 ++ > src/mesa/main/tests/dispatch_sanity.cpp | 5 +++ > 4 files changed, 51 insertions(+) > create mode 100644 src/mapi/glapi/gen/ARB_get_texture_sub_image.xml > > diff --git a/src/mapi/glapi/gen/ARB_get_texture_sub_image.xml b/src/mapi/glapi/gen/ARB_get_texture_sub_image.xml > new file mode 100644 > index 000..6acc92d > --- /dev/null > +++ b/src/mapi/glapi/gen/ARB_get_texture_sub_image.xml > @@ -0,0 +1,42 @@ > + > + > + > + > + > + > + > + > + > + > + > + > + > + > + > + > + > + > + > + > + > + > + > + > + > + > + > + > + > + > + > + > + > + > + > + > + > +{ I'm surprised this parsed ok... > + > + > + > + > diff --git a/src/mapi/glapi/gen/Makefile.am b/src/mapi/glapi/gen/Makefile.am > index 5b163b0..170898c 100644 > --- a/src/mapi/glapi/gen/Makefile.am > +++ b/src/mapi/glapi/gen/Makefile.am > @@ -135,6 +135,7 @@ API_XML = \ > ARB_framebuffer_object.xml \ > ARB_geometry_shader4.xml \ > ARB_get_program_binary.xml \ > + ARB_get_texture_sub_image.xml \ > ARB_gpu_shader_fp64.xml \ > ARB_gpu_shader5.xml \ > ARB_instanced_arrays.xml \ > diff --git a/src/mapi/glapi/gen/gl_API.xml b/src/mapi/glapi/gen/gl_API.xml > index 2f33075..9cc2c3a 100644 > --- a/src/mapi/glapi/gen/gl_API.xml > +++ b/src/mapi/glapi/gen/gl_API.xml > @@ -12890,4 +12890,7 @@ > > http://www.w3.org/2001/XInclude"/> > > +http://www.w3.org/2001/XInclude"/> I believe these are usually included in ext number order. > + > + > > diff --git a/src/mesa/main/tests/dispatch_sanity.cpp b/src/mesa/main/tests/dispatch_sanity.cpp > index 800720b..cc89acb 100644 > --- a/src/mesa/main/tests/dispatch_sanity.cpp > +++ b/src/mesa/main/tests/dispatch_sanity.cpp > @@ -930,6 +930,11 @@ const struct function common_desktop_functions_possible[] = { > > /* GL_EXT_polygon_offset_clamp */ > { "glPolygonOffsetClampEXT", 11, -1 }, > + > + /* GL_ARB_get_texture_sub_image */ > + { "glGetTextureSubImage", 12, -1 }, > + { "glGetCompressedTextureSubImage", 12, -1 }, > + > { NULL, 0, -1 } > }; > > -- > 1.9.1 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] i965: Fix 32 bit build warnings in intel_get_yf_ys_bo_size()
On Monday, July 13, 2015 03:35:16 PM Anuj Phogat wrote: > Along with fixing the type of pitch parameter, patch also changes > the types of few local variables and function return type. > > Warnings fixed are: > intel_mipmap_tree.c:671:7: warning: passing argument 3 of > 'intel_get_yf_ys_bo_size' from incompatible pointer type > > intel_mipmap_tree.c:563:1: note: expected 'uint64_t *' but > argument is of type 'long unsigned int *' > > Reported-by: Kenneth Graunke > Signed-off-by: Anuj Phogat > --- > src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) > > diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c > b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c > index fb896a9..1529651 100644 > --- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c > +++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c > @@ -559,14 +559,14 @@ intel_lower_compressed_format(struct brw_context *brw, > mesa_format format) > } > > /* This function computes Yf/Ys tiled bo size, alignment and pitch. */ > -static uint64_t > +static unsigned long > intel_get_yf_ys_bo_size(struct intel_mipmap_tree *mt, unsigned *alignment, > -uint64_t *pitch) > +unsigned long *pitch) > { > const uint32_t bpp = mt->cpp * 8; > const uint32_t aspect_ratio = (bpp == 16 || bpp == 64) ? 2 : 1; > uint32_t tile_width, tile_height; > - uint64_t stride, size, aligned_y; > + unsigned long stride, size, aligned_y; > > assert(mt->tr_mode != INTEL_MIPTREE_TRMODE_NONE); This looks good to me. I don't think anything should overflow, though I didn't check thoroughly. Thanks Anuj :) Reviewed-by: Kenneth Graunke signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] glsl: free interface_types and replace old hash_table uses
On Monday, July 13, 2015 11:21:05 AM Iago Toral wrote: > On Sat, 2015-07-11 at 10:13 +1000, Timothy Arceri wrote: > > @@ -648,27 +653,28 @@ glsl_type::get_array_instance(const glsl_type *base, > > unsigned array_size) > > mtx_lock(&glsl_type::mutex); > > > > if (array_types == NULL) { > > - array_types = hash_table_ctor(64, hash_table_string_hash, > > - hash_table_string_compare); > > + array_types = _mesa_hash_table_create(NULL, _mesa_key_hash_string, > > +_mesa_key_string_equal); > > } > > > > - const glsl_type *t = (glsl_type *) hash_table_find(array_types, key); > > - > > - if (t == NULL) { > > + const struct hash_entry *entry = _mesa_hash_table_search(array_types, > > key); > > + if (entry == NULL) { > >mtx_unlock(&glsl_type::mutex); > > - t = new glsl_type(base, array_size); > > + const glsl_type *t = new glsl_type(base, array_size); > >mtx_lock(&glsl_type::mutex); > > > > - hash_table_insert(array_types, (void *) t, ralloc_strdup(mem_ctx, > > key)); > > + entry = _mesa_hash_table_insert(array_types, > > + ralloc_strdup(mem_ctx, key), > > + (void *) t); > > } > > > > - assert(t->base_type == GLSL_TYPE_ARRAY); > > - assert(t->length == array_size); > > - assert(t->fields.array == base); > > + assert(((glsl_type *)entry->data)->base_type == GLSL_TYPE_ARRAY); > > + assert(((glsl_type *)entry->data)->length == array_size); > > + assert(((glsl_type *)entry->data)->fields.array == base); > > Other parts of this file put a blank between the type cast and the > variable, so I would add that here (and in all other places where you > cast entry to glsl_type* in this patch). Or...why not continue to have a local variable t, and just set t = entry->data? Then these could all stay the same, and there would be less casting. signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] glsl: free interface_types and replace old hash_table uses
On Mon, 2015-07-13 at 22:19 -0700, Kenneth Graunke wrote: > On Monday, July 13, 2015 11:21:05 AM Iago Toral wrote: > > On Sat, 2015-07-11 at 10:13 +1000, Timothy Arceri wrote: > > > @@ -648,27 +653,28 @@ glsl_type::get_array_instance(const glsl_type > > > *base, unsigned array_size) > > > mtx_lock(&glsl_type::mutex); > > > > > > if (array_types == NULL) { > > > - array_types = hash_table_ctor(64, hash_table_string_hash, > > > - hash_table_string_compare); > > > + array_types = _mesa_hash_table_create(NULL, > > > _mesa_key_hash_string, > > > +_mesa_key_string_equal); > > > } > > > > > > - const glsl_type *t = (glsl_type *) hash_table_find(array_types, > > > key); > > > - > > > - if (t == NULL) { > > > + const struct hash_entry *entry = > > > _mesa_hash_table_search(array_types, key); > > > + if (entry == NULL) { > > >mtx_unlock(&glsl_type::mutex); > > > - t = new glsl_type(base, array_size); > > > + const glsl_type *t = new glsl_type(base, array_size); > > >mtx_lock(&glsl_type::mutex); > > > > > > - hash_table_insert(array_types, (void *) t, ralloc_strdup(mem_ctx, > > > key)); > > > + entry = _mesa_hash_table_insert(array_types, > > > + ralloc_strdup(mem_ctx, key), > > > + (void *) t); > > > } > > > > > > - assert(t->base_type == GLSL_TYPE_ARRAY); > > > - assert(t->length == array_size); > > > - assert(t->fields.array == base); > > > + assert(((glsl_type *)entry->data)->base_type == GLSL_TYPE_ARRAY); > > > + assert(((glsl_type *)entry->data)->length == array_size); > > > + assert(((glsl_type *)entry->data)->fields.array == base); > > > > Other parts of this file put a blank between the type cast and the > > variable, so I would add that here (and in all other places where you > > cast entry to glsl_type* in this patch). > > Or...why not continue to have a local variable t, and just set t = > entry->data? Then these could all stay the same, and there would be > less casting. I did have it like that but in my opinion it just looked messy. 2 extra lines vs 3 extra casts all of which are in asserts(), it felt like I was making the surrounding code worse for the sake of the asserts. I've pushed this already so I guess it doesn't matter now anyway. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [RFC] loader: libudev vs sysfs vs libdrm
>> - libdrm: used as a last resource fall-back after the above two. the >> sole option used by *BSD, MacOS and Android. Hi, Unless I am missing something, libdrm is not used on MacOS. On osx, EGL_PLATFORM=x11 eglGetDisplay(NULL) calls dri2_initialize_x11 which fallback to dri2_initialize_x11_swrast. Though libdrm can be built and then EGL_PLATFORM=drm eglGetDisplay(NULL) calls dri2_initialize_drm which fallback to swrast . But not sure how gbm surfaces are useful. And there is no kernel side for drm on darwin right ? Speaking about egl, on linux+x11, the driver name is retrieved with xcb_dri2_connect_driver_name (except swrast which is harcoded) and then loaded with dri2_open_driver (= dlopen of %s_dri.so) So this mechanism is not either libudev, sysfs or libdrm. Am I missing something (or is it part of sysfs) ? Or the X server internally will use one of these 3 ? Also any plan to support eglQueryDevicesEXT ( https://www.khronos.org/registry/egl/extensions/EXT/EGL_EXT_device_enumeration.txt) ? On 9 July 2015 at 22:00, Emil Velikov wrote: > On 09/07/15 15:40, Axel Davy wrote: > > On 09/07/2015 15:33, Emil Velikov wrote : > >> > >> Yakes forgot about that one. From a quick look we can (I know it feels > >> durty) use readlink( > >> /sys/dev/char/$(major):$(minor)). > >> > >> Everything seems to be there - bus type and exact location of the > >> device on the bus. It's limited to sysfs users, but they are more than > >> the libudev ones (perhaps not by much). How does that sound ? > >> > >> Cheers, > >> Emil > >> > > The idea behind using udev for the tag was that arbitrary tag could be > > used, > > especially to identify usb devices. > > > So avoid breaking peoples' setup the above example can be used to > produce the very same tag(s). Regardles if it's a device on the pci or > usb bus. Speaking of usb, are there any devices supported with upstream > mesa ? > > -Emil > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/mesa-dev > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev