Re: [Mesa-dev] [PATCH 3/5] i965: Add envvar to debug the optimization passes.
On 04/11/2014 10:29 PM, Matt Turner wrote: > --- > src/mesa/drivers/dri/i965/intel_debug.c | 1 + > src/mesa/drivers/dri/i965/intel_debug.h | 1 + > 2 files changed, 2 insertions(+) > > diff --git a/src/mesa/drivers/dri/i965/intel_debug.c > b/src/mesa/drivers/dri/i965/intel_debug.c > index dc74202..44a38b4 100644 > --- a/src/mesa/drivers/dri/i965/intel_debug.c > +++ b/src/mesa/drivers/dri/i965/intel_debug.c > @@ -65,6 +65,7 @@ static const struct dri_debug_control debug_control[] = { > { "no16", DEBUG_NO16 }, > { "blorp", DEBUG_BLORP }, > { "nodualobj", DEBUG_NO_DUAL_OBJECT_GS }, > + { "optimizer", DEBUG_OPTIMIZER }, Seems handy. How about INTEL_DEBUG=opt or opts? optimizer is a bit wordy compared to the other options. But, whatever you like is fine. > { NULL,0 } > }; > > diff --git a/src/mesa/drivers/dri/i965/intel_debug.h > b/src/mesa/drivers/dri/i965/intel_debug.h > index d4f1666..52a47cf 100644 > --- a/src/mesa/drivers/dri/i965/intel_debug.h > +++ b/src/mesa/drivers/dri/i965/intel_debug.h > @@ -61,6 +61,7 @@ extern uint64_t INTEL_DEBUG; > #define DEBUG_NO160x2000 > #define DEBUG_VUE 0x4000 > #define DEBUG_NO_DUAL_OBJECT_GS 0x8000 > +#define DEBUG_OPTIMIZER 0x1 > > #ifdef HAVE_ANDROID_PLATFORM > #define LOG_TAG "INTEL-MESA" > signature.asc Description: OpenPGP digital signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 4/5] i965: Let dump_instructions write to a file.
On 04/11/2014 10:29 PM, Matt Turner wrote: > --- > src/mesa/drivers/dri/i965/brw_fs.cpp | 129 > +- > src/mesa/drivers/dri/i965/brw_fs.h| 2 + > src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp | 2 +- > src/mesa/drivers/dri/i965/brw_shader.cpp | 20 +++- > src/mesa/drivers/dri/i965/brw_shader.h| 2 + > src/mesa/drivers/dri/i965/brw_vec4.cpp| 104 + > src/mesa/drivers/dri/i965/brw_vec4.h | 1 + > 7 files changed, 154 insertions(+), 106 deletions(-) > > diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp > b/src/mesa/drivers/dri/i965/brw_fs.cpp > index 85a5463..c1d79e1 100644 > --- a/src/mesa/drivers/dri/i965/brw_fs.cpp > +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp > @@ -2810,197 +2810,218 @@ fs_visitor::lower_uniform_pull_constant_loads() > void > fs_visitor::dump_instructions() > { > + dump_instructions(NULL); > +} > + > +void > +fs_visitor::dump_instructions(const char *name) > +{ > + invalidate_live_intervals(); > calculate_register_pressure(); > + FILE *file = stderr; > + if (name) { > + file = fopen(name, "w"); > + } You're playing with fire here. Opening files is dangerous, considering that the X server ofter runs as root and executes this code. One mistake in debugging code, and...root exploit. We may want to restrict this to non-root users by default, and maybe only in debug builds... At any rate, I'd like to see patches 3-5 respun as follows: 1. Make dump_instruction() and dump_instructions() take FILE * parameters, and do the s/stderr/file/g sed job you've done here. This is the bulk of the change, and easily justifiable - we can always use stdout vs. stderr or the like. 2. Introduce any code that opens files for writing. With this being separate, we can think through the security implications. 3. Your actual optimization debugging. I would probably squash patch 3 into this patch, since it doesn't really stand alone. *shrug* [snip] > diff --git a/src/mesa/drivers/dri/i965/brw_vec4.h > b/src/mesa/drivers/dri/i965/brw_vec4.h > index 159a5bd..43c3f64 100644 > --- a/src/mesa/drivers/dri/i965/brw_vec4.h > +++ b/src/mesa/drivers/dri/i965/brw_vec4.h > @@ -597,6 +597,7 @@ public: > bool process_move_condition(ir_rvalue *ir); > > void dump_instruction(backend_instruction *inst); > + void dump_instruction(backend_instruction *inst, FILE *file); Usually, this is done via a default argument: void dump_instruction(backend_instruction *inst, FILE *file = stderr); That would probably be cleaner. > > void visit_atomic_counter_intrinsic(ir_call *ir); signature.asc Description: OpenPGP digital signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 5/5] i965/fs: Debug the optimization passes by dumping instr to file.
On 04/11/2014 10:29 PM, Matt Turner wrote: > With INTEL_DEBUG=optimizer, write the output of dump_instructions() to a > file each time an optimization pass makes progress. This lets you easily > diff successive files to see what an optimization pass did. > > Example filenames written when running glxgears: >fs8-00-00-start >fs8-00-01-04-opt_copy_propagate >fs8-00-01-06-dead_code_eliminate >fs8-00-01-12-compute_to_mrf >fs8-00-02-06-dead_code_eliminate >| | | | >| | | `-- optimization pass name >| | | >| | `-- optimization pass number in the loop >| | >| `-- optimization loop interation >| >`-- shader program number > > Note that with INTEL_DEBUG=optimizer, we disable compact_virtual_grfs, > so that we can diff instruction lists across loop interations without > the register numbers being changes. > --- > src/mesa/drivers/dri/i965/brw_fs.cpp | 56 > +++- > 1 file changed, 43 insertions(+), 13 deletions(-) > > diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp > b/src/mesa/drivers/dri/i965/brw_fs.cpp > index c1d79e1..666cfa5b 100644 > --- a/src/mesa/drivers/dri/i965/brw_fs.cpp > +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp > @@ -1704,6 +1704,9 @@ fs_visitor::split_virtual_grfs() > void > fs_visitor::compact_virtual_grfs() > { > + if (unlikely(INTEL_DEBUG & DEBUG_OPTIMIZER)) > + return; > + > /* Mark which virtual GRFs are used, and count how many. */ > int remap_table[this->virtual_grf_count]; > memset(remap_table, -1, sizeof(remap_table)); > @@ -3258,25 +3261,52 @@ fs_visitor::run() > >opt_drop_redundant_mov_to_flags(); > > +#define OPT(pass, args...) \ > + ({ \ > + pass_num++; \ > + bool progress = pass(args); \ > + \ > + if (unlikely(INTEL_DEBUG & DEBUG_OPTIMIZER) && progress) { \ > + char filename[64];\ > + snprintf(filename, 64, "fs%d-%02d-%02d-%02d-" #pass, \ > + dispatch_width, shader_prog->Name, iteration, pass_num); \ > + \ > + backend_visitor::dump_instructions(filename); >\ > + }\ > + \ > + progress;\ > + }) I like the use of the macro. But...folding in the "progress = ... || progress" might be nice too. It would also let us stick to standard C here, rather than GNU extensions. > + > + if (unlikely(INTEL_DEBUG & DEBUG_OPTIMIZER)) { > + char filename[64]; > + snprintf(filename, 64, "fs%d-%02d-00-start", > + dispatch_width, shader_prog->Name); I am really unexcited about fixed length buffers, although you did do snprintf, and it'll probably not be a problem in practice. Why not just asprintf or ralloc_asprintf it and then free it? It's no more code and will just work. > + > + backend_visitor::dump_instructions(filename); > + } > + >bool progress; > + int iteration = 0; >do { >progress = false; > + iteration++; > + int pass_num = 0; > > compact_virtual_grfs(); > > - progress = remove_duplicate_mrf_writes() || progress; > - > - progress = opt_algebraic() || progress; > - progress = opt_cse() || progress; > - progress = opt_copy_propagate() || progress; > - progress = opt_peephole_predicated_break() || progress; > - progress = dead_code_eliminate() || progress; > - progress = dead_code_eliminate_local() || progress; > - progress = opt_peephole_sel() || progress; > - progress = dead_control_flow_eliminate(this) || progress; > - progress = opt_saturate_propagation() || progress; > - progress = register_coalesce() || progress; > - progress = compute_to_mrf() || progress; > + progress = OPT(remove_duplicate_mrf_writes) || progress; > + > + progress = OPT(opt_algebraic) || progress; > + progress = OPT(opt_cse) || progress; > + progress = OPT(opt_copy_propagate) || progress; > + progress = OPT(opt_peephole_predicated_break) || progress; > + progress = OPT(dead_code_eliminate) || progress; > + progress = OPT(dead_code_eliminate_local) || progress; > + progress = OPT(opt_peephole_sel) || progress; > + progress = OPT(dead_control_flow_eliminate, this)
Re: [Mesa-dev] [PATCH 1/5] dri: Expand driParseDebugString return value to uint64_t.
On 04/11/2014 10:29 PM, Matt Turner wrote: > Users will downcast if they don't have >32 debug flags. > --- > src/mesa/drivers/dri/common/utils.c | 7 +++ > src/mesa/drivers/dri/common/utils.h | 4 ++-- > 2 files changed, 5 insertions(+), 6 deletions(-) > > diff --git a/src/mesa/drivers/dri/common/utils.c > b/src/mesa/drivers/dri/common/utils.c > index 1f29e0b..eee77ec 100644 > --- a/src/mesa/drivers/dri/common/utils.c > +++ b/src/mesa/drivers/dri/common/utils.c > @@ -32,6 +32,7 @@ > #include > #include > #include > +#include > #include "main/macros.h" > #include "main/mtypes.h" > #include "main/cpuinfo.h" > @@ -40,14 +41,12 @@ > #include "dri_util.h" > > > -unsigned > +uint64_t > driParseDebugString( const char * debug, >const struct dri_debug_control * control ) > { > - unsigned flag; > - > + uint64_t flag = 0; > > - flag = 0; > if ( debug != NULL ) { >while( control->string != NULL ) { >if ( !strcmp( debug, "all" ) || > diff --git a/src/mesa/drivers/dri/common/utils.h > b/src/mesa/drivers/dri/common/utils.h > index 0941434..3760c38 100644 > --- a/src/mesa/drivers/dri/common/utils.h > +++ b/src/mesa/drivers/dri/common/utils.h > @@ -34,10 +34,10 @@ > > struct dri_debug_control { > const char * string; > -unsigned flag; > +uint64_t flag; > }; > > -extern unsigned driParseDebugString( const char * debug, > +extern uint64_t driParseDebugString( const char * debug, > const struct dri_debug_control * control ); > > extern unsigned driGetRendererString( char * buffer, > Patches 1-2 are: Reviewed-by: Kenneth Graunke signature.asc Description: OpenPGP digital signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 4/5] i965: Let dump_instructions write to a file.
On Sun, Apr 13, 2014 at 12:12 AM, Kenneth Graunke wrote: > On 04/11/2014 10:29 PM, Matt Turner wrote: >> --- >> src/mesa/drivers/dri/i965/brw_fs.cpp | 129 >> +- >> src/mesa/drivers/dri/i965/brw_fs.h| 2 + >> src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp | 2 +- >> src/mesa/drivers/dri/i965/brw_shader.cpp | 20 +++- >> src/mesa/drivers/dri/i965/brw_shader.h| 2 + >> src/mesa/drivers/dri/i965/brw_vec4.cpp| 104 + >> src/mesa/drivers/dri/i965/brw_vec4.h | 1 + >> 7 files changed, 154 insertions(+), 106 deletions(-) >> >> diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp >> b/src/mesa/drivers/dri/i965/brw_fs.cpp >> index 85a5463..c1d79e1 100644 >> --- a/src/mesa/drivers/dri/i965/brw_fs.cpp >> +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp >> @@ -2810,197 +2810,218 @@ fs_visitor::lower_uniform_pull_constant_loads() >> void >> fs_visitor::dump_instructions() >> { >> + dump_instructions(NULL); >> +} >> + >> +void >> +fs_visitor::dump_instructions(const char *name) >> +{ >> + invalidate_live_intervals(); >> calculate_register_pressure(); >> + FILE *file = stderr; >> + if (name) { >> + file = fopen(name, "w"); >> + } > > You're playing with fire here. Opening files is dangerous, considering > that the X server ofter runs as root and executes this code. One > mistake in debugging code, and...root exploit. We may want to restrict > this to non-root users by default, and maybe only in debug builds... > > At any rate, I'd like to see patches 3-5 respun as follows: > > 1. Make dump_instruction() and dump_instructions() take FILE * > parameters, and do the s/stderr/file/g sed job you've done here. This > is the bulk of the change, and easily justifiable - we can always use > stdout vs. stderr or the like. > > 2. Introduce any code that opens files for writing. With this being > separate, we can think through the security implications. > > 3. Your actual optimization debugging. I would probably squash patch 3 > into this patch, since it doesn't really stand alone. *shrug* > > [snip] > >> diff --git a/src/mesa/drivers/dri/i965/brw_vec4.h >> b/src/mesa/drivers/dri/i965/brw_vec4.h >> index 159a5bd..43c3f64 100644 >> --- a/src/mesa/drivers/dri/i965/brw_vec4.h >> +++ b/src/mesa/drivers/dri/i965/brw_vec4.h >> @@ -597,6 +597,7 @@ public: >> bool process_move_condition(ir_rvalue *ir); >> >> void dump_instruction(backend_instruction *inst); >> + void dump_instruction(backend_instruction *inst, FILE *file); > > Usually, this is done via a default argument: > > void dump_instruction(backend_instruction *inst, FILE *file = stderr); > > That would probably be cleaner. I didn't do that on purpose, because gdb doesn't handle default parameters when you call functions from it. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 5/5] i965/fs: Debug the optimization passes by dumping instr to file.
On Sun, Apr 13, 2014 at 12:16 AM, Kenneth Graunke wrote: > On 04/11/2014 10:29 PM, Matt Turner wrote: >> With INTEL_DEBUG=optimizer, write the output of dump_instructions() to a >> file each time an optimization pass makes progress. This lets you easily >> diff successive files to see what an optimization pass did. >> >> Example filenames written when running glxgears: >>fs8-00-00-start >>fs8-00-01-04-opt_copy_propagate >>fs8-00-01-06-dead_code_eliminate >>fs8-00-01-12-compute_to_mrf >>fs8-00-02-06-dead_code_eliminate >>| | | | >>| | | `-- optimization pass name >>| | | >>| | `-- optimization pass number in the loop >>| | >>| `-- optimization loop interation >>| >>`-- shader program number >> >> Note that with INTEL_DEBUG=optimizer, we disable compact_virtual_grfs, >> so that we can diff instruction lists across loop interations without >> the register numbers being changes. >> --- >> src/mesa/drivers/dri/i965/brw_fs.cpp | 56 >> +++- >> 1 file changed, 43 insertions(+), 13 deletions(-) >> >> diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp >> b/src/mesa/drivers/dri/i965/brw_fs.cpp >> index c1d79e1..666cfa5b 100644 >> --- a/src/mesa/drivers/dri/i965/brw_fs.cpp >> +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp >> @@ -1704,6 +1704,9 @@ fs_visitor::split_virtual_grfs() >> void >> fs_visitor::compact_virtual_grfs() >> { >> + if (unlikely(INTEL_DEBUG & DEBUG_OPTIMIZER)) >> + return; >> + >> /* Mark which virtual GRFs are used, and count how many. */ >> int remap_table[this->virtual_grf_count]; >> memset(remap_table, -1, sizeof(remap_table)); >> @@ -3258,25 +3261,52 @@ fs_visitor::run() >> >>opt_drop_redundant_mov_to_flags(); >> >> +#define OPT(pass, args...) \ >> + ({ \ >> + pass_num++; \ >> + bool progress = pass(args); \ >> + \ >> + if (unlikely(INTEL_DEBUG & DEBUG_OPTIMIZER) && progress) { \ >> + char filename[64];\ >> + snprintf(filename, 64, "fs%d-%02d-%02d-%02d-" #pass, \ >> + dispatch_width, shader_prog->Name, iteration, pass_num); \ >> + \ >> + backend_visitor::dump_instructions(filename); >> \ >> + }\ >> + \ >> + progress;\ >> + }) > > I like the use of the macro. But...folding in the "progress = ... || > progress" might be nice too. It would also let us stick to standard C > here, rather than GNU extensions. I don't see a problem with using extensions, especially when we already use them. But decluttering the loop seems like a good idea. >> + >> + if (unlikely(INTEL_DEBUG & DEBUG_OPTIMIZER)) { >> + char filename[64]; >> + snprintf(filename, 64, "fs%d-%02d-00-start", >> + dispatch_width, shader_prog->Name); > > I am really unexcited about fixed length buffers, although you did do > snprintf, and it'll probably not be a problem in practice. Why not just > asprintf or ralloc_asprintf it and then free it? It's no more code and > will just work. Given the lengths of our optimization pass names, 64 seems like plenty, and if it's not that worst that happens is we lose a few characters from our optimization pass's name. My compiler complains about not checking the return type of asprintf, and adding a memory allocation failure check in the macro felt really stupid. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 01/10] winsys/radeon: remove cs_write_reloc, add simpler cs_get_reloc
For this series: Reviewed-by: Christian König BTW: Can we get rid of RADEON_FLUSH_ASYNC and always flush asynchronously? Just calling cs_sync_flush directly after the flush should have the same effect as synchronous flushing. Christian. Am 12.04.2014 18:34, schrieb Marek Olšák: From: Marek Olšák The only difference is that it doesn't write to the CS and only returns the index. --- src/gallium/drivers/r300/r300_cs.h| 3 ++- src/gallium/drivers/r300/r300_emit.c | 4 ++-- src/gallium/winsys/radeon/drm/radeon_drm_cs.c | 26 +- src/gallium/winsys/radeon/drm/radeon_winsys.h | 19 ++- 4 files changed, 23 insertions(+), 29 deletions(-) diff --git a/src/gallium/drivers/r300/r300_cs.h b/src/gallium/drivers/r300/r300_cs.h index 744e19e..748d6ea 100644 --- a/src/gallium/drivers/r300/r300_cs.h +++ b/src/gallium/drivers/r300/r300_cs.h @@ -109,7 +109,8 @@ #define OUT_CS_RELOC(r) do { \ assert((r)); \ assert((r)->cs_buf); \ -cs_winsys->cs_write_reloc(cs_copy, (r)->cs_buf); \ +OUT_CS(0xc0001000); /* PKT3_NOP */ \ +OUT_CS(cs_winsys->cs_get_reloc(cs_copy, (r)->cs_buf) * 4); \ CS_USED_DW(2); \ } while (0) diff --git a/src/gallium/drivers/r300/r300_emit.c b/src/gallium/drivers/r300/r300_emit.c index d99b919..9f16413 100644 --- a/src/gallium/drivers/r300/r300_emit.c +++ b/src/gallium/drivers/r300/r300_emit.c @@ -1043,8 +1043,8 @@ void r300_emit_vertex_arrays_swtcl(struct r300_context *r300, boolean indexed) OUT_CS(0); assert(r300->vbo_cs); -cs_winsys->cs_write_reloc(cs_copy, r300->vbo_cs); -CS_USED_DW(2); +OUT_CS(0xc0001000); /* PKT3_NOP */ +OUT_CS(r300->rws->cs_get_reloc(r300->cs, r300->vbo_cs) * 4); END_CS; } diff --git a/src/gallium/winsys/radeon/drm/radeon_drm_cs.c b/src/gallium/winsys/radeon/drm/radeon_drm_cs.c index b55eb80..4ce1717 100644 --- a/src/gallium/winsys/radeon/drm/radeon_drm_cs.c +++ b/src/gallium/winsys/radeon/drm/radeon_drm_cs.c @@ -313,6 +313,14 @@ static unsigned radeon_drm_cs_add_reloc(struct radeon_winsys_cs *rcs, return index; } +static int radeon_drm_cs_get_reloc(struct radeon_winsys_cs *rcs, + struct radeon_winsys_cs_handle *buf) +{ +struct radeon_drm_cs *cs = radeon_drm_cs(rcs); + +return radeon_get_reloc(cs->csc, (struct radeon_bo*)buf, NULL); +} + static boolean radeon_drm_cs_validate(struct radeon_winsys_cs *rcs) { struct radeon_drm_cs *cs = radeon_drm_cs(rcs); @@ -359,22 +367,6 @@ static boolean radeon_drm_cs_memory_below_limit(struct radeon_winsys_cs *rcs, ui return status; } -static void radeon_drm_cs_write_reloc(struct radeon_winsys_cs *rcs, - struct radeon_winsys_cs_handle *buf) -{ -struct radeon_drm_cs *cs = radeon_drm_cs(rcs); -struct radeon_bo *bo = (struct radeon_bo*)buf; -unsigned index = radeon_get_reloc(cs->csc, bo, NULL); - -if (index == -1) { -fprintf(stderr, "radeon: Cannot get a relocation in %s.\n", __func__); -return; -} - -OUT_CS(&cs->base, 0xc0001000); -OUT_CS(&cs->base, index * RELOC_DWORDS); -} - void radeon_drm_cs_emit_ioctl_oneshot(struct radeon_drm_cs *cs, struct radeon_cs_context *csc) { unsigned i; @@ -650,9 +642,9 @@ void radeon_drm_cs_init_functions(struct radeon_drm_winsys *ws) ws->base.cs_create = radeon_drm_cs_create; ws->base.cs_destroy = radeon_drm_cs_destroy; ws->base.cs_add_reloc = radeon_drm_cs_add_reloc; +ws->base.cs_get_reloc = radeon_drm_cs_get_reloc; ws->base.cs_validate = radeon_drm_cs_validate; ws->base.cs_memory_below_limit = radeon_drm_cs_memory_below_limit; -ws->base.cs_write_reloc = radeon_drm_cs_write_reloc; ws->base.cs_flush = radeon_drm_cs_flush; ws->base.cs_set_flush_callback = radeon_drm_cs_set_flush; ws->base.cs_is_buffer_referenced = radeon_bo_is_referenced; diff --git a/src/gallium/winsys/radeon/drm/radeon_winsys.h b/src/gallium/winsys/radeon/drm/radeon_winsys.h index 485e925..320989c 100644 --- a/src/gallium/winsys/radeon/drm/radeon_winsys.h +++ b/src/gallium/winsys/radeon/drm/radeon_winsys.h @@ -450,6 +450,16 @@ struct radeon_winsys { enum radeon_bo_priority priority); /** + * Return the index of an already-added buffer. + * + * \param csCommand stream + * \param buf Buffer + * \return The buffer index, or -1 if the buffer has not been added. + */ +int (*cs_get_reloc)(struct radeon_winsys_cs *cs, +struct radeon_winsys_cs_handle *buf); + +/** * Return TRUE if there is enough memory in VRAM and GTT for the relocs * added so far. If the validation fails, all the relocations which have * been added since the last call of cs_validate will be removed and @@ -470,15 +480,6 @@ struct radeon_winsys { boolean (*cs_memory_
Re: [Mesa-dev] [PATCH 4/5] i965: Let dump_instructions write to a file.
On 04/13/2014 12:53 AM, Matt Turner wrote: > On Sun, Apr 13, 2014 at 12:12 AM, Kenneth Graunke > wrote: >> On 04/11/2014 10:29 PM, Matt Turner wrote: >>> --- >>> src/mesa/drivers/dri/i965/brw_fs.cpp | 129 >>> +- >>> src/mesa/drivers/dri/i965/brw_fs.h| 2 + >>> src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp | 2 +- >>> src/mesa/drivers/dri/i965/brw_shader.cpp | 20 +++- >>> src/mesa/drivers/dri/i965/brw_shader.h| 2 + >>> src/mesa/drivers/dri/i965/brw_vec4.cpp| 104 + >>> src/mesa/drivers/dri/i965/brw_vec4.h | 1 + >>> 7 files changed, 154 insertions(+), 106 deletions(-) >>> >>> diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp >>> b/src/mesa/drivers/dri/i965/brw_fs.cpp >>> index 85a5463..c1d79e1 100644 >>> --- a/src/mesa/drivers/dri/i965/brw_fs.cpp >>> +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp >>> @@ -2810,197 +2810,218 @@ fs_visitor::lower_uniform_pull_constant_loads() >>> void >>> fs_visitor::dump_instructions() >>> { >>> + dump_instructions(NULL); >>> +} >>> + >>> +void >>> +fs_visitor::dump_instructions(const char *name) >>> +{ >>> + invalidate_live_intervals(); >>> calculate_register_pressure(); >>> + FILE *file = stderr; >>> + if (name) { >>> + file = fopen(name, "w"); >>> + } >> >> You're playing with fire here. Opening files is dangerous, considering >> that the X server ofter runs as root and executes this code. One >> mistake in debugging code, and...root exploit. We may want to restrict >> this to non-root users by default, and maybe only in debug builds... >> >> At any rate, I'd like to see patches 3-5 respun as follows: >> >> 1. Make dump_instruction() and dump_instructions() take FILE * >> parameters, and do the s/stderr/file/g sed job you've done here. This >> is the bulk of the change, and easily justifiable - we can always use >> stdout vs. stderr or the like. >> >> 2. Introduce any code that opens files for writing. With this being >> separate, we can think through the security implications. >> >> 3. Your actual optimization debugging. I would probably squash patch 3 >> into this patch, since it doesn't really stand alone. *shrug* >> >> [snip] >> >>> diff --git a/src/mesa/drivers/dri/i965/brw_vec4.h >>> b/src/mesa/drivers/dri/i965/brw_vec4.h >>> index 159a5bd..43c3f64 100644 >>> --- a/src/mesa/drivers/dri/i965/brw_vec4.h >>> +++ b/src/mesa/drivers/dri/i965/brw_vec4.h >>> @@ -597,6 +597,7 @@ public: >>> bool process_move_condition(ir_rvalue *ir); >>> >>> void dump_instruction(backend_instruction *inst); >>> + void dump_instruction(backend_instruction *inst, FILE *file); >> >> Usually, this is done via a default argument: >> >> void dump_instruction(backend_instruction *inst, FILE *file = stderr); >> >> That would probably be cleaner. > > I didn't do that on purpose, because gdb doesn't handle default > parameters when you call functions from it. Oh, that makes sense. Adding extra overloads seems fine then. --Ken signature.asc Description: OpenPGP digital signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/7] translate_sse: Rename translate_buffer_variant to translate_group.
From: Andreas Hartmetz It is a better description of what goes on and it's shorter, too. --- src/gallium/auxiliary/translate/translate_sse.c | 86 - 1 file changed, 43 insertions(+), 43 deletions(-) diff --git a/src/gallium/auxiliary/translate/translate_sse.c b/src/gallium/auxiliary/translate/translate_sse.c index 24d8017..e25d450 100644 --- a/src/gallium/auxiliary/translate/translate_sse.c +++ b/src/gallium/auxiliary/translate/translate_sse.c @@ -61,7 +61,7 @@ struct translate_buffer * This describes a group of trivially compatible translations. * ptr is only used at runtime (not code generation time). */ -struct translate_buffer_variant +struct translate_group { unsigned buffer_index; unsigned instance_divisor; @@ -115,11 +115,11 @@ struct translate_sse unsigned nr_buffers; /* Multiple buffer variants can map to a single buffer. */ - struct translate_buffer_variant buffer_variant[TRANSLATE_MAX_ATTRIBS]; - unsigned nr_buffer_variants; + struct translate_group group[TRANSLATE_MAX_ATTRIBS]; + unsigned nr_groups; /* Multiple elements can map to a single buffer variant. */ - unsigned element_to_buffer_variant[TRANSLATE_MAX_ATTRIBS]; + unsigned element_to_group[TRANSLATE_MAX_ATTRIBS]; boolean use_instancing; unsigned instance_id; @@ -1085,17 +1085,17 @@ init_inputs(struct translate_sse *p, unsigned index_size) struct x86_reg start_instance = x86_make_disp(p->machine_EDI, get_offset(p, &p->start_instance)); - for (i = 0; i < p->nr_buffer_variants; i++) { - struct translate_buffer_variant *variant = &p->buffer_variant[i]; - struct translate_buffer *buffer = &p->buffer[variant->buffer_index]; + for (i = 0; i < p->nr_groups; i++) { + struct translate_group *group = &p->group[i]; + struct translate_buffer *buffer = &p->buffer[group->buffer_index]; - if (!index_size || variant->instance_divisor) { + if (!index_size || group->instance_divisor) { struct x86_reg buf_max_index = x86_make_disp(p->machine_EDI, get_offset(p, &buffer->max_index)); struct x86_reg buf_stride = x86_make_disp(p->machine_EDI, get_offset(p, &buffer->stride)); struct x86_reg buf_ptr = -x86_make_disp(p->machine_EDI, get_offset(p, &variant->ptr)); +x86_make_disp(p->machine_EDI, get_offset(p, &group->ptr)); struct x86_reg buf_base_ptr = x86_make_disp(p->machine_EDI, get_offset(p, &buffer->base_ptr)); struct x86_reg elt = p->idx_ESI; @@ -1104,13 +1104,13 @@ init_inputs(struct translate_sse *p, unsigned index_size) /* Calculate pointer to first attrib: * base_ptr + stride * index, where index depends on instance divisor */ - if (variant->instance_divisor) { + if (group->instance_divisor) { /* Start with instance = instance_id * which is true if divisor is 1. */ x86_mov(p->func, tmp_EAX, instance_id); -if (variant->instance_divisor != 1) { +if (group->instance_divisor != 1) { struct x86_reg tmp_EDX = p->tmp2_EDX; struct x86_reg tmp_ECX = p->src_ECX; @@ -1118,7 +1118,7 @@ init_inputs(struct translate_sse *p, unsigned index_size) * instance divisor is power of two. */ x86_xor(p->func, tmp_EDX, tmp_EDX); - x86_mov_reg_imm(p->func, tmp_ECX, variant->instance_divisor); + x86_mov_reg_imm(p->func, tmp_ECX, group->instance_divisor); x86_div(p->func, tmp_ECX); /* EAX = EDX:EAX / ECX */ /* instance = (instance_id - start_instance) / divisor + @@ -1153,7 +1153,7 @@ init_inputs(struct translate_sse *p, unsigned index_size) /* In the linear case, keep the buffer pointer instead of the * index number. */ - if (!index_size && p->nr_buffer_variants == 1) { + if (!index_size && p->nr_groups == 1) { x64_rexw(p->func); x86_mov(p->func, elt, tmp_EAX); } @@ -1175,14 +1175,14 @@ get_buffer_ptr(struct translate_sse *p, if (var_idx == ELEMENT_BUFFER_INSTANCE_ID) { return x86_make_disp(p->machine_EDI, get_offset(p, &p->instance_id)); } - if (!index_size && p->nr_buffer_variants == 1) { + if (!index_size && p->nr_groups == 1) { return p->idx_ESI; } - else if (!index_size || p->buffer_variant[var_idx].instance_divisor) { + else if (!index_size || p->group[var_idx].instance_divisor) { struct x86_reg ptr = p->src_ECX; struct x86_reg buf_ptr = x86_make_disp(p->machine_EDI, - get_offset(p, &p->buffer_variant[var_idx].ptr)); + get_offset(p, &p->group[var_idx].ptr)); x64_rexw(p->func); x86_mov(p->func, ptr, buf_ptr); @@ -1190,17 +1190,17 @@ get_buffer_ptr(struct
[Mesa-dev] [PATCH 5/7] translate_sse: Preserve low bit during unsigned -> float conversion.
From: Andreas Hartmetz --- src/gallium/auxiliary/translate/translate_sse.c | 35 - 1 file changed, 28 insertions(+), 7 deletions(-) diff --git a/src/gallium/auxiliary/translate/translate_sse.c b/src/gallium/auxiliary/translate/translate_sse.c index dace722..03d8276 100644 --- a/src/gallium/auxiliary/translate/translate_sse.c +++ b/src/gallium/auxiliary/translate/translate_sse.c @@ -71,7 +71,7 @@ struct translate_group #define ELEMENT_BUFFER_INSTANCE_ID 1001 -#define NUM_CONSTS 7 +#define NUM_CONSTS 8 enum { @@ -81,6 +81,7 @@ enum CONST_INV_32767, CONST_INV_65535, CONST_INV_2147483647, + CONST_INV_4294967295, CONST_255 }; @@ -92,6 +93,7 @@ static float consts[NUM_CONSTS][4] = { C(1.0 / 32767.0), C(1.0 / 65535.0), C(1.0 / 2147483647.0), + C(1.0 / 4294967295.0), C(255.0) }; @@ -515,6 +517,7 @@ translate_attr_convert(struct translate_sse *p, || a->output_format == PIPE_FORMAT_R32G32B32_FLOAT || a->output_format == PIPE_FORMAT_R32G32B32A32_FLOAT)) { struct x86_reg dataXMM = x86_make_reg(file_XMM, 0); + struct x86_reg dataXMM2 = x86_make_reg(file_XMM, 1); for (i = 0; i < output_desc->nr_channels; ++i) { if (swizzle[i] == UTIL_FORMAT_SWIZZLE_0 @@ -546,17 +549,38 @@ translate_attr_convert(struct translate_sse *p, */ sse2_punpcklbw(p->func, dataXMM, get_const(p, CONST_IDENTITY)); sse2_punpcklbw(p->func, dataXMM, get_const(p, CONST_IDENTITY)); + sse2_cvtdq2ps(p->func, dataXMM, dataXMM); break; case 16: sse2_punpcklwd(p->func, dataXMM, get_const(p, CONST_IDENTITY)); + sse2_cvtdq2ps(p->func, dataXMM, dataXMM); break; -case 32: /* we lose precision here */ +case 32: /* we lose precision if value > 2^23 - 1 */ + /* ...but try to keep all bits for value <= 2^23 - 1 */ + /* this could be done much smarter for 1 or 2 dwords of input +* on X64, using sth like "cvtsi2ss xmm0, rax" (note *R*ax) */ + + /* save the low bit */ + sse_movss(p->func, dataXMM2, dataXMM); + /* right shift & convert, losing the low bit - must clear +* high bit because there is no unsigned convert instruction */ sse2_psrld_imm(p->func, dataXMM, 1); + sse2_cvtdq2ps(p->func, dataXMM, dataXMM); + + /* convert low bit to float */ + sse2_pslld_imm(p->func, dataXMM2, 31); + sse2_psrld_imm(p->func, dataXMM2, 31); + sse2_cvtdq2ps(p->func, dataXMM2, dataXMM2); + + /* mostly undo the right-shift by 1 */ + sse_addps(p->func, dataXMM, dataXMM); + /* add back the lost low bit */ + sse_addps(p->func, dataXMM, dataXMM2); break; default: return FALSE; } -sse2_cvtdq2ps(p->func, dataXMM, dataXMM); + if (input_desc->channel[0].normalized) { struct x86_reg factor; switch (input_desc->channel[0].size) { @@ -567,7 +591,7 @@ translate_attr_convert(struct translate_sse *p, factor = get_const(p, CONST_INV_65535); break; case 32: - factor = get_const(p, CONST_INV_2147483647); + factor = get_const(p, CONST_INV_4294967295); break; default: assert(0); @@ -579,9 +603,6 @@ translate_attr_convert(struct translate_sse *p, } sse_mulps(p->func, dataXMM, factor); } -else if (input_desc->channel[0].size == 32) - /* compensate for the bit we threw away to fit u32 into s32 */ - sse_addps(p->func, dataXMM, dataXMM); break; case UTIL_FORMAT_TYPE_SIGNED: if (!(x86_target_caps(p->func) & X86_SSE2)) -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 0/7] translate_sse fixes
Hello, I noticed and incorrect uint->float conversion of vertex shader input in an application I'm working on, so I set out to fix it. In the meantime the application changed to avoid that conversion because it seems broken in too many drivers. So anyway, this series contains two actual bug fixes in patches 4 and 5. The others are non-functional changes, mostly for readability. Andreas Hartmetz (7): translate_sse: Explain what struct translate_buffer_variant is for. translate_sse: Rename translate_buffer_variant to translate_group. translate_sse: Make those comments less abstract. translate_sse: Use the correct buffer index in this fast path. translate_sse: Preserve low bit during unsigned -> float conversion. translate_sse: Be more specific in comment about loss of precision. translate_sse: Remove a stray break; src/gallium/auxiliary/translate/translate_sse.c | 140 ++-- 1 file changed, 84 insertions(+), 56 deletions(-) -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 7/7] translate_sse: Remove a stray break;
From: Andreas Hartmetz --- src/gallium/auxiliary/translate/translate_sse.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/src/gallium/auxiliary/translate/translate_sse.c b/src/gallium/auxiliary/translate/translate_sse.c index ca02b7a..992c5f6 100644 --- a/src/gallium/auxiliary/translate/translate_sse.c +++ b/src/gallium/auxiliary/translate/translate_sse.c @@ -651,8 +651,6 @@ translate_attr_convert(struct translate_sse *p, sse_mulps(p->func, dataXMM, factor); } break; - -break; case UTIL_FORMAT_TYPE_FLOAT: if (input_desc->channel[0].size != 32 && input_desc->channel[0].size != 64) { -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 6/7] translate_sse: Be more specific in comment about loss of precision.
From: Andreas Hartmetz The only loss of precision here due to intrinsic properties of unsigned int and float. --- src/gallium/auxiliary/translate/translate_sse.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/gallium/auxiliary/translate/translate_sse.c b/src/gallium/auxiliary/translate/translate_sse.c index 03d8276..ca02b7a 100644 --- a/src/gallium/auxiliary/translate/translate_sse.c +++ b/src/gallium/auxiliary/translate/translate_sse.c @@ -622,7 +622,7 @@ translate_attr_convert(struct translate_sse *p, sse2_punpcklwd(p->func, dataXMM, dataXMM); sse2_psrad_imm(p->func, dataXMM, 16); break; -case 32: /* we lose precision here */ +case 32: /* we lose precision if abs(value) > 2^23 - 1 */ break; default: return FALSE; -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 4/7] translate_sse: Use the correct buffer index in this fast path.
From: Andreas Hartmetz It is possible that there are multiple input buffers but only one is relevant for translation. Then there will be only a single translation group, which might need to source data from a buffer index != 0. Fixes wrong vertex shader inputs as observed while debugging with an application and driver combination that requires translation of a vertex attribute in a non-trivial set of attributes and input buffers. --- src/gallium/auxiliary/translate/translate_sse.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/src/gallium/auxiliary/translate/translate_sse.c b/src/gallium/auxiliary/translate/translate_sse.c index 8dd30c4..dace722 100644 --- a/src/gallium/auxiliary/translate/translate_sse.c +++ b/src/gallium/auxiliary/translate/translate_sse.c @@ -1235,8 +1235,10 @@ static boolean incr_inputs(struct translate_sse *p, unsigned index_size) { if (!index_size && p->nr_groups == 1) { + const unsigned buffer_index = p->group[0].buffer_index; struct x86_reg stride = - x86_make_disp(p->machine_EDI, get_offset(p, &p->buffer[0].stride)); + x86_make_disp(p->machine_EDI, + get_offset(p, &p->buffer[buffer_index].stride)); if (p->group[0].instance_divisor == 0) { x64_rexw(p->func); -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/7] translate_sse: Explain what struct translate_buffer_variant is for.
From: Andreas Hartmetz --- src/gallium/auxiliary/translate/translate_sse.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/src/gallium/auxiliary/translate/translate_sse.c b/src/gallium/auxiliary/translate/translate_sse.c index 1b698cd..24d8017 100644 --- a/src/gallium/auxiliary/translate/translate_sse.c +++ b/src/gallium/auxiliary/translate/translate_sse.c @@ -54,6 +54,13 @@ struct translate_buffer unsigned max_index; }; +/* We can only trivially chain together compatible translations. Additional + * state changes (e.g. reloading input pointers) are required between + * incompatible translations. So we group compatible translations together + * to minimize those state changes. + * This describes a group of trivially compatible translations. + * ptr is only used at runtime (not code generation time). + */ struct translate_buffer_variant { unsigned buffer_index; -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 3/7] translate_sse: Make those comments less abstract.
From: Andreas Hartmetz --- src/gallium/auxiliary/translate/translate_sse.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/src/gallium/auxiliary/translate/translate_sse.c b/src/gallium/auxiliary/translate/translate_sse.c index e25d450..8dd30c4 100644 --- a/src/gallium/auxiliary/translate/translate_sse.c +++ b/src/gallium/auxiliary/translate/translate_sse.c @@ -114,11 +114,11 @@ struct translate_sse struct translate_buffer buffer[TRANSLATE_MAX_ATTRIBS]; unsigned nr_buffers; - /* Multiple buffer variants can map to a single buffer. */ + /* A buffer can require multiple groups to translate all elements */ struct translate_group group[TRANSLATE_MAX_ATTRIBS]; unsigned nr_groups; - /* Multiple elements can map to a single buffer variant. */ + /* A group can translate multiple elements */ unsigned element_to_group[TRANSLATE_MAX_ATTRIBS]; boolean use_instancing; @@ -1515,7 +1515,7 @@ translate_sse2_create(const struct translate_key *key) } /* - * Map vertex element to vertex buffer group. + * Find or create a compatible group for the element */ for (j = 0; j < p->nr_groups; j++) { if (p->group[j].buffer_index == -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/2] i965: Disable Z16 in all APIs.
We originally thought that GL 3.0 required GL_DEPTH_COMPONENT16 to map exactly to Z16. However, we misread the specification, thanks in part to LaTeX reordering the tables in the PDF. Page 180 of the GL 3.0 specification (glspec30.20080923.pdf) says: "[...] memory allocation per texture component is assigned by the GL to match the allocations listed in tables 3.16-3.18 as closely as possible. [...] Required Texture Formats [...] In addition, implementations are required to support the following sized internal formats. Requesting one of these internal formats for any texture type will allocate exactly the internal component sizes and types shown for that format in tables 3.16-3.17:" Notably, however, GL_DEPTH_COMPONENT16 does /not/ appear in table 3.16 or table 3.17. It appears in table 3.18, where the "exact" rule doesn't apply, and it falls back to the "closely as possible" rule. The confusing part is that the ordering of the tables in the PDF is: Table 3.16 (pages 182-184) Table 3.18 (bottom of page 184 to top of 185) Table 3.17 (page 185) Presumably, people saw table 3.16, then saw the table immediately following with DEPTH_COMPONENT* formats, and assumed it was 3.17. Based on a batch by Chia-I Wu, but without the driconf option to force Z16 to be used. It's not required, and there's apparently no benefit to actually using it. Signed-off-by: Kenneth Graunke --- src/mesa/drivers/dri/i965/brw_surface_formats.c | 6 -- 1 file changed, 6 deletions(-) Sorry if this is a duplicate of an earlier patch...the only one I could find in my inbox was the one with the driconf option. diff --git a/src/mesa/drivers/dri/i965/brw_surface_formats.c b/src/mesa/drivers/dri/i965/brw_surface_formats.c index 196f139..5907dd9 100644 --- a/src/mesa/drivers/dri/i965/brw_surface_formats.c +++ b/src/mesa/drivers/dri/i965/brw_surface_formats.c @@ -608,7 +608,6 @@ brw_init_surface_formats(struct brw_context *brw) brw->format_supported_as_render_target[MESA_FORMAT_Z24_UNORM_S8_UINT] = true; brw->format_supported_as_render_target[MESA_FORMAT_Z24_UNORM_X8_UINT] = true; brw->format_supported_as_render_target[MESA_FORMAT_S_UINT8] = true; - brw->format_supported_as_render_target[MESA_FORMAT_Z_UNORM16] = true; brw->format_supported_as_render_target[MESA_FORMAT_Z_FLOAT32] = true; brw->format_supported_as_render_target[MESA_FORMAT_Z32_FLOAT_S8X24_UINT] = true; @@ -630,12 +629,7 @@ brw_init_surface_formats(struct brw_context *brw) * * Other speculation is that we may be hitting increased fragment shader * execution from GL_LEQUAL/GL_EQUAL depth tests at reduced precision. -* -* However, desktop GL 3.0+ require that you get exactly 16 bits when -* asking for DEPTH_COMPONENT16, so we have to respect that. */ - if (_mesa_is_desktop_gl(ctx)) - ctx->TextureFormatSupported[MESA_FORMAT_Z_UNORM16] = true; /* On hardware that lacks support for ETC1, we map ETC1 to RGBX * during glCompressedTexImage2D(). See intel_mipmap_tree::wraps_etc1. -- 1.9.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/2] i965: Update comments about Z16 being slow.
We've learned a few things since we originally disabled Z16; this attempts to summarize the issue. I am no expert on this subject, though, so the comment may not be totally accurate. I did some benchmarking on GM45 and Ironlake, and discovered that for GLBenchmark 2.7 EgyptHD, using Z16 was 3% slower on GM45 (n=15), and 4.5% slower on Ironlake (n=95). So, we can drop the "on Ivybridge" aspect of the comment - it's always slower. Signed-off-by: Kenneth Graunke --- src/mesa/drivers/dri/i965/brw_surface_formats.c | 17 ++--- 1 file changed, 10 insertions(+), 7 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_surface_formats.c b/src/mesa/drivers/dri/i965/brw_surface_formats.c index cef4020..196f139 100644 --- a/src/mesa/drivers/dri/i965/brw_surface_formats.c +++ b/src/mesa/drivers/dri/i965/brw_surface_formats.c @@ -620,13 +620,16 @@ brw_init_surface_formats(struct brw_context *brw) ctx->TextureFormatSupported[MESA_FORMAT_Z_FLOAT32] = true; ctx->TextureFormatSupported[MESA_FORMAT_Z32_FLOAT_S8X24_UINT] = true; - /* It appears that Z16 is slower than Z24 (on Intel Ivybridge and newer -* hardware at least), so there's no real reason to prefer it unless you're -* under memory (not memory bandwidth) pressure. Our speculation is that -* this is due to either increased fragment shader execution from -* GL_LEQUAL/GL_EQUAL depth tests at the reduced precision, or due to -* increased depth stalls from a cacheline-based heuristic for detecting -* depth stalls. + /* Benchmarking shows that Z16 is slower than Z24, so there's no reason to +* use it unless you're under memory (not memory bandwidth) pressure. +* +* Apparently, the GPU's depth scoreboarding works on a 32-bit granularity, +* which corresponds to one pixel in the depth buffer for Z24 or Z32 formats. +* However, it corresponds to two pixels with Z16, which means both need to +* hit the early depth case in order for it to happen. +* +* Other speculation is that we may be hitting increased fragment shader +* execution from GL_LEQUAL/GL_EQUAL depth tests at reduced precision. * * However, desktop GL 3.0+ require that you get exactly 16 bits when * asking for DEPTH_COMPONENT16, so we have to respect that. -- 1.9.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/2] i965: Disable Z16 in all APIs.
Both are Reviewed-by: Matt Turner ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev