Re: [Mesa-dev] [PATCH 3/5] i965: Add envvar to debug the optimization passes.

2014-04-13 Thread Kenneth Graunke
On 04/11/2014 10:29 PM, Matt Turner wrote:
> ---
>  src/mesa/drivers/dri/i965/intel_debug.c | 1 +
>  src/mesa/drivers/dri/i965/intel_debug.h | 1 +
>  2 files changed, 2 insertions(+)
> 
> diff --git a/src/mesa/drivers/dri/i965/intel_debug.c 
> b/src/mesa/drivers/dri/i965/intel_debug.c
> index dc74202..44a38b4 100644
> --- a/src/mesa/drivers/dri/i965/intel_debug.c
> +++ b/src/mesa/drivers/dri/i965/intel_debug.c
> @@ -65,6 +65,7 @@ static const struct dri_debug_control debug_control[] = {
> { "no16",  DEBUG_NO16 },
> { "blorp", DEBUG_BLORP },
> { "nodualobj", DEBUG_NO_DUAL_OBJECT_GS },
> +   { "optimizer", DEBUG_OPTIMIZER },

Seems handy.  How about INTEL_DEBUG=opt or opts?  optimizer is a bit
wordy compared to the other options.  But, whatever you like is fine.

> { NULL,0 }
>  };
>  
> diff --git a/src/mesa/drivers/dri/i965/intel_debug.h 
> b/src/mesa/drivers/dri/i965/intel_debug.h
> index d4f1666..52a47cf 100644
> --- a/src/mesa/drivers/dri/i965/intel_debug.h
> +++ b/src/mesa/drivers/dri/i965/intel_debug.h
> @@ -61,6 +61,7 @@ extern uint64_t INTEL_DEBUG;
>  #define DEBUG_NO160x2000
>  #define DEBUG_VUE 0x4000
>  #define DEBUG_NO_DUAL_OBJECT_GS 0x8000
> +#define DEBUG_OPTIMIZER   0x1
>  
>  #ifdef HAVE_ANDROID_PLATFORM
>  #define LOG_TAG "INTEL-MESA"
> 




signature.asc
Description: OpenPGP digital signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 4/5] i965: Let dump_instructions write to a file.

2014-04-13 Thread Kenneth Graunke
On 04/11/2014 10:29 PM, Matt Turner wrote:
> ---
>  src/mesa/drivers/dri/i965/brw_fs.cpp  | 129 
> +-
>  src/mesa/drivers/dri/i965/brw_fs.h|   2 +
>  src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp |   2 +-
>  src/mesa/drivers/dri/i965/brw_shader.cpp  |  20 +++-
>  src/mesa/drivers/dri/i965/brw_shader.h|   2 +
>  src/mesa/drivers/dri/i965/brw_vec4.cpp| 104 +
>  src/mesa/drivers/dri/i965/brw_vec4.h  |   1 +
>  7 files changed, 154 insertions(+), 106 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
> b/src/mesa/drivers/dri/i965/brw_fs.cpp
> index 85a5463..c1d79e1 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
> @@ -2810,197 +2810,218 @@ fs_visitor::lower_uniform_pull_constant_loads()
>  void
>  fs_visitor::dump_instructions()
>  {
> +   dump_instructions(NULL);
> +}
> +
> +void
> +fs_visitor::dump_instructions(const char *name)
> +{
> +   invalidate_live_intervals();
> calculate_register_pressure();
> +   FILE *file = stderr;
> +   if (name) {
> +  file = fopen(name, "w");
> +   }

You're playing with fire here.  Opening files is dangerous, considering
that the X server ofter runs as root and executes this code.  One
mistake in debugging code, and...root exploit.  We may want to restrict
this to non-root users by default, and maybe only in debug builds...

At any rate, I'd like to see patches 3-5 respun as follows:

1. Make dump_instruction() and dump_instructions() take FILE *
parameters, and do the s/stderr/file/g sed job you've done here.  This
is the bulk of the change, and easily justifiable - we can always use
stdout vs. stderr or the like.

2. Introduce any code that opens files for writing.  With this being
separate, we can think through the security implications.

3. Your actual optimization debugging.  I would probably squash patch 3
into this patch, since it doesn't really stand alone.  *shrug*

[snip]

> diff --git a/src/mesa/drivers/dri/i965/brw_vec4.h 
> b/src/mesa/drivers/dri/i965/brw_vec4.h
> index 159a5bd..43c3f64 100644
> --- a/src/mesa/drivers/dri/i965/brw_vec4.h
> +++ b/src/mesa/drivers/dri/i965/brw_vec4.h
> @@ -597,6 +597,7 @@ public:
> bool process_move_condition(ir_rvalue *ir);
>  
> void dump_instruction(backend_instruction *inst);
> +   void dump_instruction(backend_instruction *inst, FILE *file);

Usually, this is done via a default argument:

void dump_instruction(backend_instruction *inst, FILE *file = stderr);

That would probably be cleaner.

>  
> void visit_atomic_counter_intrinsic(ir_call *ir);



signature.asc
Description: OpenPGP digital signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 5/5] i965/fs: Debug the optimization passes by dumping instr to file.

2014-04-13 Thread Kenneth Graunke
On 04/11/2014 10:29 PM, Matt Turner wrote:
> With INTEL_DEBUG=optimizer, write the output of dump_instructions() to a
> file each time an optimization pass makes progress. This lets you easily
> diff successive files to see what an optimization pass did.
> 
> Example filenames written when running glxgears:
>fs8-00-00-start
>fs8-00-01-04-opt_copy_propagate
>fs8-00-01-06-dead_code_eliminate
>fs8-00-01-12-compute_to_mrf
>fs8-00-02-06-dead_code_eliminate
>|   |  |   |
>|   |  |   `-- optimization pass name
>|   |  |
>|   |  `-- optimization pass number in the loop
>|   |
>|   `-- optimization loop interation
>|
>`-- shader program number
> 
> Note that with INTEL_DEBUG=optimizer, we disable compact_virtual_grfs,
> so that we can diff instruction lists across loop interations without
> the register numbers being changes.
> ---
>  src/mesa/drivers/dri/i965/brw_fs.cpp | 56 
> +++-
>  1 file changed, 43 insertions(+), 13 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
> b/src/mesa/drivers/dri/i965/brw_fs.cpp
> index c1d79e1..666cfa5b 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
> @@ -1704,6 +1704,9 @@ fs_visitor::split_virtual_grfs()
>  void
>  fs_visitor::compact_virtual_grfs()
>  {
> +   if (unlikely(INTEL_DEBUG & DEBUG_OPTIMIZER))
> +  return;
> +
> /* Mark which virtual GRFs are used, and count how many. */
> int remap_table[this->virtual_grf_count];
> memset(remap_table, -1, sizeof(remap_table));
> @@ -3258,25 +3261,52 @@ fs_visitor::run()
>  
>opt_drop_redundant_mov_to_flags();
>  
> +#define OPT(pass, args...) \
> +   ({  \
> +  pass_num++;  \
> +  bool progress = pass(args);  \
> +   \
> +  if (unlikely(INTEL_DEBUG & DEBUG_OPTIMIZER) && progress) {   \
> + char filename[64];\
> + snprintf(filename, 64, "fs%d-%02d-%02d-%02d-" #pass,  \
> +  dispatch_width, shader_prog->Name, iteration, pass_num); \
> +   \
> + backend_visitor::dump_instructions(filename);   
>\
> +  }\
> +   \
> +  progress;\
> +   })

I like the use of the macro.  But...folding in the "progress = ... ||
progress" might be nice too.  It would also let us stick to standard C
here, rather than GNU extensions.

> +
> +  if (unlikely(INTEL_DEBUG & DEBUG_OPTIMIZER)) {
> + char filename[64];
> + snprintf(filename, 64, "fs%d-%02d-00-start",
> +  dispatch_width, shader_prog->Name);

I am really unexcited about fixed length buffers, although you did do
snprintf, and it'll probably not be a problem in practice.  Why not just
asprintf or ralloc_asprintf it and then free it?  It's no more code and
will just work.

> +
> + backend_visitor::dump_instructions(filename);
> +  }
> +
>bool progress;
> +  int iteration = 0;
>do {
>progress = false;
> + iteration++;
> + int pass_num = 0;
>  
>   compact_virtual_grfs();
>  
> -  progress = remove_duplicate_mrf_writes() || progress;
> -
> -  progress = opt_algebraic() || progress;
> -  progress = opt_cse() || progress;
> -  progress = opt_copy_propagate() || progress;
> - progress = opt_peephole_predicated_break() || progress;
> -  progress = dead_code_eliminate() || progress;
> -  progress = dead_code_eliminate_local() || progress;
> - progress = opt_peephole_sel() || progress;
> - progress = dead_control_flow_eliminate(this) || progress;
> - progress = opt_saturate_propagation() || progress;
> - progress = register_coalesce() || progress;
> -  progress = compute_to_mrf() || progress;
> + progress = OPT(remove_duplicate_mrf_writes) || progress;
> +
> + progress = OPT(opt_algebraic) || progress;
> + progress = OPT(opt_cse) || progress;
> + progress = OPT(opt_copy_propagate) || progress;
> + progress = OPT(opt_peephole_predicated_break) || progress;
> + progress = OPT(dead_code_eliminate) || progress;
> + progress = OPT(dead_code_eliminate_local) || progress;
> + progress = OPT(opt_peephole_sel) || progress;
> + progress = OPT(dead_control_flow_eliminate, this) 

Re: [Mesa-dev] [PATCH 1/5] dri: Expand driParseDebugString return value to uint64_t.

2014-04-13 Thread Kenneth Graunke
On 04/11/2014 10:29 PM, Matt Turner wrote:
> Users will downcast if they don't have >32 debug flags.
> ---
>  src/mesa/drivers/dri/common/utils.c | 7 +++
>  src/mesa/drivers/dri/common/utils.h | 4 ++--
>  2 files changed, 5 insertions(+), 6 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/common/utils.c 
> b/src/mesa/drivers/dri/common/utils.c
> index 1f29e0b..eee77ec 100644
> --- a/src/mesa/drivers/dri/common/utils.c
> +++ b/src/mesa/drivers/dri/common/utils.c
> @@ -32,6 +32,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include "main/macros.h"
>  #include "main/mtypes.h"
>  #include "main/cpuinfo.h"
> @@ -40,14 +41,12 @@
>  #include "dri_util.h"
>  
>  
> -unsigned
> +uint64_t
>  driParseDebugString( const char * debug, 
>const struct dri_debug_control * control  )
>  {
> -   unsigned   flag;
> -
> +   uint64_t flag = 0;
>  
> -   flag = 0;
> if ( debug != NULL ) {
>while( control->string != NULL ) {
>if ( !strcmp( debug, "all" ) ||
> diff --git a/src/mesa/drivers/dri/common/utils.h 
> b/src/mesa/drivers/dri/common/utils.h
> index 0941434..3760c38 100644
> --- a/src/mesa/drivers/dri/common/utils.h
> +++ b/src/mesa/drivers/dri/common/utils.h
> @@ -34,10 +34,10 @@
>  
>  struct dri_debug_control {
>  const char * string;
> -unsigned flag;
> +uint64_t flag;
>  };
>  
> -extern unsigned driParseDebugString( const char * debug,
> +extern uint64_t driParseDebugString( const char * debug,
>  const struct dri_debug_control * control );
>  
>  extern unsigned driGetRendererString( char * buffer,
> 

Patches 1-2 are:
Reviewed-by: Kenneth Graunke 



signature.asc
Description: OpenPGP digital signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 4/5] i965: Let dump_instructions write to a file.

2014-04-13 Thread Matt Turner
On Sun, Apr 13, 2014 at 12:12 AM, Kenneth Graunke  wrote:
> On 04/11/2014 10:29 PM, Matt Turner wrote:
>> ---
>>  src/mesa/drivers/dri/i965/brw_fs.cpp  | 129 
>> +-
>>  src/mesa/drivers/dri/i965/brw_fs.h|   2 +
>>  src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp |   2 +-
>>  src/mesa/drivers/dri/i965/brw_shader.cpp  |  20 +++-
>>  src/mesa/drivers/dri/i965/brw_shader.h|   2 +
>>  src/mesa/drivers/dri/i965/brw_vec4.cpp| 104 +
>>  src/mesa/drivers/dri/i965/brw_vec4.h  |   1 +
>>  7 files changed, 154 insertions(+), 106 deletions(-)
>>
>> diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
>> b/src/mesa/drivers/dri/i965/brw_fs.cpp
>> index 85a5463..c1d79e1 100644
>> --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
>> +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
>> @@ -2810,197 +2810,218 @@ fs_visitor::lower_uniform_pull_constant_loads()
>>  void
>>  fs_visitor::dump_instructions()
>>  {
>> +   dump_instructions(NULL);
>> +}
>> +
>> +void
>> +fs_visitor::dump_instructions(const char *name)
>> +{
>> +   invalidate_live_intervals();
>> calculate_register_pressure();
>> +   FILE *file = stderr;
>> +   if (name) {
>> +  file = fopen(name, "w");
>> +   }
>
> You're playing with fire here.  Opening files is dangerous, considering
> that the X server ofter runs as root and executes this code.  One
> mistake in debugging code, and...root exploit.  We may want to restrict
> this to non-root users by default, and maybe only in debug builds...
>
> At any rate, I'd like to see patches 3-5 respun as follows:
>
> 1. Make dump_instruction() and dump_instructions() take FILE *
> parameters, and do the s/stderr/file/g sed job you've done here.  This
> is the bulk of the change, and easily justifiable - we can always use
> stdout vs. stderr or the like.
>
> 2. Introduce any code that opens files for writing.  With this being
> separate, we can think through the security implications.
>
> 3. Your actual optimization debugging.  I would probably squash patch 3
> into this patch, since it doesn't really stand alone.  *shrug*
>
> [snip]
>
>> diff --git a/src/mesa/drivers/dri/i965/brw_vec4.h 
>> b/src/mesa/drivers/dri/i965/brw_vec4.h
>> index 159a5bd..43c3f64 100644
>> --- a/src/mesa/drivers/dri/i965/brw_vec4.h
>> +++ b/src/mesa/drivers/dri/i965/brw_vec4.h
>> @@ -597,6 +597,7 @@ public:
>> bool process_move_condition(ir_rvalue *ir);
>>
>> void dump_instruction(backend_instruction *inst);
>> +   void dump_instruction(backend_instruction *inst, FILE *file);
>
> Usually, this is done via a default argument:
>
> void dump_instruction(backend_instruction *inst, FILE *file = stderr);
>
> That would probably be cleaner.

I didn't do that on purpose, because gdb doesn't handle default
parameters when you call functions from it.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 5/5] i965/fs: Debug the optimization passes by dumping instr to file.

2014-04-13 Thread Matt Turner
On Sun, Apr 13, 2014 at 12:16 AM, Kenneth Graunke  wrote:
> On 04/11/2014 10:29 PM, Matt Turner wrote:
>> With INTEL_DEBUG=optimizer, write the output of dump_instructions() to a
>> file each time an optimization pass makes progress. This lets you easily
>> diff successive files to see what an optimization pass did.
>>
>> Example filenames written when running glxgears:
>>fs8-00-00-start
>>fs8-00-01-04-opt_copy_propagate
>>fs8-00-01-06-dead_code_eliminate
>>fs8-00-01-12-compute_to_mrf
>>fs8-00-02-06-dead_code_eliminate
>>|   |  |   |
>>|   |  |   `-- optimization pass name
>>|   |  |
>>|   |  `-- optimization pass number in the loop
>>|   |
>>|   `-- optimization loop interation
>>|
>>`-- shader program number
>>
>> Note that with INTEL_DEBUG=optimizer, we disable compact_virtual_grfs,
>> so that we can diff instruction lists across loop interations without
>> the register numbers being changes.
>> ---
>>  src/mesa/drivers/dri/i965/brw_fs.cpp | 56 
>> +++-
>>  1 file changed, 43 insertions(+), 13 deletions(-)
>>
>> diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
>> b/src/mesa/drivers/dri/i965/brw_fs.cpp
>> index c1d79e1..666cfa5b 100644
>> --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
>> +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
>> @@ -1704,6 +1704,9 @@ fs_visitor::split_virtual_grfs()
>>  void
>>  fs_visitor::compact_virtual_grfs()
>>  {
>> +   if (unlikely(INTEL_DEBUG & DEBUG_OPTIMIZER))
>> +  return;
>> +
>> /* Mark which virtual GRFs are used, and count how many. */
>> int remap_table[this->virtual_grf_count];
>> memset(remap_table, -1, sizeof(remap_table));
>> @@ -3258,25 +3261,52 @@ fs_visitor::run()
>>
>>opt_drop_redundant_mov_to_flags();
>>
>> +#define OPT(pass, args...) \
>> +   ({  \
>> +  pass_num++;  \
>> +  bool progress = pass(args);  \
>> +   \
>> +  if (unlikely(INTEL_DEBUG & DEBUG_OPTIMIZER) && progress) {   \
>> + char filename[64];\
>> + snprintf(filename, 64, "fs%d-%02d-%02d-%02d-" #pass,  \
>> +  dispatch_width, shader_prog->Name, iteration, pass_num); \
>> +   \
>> + backend_visitor::dump_instructions(filename);  
>> \
>> +  }\
>> +   \
>> +  progress;\
>> +   })
>
> I like the use of the macro.  But...folding in the "progress = ... ||
> progress" might be nice too.  It would also let us stick to standard C
> here, rather than GNU extensions.

I don't see a problem with using extensions, especially when we
already use them. But decluttering the loop seems like a good idea.

>> +
>> +  if (unlikely(INTEL_DEBUG & DEBUG_OPTIMIZER)) {
>> + char filename[64];
>> + snprintf(filename, 64, "fs%d-%02d-00-start",
>> +  dispatch_width, shader_prog->Name);
>
> I am really unexcited about fixed length buffers, although you did do
> snprintf, and it'll probably not be a problem in practice.  Why not just
> asprintf or ralloc_asprintf it and then free it?  It's no more code and
> will just work.

Given the lengths of our optimization pass names, 64 seems like
plenty, and if it's not that worst that happens is we lose a few
characters from our optimization pass's name. My compiler complains
about not checking the return type of asprintf, and adding a memory
allocation failure check in the macro felt really stupid.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 01/10] winsys/radeon: remove cs_write_reloc, add simpler cs_get_reloc

2014-04-13 Thread Christian König

For this series: Reviewed-by: Christian König 

BTW: Can we get rid of RADEON_FLUSH_ASYNC and always flush 
asynchronously? Just calling cs_sync_flush directly after the flush 
should have the same effect as synchronous flushing.


Christian.

Am 12.04.2014 18:34, schrieb Marek Olšák:

From: Marek Olšák 

The only difference is that it doesn't write to the CS and only returns
the index.
---
  src/gallium/drivers/r300/r300_cs.h|  3 ++-
  src/gallium/drivers/r300/r300_emit.c  |  4 ++--
  src/gallium/winsys/radeon/drm/radeon_drm_cs.c | 26 +-
  src/gallium/winsys/radeon/drm/radeon_winsys.h | 19 ++-
  4 files changed, 23 insertions(+), 29 deletions(-)

diff --git a/src/gallium/drivers/r300/r300_cs.h 
b/src/gallium/drivers/r300/r300_cs.h
index 744e19e..748d6ea 100644
--- a/src/gallium/drivers/r300/r300_cs.h
+++ b/src/gallium/drivers/r300/r300_cs.h
@@ -109,7 +109,8 @@
  #define OUT_CS_RELOC(r) do { \
  assert((r)); \
  assert((r)->cs_buf); \
-cs_winsys->cs_write_reloc(cs_copy, (r)->cs_buf); \
+OUT_CS(0xc0001000); /* PKT3_NOP */ \
+OUT_CS(cs_winsys->cs_get_reloc(cs_copy, (r)->cs_buf) * 4); \
  CS_USED_DW(2); \
  } while (0)
  
diff --git a/src/gallium/drivers/r300/r300_emit.c b/src/gallium/drivers/r300/r300_emit.c

index d99b919..9f16413 100644
--- a/src/gallium/drivers/r300/r300_emit.c
+++ b/src/gallium/drivers/r300/r300_emit.c
@@ -1043,8 +1043,8 @@ void r300_emit_vertex_arrays_swtcl(struct r300_context 
*r300, boolean indexed)
  OUT_CS(0);
  
  assert(r300->vbo_cs);

-cs_winsys->cs_write_reloc(cs_copy, r300->vbo_cs);
-CS_USED_DW(2);
+OUT_CS(0xc0001000); /* PKT3_NOP */
+OUT_CS(r300->rws->cs_get_reloc(r300->cs, r300->vbo_cs) * 4);
  END_CS;
  }
  
diff --git a/src/gallium/winsys/radeon/drm/radeon_drm_cs.c b/src/gallium/winsys/radeon/drm/radeon_drm_cs.c

index b55eb80..4ce1717 100644
--- a/src/gallium/winsys/radeon/drm/radeon_drm_cs.c
+++ b/src/gallium/winsys/radeon/drm/radeon_drm_cs.c
@@ -313,6 +313,14 @@ static unsigned radeon_drm_cs_add_reloc(struct 
radeon_winsys_cs *rcs,
  return index;
  }
  
+static int radeon_drm_cs_get_reloc(struct radeon_winsys_cs *rcs,

+   struct radeon_winsys_cs_handle *buf)
+{
+struct radeon_drm_cs *cs = radeon_drm_cs(rcs);
+
+return radeon_get_reloc(cs->csc, (struct radeon_bo*)buf, NULL);
+}
+
  static boolean radeon_drm_cs_validate(struct radeon_winsys_cs *rcs)
  {
  struct radeon_drm_cs *cs = radeon_drm_cs(rcs);
@@ -359,22 +367,6 @@ static boolean radeon_drm_cs_memory_below_limit(struct 
radeon_winsys_cs *rcs, ui
  return status;
  }
  
-static void radeon_drm_cs_write_reloc(struct radeon_winsys_cs *rcs,

-  struct radeon_winsys_cs_handle *buf)
-{
-struct radeon_drm_cs *cs = radeon_drm_cs(rcs);
-struct radeon_bo *bo = (struct radeon_bo*)buf;
-unsigned index = radeon_get_reloc(cs->csc, bo, NULL);
-
-if (index == -1) {
-fprintf(stderr, "radeon: Cannot get a relocation in %s.\n", __func__);
-return;
-}
-
-OUT_CS(&cs->base, 0xc0001000);
-OUT_CS(&cs->base, index * RELOC_DWORDS);
-}
-
  void radeon_drm_cs_emit_ioctl_oneshot(struct radeon_drm_cs *cs, struct 
radeon_cs_context *csc)
  {
  unsigned i;
@@ -650,9 +642,9 @@ void radeon_drm_cs_init_functions(struct radeon_drm_winsys 
*ws)
  ws->base.cs_create = radeon_drm_cs_create;
  ws->base.cs_destroy = radeon_drm_cs_destroy;
  ws->base.cs_add_reloc = radeon_drm_cs_add_reloc;
+ws->base.cs_get_reloc = radeon_drm_cs_get_reloc;
  ws->base.cs_validate = radeon_drm_cs_validate;
  ws->base.cs_memory_below_limit = radeon_drm_cs_memory_below_limit;
-ws->base.cs_write_reloc = radeon_drm_cs_write_reloc;
  ws->base.cs_flush = radeon_drm_cs_flush;
  ws->base.cs_set_flush_callback = radeon_drm_cs_set_flush;
  ws->base.cs_is_buffer_referenced = radeon_bo_is_referenced;
diff --git a/src/gallium/winsys/radeon/drm/radeon_winsys.h 
b/src/gallium/winsys/radeon/drm/radeon_winsys.h
index 485e925..320989c 100644
--- a/src/gallium/winsys/radeon/drm/radeon_winsys.h
+++ b/src/gallium/winsys/radeon/drm/radeon_winsys.h
@@ -450,6 +450,16 @@ struct radeon_winsys {
   enum radeon_bo_priority priority);
  
  /**

+ * Return the index of an already-added buffer.
+ *
+ * \param csCommand stream
+ * \param buf   Buffer
+ * \return  The buffer index, or -1 if the buffer has not been 
added.
+ */
+int (*cs_get_reloc)(struct radeon_winsys_cs *cs,
+struct radeon_winsys_cs_handle *buf);
+
+/**
   * Return TRUE if there is enough memory in VRAM and GTT for the relocs
   * added so far. If the validation fails, all the relocations which have
   * been added since the last call of cs_validate will be removed and
@@ -470,15 +480,6 @@ struct radeon_winsys {
  boolean (*cs_memory_

Re: [Mesa-dev] [PATCH 4/5] i965: Let dump_instructions write to a file.

2014-04-13 Thread Kenneth Graunke
On 04/13/2014 12:53 AM, Matt Turner wrote:
> On Sun, Apr 13, 2014 at 12:12 AM, Kenneth Graunke  
> wrote:
>> On 04/11/2014 10:29 PM, Matt Turner wrote:
>>> ---
>>>  src/mesa/drivers/dri/i965/brw_fs.cpp  | 129 
>>> +-
>>>  src/mesa/drivers/dri/i965/brw_fs.h|   2 +
>>>  src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp |   2 +-
>>>  src/mesa/drivers/dri/i965/brw_shader.cpp  |  20 +++-
>>>  src/mesa/drivers/dri/i965/brw_shader.h|   2 +
>>>  src/mesa/drivers/dri/i965/brw_vec4.cpp| 104 +
>>>  src/mesa/drivers/dri/i965/brw_vec4.h  |   1 +
>>>  7 files changed, 154 insertions(+), 106 deletions(-)
>>>
>>> diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
>>> b/src/mesa/drivers/dri/i965/brw_fs.cpp
>>> index 85a5463..c1d79e1 100644
>>> --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
>>> +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
>>> @@ -2810,197 +2810,218 @@ fs_visitor::lower_uniform_pull_constant_loads()
>>>  void
>>>  fs_visitor::dump_instructions()
>>>  {
>>> +   dump_instructions(NULL);
>>> +}
>>> +
>>> +void
>>> +fs_visitor::dump_instructions(const char *name)
>>> +{
>>> +   invalidate_live_intervals();
>>> calculate_register_pressure();
>>> +   FILE *file = stderr;
>>> +   if (name) {
>>> +  file = fopen(name, "w");
>>> +   }
>>
>> You're playing with fire here.  Opening files is dangerous, considering
>> that the X server ofter runs as root and executes this code.  One
>> mistake in debugging code, and...root exploit.  We may want to restrict
>> this to non-root users by default, and maybe only in debug builds...
>>
>> At any rate, I'd like to see patches 3-5 respun as follows:
>>
>> 1. Make dump_instruction() and dump_instructions() take FILE *
>> parameters, and do the s/stderr/file/g sed job you've done here.  This
>> is the bulk of the change, and easily justifiable - we can always use
>> stdout vs. stderr or the like.
>>
>> 2. Introduce any code that opens files for writing.  With this being
>> separate, we can think through the security implications.
>>
>> 3. Your actual optimization debugging.  I would probably squash patch 3
>> into this patch, since it doesn't really stand alone.  *shrug*
>>
>> [snip]
>>
>>> diff --git a/src/mesa/drivers/dri/i965/brw_vec4.h 
>>> b/src/mesa/drivers/dri/i965/brw_vec4.h
>>> index 159a5bd..43c3f64 100644
>>> --- a/src/mesa/drivers/dri/i965/brw_vec4.h
>>> +++ b/src/mesa/drivers/dri/i965/brw_vec4.h
>>> @@ -597,6 +597,7 @@ public:
>>> bool process_move_condition(ir_rvalue *ir);
>>>
>>> void dump_instruction(backend_instruction *inst);
>>> +   void dump_instruction(backend_instruction *inst, FILE *file);
>>
>> Usually, this is done via a default argument:
>>
>> void dump_instruction(backend_instruction *inst, FILE *file = stderr);
>>
>> That would probably be cleaner.
> 
> I didn't do that on purpose, because gdb doesn't handle default
> parameters when you call functions from it.

Oh, that makes sense.  Adding extra overloads seems fine then.

--Ken



signature.asc
Description: OpenPGP digital signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/7] translate_sse: Rename translate_buffer_variant to translate_group.

2014-04-13 Thread Andreas Hartmetz
From: Andreas Hartmetz 

It is a better description of what goes on and it's shorter, too.
---
 src/gallium/auxiliary/translate/translate_sse.c | 86 -
 1 file changed, 43 insertions(+), 43 deletions(-)

diff --git a/src/gallium/auxiliary/translate/translate_sse.c 
b/src/gallium/auxiliary/translate/translate_sse.c
index 24d8017..e25d450 100644
--- a/src/gallium/auxiliary/translate/translate_sse.c
+++ b/src/gallium/auxiliary/translate/translate_sse.c
@@ -61,7 +61,7 @@ struct translate_buffer
  * This describes a group of trivially compatible translations.
  * ptr is only used at runtime (not code generation time).
  */
-struct translate_buffer_variant
+struct translate_group
 {
unsigned buffer_index;
unsigned instance_divisor;
@@ -115,11 +115,11 @@ struct translate_sse
unsigned nr_buffers;
 
/* Multiple buffer variants can map to a single buffer. */
-   struct translate_buffer_variant buffer_variant[TRANSLATE_MAX_ATTRIBS];
-   unsigned nr_buffer_variants;
+   struct translate_group group[TRANSLATE_MAX_ATTRIBS];
+   unsigned nr_groups;
 
/* Multiple elements can map to a single buffer variant. */
-   unsigned element_to_buffer_variant[TRANSLATE_MAX_ATTRIBS];
+   unsigned element_to_group[TRANSLATE_MAX_ATTRIBS];
 
boolean use_instancing;
unsigned instance_id;
@@ -1085,17 +1085,17 @@ init_inputs(struct translate_sse *p, unsigned 
index_size)
struct x86_reg start_instance =
   x86_make_disp(p->machine_EDI, get_offset(p, &p->start_instance));
 
-   for (i = 0; i < p->nr_buffer_variants; i++) {
-  struct translate_buffer_variant *variant = &p->buffer_variant[i];
-  struct translate_buffer *buffer = &p->buffer[variant->buffer_index];
+   for (i = 0; i < p->nr_groups; i++) {
+  struct translate_group *group = &p->group[i];
+  struct translate_buffer *buffer = &p->buffer[group->buffer_index];
 
-  if (!index_size || variant->instance_divisor) {
+  if (!index_size || group->instance_divisor) {
  struct x86_reg buf_max_index =
 x86_make_disp(p->machine_EDI, get_offset(p, &buffer->max_index));
  struct x86_reg buf_stride =
 x86_make_disp(p->machine_EDI, get_offset(p, &buffer->stride));
  struct x86_reg buf_ptr =
-x86_make_disp(p->machine_EDI, get_offset(p, &variant->ptr));
+x86_make_disp(p->machine_EDI, get_offset(p, &group->ptr));
  struct x86_reg buf_base_ptr =
 x86_make_disp(p->machine_EDI, get_offset(p, &buffer->base_ptr));
  struct x86_reg elt = p->idx_ESI;
@@ -1104,13 +1104,13 @@ init_inputs(struct translate_sse *p, unsigned 
index_size)
  /* Calculate pointer to first attrib:
   *   base_ptr + stride * index, where index depends on instance 
divisor
   */
- if (variant->instance_divisor) {
+ if (group->instance_divisor) {
 /* Start with instance = instance_id
  * which is true if divisor is 1.
  */
 x86_mov(p->func, tmp_EAX, instance_id);
 
-if (variant->instance_divisor != 1) {
+if (group->instance_divisor != 1) {
struct x86_reg tmp_EDX = p->tmp2_EDX;
struct x86_reg tmp_ECX = p->src_ECX;
 
@@ -1118,7 +1118,7 @@ init_inputs(struct translate_sse *p, unsigned index_size)
 *   instance divisor is power of two.
 */
x86_xor(p->func, tmp_EDX, tmp_EDX);
-   x86_mov_reg_imm(p->func, tmp_ECX, variant->instance_divisor);
+   x86_mov_reg_imm(p->func, tmp_ECX, group->instance_divisor);
x86_div(p->func, tmp_ECX);   /* EAX = EDX:EAX / ECX */
 
/* instance = (instance_id - start_instance) / divisor + 
@@ -1153,7 +1153,7 @@ init_inputs(struct translate_sse *p, unsigned index_size)
  /* In the linear case, keep the buffer pointer instead of the
   * index number.
   */
- if (!index_size && p->nr_buffer_variants == 1) {
+ if (!index_size && p->nr_groups == 1) {
 x64_rexw(p->func);
 x86_mov(p->func, elt, tmp_EAX);
  }
@@ -1175,14 +1175,14 @@ get_buffer_ptr(struct translate_sse *p,
if (var_idx == ELEMENT_BUFFER_INSTANCE_ID) {
   return x86_make_disp(p->machine_EDI, get_offset(p, &p->instance_id));
}
-   if (!index_size && p->nr_buffer_variants == 1) {
+   if (!index_size && p->nr_groups == 1) {
   return p->idx_ESI;
}
-   else if (!index_size || p->buffer_variant[var_idx].instance_divisor) {
+   else if (!index_size || p->group[var_idx].instance_divisor) {
   struct x86_reg ptr = p->src_ECX;
   struct x86_reg buf_ptr =
  x86_make_disp(p->machine_EDI,
-   get_offset(p, &p->buffer_variant[var_idx].ptr));
+   get_offset(p, &p->group[var_idx].ptr));
 
   x64_rexw(p->func);
   x86_mov(p->func, ptr, buf_ptr);
@@ -1190,17 +1190,17 @@ get_buffer_ptr(struct

[Mesa-dev] [PATCH 5/7] translate_sse: Preserve low bit during unsigned -> float conversion.

2014-04-13 Thread Andreas Hartmetz
From: Andreas Hartmetz 

---
 src/gallium/auxiliary/translate/translate_sse.c | 35 -
 1 file changed, 28 insertions(+), 7 deletions(-)

diff --git a/src/gallium/auxiliary/translate/translate_sse.c 
b/src/gallium/auxiliary/translate/translate_sse.c
index dace722..03d8276 100644
--- a/src/gallium/auxiliary/translate/translate_sse.c
+++ b/src/gallium/auxiliary/translate/translate_sse.c
@@ -71,7 +71,7 @@ struct translate_group
 
 #define ELEMENT_BUFFER_INSTANCE_ID  1001
 
-#define NUM_CONSTS 7
+#define NUM_CONSTS 8
 
 enum
 {
@@ -81,6 +81,7 @@ enum
CONST_INV_32767,
CONST_INV_65535,
CONST_INV_2147483647,
+   CONST_INV_4294967295,
CONST_255
 };
 
@@ -92,6 +93,7 @@ static float consts[NUM_CONSTS][4] = {
C(1.0 / 32767.0),
C(1.0 / 65535.0),
C(1.0 / 2147483647.0),
+   C(1.0 / 4294967295.0),
C(255.0)
 };
 
@@ -515,6 +517,7 @@ translate_attr_convert(struct translate_sse *p,
 || a->output_format == PIPE_FORMAT_R32G32B32_FLOAT
 || a->output_format == PIPE_FORMAT_R32G32B32A32_FLOAT)) {
   struct x86_reg dataXMM = x86_make_reg(file_XMM, 0);
+  struct x86_reg dataXMM2 = x86_make_reg(file_XMM, 1);
 
   for (i = 0; i < output_desc->nr_channels; ++i) {
  if (swizzle[i] == UTIL_FORMAT_SWIZZLE_0
@@ -546,17 +549,38 @@ translate_attr_convert(struct translate_sse *p,
 */
sse2_punpcklbw(p->func, dataXMM, get_const(p, CONST_IDENTITY));
sse2_punpcklbw(p->func, dataXMM, get_const(p, CONST_IDENTITY));
+   sse2_cvtdq2ps(p->func, dataXMM, dataXMM);
break;
 case 16:
sse2_punpcklwd(p->func, dataXMM, get_const(p, CONST_IDENTITY));
+   sse2_cvtdq2ps(p->func, dataXMM, dataXMM);
break;
-case 32:   /* we lose precision here */
+case 32:   /* we lose precision if value > 2^23 - 1 */
+   /* ...but try to keep all bits for value <= 2^23 - 1 */
+   /* this could be done much smarter for 1 or 2 dwords of input
+* on X64, using sth like "cvtsi2ss xmm0, rax" (note *R*ax) */
+
+   /* save the low bit */
+   sse_movss(p->func, dataXMM2, dataXMM);
+   /* right shift & convert, losing the low bit - must clear
+* high bit because there is no unsigned convert instruction */
sse2_psrld_imm(p->func, dataXMM, 1);
+   sse2_cvtdq2ps(p->func, dataXMM, dataXMM);
+
+   /* convert low bit to float */
+   sse2_pslld_imm(p->func, dataXMM2, 31);
+   sse2_psrld_imm(p->func, dataXMM2, 31);
+   sse2_cvtdq2ps(p->func, dataXMM2, dataXMM2);
+
+   /* mostly undo the right-shift by 1 */
+   sse_addps(p->func, dataXMM, dataXMM);
+   /* add back the lost low bit */
+   sse_addps(p->func, dataXMM, dataXMM2);
break;
 default:
return FALSE;
 }
-sse2_cvtdq2ps(p->func, dataXMM, dataXMM);
+
 if (input_desc->channel[0].normalized) {
struct x86_reg factor;
switch (input_desc->channel[0].size) {
@@ -567,7 +591,7 @@ translate_attr_convert(struct translate_sse *p,
   factor = get_const(p, CONST_INV_65535);
   break;
case 32:
-  factor = get_const(p, CONST_INV_2147483647);
+  factor = get_const(p, CONST_INV_4294967295);
   break;
default:
   assert(0);
@@ -579,9 +603,6 @@ translate_attr_convert(struct translate_sse *p,
}
sse_mulps(p->func, dataXMM, factor);
 }
-else if (input_desc->channel[0].size == 32)
-   /* compensate for the bit we threw away to fit u32 into s32 */
-   sse_addps(p->func, dataXMM, dataXMM);
 break;
  case UTIL_FORMAT_TYPE_SIGNED:
 if (!(x86_target_caps(p->func) & X86_SSE2))
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 0/7] translate_sse fixes

2014-04-13 Thread Andreas Hartmetz
Hello, I noticed and incorrect uint->float conversion of vertex shader
input in an application I'm working on, so I set out to fix it.
In the meantime the application changed to avoid that conversion
because it seems broken in too many drivers.

So anyway, this series contains two actual bug fixes in patches 4 and
5. The others are non-functional changes, mostly for readability.

Andreas Hartmetz (7):
  translate_sse: Explain what struct translate_buffer_variant is for.
  translate_sse: Rename translate_buffer_variant to translate_group.
  translate_sse: Make those comments less abstract.
  translate_sse: Use the correct buffer index in this fast path.
  translate_sse: Preserve low bit during unsigned -> float conversion.
  translate_sse: Be more specific in comment about loss of precision.
  translate_sse: Remove a stray break;

 src/gallium/auxiliary/translate/translate_sse.c | 140 ++--
 1 file changed, 84 insertions(+), 56 deletions(-)

-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 7/7] translate_sse: Remove a stray break;

2014-04-13 Thread Andreas Hartmetz
From: Andreas Hartmetz 

---
 src/gallium/auxiliary/translate/translate_sse.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/src/gallium/auxiliary/translate/translate_sse.c 
b/src/gallium/auxiliary/translate/translate_sse.c
index ca02b7a..992c5f6 100644
--- a/src/gallium/auxiliary/translate/translate_sse.c
+++ b/src/gallium/auxiliary/translate/translate_sse.c
@@ -651,8 +651,6 @@ translate_attr_convert(struct translate_sse *p,
sse_mulps(p->func, dataXMM, factor);
 }
 break;
-
-break;
  case UTIL_FORMAT_TYPE_FLOAT:
 if (input_desc->channel[0].size != 32
 && input_desc->channel[0].size != 64) {
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 6/7] translate_sse: Be more specific in comment about loss of precision.

2014-04-13 Thread Andreas Hartmetz
From: Andreas Hartmetz 

The only loss of precision here due to intrinsic properties of
unsigned int and float.
---
 src/gallium/auxiliary/translate/translate_sse.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/gallium/auxiliary/translate/translate_sse.c 
b/src/gallium/auxiliary/translate/translate_sse.c
index 03d8276..ca02b7a 100644
--- a/src/gallium/auxiliary/translate/translate_sse.c
+++ b/src/gallium/auxiliary/translate/translate_sse.c
@@ -622,7 +622,7 @@ translate_attr_convert(struct translate_sse *p,
sse2_punpcklwd(p->func, dataXMM, dataXMM);
sse2_psrad_imm(p->func, dataXMM, 16);
break;
-case 32:   /* we lose precision here */
+case 32:   /* we lose precision if abs(value) > 2^23 - 1 */
break;
 default:
return FALSE;
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 4/7] translate_sse: Use the correct buffer index in this fast path.

2014-04-13 Thread Andreas Hartmetz
From: Andreas Hartmetz 

It is possible that there are multiple input buffers but only one is
relevant for translation. Then there will be only a single translation
group, which might need to source data from a buffer index != 0.

Fixes wrong vertex shader inputs as observed while debugging with an
application and driver combination that requires translation of a
vertex attribute in a non-trivial set of attributes and input buffers.
---
 src/gallium/auxiliary/translate/translate_sse.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/src/gallium/auxiliary/translate/translate_sse.c 
b/src/gallium/auxiliary/translate/translate_sse.c
index 8dd30c4..dace722 100644
--- a/src/gallium/auxiliary/translate/translate_sse.c
+++ b/src/gallium/auxiliary/translate/translate_sse.c
@@ -1235,8 +1235,10 @@ static boolean
 incr_inputs(struct translate_sse *p, unsigned index_size)
 {
if (!index_size && p->nr_groups == 1) {
+  const unsigned buffer_index = p->group[0].buffer_index;
   struct x86_reg stride =
- x86_make_disp(p->machine_EDI, get_offset(p, &p->buffer[0].stride));
+ x86_make_disp(p->machine_EDI,
+   get_offset(p, &p->buffer[buffer_index].stride));
 
   if (p->group[0].instance_divisor == 0) {
  x64_rexw(p->func);
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/7] translate_sse: Explain what struct translate_buffer_variant is for.

2014-04-13 Thread Andreas Hartmetz
From: Andreas Hartmetz 

---
 src/gallium/auxiliary/translate/translate_sse.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/src/gallium/auxiliary/translate/translate_sse.c 
b/src/gallium/auxiliary/translate/translate_sse.c
index 1b698cd..24d8017 100644
--- a/src/gallium/auxiliary/translate/translate_sse.c
+++ b/src/gallium/auxiliary/translate/translate_sse.c
@@ -54,6 +54,13 @@ struct translate_buffer
unsigned max_index;
 };
 
+/* We can only trivially chain together compatible translations. Additional
+ * state changes (e.g. reloading  input pointers) are required between
+ * incompatible translations. So we group compatible translations together
+ * to minimize those state changes.
+ * This describes a group of trivially compatible translations.
+ * ptr is only used at runtime (not code generation time).
+ */
 struct translate_buffer_variant
 {
unsigned buffer_index;
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/7] translate_sse: Make those comments less abstract.

2014-04-13 Thread Andreas Hartmetz
From: Andreas Hartmetz 

---
 src/gallium/auxiliary/translate/translate_sse.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/src/gallium/auxiliary/translate/translate_sse.c 
b/src/gallium/auxiliary/translate/translate_sse.c
index e25d450..8dd30c4 100644
--- a/src/gallium/auxiliary/translate/translate_sse.c
+++ b/src/gallium/auxiliary/translate/translate_sse.c
@@ -114,11 +114,11 @@ struct translate_sse
struct translate_buffer buffer[TRANSLATE_MAX_ATTRIBS];
unsigned nr_buffers;
 
-   /* Multiple buffer variants can map to a single buffer. */
+   /* A buffer can require multiple groups to translate all elements */
struct translate_group group[TRANSLATE_MAX_ATTRIBS];
unsigned nr_groups;
 
-   /* Multiple elements can map to a single buffer variant. */
+   /* A group can translate multiple elements */
unsigned element_to_group[TRANSLATE_MAX_ATTRIBS];
 
boolean use_instancing;
@@ -1515,7 +1515,7 @@ translate_sse2_create(const struct translate_key *key)
  }
 
  /*
-  * Map vertex element to vertex buffer group.
+  * Find or create a compatible group for the element
   */
  for (j = 0; j < p->nr_groups; j++) {
 if (p->group[j].buffer_index ==
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/2] i965: Disable Z16 in all APIs.

2014-04-13 Thread Kenneth Graunke
We originally thought that GL 3.0 required GL_DEPTH_COMPONENT16 to map
exactly to Z16.  However, we misread the specification, thanks in part
to LaTeX reordering the tables in the PDF.

Page 180 of the GL 3.0 specification (glspec30.20080923.pdf) says:
"[...] memory allocation per texture component is assigned by the GL to
match the allocations listed in tables 3.16-3.18 as closely as possible.
[...]

Required Texture Formats
[...]
In addition, implementations are required to support the following sized
internal formats.  Requesting one of these internal formats for any
texture type will allocate exactly the internal component sizes and
types shown for that format in tables 3.16-3.17:"

Notably, however, GL_DEPTH_COMPONENT16 does /not/ appear in table 3.16
or table 3.17.  It appears in table 3.18, where the "exact" rule doesn't
apply, and it falls back to the "closely as possible" rule.

The confusing part is that the ordering of the tables in the PDF is:

Table 3.16 (pages 182-184)
Table 3.18 (bottom of page 184 to top of 185)
Table 3.17 (page 185)

Presumably, people saw table 3.16, then saw the table immediately
following with DEPTH_COMPONENT* formats, and assumed it was 3.17.

Based on a batch by Chia-I Wu, but without the driconf option to force
Z16 to be used.  It's not required, and there's apparently no benefit
to actually using it.

Signed-off-by: Kenneth Graunke 
---
 src/mesa/drivers/dri/i965/brw_surface_formats.c | 6 --
 1 file changed, 6 deletions(-)

Sorry if this is a duplicate of an earlier patch...the only one I could
find in my inbox was the one with the driconf option.

diff --git a/src/mesa/drivers/dri/i965/brw_surface_formats.c 
b/src/mesa/drivers/dri/i965/brw_surface_formats.c
index 196f139..5907dd9 100644
--- a/src/mesa/drivers/dri/i965/brw_surface_formats.c
+++ b/src/mesa/drivers/dri/i965/brw_surface_formats.c
@@ -608,7 +608,6 @@ brw_init_surface_formats(struct brw_context *brw)
brw->format_supported_as_render_target[MESA_FORMAT_Z24_UNORM_S8_UINT] = 
true;
brw->format_supported_as_render_target[MESA_FORMAT_Z24_UNORM_X8_UINT] = 
true;
brw->format_supported_as_render_target[MESA_FORMAT_S_UINT8] = true;
-   brw->format_supported_as_render_target[MESA_FORMAT_Z_UNORM16] = true;
brw->format_supported_as_render_target[MESA_FORMAT_Z_FLOAT32] = true;
brw->format_supported_as_render_target[MESA_FORMAT_Z32_FLOAT_S8X24_UINT] = 
true;
 
@@ -630,12 +629,7 @@ brw_init_surface_formats(struct brw_context *brw)
 *
 * Other speculation is that we may be hitting increased fragment shader
 * execution from GL_LEQUAL/GL_EQUAL depth tests at reduced precision.
-*
-* However, desktop GL 3.0+ require that you get exactly 16 bits when
-* asking for DEPTH_COMPONENT16, so we have to respect that.
 */
-   if (_mesa_is_desktop_gl(ctx))
-  ctx->TextureFormatSupported[MESA_FORMAT_Z_UNORM16] = true;
 
/* On hardware that lacks support for ETC1, we map ETC1 to RGBX
 * during glCompressedTexImage2D(). See intel_mipmap_tree::wraps_etc1.
-- 
1.9.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/2] i965: Update comments about Z16 being slow.

2014-04-13 Thread Kenneth Graunke
We've learned a few things since we originally disabled Z16; this attempts
to summarize the issue.  I am no expert on this subject, though, so the
comment may not be totally accurate.

I did some benchmarking on GM45 and Ironlake, and discovered that for
GLBenchmark 2.7 EgyptHD, using Z16 was 3% slower on GM45 (n=15), and
4.5% slower on Ironlake (n=95).  So, we can drop the "on Ivybridge"
aspect of the comment - it's always slower.

Signed-off-by: Kenneth Graunke 
---
 src/mesa/drivers/dri/i965/brw_surface_formats.c | 17 ++---
 1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_surface_formats.c 
b/src/mesa/drivers/dri/i965/brw_surface_formats.c
index cef4020..196f139 100644
--- a/src/mesa/drivers/dri/i965/brw_surface_formats.c
+++ b/src/mesa/drivers/dri/i965/brw_surface_formats.c
@@ -620,13 +620,16 @@ brw_init_surface_formats(struct brw_context *brw)
ctx->TextureFormatSupported[MESA_FORMAT_Z_FLOAT32] = true;
ctx->TextureFormatSupported[MESA_FORMAT_Z32_FLOAT_S8X24_UINT] = true;
 
-   /* It appears that Z16 is slower than Z24 (on Intel Ivybridge and newer
-* hardware at least), so there's no real reason to prefer it unless you're
-* under memory (not memory bandwidth) pressure.  Our speculation is that
-* this is due to either increased fragment shader execution from
-* GL_LEQUAL/GL_EQUAL depth tests at the reduced precision, or due to
-* increased depth stalls from a cacheline-based heuristic for detecting
-* depth stalls.
+   /* Benchmarking shows that Z16 is slower than Z24, so there's no reason to
+* use it unless you're under memory (not memory bandwidth) pressure.
+*
+* Apparently, the GPU's depth scoreboarding works on a 32-bit granularity,
+* which corresponds to one pixel in the depth buffer for Z24 or Z32 
formats.
+* However, it corresponds to two pixels with Z16, which means both need to
+* hit the early depth case in order for it to happen.
+*
+* Other speculation is that we may be hitting increased fragment shader
+* execution from GL_LEQUAL/GL_EQUAL depth tests at reduced precision.
 *
 * However, desktop GL 3.0+ require that you get exactly 16 bits when
 * asking for DEPTH_COMPONENT16, so we have to respect that.
-- 
1.9.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] i965: Disable Z16 in all APIs.

2014-04-13 Thread Matt Turner
Both are

Reviewed-by: Matt Turner 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev