Re: [Mesa-dev] [PATCH 1/3] winsys/radeon: Use separate caching buffer manager for each set of flags

2014-10-15 Thread Michel Dänzer

On 10.10.2014 18:43, Marek Olšák wrote:

I wonder if it wouldn't be nicer if the cache manager understood that
there are buffers with different flags, so that we don't have to have
so many of them.


Maybe, though it might increase CPU overhead in the cache manager?


Thanks for the review.


--
Earthling Michel Dänzer|  http://www.amd.com
Libre software enthusiast  |Mesa and X developer
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 84566] Unify the format conversion code

2014-10-15 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=84566

--- Comment #19 from Iago Toral  ---
Jason, for conversions where we cannot use a fast path in the master converter
(that is, where we need to unpack the src to rgba and then pack from rgba to
dst) you differentiate between 3 paths: uint (integer), float (is_signed ||
bits > 8) and ubyte (otherwise).

While the last two paths seem to be always valid (I think we have pack and
unpack functions for all relevant formats for these types), I am hitting an
assertion in the first path because there are no uint unpack functions for some
types. Indeed, _mesa_unpack_uint_rgba_row only supports unpacking from a
limited set of uint formats. I am hitting this path in cases where dst is an
integer type but src is not. The code will choose the integer even though the
implementation requires that both src and dst are integer, since we need an
uint unpack function for the src in this case.

A solution for this would be to forget about the uint path and always go with
the float path or the ubyte path, but I imagine that the uint path has benefits
when it can be used... I guess because we could avoid uint->{float,byte}
transformations.

So, I think we have two options:
1) Auto-generate uint unpack/pack functions for all required mesa formats.
2) Only use the integer path if both src and dst are integer types.

I think 1) is only useful when src is not an integer type (so we would be doing
a type conversion anyway), so maybe 2) is the most sensible approach in this
case. What do you think?

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/3] winsys/radeon: Use separate caching buffer manager for each set of flags

2014-10-15 Thread Marek Olšák
With so many cache managers, memory usage might be a bigger problem.

Usually when a cache manager fails to allocate a new buffer, it clears
the cache and tries again. This is not so useful when there are a lot
of them, because the other managers aren't cleared.

A cache manager also has a limit on how large the cache can be to
prevent the cache from taking too much memory. Again, with several
cache managers, this becomes useless.

Finally, buffers are released from the cache during buffer_create and
buffer_destroy, so the managers really need to receive those 2 calls
regularly. If a cache manager is unused for a while, the cache won't
be cleared and it will occupy memory with old and unused buffers that
could otherwise be freed.

Marek

On Wed, Oct 15, 2014 at 9:27 AM, Michel Dänzer  wrote:
> On 10.10.2014 18:43, Marek Olšák wrote:
>>
>> I wonder if it wouldn't be nicer if the cache manager understood that
>> there are buffers with different flags, so that we don't have to have
>> so many of them.
>
>
> Maybe, though it might increase CPU overhead in the cache manager?
>
>
> Thanks for the review.
>
>
> --
> Earthling Michel Dänzer|  http://www.amd.com
> Libre software enthusiast  |Mesa and X developer
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2] sampler validation

2014-10-15 Thread Tapani Pälli
Hi;

Here's a v2 of 'runtime sampler validation' patch. This patch does not 
fix existing failure with a Piglit test
'arb_separate_shader_object-active-sampler-conflict' because I consider
that a separate issue. I took a look and that test fails because
_mesa_validate_program_pipeline throws GL_INVALID_OPERATION which is
not read by Piglit's 'program_pipeline_check_status' or test app before
it switch to 'valid configuration' and renders again.


Tapani Pälli (1):
  mesa: validate sampler uniforms during gluniform calls

 src/mesa/main/context.c | 12 +++
 src/mesa/main/mtypes.h  |  1 +
 src/mesa/main/uniform_query.cpp | 44 +
 src/mesa/main/uniforms.c| 14 +
 4 files changed, 36 insertions(+), 35 deletions(-)

-- 
1.9.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2] mesa: validate sampler uniforms during gluniform calls

2014-10-15 Thread Tapani Pälli
Patch fixes 'glsl-2types-of-textures-on-same-unit' in WebGL conformance
test suite. No Piglit regressions, fixes gl-2.0-active-sampler-conflict.

To avoid adding potentially heavy check during draw (valid_to_render),
check is done during uniform updates by inspecting TexturesUsed mask.

A new boolean variable is introduced to cache validation state.

v2: take into account case where 2 uniforms use same unit (curro)
also do the check only when SSO is not in use, SSO has own
path for sampler validation.

Signed-off-by: Tapani Pälli 
---
 src/mesa/main/context.c | 12 +++
 src/mesa/main/mtypes.h  |  1 +
 src/mesa/main/uniform_query.cpp | 44 +
 src/mesa/main/uniforms.c| 14 +
 4 files changed, 36 insertions(+), 35 deletions(-)

diff --git a/src/mesa/main/context.c b/src/mesa/main/context.c
index 5a8f718..25b9bfc 100644
--- a/src/mesa/main/context.c
+++ b/src/mesa/main/context.c
@@ -133,6 +133,7 @@
 #include "program/prog_print.h"
 #include "math/m_matrix.h"
 #include "main/dispatch.h" /* for _gloffset_COUNT */
+#include "uniforms.h"
 
 #ifdef USE_SPARC_ASM
 #include "sparc/sparc.h"
@@ -1949,6 +1950,17 @@ _mesa_valid_to_render(struct gl_context *ctx, const char 
*where)
   }
}
 
+   /* If a program is active and SSO not in use, check if validation of
+* samplers succeeded for the active program. */
+   if (ctx->_Shader->ActiveProgram && ctx->_Shader != ctx->Pipeline.Current) {
+  char errMsg[100];
+  if (!_mesa_sampler_uniforms_are_valid(ctx->_Shader->ActiveProgram,
+errMsg, 100)) {
+ _mesa_error(ctx, GL_INVALID_OPERATION, "%s", errMsg);
+ return GL_FALSE;
+  }
+   }
+
if (ctx->DrawBuffer->_Status != GL_FRAMEBUFFER_COMPLETE_EXT) {
   _mesa_error(ctx, GL_INVALID_FRAMEBUFFER_OPERATION_EXT,
   "%s(incomplete framebuffer)", where);
diff --git a/src/mesa/main/mtypes.h b/src/mesa/main/mtypes.h
index 5e9453b..27670bd 100644
--- a/src/mesa/main/mtypes.h
+++ b/src/mesa/main/mtypes.h
@@ -2873,6 +2873,7 @@ struct gl_shader_program
GLboolean LinkStatus;   /**< GL_LINK_STATUS */
GLboolean Validated;
GLboolean _Used;/**< Ever used for drawing? */
+   GLboolean SamplersValidated; /**< Samplers validated against texture units? 
*/
GLchar *InfoLog;
 
unsigned Version;   /**< GLSL version used for linking */
diff --git a/src/mesa/main/uniform_query.cpp b/src/mesa/main/uniform_query.cpp
index 1592c9b..c2776c0 100644
--- a/src/mesa/main/uniform_query.cpp
+++ b/src/mesa/main/uniform_query.cpp
@@ -1064,42 +1064,16 @@ extern "C" bool
 _mesa_sampler_uniforms_are_valid(const struct gl_shader_program *shProg,
 char *errMsg, size_t errMsgLength)
 {
-   const glsl_type *unit_types[MAX_COMBINED_TEXTURE_IMAGE_UNITS];
-
-   memset(unit_types, 0, sizeof(unit_types));
-
-   for (unsigned i = 0; i < shProg->NumUserUniformStorage; i++) {
-  const struct gl_uniform_storage *const storage =
-&shProg->UniformStorage[i];
-  const glsl_type *const t = (storage->type->is_array())
-? storage->type->fields.array : storage->type;
-
-  if (!t->is_sampler())
-continue;
-
-  const unsigned count = MAX2(1, storage->type->array_size());
-  for (unsigned j = 0; j < count; j++) {
-const unsigned unit = storage->storage[j].i;
-
-/* The types of the samplers associated with a particular texture
- * unit must be an exact match.  Page 74 (page 89 of the PDF) of the
- * OpenGL 3.3 core spec says:
- *
- * "It is not allowed to have variables of different sampler
- * types pointing to the same texture image unit within a program
- * object."
- */
-if (unit_types[unit] == NULL) {
-   unit_types[unit] = t;
-} else if (unit_types[unit] != t) {
-   _mesa_snprintf(errMsg, errMsgLength,
-  "Texture unit %d is accessed both as %s and %s",
-  unit, unit_types[unit]->name, t->name);
-   return false;
-}
-  }
+   /* Shader does not have samplers. */
+   if (shProg->NumUserUniformStorage == 0)
+  return true;
+
+   if (!shProg->SamplersValidated) {
+  _mesa_snprintf(errMsg, errMsgLength,
+ "active samplers with a different type "
+ "refer to the same texture image unit");
+  return false;
}
-
return true;
 }
 
diff --git a/src/mesa/main/uniforms.c b/src/mesa/main/uniforms.c
index 0d0cbf5..598b4d4 100644
--- a/src/mesa/main/uniforms.c
+++ b/src/mesa/main/uniforms.c
@@ -75,12 +75,26 @@ _mesa_update_shader_textures_used(struct gl_shader_program 
*shProg,
memcpy(prog->SamplerUnits, shader->SamplerUnits, 
sizeof(prog->SamplerUnits));
memset(prog->TexturesUsed, 0, sizeof(prog->TexturesUsed));
 
+   shProg->SamplersValidated = GL_TRUE;
+

[Mesa-dev] [PATCH 2/2] r600g: Implement sm5 UBO/sampler indexing

2014-10-15 Thread Glenn Kennard
Caveat: Shaders using UBO/sampler indexing will
not be optimized by SB, due to SB not currently
supporting the necessary CF_INDEX_[01] index
registers.

Signed-off-by: Glenn Kennard 
---
 docs/GL3.txt   |  4 +--
 src/gallium/drivers/r600/eg_asm.c  | 52 ---
 src/gallium/drivers/r600/r600_asm.c| 58 +-
 src/gallium/drivers/r600/r600_asm.h|  9 +
 src/gallium/drivers/r600/r600_shader.c | 52 +++
 src/gallium/drivers/r600/r600_shader.h |  2 ++
 src/gallium/drivers/r600/sb/sb_bc_dump.cpp |  8 -
 src/gallium/drivers/r600/sb/sb_sched.h |  2 ++
 8 files changed, 166 insertions(+), 21 deletions(-)

diff --git a/docs/GL3.txt b/docs/GL3.txt
index 5ccfdea..dba36e0 100644
--- a/docs/GL3.txt
+++ b/docs/GL3.txt
@@ -98,8 +98,8 @@ GL 4.0, GLSL 4.00:
   GL_ARB_draw_indirect DONE (i965, nvc0, 
radeonsi, llvmpipe, softpipe)
   GL_ARB_gpu_shader5   DONE (i965, nvc0)
   - 'precise' qualifierDONE
-  - Dynamically uniform sampler array indices  DONE ()
-  - Dynamically uniform UBO array indices  DONE ()
+  - Dynamically uniform sampler array indices  DONE (r600)
+  - Dynamically uniform UBO array indices  DONE (r600)
   - Implicit signed -> unsigned conversionsDONE
   - Fused multiply-add DONE ()
   - Packing/bitfield/conversion functions  DONE (r600)
diff --git a/src/gallium/drivers/r600/eg_asm.c 
b/src/gallium/drivers/r600/eg_asm.c
index acb3040..295cb4d 100644
--- a/src/gallium/drivers/r600/eg_asm.c
+++ b/src/gallium/drivers/r600/eg_asm.c
@@ -43,10 +43,10 @@ int eg_bytecode_cf_build(struct r600_bytecode *bc, struct 
r600_bytecode_cf *cf)
/* prepend ALU_EXTENDED if we need more than 2 kcache 
sets */
if (cf->eg_alu_extended) {
bc->bytecode[id++] =
-   
S_SQ_CF_ALU_WORD0_EXT_KCACHE_BANK_INDEX_MODE0(V_SQ_CF_INDEX_NONE) |
-   
S_SQ_CF_ALU_WORD0_EXT_KCACHE_BANK_INDEX_MODE1(V_SQ_CF_INDEX_NONE) |
-   
S_SQ_CF_ALU_WORD0_EXT_KCACHE_BANK_INDEX_MODE2(V_SQ_CF_INDEX_NONE) |
-   
S_SQ_CF_ALU_WORD0_EXT_KCACHE_BANK_INDEX_MODE3(V_SQ_CF_INDEX_NONE) |
+   
S_SQ_CF_ALU_WORD0_EXT_KCACHE_BANK_INDEX_MODE0(cf->kcache[0].index_mode) |
+   
S_SQ_CF_ALU_WORD0_EXT_KCACHE_BANK_INDEX_MODE1(cf->kcache[1].index_mode) |
+   
S_SQ_CF_ALU_WORD0_EXT_KCACHE_BANK_INDEX_MODE2(cf->kcache[2].index_mode) |
+   
S_SQ_CF_ALU_WORD0_EXT_KCACHE_BANK_INDEX_MODE3(cf->kcache[3].index_mode) |

S_SQ_CF_ALU_WORD0_EXT_KCACHE_BANK2(cf->kcache[2].bank) |

S_SQ_CF_ALU_WORD0_EXT_KCACHE_BANK3(cf->kcache[3].bank) |

S_SQ_CF_ALU_WORD0_EXT_KCACHE_MODE2(cf->kcache[2].mode);
@@ -143,3 +143,47 @@ void eg_bytecode_export_read(struct r600_bytecode *bc,
output->comp_mask = G_SQ_CF_ALLOC_EXPORT_WORD1_BUF_COMP_MASK(word1);
 }
 #endif
+
+int egcm_load_index_reg(struct r600_bytecode *bc, unsigned id, bool 
inside_alu_clause)
+{
+   struct r600_bytecode_alu alu;
+   int r;
+   unsigned type;
+
+   assert(id < 2);
+   assert(bc->chip_class >= EVERGREEN);
+
+   if (bc->index_loaded[id])
+   return 0;
+
+   memset(&alu, 0, sizeof(alu));
+   alu.op = ALU_OP1_MOVA_INT;
+   alu.src[0].sel = bc->index_reg[id];
+   alu.src[0].chan = 0;
+   alu.last = 1;
+   r = r600_bytecode_add_alu(bc, &alu);
+   if (r)
+   return r;
+
+   bc->ar_loaded = 0; /* clobbered */
+
+   memset(&alu, 0, sizeof(alu));
+   alu.op = id == 0 ? ALU_OP0_SET_CF_IDX0 : ALU_OP0_SET_CF_IDX1;
+   alu.last = 1;
+   r = r600_bytecode_add_alu(bc, &alu);
+   if (r)
+   return r;
+
+   /* Must split ALU group as index only applies to following group */
+   if (inside_alu_clause) {
+   type = bc->cf_last->op;
+   if ((r = r600_bytecode_add_cf(bc))) {
+   return r;
+   }
+   bc->cf_last->op = type;
+   }
+
+   bc->index_loaded[id] = 1;
+
+   return 0;
+}
diff --git a/src/gallium/drivers/r600/r600_asm.c 
b/src/gallium/drivers/r600/r600_asm.c
index 8aa69b5..ce3c2d1 100644
--- a/src/gallium/drivers/r600/r600_asm.c
+++ b/src/gallium/drivers/r600/r600_asm.c
@@ -819,6 +819,10 @@ static int merge_inst_groups(struct r600_bytecode *bc, 
struct r600_bytecode

[Mesa-dev] [PATCH 1/2] r600g: Implement sm5 interpolation functions

2014-10-15 Thread Glenn Kennard
Requires evergreen/cayman

Signed-off-by: Glenn Kennard 
---
 docs/GL3.txt |   2 +-
 src/gallium/drivers/r600/r600_shader.c   | 237 ++-
 src/gallium/drivers/r600/r600_state_common.c |   3 +
 3 files changed, 238 insertions(+), 4 deletions(-)

diff --git a/docs/GL3.txt b/docs/GL3.txt
index 07d1d2c..5ccfdea 100644
--- a/docs/GL3.txt
+++ b/docs/GL3.txt
@@ -107,7 +107,7 @@ GL 4.0, GLSL 4.00:
   - Geometry shader instancing DONE (r600)
   - Geometry shader multiple streams   DONE ()
   - Enhanced per-sample shadingDONE (r600)
-  - Interpolation functionsDONE ()
+  - Interpolation functionsDONE (r600)
   - New overload resolution rules  DONE
   GL_ARB_gpu_shader_fp64   started (Dave)
   GL_ARB_sample_shadingDONE (i965, nv50, nvc0, 
r600, radeonsi)
diff --git a/src/gallium/drivers/r600/r600_shader.c 
b/src/gallium/drivers/r600/r600_shader.c
index 9e9a557..08125b7 100644
--- a/src/gallium/drivers/r600/r600_shader.c
+++ b/src/gallium/drivers/r600/r600_shader.c
@@ -761,10 +761,33 @@ static int allocate_system_value_inputs(struct 
r600_shader_ctx *ctx, int gpr_off
return 0;
}
 
+   /* need to scan shader for system values and 
interpolateAtSample/Offset/Centroid */
while (!tgsi_parse_end_of_tokens(&parse)) {
tgsi_parse_token(&parse);
 
-   if (parse.FullToken.Token.Type == TGSI_TOKEN_TYPE_DECLARATION) {
+   if (parse.FullToken.Token.Type == TGSI_TOKEN_TYPE_INSTRUCTION) {
+   const struct tgsi_full_instruction *inst = 
&parse.FullToken.FullInstruction;
+   if (inst->Instruction.Opcode == 
TGSI_OPCODE_INTERP_SAMPLE ||
+   inst->Instruction.Opcode == 
TGSI_OPCODE_INTERP_OFFSET ||
+   inst->Instruction.Opcode == 
TGSI_OPCODE_INTERP_CENTROID)
+   {
+   int interpolate, location, k;
+
+   if (inst->Instruction.Opcode == 
TGSI_OPCODE_INTERP_SAMPLE) {
+   location = TGSI_INTERPOLATE_LOC_CENTER;
+   inputs[1].enabled = true; /* needs 
SAMPLEID */
+   } else if (inst->Instruction.Opcode == 
TGSI_OPCODE_INTERP_OFFSET) {
+   location = TGSI_INTERPOLATE_LOC_CENTER;
+   /* Needs sample positions, currently 
those are always available */
+   } else {
+   location = 
TGSI_INTERPOLATE_LOC_CENTROID;
+   }
+
+   interpolate = 
ctx->info.input_interpolate[inst->Src[0].Register.Index];
+   k = eg_get_interpolator_index(interpolate, 
location);
+   ctx->eg_interpolators[k].enabled = true;
+   }
+   } else if (parse.FullToken.Token.Type == 
TGSI_TOKEN_TYPE_DECLARATION) {
struct tgsi_full_declaration *d = 
&parse.FullToken.FullDeclaration;
if (d->Declaration.File == TGSI_FILE_SYSTEM_VALUE) {
for (k = 0; k < Elements(inputs); k++) {
@@ -812,6 +835,7 @@ static int evergreen_gpr_count(struct r600_shader_ctx *ctx)
 {
int i;
int num_baryc;
+   struct tgsi_parse_context parse;
 
memset(&ctx->eg_interpolators, 0, sizeof(ctx->eg_interpolators));
 
@@ -831,6 +855,39 @@ static int evergreen_gpr_count(struct r600_shader_ctx *ctx)
ctx->eg_interpolators[k].enabled = TRUE;
}
 
+   if (tgsi_parse_init(&parse, ctx->tokens) != TGSI_PARSE_OK) {
+   return 0;
+   }
+
+   /* need to scan shader for system values and 
interpolateAtSample/Offset/Centroid */
+   while (!tgsi_parse_end_of_tokens(&parse)) {
+   tgsi_parse_token(&parse);
+
+   if (parse.FullToken.Token.Type == TGSI_TOKEN_TYPE_INSTRUCTION) {
+   const struct tgsi_full_instruction *inst = 
&parse.FullToken.FullInstruction;
+   if (inst->Instruction.Opcode == 
TGSI_OPCODE_INTERP_SAMPLE ||
+   inst->Instruction.Opcode == 
TGSI_OPCODE_INTERP_OFFSET ||
+   inst->Instruction.Opcode == 
TGSI_OPCODE_INTERP_CENTROID)
+   {
+   int interpolate, location, k;
+
+   if (inst->Instruction.Opcode == 
TGSI_OPCODE_INTERP_SAMPLE) {
+   location = TGSI_INTERPOLATE_LOC_CENTER;
+   } else if (inst->Instruction.Opcode == 
TGSI_OPCO

[Mesa-dev] [Bug 84570] Borderlands 2: Constant frame rate drops while playing; really bad with additionl lighting

2014-10-15 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=84570

Kai  changed:

   What|Removed |Added

 Attachment #107574|0   |1
is obsolete||

--- Comment #18 from Kai  ---
Created attachment 107887
  --> https://bugs.freedesktop.org/attachment.cgi?id=107887&action=edit
Screenshot with
GALLIUM_HUD=fps,requested-VRAM+VRAM-usage,requested-GTT+GTT-usage showing FPS
drops (DynamicLights=true)

(In reply to Michel Dänzer from comment #16)
> (In reply to Kai from comment #15)
> > THAT is a LOT better! Even with DynamicLights on I only get occasional FPS
> > drops. Usually directly after entering a new area. Sometimes, when there's a
> > lot to draw, that moves, you can get the drops again as well.
> 
> And those remaining drops don't happen with other drivers?

I can't tell you that. I have only a R9 290 and I'm only using radeonsi.

> Lower FPS when
> more things are happening doesn't seem unexpected, and e.g. when entering a
> new area, the game code itself might slow down due to loading stuff etc.

That's quite possible, though I can see drops (as shown in the attached
screenshot), when I do nothing particularily intersting besides moving around
in an area which I've been in for some time. The really bad drops
(single-digit) indeed seem only to happen upon entering an area, ie. after a
loading screen.
Though I did hear from a friend – using a nVidia card with the proprietary
driver and DynamicLights=false – that she's seeing occasional
drops/micro-freezes as well.

Please note that this is all with the DynamicLights option set to true.

If there's some kind of info I can provide you with, which would make this
clear, let me know. Maybe Ian from Aspyr can help as well (especially: might
any option listed at  help
in debugging this? Are there other options that might produce helpful output?)?

(In reply to Michel Dänzer from comment #17)
> Can you attach a new GALLIUM_HUD screenshot corresponding to a remaining drop?

Done. As you can see those aren't as bad as they were initially and that's with
DynamicLights=true. Still they are noticeable. This graph shows a drop during a
fight (first drop till those smaller drops before the last big drop) and
afterwards I was running around in that same small area I've been fighting in
and was collecting the loot (from those smaller drops till the last big drop).
If you stand still the FPS usually go up to ~40 FPS and stay there with only
minor fluctuations.

Jumping and turning on the spot can trigger a big drop as well.


Overall though, I would call the game playable now. Not sure though, if we
should split this bug, so one can stay open for Intel and the Radeon version
can be closed (maybe with a follow-up along the lines of "Borderlands 2 shows
FPS drops – more optimization needed")?

Any chance those kernel patches will be applied to the 3.17 tree (point
releases)?

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 84570] Borderlands 2: Constant frame rate drops while playing; really bad with additionl lighting

2014-10-15 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=84570

--- Comment #19 from Andreas Hartmetz  ---
I've noticed that disabling CPU frequency scaling makes a big difference in the
severity of micro-hangs (3.17, dynamic lights off). My explanation is that,
when threads are waiting for each other, the scheduler doesn't know that the
apparently "idle-blocked" threads would all run faster overall if the CPU went
faster. Doesn't matter if that's wrong, the difference is there...

At least on AMD (some Intel CPUs use a special frequency scaling driver), you
can effectively disable CPU frequency scaling like so:
echo performance | tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor

Maybe that also helps a bit with dynamic lights on?

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 84570] Borderlands 2: Constant frame rate drops while playing; really bad with additionl lighting

2014-10-15 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=84570

--- Comment #20 from Ian C. Bullard  ---
(In reply to Kai from comment #18)
> That's quite possible, though I can see drops (as shown in the attached
> screenshot), when I do nothing particularily intersting besides moving
> around in an area which I've been in for some time. The really bad drops
> (single-digit) indeed seem only to happen upon entering an area, ie. after a
> loading screen.

The drops can occur even if you've been in an area for a while.  Borderlands is
a heavily streamed game so new textures/shaders can cause a hiccup.  We're
looking at ways to smooth this out.

> Though I did hear from a friend – using a nVidia card with the proprietary
> driver and DynamicLights=false – that she's seeing occasional
> drops/micro-freezes as well.

That fits with what I expected and matches what I said above. 

> If there's some kind of info I can provide you with, which would make this
> clear, let me know. Maybe Ian from Aspyr can help as well (especially: might
> any option listed at 
> help in debugging this? Are there other options that might produce helpful
> output?)?

I don't know of any command that can assist, sorry.  Most of the commands are
disabled in release builds.

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 84570] Borderlands 2: Constant frame rate drops while playing; really bad with additionl lighting

2014-10-15 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=84570

--- Comment #21 from Kai  ---
(In reply to Andreas Hartmetz from comment #19)
> I've noticed that disabling CPU frequency scaling makes a big difference in
> the severity of micro-hangs (3.17, dynamic lights off). My explanation is
> that, when threads are waiting for each other, the scheduler doesn't know
> that the apparently "idle-blocked" threads would all run faster overall if
> the CPU went faster. Doesn't matter if that's wrong, the difference is
> there...
> 
> At least on AMD (some Intel CPUs use a special frequency scaling driver),
> you can effectively disable CPU frequency scaling like so:
> echo performance | tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
> 
> Maybe that also helps a bit with dynamic lights on?

I have that Intel pstate driver for my CPU, but after feeding the performance
setting to scaling_governor the amount of drops goes down, the drops aren't as
severe (usually around 17 FPS and not single-digits; only exception: the first
five to ten seconds after loading screens) and the overall FPS rate is up by
~5-8 FPS. Thanks a lot for that tip!


(In reply to Ian C. Bullard from comment #20)
> (In reply to Kai from comment #18)
> > That's quite possible, though I can see drops (as shown in the attached
> > screenshot), when I do nothing particularily intersting besides moving
> > around in an area which I've been in for some time. The really bad drops
> > (single-digit) indeed seem only to happen upon entering an area, ie. after a
> > loading screen.
> 
> The drops can occur even if you've been in an area for a while.  Borderlands
> is a heavily streamed game so new textures/shaders can cause a hiccup. 
> We're looking at ways to smooth this out.

Thanks for clearing this up. Though I like to state that using the radeonsi
driver with the stack detailed in comment #15 gives you a very playable game
(almost no drops) with "DynamicLights=false" and only the smaller drops I
described in comment #18 with DynamicLights=true. From my POV that sounds like
"supporting" Radeons with the FLOSS driver should be ok or at least not worse
than nVidia cards. Especially when combined with Andreas' suggestion.
I've been playing for a few hours (alone and co-op) with this stack and
DynamicLights=true and it works for me; if you keep DynamicLights off it should
work for most, shouldn't it?

> > Though I did hear from a friend – using a nVidia card with the proprietary
> > driver and DynamicLights=false – that she's seeing occasional
> > drops/micro-freezes as well.
> 
> That fits with what I expected and matches what I said above. 

Oh, I thought your previous statements only related to Intel and AMD GPUs
(especially since nVidia is the only officially supported GPU vendor).

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 84570] Borderlands 2: Constant frame rate drops while playing; really bad with additionl lighting

2014-10-15 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=84570

--- Comment #22 from Ian C. Bullard  ---
(In reply to Kai from comment #21)
> From my POV that sounds
> like "supporting" Radeons with the FLOSS driver should be ok or at least not
> worse than nVidia cards. Especially when combined with Andreas' suggestion.
> I've been playing for a few hours (alone and co-op) with this stack and
> DynamicLights=true and it works for me; if you keep DynamicLights off it
> should work for most, shouldn't it?

I'll ask QA to take a look when they have a chance (not soon) and get new
performance numbers.  If it does well enough we'll add support.

> Oh, I thought your previous statements only related to Intel and AMD GPUs
> (especially since nVidia is the only officially supported GPU vendor).

__GL_THREADED_OPTIMIZATIONS=1 smooths things out for Nvidia but there are
hitches every once in a while.

"Supported" for Aspyr means that we have tested it on that hardware/driver and
feel like the experience is good for players using that hardware/driver. 
Nvidia's proprietary driver did that, everything else didn't.  

If you want to stress test the driver and see the performance under high load,
I suggest using the Deploy Beacon part of the "Bright Lights, Flying City"
mission (http://borderlands.wikia.com/wiki/Bright_Lights,_Flying_City) when the
first round of robots arrive near the entrance to Overlook.

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 84566] Unify the format conversion code

2014-10-15 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=84566

--- Comment #20 from Jason Ekstrand  ---
(In reply to Iago Toral from comment #18)
> (In reply to Jason Ekstrand from comment #17)
> > (In reply to Iago Toral from comment #16)
> > > We also need new mesa_format enums and pack/unpack functions for byte
> > > swapped variants of non-array types.
> > 
> > Why?  Is this for the byte_swapped flag on GL upload/downloads?  If that's
> > the only reason, then I'd rather not bother adding mesa types.  We can
> > easily do a byte swap with _mesa_swizzle_and_convert.  While it's a little
> > bit of a performance hit, people shouldn't be using that attribute anyway.
> 
> I think that won't help in this case. We can only use
> _mesa_swizzle_and_convert for array types, but as I say, this involves
> non-array types like GL_UNSIGNED_INT_10_10_10_2. The master converter will
> not use _mesa_swizzle_and_convert with these, it goes through things like
> _mesa_unpack_rgba_row and _mesa_pack_float_rgba_row for example (which rely
> on auto-generated pack/unpack functions). Do you think we should try expand
> these to consider byte-swapping?

My inclination on byte-swapping it to keep it out of the master conversion
function.  It's really an oddity of the old GL api and not a core format
conversion thing.  No sanely written app should use byte swapping, so I'm not
too worried about making it fast.  My inclination there is to keep going
byteswapping and the other format conversion ops in TexImage, GetTexImag,
ReadPixels, and DrawPixels.  They can call _mesa_swizzle_and_convert to do the
byteswap operation befor calling into the actual format conversion function.

The downside here is that there are a few cases where we can do better. 
However, that's pretty much limited to GL_UNSIGNED_INT_8_8_8_8[_REV] and I
don't think it's worth the effort.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 84566] Unify the format conversion code

2014-10-15 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=84566

--- Comment #21 from Jason Ekstrand  ---
(In reply to Iago Toral from comment #19)
> Jason, for conversions where we cannot use a fast path in the master
> converter (that is, where we need to unpack the src to rgba and then pack
> from rgba to dst) you differentiate between 3 paths: uint (integer), float
> (is_signed || bits > 8) and ubyte (otherwise).
> 
> While the last two paths seem to be always valid (I think we have pack and
> unpack functions for all relevant formats for these types), I am hitting an
> assertion in the first path because there are no uint unpack functions for
> some types. Indeed, _mesa_unpack_uint_rgba_row only supports unpacking from
> a limited set of uint formats. I am hitting this path in cases where dst is
> an integer type but src is not. The code will choose the integer even though
> the implementation requires that both src and dst are integer, since we need
> an uint unpack function for the src in this case.
> 
> A solution for this would be to forget about the uint path and always go
> with the float path or the ubyte path, but I imagine that the uint path has
> benefits when it can be used... I guess because we could avoid
> uint->{float,byte} transformations.
> 
> So, I think we have two options:
> 1) Auto-generate uint unpack/pack functions for all required mesa formats.
> 2) Only use the integer path if both src and dst are integer types.
> 
> I think 1) is only useful when src is not an integer type (so we would be
> doing a type conversion anyway), so maybe 2) is the most sensible approach
> in this case. What do you think?

2 sounds like the better plan.  Can you give an example of when we are hitting
that assertion?  I thought the GL spec forbid you from doing normalized ->
integer conversions so I'm not seeing how this happens.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965: Fix an off-by-1 error in the draw upload code's size calculation.

2014-10-15 Thread Jason Ekstrand
On Tue, Oct 14, 2014 at 4:42 PM, Kenneth Graunke 
wrote:

> According to INTEL_DEBUG=perf, "Borderlands: The Pre-Sequel" was
> stalling on nearly every glBufferSubData call, with very slightly
> overlapping busy ranges.
>
> It turns out the draw upload code was accidentally including an extra
> stride's worth of data in the vertex buffer size due to a simple
> off-by-one error.  We considered this extra bit of buffer space to be
> busy (in use by the GPU), when it was actually idle.
>

Nice Catch!


>
> The new diagram should make it easier to understand the formula.  It's
> basically what I drew on paper when working through an actual
> glDrawRangeElements call.
>
> Eliminates all glBufferSubData stalls in "Borderlands: The Pre-Sequel."
>
> Signed-off-by: Kenneth Graunke 
> Cc: mesa-sta...@lists.freedesktop.org
> ---
>  src/mesa/drivers/dri/i965/brw_draw_upload.c | 22 +-
>  1 file changed, 21 insertions(+), 1 deletion(-)
>
> No Piglit regressions on Haswell.  This might help Dota 2 and Serious Sam 3
> as well, but I haven't checked.
>
> diff --git a/src/mesa/drivers/dri/i965/brw_draw_upload.c
> b/src/mesa/drivers/dri/i965/brw_draw_upload.c
> index 5a12439..6cb653c 100644
> --- a/src/mesa/drivers/dri/i965/brw_draw_upload.c
> +++ b/src/mesa/drivers/dri/i965/brw_draw_upload.c
> @@ -486,8 +486,28 @@ brw_prepare_vertices(struct brw_context *brw)
>offset = 0;
>size = intel_buffer->Base.Size;
> } else {
> +  /* Compute the size/amount of data referenced by the
> GPU.
> +   * If the data is interleaved, StrideB may be larger
> than
> +   * _ElementSize.  As an example, assume we have 2
> interleaved
> +   * attributes A and B.  The data is organized like this:
> +   *
> +   *   StrideEltSize
> +   *_,,_,
> +   *   /\  / \
> +   *A: ---   ---   ---   ---   ---   ---
> +   *B:---   ---   ---   ---   ---   ---
> +   *
> +   *   |= 4 elts ==|  (4-1) * Stride + EltSize
> +   *
> +   * max_index - min_index gives the number of elements
> that
> +   * will be referenced.  Say we're drawing 4 elements.
> On
> +   * the first three, we need the full stride in order to
> get
> +   * to the next element.  But on the last, we only want
> the
> +   * element size, since we don't actually read the other
> +   * interleaved vertex attributes stored beyond that.
> +   */
>offset = buffer->offset + min_index * buffer->stride;
> -  size = (buffer->stride * (max_index - min_index) +
> +  size = (buffer->stride * MAX2(max_index - min_index -
> 1, 0) +
>glarray->_ElementSize);
>

I'm not so sure about this new formula.  Looking at the docs on
glDrawRangeElements, the index range is inclusive on both ends meaning that
the number of elements drawn is max_index - min_index + 1.  If that's the
case, then the old formula was correct.  Are we using a half-open interval
in mesa?


> }
>  }
> --
> 2.1.2
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965: Fix an off-by-1 error in the draw upload code's size calculation.

2014-10-15 Thread Kenneth Graunke
On Wednesday, October 15, 2014 09:50:42 AM Jason Ekstrand wrote:
> On Tue, Oct 14, 2014 at 4:42 PM, Kenneth Graunke 
> wrote:
> 
> > According to INTEL_DEBUG=perf, "Borderlands: The Pre-Sequel" was
> > stalling on nearly every glBufferSubData call, with very slightly
> > overlapping busy ranges.
> >
> > It turns out the draw upload code was accidentally including an extra
> > stride's worth of data in the vertex buffer size due to a simple
> > off-by-one error.  We considered this extra bit of buffer space to be
> > busy (in use by the GPU), when it was actually idle.
> >
> 
> Nice Catch!
> 
> 
> >
> > The new diagram should make it easier to understand the formula.  It's
> > basically what I drew on paper when working through an actual
> > glDrawRangeElements call.
> >
> > Eliminates all glBufferSubData stalls in "Borderlands: The Pre-Sequel."
> >
> > Signed-off-by: Kenneth Graunke 
> > Cc: mesa-sta...@lists.freedesktop.org
> > ---
> >  src/mesa/drivers/dri/i965/brw_draw_upload.c | 22 +-
> >  1 file changed, 21 insertions(+), 1 deletion(-)
> >
> > No Piglit regressions on Haswell.  This might help Dota 2 and Serious Sam 3
> > as well, but I haven't checked.
> >
> > diff --git a/src/mesa/drivers/dri/i965/brw_draw_upload.c
> > b/src/mesa/drivers/dri/i965/brw_draw_upload.c
> > index 5a12439..6cb653c 100644
> > --- a/src/mesa/drivers/dri/i965/brw_draw_upload.c
> > +++ b/src/mesa/drivers/dri/i965/brw_draw_upload.c
> > @@ -486,8 +486,28 @@ brw_prepare_vertices(struct brw_context *brw)
> >offset = 0;
> >size = intel_buffer->Base.Size;
> > } else {
> > +  /* Compute the size/amount of data referenced by the
> > GPU.
> > +   * If the data is interleaved, StrideB may be larger
> > than
> > +   * _ElementSize.  As an example, assume we have 2
> > interleaved
> > +   * attributes A and B.  The data is organized like this:
> > +   *
> > +   *   StrideEltSize
> > +   *_,,_,
> > +   *   /\  / \
> > +   *A: ---   ---   ---   ---   ---   ---
> > +   *B:---   ---   ---   ---   ---   ---
> > +   *
> > +   *   |= 4 elts ==|  (4-1) * Stride + EltSize
> > +   *
> > +   * max_index - min_index gives the number of elements
> > that
> > +   * will be referenced.  Say we're drawing 4 elements.
> > On
> > +   * the first three, we need the full stride in order to
> > get
> > +   * to the next element.  But on the last, we only want
> > the
> > +   * element size, since we don't actually read the other
> > +   * interleaved vertex attributes stored beyond that.
> > +   */
> >offset = buffer->offset + min_index * buffer->stride;
> > -  size = (buffer->stride * (max_index - min_index) +
> > +  size = (buffer->stride * MAX2(max_index - min_index -
> > 1, 0) +
> >glarray->_ElementSize);
> >
> 
> I'm not so sure about this new formula.  Looking at the docs on
> glDrawRangeElements, the index range is inclusive on both ends meaning that
> the number of elements drawn is max_index - min_index + 1.  If that's the
> case, then the old formula was correct.  Are we using a half-open interval
> in mesa?

Yes, you're both right...I got tripped up by seeing (max_index - min_index) 
instead of (elements = max_index - min_index + 1) and then (elements - 1).

Looking over the code, everything sure looks correct.

Which leaves the question of why both Borderlands: TPS and Serious Sam 3 have 
all kinds of stalling with the current code, and work fine with no stalls with 
the patch.

I wonder if the games are incorrectly translating DirectX's 
DrawIndexedPrimitive call, which takes MinIndex and NumVertices, where 
NumVertices is apparently supposed to be (Max-Min+1).  I suppose I could ask...

signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] ilo: Advertise DMA-BUF

2014-10-15 Thread Nick Sarnie
diff --git a/src/gallium/targets/pipe-loader/pipe_i965.c 
b/src/gallium/targets/pipe-loader/pipe_i965.c
index f4d447c..810dffc 100644
--- a/src/gallium/targets/pipe-loader/pipe_i965.c
+++ b/src/gallium/targets/pipe-loader/pipe_i965.c
@@ -21,6 +21,27 @@ create_screen(int fd)
 
return screen;
 }
+static const struct drm_conf_ret throttle_ret = {
+   .type = DRM_CONF_INT,
+   .val.val_int = 2,
+};
 
+static const struct drm_conf_ret share_fd_ret = {
+   .type = DRM_CONF_BOOL,
+   .val.val_int = true,
+};
+
+static const struct drm_conf_ret *drm_configuration(enum drm_conf conf)
+{
+   switch (conf) {
+   case DRM_CONF_THROTTLE:
+  return &throttle_ret;
+   case DRM_CONF_SHARE_FD:
+  return &share_fd_ret;
+   default:
+  break;
+   }
+   return NULL;
+}
 PUBLIC
-DRM_DRIVER_DESCRIPTOR("i965", "i915", create_screen, NULL)
+DRM_DRIVER_DESCRIPTOR("i965", "i915", create_screen, drm_configuration)
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] ilo: Advertise DMA-BUF

2014-10-15 Thread Commend Sarnex
Whoops, forgot to add the description. This is another trivial patch for
ilo, all the frameworks were already implemented. In hindsight, I should
have included this with my previous patch.

On Wed, Oct 15, 2014 at 4:08 PM, Nick Sarnie 
wrote:

> diff --git a/src/gallium/targets/pipe-loader/pipe_i965.c
> b/src/gallium/targets/pipe-loader/pipe_i965.c
> index f4d447c..810dffc 100644
> --- a/src/gallium/targets/pipe-loader/pipe_i965.c
> +++ b/src/gallium/targets/pipe-loader/pipe_i965.c
> @@ -21,6 +21,27 @@ create_screen(int fd)
>
> return screen;
>  }
> +static const struct drm_conf_ret throttle_ret = {
> +   .type = DRM_CONF_INT,
> +   .val.val_int = 2,
> +};
>
> +static const struct drm_conf_ret share_fd_ret = {
> +   .type = DRM_CONF_BOOL,
> +   .val.val_int = true,
> +};
> +
> +static const struct drm_conf_ret *drm_configuration(enum drm_conf conf)
> +{
> +   switch (conf) {
> +   case DRM_CONF_THROTTLE:
> +  return &throttle_ret;
> +   case DRM_CONF_SHARE_FD:
> +  return &share_fd_ret;
> +   default:
> +  break;
> +   }
> +   return NULL;
> +}
>  PUBLIC
> -DRM_DRIVER_DESCRIPTOR("i965", "i915", create_screen, NULL)
> +DRM_DRIVER_DESCRIPTOR("i965", "i915", create_screen, drm_configuration)
> --
> 1.9.1
>
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] mesa: Drop the "target" parameter from NewBufferObject().

2014-10-15 Thread Kenneth Graunke
NewBufferObject took a "target" parameter, which it blindly passed to
_mesa_initialize_buffer_object(), which ignored it.

Not much point in passing it around.

Signed-off-by: Kenneth Graunke 
---
 src/mesa/drivers/dri/i915/intel_buffer_objects.c|  4 ++--
 src/mesa/drivers/dri/i965/intel_buffer_objects.c|  4 ++--
 src/mesa/drivers/dri/nouveau/nouveau_bufferobj.c|  4 ++--
 src/mesa/drivers/dri/radeon/radeon_buffer_objects.c |  5 ++---
 src/mesa/main/bufferobj.c   | 14 ++
 src/mesa/main/bufferobj.h   |  2 +-
 src/mesa/main/dd.h  |  2 +-
 src/mesa/main/shared.c  |  2 +-
 src/mesa/state_tracker/st_cb_bufferobjects.c|  4 ++--
 src/mesa/vbo/vbo_exec_api.c |  2 +-
 src/mesa/vbo/vbo_save_api.c |  4 +---
 11 files changed, 21 insertions(+), 26 deletions(-)

diff --git a/src/mesa/drivers/dri/i915/intel_buffer_objects.c 
b/src/mesa/drivers/dri/i915/intel_buffer_objects.c
index 3da901b..ef06743 100644
--- a/src/mesa/drivers/dri/i915/intel_buffer_objects.c
+++ b/src/mesa/drivers/dri/i915/intel_buffer_objects.c
@@ -68,11 +68,11 @@ release_buffer(struct intel_buffer_object *intel_obj)
  * internal structure where somehow shared.
  */
 static struct gl_buffer_object *
-intel_bufferobj_alloc(struct gl_context * ctx, GLuint name, GLenum target)
+intel_bufferobj_alloc(struct gl_context * ctx, GLuint name)
 {
struct intel_buffer_object *obj = CALLOC_STRUCT(intel_buffer_object);
 
-   _mesa_initialize_buffer_object(ctx, &obj->Base, name, target);
+   _mesa_initialize_buffer_object(ctx, &obj->Base, name);
 
obj->buffer = NULL;
 
diff --git a/src/mesa/drivers/dri/i965/intel_buffer_objects.c 
b/src/mesa/drivers/dri/i965/intel_buffer_objects.c
index 82e0744..dd797e0 100644
--- a/src/mesa/drivers/dri/i965/intel_buffer_objects.c
+++ b/src/mesa/drivers/dri/i965/intel_buffer_objects.c
@@ -133,14 +133,14 @@ release_buffer(struct intel_buffer_object *intel_obj)
  * internal structure where somehow shared.
  */
 static struct gl_buffer_object *
-intel_bufferobj_alloc(struct gl_context * ctx, GLuint name, GLenum target)
+intel_bufferobj_alloc(struct gl_context * ctx, GLuint name)
 {
struct intel_buffer_object *obj = CALLOC_STRUCT(intel_buffer_object);
if (!obj) {
   _mesa_error_no_memory(__func__);
}
 
-   _mesa_initialize_buffer_object(ctx, &obj->Base, name, target);
+   _mesa_initialize_buffer_object(ctx, &obj->Base, name);
 
obj->buffer = NULL;
 
diff --git a/src/mesa/drivers/dri/nouveau/nouveau_bufferobj.c 
b/src/mesa/drivers/dri/nouveau/nouveau_bufferobj.c
index 2bb633e..afccf35 100644
--- a/src/mesa/drivers/dri/nouveau/nouveau_bufferobj.c
+++ b/src/mesa/drivers/dri/nouveau/nouveau_bufferobj.c
@@ -48,7 +48,7 @@ get_bufferobj_map(struct gl_context *ctx, struct 
gl_buffer_object *obj,
 }
 
 static struct gl_buffer_object *
-nouveau_bufferobj_new(struct gl_context *ctx, GLuint buffer, GLenum target)
+nouveau_bufferobj_new(struct gl_context *ctx, GLuint buffer)
 {
struct nouveau_bufferobj *nbo;
 
@@ -56,7 +56,7 @@ nouveau_bufferobj_new(struct gl_context *ctx, GLuint buffer, 
GLenum target)
if (!nbo)
return NULL;
 
-   _mesa_initialize_buffer_object(ctx, &nbo->base, buffer, target);
+   _mesa_initialize_buffer_object(ctx, &nbo->base, buffer);
 
return &nbo->base;
 }
diff --git a/src/mesa/drivers/dri/radeon/radeon_buffer_objects.c 
b/src/mesa/drivers/dri/radeon/radeon_buffer_objects.c
index 6294dbe..d9d4f5f 100644
--- a/src/mesa/drivers/dri/radeon/radeon_buffer_objects.c
+++ b/src/mesa/drivers/dri/radeon/radeon_buffer_objects.c
@@ -40,12 +40,11 @@ get_radeon_buffer_object(struct gl_buffer_object *obj)
 
 static struct gl_buffer_object *
 radeonNewBufferObject(struct gl_context * ctx,
-  GLuint name,
-  GLenum target)
+  GLuint name)
 {
 struct radeon_buffer_object *obj = CALLOC_STRUCT(radeon_buffer_object);
 
-_mesa_initialize_buffer_object(ctx, &obj->Base, name, target);
+_mesa_initialize_buffer_object(ctx, &obj->Base, name);
 
 obj->bo = NULL;
 
diff --git a/src/mesa/main/bufferobj.c b/src/mesa/main/bufferobj.c
index b6879ce..2f15a41 100644
--- a/src/mesa/main/bufferobj.c
+++ b/src/mesa/main/bufferobj.c
@@ -389,14 +389,14 @@ convert_clear_buffer_data(struct gl_context *ctx,
  * Default callback for the \c dd_function_table::NewBufferObject() hook.
  */
 static struct gl_buffer_object *
-_mesa_new_buffer_object( struct gl_context *ctx, GLuint name, GLenum target )
+_mesa_new_buffer_object(struct gl_context *ctx, GLuint name)
 {
struct gl_buffer_object *obj;
 
(void) ctx;
 
obj = MALLOC_STRUCT(gl_buffer_object);
-   _mesa_initialize_buffer_object(ctx, obj, name, target);
+   _mesa_initialize_buffer_object(ctx, obj, name);
return obj;
 }
 
@@ -494,12 +494,10 @@ _mesa_reference_buffer_object_

Re: [Mesa-dev] [PATCH] mesa: Drop the "target" parameter from NewBufferObject().

2014-10-15 Thread Brian Paul

On 10/15/2014 03:06 PM, Kenneth Graunke wrote:

NewBufferObject took a "target" parameter, which it blindly passed to
_mesa_initialize_buffer_object(), which ignored it.

Not much point in passing it around.

Signed-off-by: Kenneth Graunke 
---
  src/mesa/drivers/dri/i915/intel_buffer_objects.c|  4 ++--
  src/mesa/drivers/dri/i965/intel_buffer_objects.c|  4 ++--
  src/mesa/drivers/dri/nouveau/nouveau_bufferobj.c|  4 ++--
  src/mesa/drivers/dri/radeon/radeon_buffer_objects.c |  5 ++---
  src/mesa/main/bufferobj.c   | 14 ++
  src/mesa/main/bufferobj.h   |  2 +-
  src/mesa/main/dd.h  |  2 +-
  src/mesa/main/shared.c  |  2 +-
  src/mesa/state_tracker/st_cb_bufferobjects.c|  4 ++--
  src/mesa/vbo/vbo_exec_api.c |  2 +-
  src/mesa/vbo/vbo_save_api.c |  4 +---
  11 files changed, 21 insertions(+), 26 deletions(-)



IIRC, the original intention was the target could be used as a hint to 
help with memory allocation/type, but as you see, it was never used.


Reviewed-by: Brian Paul 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] r300/vdpau: enable again

2014-10-15 Thread David Heidelberger

Tested-by: David Heidelberger 
---
 src/gallium/targets/vdpau/Makefile.am | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/gallium/targets/vdpau/Makefile.am 
b/src/gallium/targets/vdpau/Makefile.am

index 2d6e2e7..c137d04 100644
--- a/src/gallium/targets/vdpau/Makefile.am
+++ b/src/gallium/targets/vdpau/Makefile.am
@@ -42,6 +42,7 @@ TARGET_LIB_DEPS = $(top_builddir)/src/loader/libloader.la

 include $(top_srcdir)/src/gallium/drivers/nouveau/Automake.inc

+include $(top_srcdir)/src/gallium/drivers/r300/Automake.inc
 include $(top_srcdir)/src/gallium/drivers/r600/Automake.inc
 include $(top_srcdir)/src/gallium/drivers/radeonsi/Automake.inc

--
2.1.2

>From 00cb95fa770fde065f47364af4e791fa87f46db3 Mon Sep 17 00:00:00 2001
From: David Heidelberger 
Date: Wed, 15 Oct 2014 23:47:22 +0200
Subject: [PATCH] r300/vdpau: enable again

Signed-off-by: David Heidelberger 
---
 src/gallium/targets/vdpau/Makefile.am | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/gallium/targets/vdpau/Makefile.am b/src/gallium/targets/vdpau/Makefile.am
index 2d6e2e7..c137d04 100644
--- a/src/gallium/targets/vdpau/Makefile.am
+++ b/src/gallium/targets/vdpau/Makefile.am
@@ -42,6 +42,7 @@ TARGET_LIB_DEPS = $(top_builddir)/src/loader/libloader.la
 
 include $(top_srcdir)/src/gallium/drivers/nouveau/Automake.inc
 
+include $(top_srcdir)/src/gallium/drivers/r300/Automake.inc
 include $(top_srcdir)/src/gallium/drivers/r600/Automake.inc
 include $(top_srcdir)/src/gallium/drivers/radeonsi/Automake.inc
 
-- 
2.1.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] glsl: Lower constant arrays to uniform arrays.

2014-10-15 Thread Kenneth Graunke
Consider GLSL code such as:

   const ivec2 offsets[] =
  ivec2[](ivec2(-1, -1), ivec2(-1, 0), ivec2(-1, 1),
  ivec2(0, -1),  ivec2(0, 0),  ivec2(0, 1),
  ivec2(1, -1),  ivec2(1, 0),  ivec2(1, 1));

   ivec2 offset = offsets[];

Both i965 and nv50 currently handle this very poorly.  On i965, this
becomes a pile of MOVs to load the immediate constants into registers,
a pile of scratch writes to move the whole array to memory, and one
scratch read to actually access the value - effectively the same as if
it were a non-constant array.

We'd much rather upload large blocks of constant data as uniform data,
so drivers can simply upload the data via constbufs, and not have to
populate it via shader instructions.

This is currently non-optional because both i965 and nouveau benefit
from it - we can revisit that if another driver actually benefits.

Improves performance in a terrain rendering microbenchmark by about 2x,
and cuts the number of instructions in about half.  Helps a lot of
"Natural Selection 2" shaders, as well as one "HOARD" shader.

total instructions in shared programs: 5473459 -> 5471765 (-0.03%)
instructions in affected programs: 5880 -> 4186 (-28.81%)

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=77957
Signed-off-by: Kenneth Graunke 

Reviewers: suggestions for better max_array_access handling are welcome.
Reviewers: This changes the const-ness of the expression.  Is that a problem?
---
 src/glsl/Makefile.sources   |   1 +
 src/glsl/ir_optimization.h  |   1 +
 src/glsl/linker.cpp |   2 +
 src/glsl/lower_const_arrays_to_uniforms.cpp | 101 
 4 files changed, 105 insertions(+)
 create mode 100644 src/glsl/lower_const_arrays_to_uniforms.cpp

I've had this patch sitting around since April, and been pondering whether
we should improve it somehow.  But...it helps certain shaders a ton, and I
haven't seen anything hurt by it.  So I'm wondering if we should just land
it; we can always improve things later.

diff --git a/src/glsl/Makefile.sources b/src/glsl/Makefile.sources
index 0c55327..6aed52d 100644
--- a/src/glsl/Makefile.sources
+++ b/src/glsl/Makefile.sources
@@ -58,6 +58,7 @@ LIBGLSL_FILES = \
$(GLSL_SRCDIR)/loop_analysis.cpp \
$(GLSL_SRCDIR)/loop_controls.cpp \
$(GLSL_SRCDIR)/loop_unroll.cpp \
+   $(GLSL_SRCDIR)/lower_const_arrays_to_uniforms.cpp \
$(GLSL_SRCDIR)/lower_clip_distance.cpp \
$(GLSL_SRCDIR)/lower_discard.cpp \
$(GLSL_SRCDIR)/lower_discard_flow.cpp \
diff --git a/src/glsl/ir_optimization.h b/src/glsl/ir_optimization.h
index e25857a..34e0b4b 100644
--- a/src/glsl/ir_optimization.h
+++ b/src/glsl/ir_optimization.h
@@ -114,6 +114,7 @@ bool lower_noise(exec_list *instructions);
 bool lower_variable_index_to_cond_assign(exec_list *instructions,
 bool lower_input, bool lower_output, bool lower_temp, bool lower_uniform);
 bool lower_quadop_vector(exec_list *instructions, bool dont_lower_swz);
+bool lower_const_arrays_to_uniforms(exec_list *instructions);
 bool lower_clip_distance(gl_shader *shader);
 void lower_output_reads(exec_list *instructions);
 bool lower_packing_builtins(exec_list *instructions, int op_mask);
diff --git a/src/glsl/linker.cpp b/src/glsl/linker.cpp
index 47a722d..2a69b78 100644
--- a/src/glsl/linker.cpp
+++ b/src/glsl/linker.cpp
@@ -2692,6 +2692,8 @@ link_shaders(struct gl_context *ctx, struct 
gl_shader_program *prog)
 &ctx->Const.ShaderCompilerOptions[i],
 ctx->Const.NativeIntegers))
 ;
+
+  lower_const_arrays_to_uniforms(prog->_LinkedShaders[i]->ir);
}
 
/* Check and validate stream emissions in geometry shaders */
diff --git a/src/glsl/lower_const_arrays_to_uniforms.cpp 
b/src/glsl/lower_const_arrays_to_uniforms.cpp
new file mode 100644
index 000..2085086
--- /dev/null
+++ b/src/glsl/lower_const_arrays_to_uniforms.cpp
@@ -0,0 +1,101 @@
+/*
+ * Copyright © 2014 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM,

Re: [Mesa-dev] [PATCH] glsl: Lower constant arrays to uniform arrays.

2014-10-15 Thread Ian Romanick
I like the idea (when you suggested it in April), but I think this
implementation has some issues.  I believe making the constant array
into a uniform will cause it to be shown by the GL API (e.g., returned
by glGetActiveUniform) and counted against uniform limits.  Since I
doubt we have any piglit tests that explicitly check this, it's hard to
say with 100% certainty.

We already have some mechanism that prevents built-in uniforms from
being advertised by glGetActiveUniform, so that part shouldn't be too
hard to fix.  I'm not sure whether this should be counted against
uniform limits or not.

Also... if a shader has a mix of constant and non-constant indexing of
the array, will the constant indexed accesses still get constant propagated?

On 10/15/2014 05:32 PM, Kenneth Graunke wrote:
> Consider GLSL code such as:
> 
>const ivec2 offsets[] =
>   ivec2[](ivec2(-1, -1), ivec2(-1, 0), ivec2(-1, 1),
>   ivec2(0, -1),  ivec2(0, 0),  ivec2(0, 1),
>   ivec2(1, -1),  ivec2(1, 0),  ivec2(1, 1));
> 
>ivec2 offset = offsets[];
> 
> Both i965 and nv50 currently handle this very poorly.  On i965, this
> becomes a pile of MOVs to load the immediate constants into registers,
> a pile of scratch writes to move the whole array to memory, and one
> scratch read to actually access the value - effectively the same as if
> it were a non-constant array.
> 
> We'd much rather upload large blocks of constant data as uniform data,
> so drivers can simply upload the data via constbufs, and not have to
> populate it via shader instructions.
> 
> This is currently non-optional because both i965 and nouveau benefit
> from it - we can revisit that if another driver actually benefits.
> 
> Improves performance in a terrain rendering microbenchmark by about 2x,
> and cuts the number of instructions in about half.  Helps a lot of
> "Natural Selection 2" shaders, as well as one "HOARD" shader.
> 
> total instructions in shared programs: 5473459 -> 5471765 (-0.03%)
> instructions in affected programs: 5880 -> 4186 (-28.81%)
> 
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=77957
> Signed-off-by: Kenneth Graunke 
> 
> Reviewers: suggestions for better max_array_access handling are welcome.
> Reviewers: This changes the const-ness of the expression.  Is that a problem?

Maybe... but I think this happens late enough that it doesn't matter.
We'd need some tests like:

const int a[] = int[](-1, -1, 2, -1);
uniform vec4 b[a[2]];  // size must be constant expr.
const int c = a[3];// initializer must be constant expr.

and other cases where only constant expressions can be used.

> ---
>  src/glsl/Makefile.sources   |   1 +
>  src/glsl/ir_optimization.h  |   1 +
>  src/glsl/linker.cpp |   2 +
>  src/glsl/lower_const_arrays_to_uniforms.cpp | 101 
> 
>  4 files changed, 105 insertions(+)
>  create mode 100644 src/glsl/lower_const_arrays_to_uniforms.cpp
> 
> I've had this patch sitting around since April, and been pondering whether
> we should improve it somehow.  But...it helps certain shaders a ton, and I
> haven't seen anything hurt by it.  So I'm wondering if we should just land
> it; we can always improve things later.
> 
> diff --git a/src/glsl/Makefile.sources b/src/glsl/Makefile.sources
> index 0c55327..6aed52d 100644
> --- a/src/glsl/Makefile.sources
> +++ b/src/glsl/Makefile.sources
> @@ -58,6 +58,7 @@ LIBGLSL_FILES = \
>   $(GLSL_SRCDIR)/loop_analysis.cpp \
>   $(GLSL_SRCDIR)/loop_controls.cpp \
>   $(GLSL_SRCDIR)/loop_unroll.cpp \
> + $(GLSL_SRCDIR)/lower_const_arrays_to_uniforms.cpp \
>   $(GLSL_SRCDIR)/lower_clip_distance.cpp \
>   $(GLSL_SRCDIR)/lower_discard.cpp \
>   $(GLSL_SRCDIR)/lower_discard_flow.cpp \
> diff --git a/src/glsl/ir_optimization.h b/src/glsl/ir_optimization.h
> index e25857a..34e0b4b 100644
> --- a/src/glsl/ir_optimization.h
> +++ b/src/glsl/ir_optimization.h
> @@ -114,6 +114,7 @@ bool lower_noise(exec_list *instructions);
>  bool lower_variable_index_to_cond_assign(exec_list *instructions,
>  bool lower_input, bool lower_output, bool lower_temp, bool 
> lower_uniform);
>  bool lower_quadop_vector(exec_list *instructions, bool dont_lower_swz);
> +bool lower_const_arrays_to_uniforms(exec_list *instructions);
>  bool lower_clip_distance(gl_shader *shader);
>  void lower_output_reads(exec_list *instructions);
>  bool lower_packing_builtins(exec_list *instructions, int op_mask);
> diff --git a/src/glsl/linker.cpp b/src/glsl/linker.cpp
> index 47a722d..2a69b78 100644
> --- a/src/glsl/linker.cpp
> +++ b/src/glsl/linker.cpp
> @@ -2692,6 +2692,8 @@ link_shaders(struct gl_context *ctx, struct 
> gl_shader_program *prog)
>  &ctx->Const.ShaderCompilerOptions[i],
>  ctx->Const.NativeIntegers))
>;
> +
> +  lower_const_arrays_to_uniforms(prog->_LinkedShaders[i]->ir);
> }
>  
> 

Re: [Mesa-dev] [PATCH] mesa: Drop the "target" parameter from NewBufferObject().

2014-10-15 Thread Ian Romanick
On 10/15/2014 01:15 PM, Brian Paul wrote:
> On 10/15/2014 03:06 PM, Kenneth Graunke wrote:
>> NewBufferObject took a "target" parameter, which it blindly passed to
>> _mesa_initialize_buffer_object(), which ignored it.
>>
>> Not much point in passing it around.
>>
>> Signed-off-by: Kenneth Graunke 
>> ---
>>   src/mesa/drivers/dri/i915/intel_buffer_objects.c|  4 ++--
>>   src/mesa/drivers/dri/i965/intel_buffer_objects.c|  4 ++--
>>   src/mesa/drivers/dri/nouveau/nouveau_bufferobj.c|  4 ++--
>>   src/mesa/drivers/dri/radeon/radeon_buffer_objects.c |  5 ++---
>>   src/mesa/main/bufferobj.c   | 14 ++
>>   src/mesa/main/bufferobj.h   |  2 +-
>>   src/mesa/main/dd.h  |  2 +-
>>   src/mesa/main/shared.c  |  2 +-
>>   src/mesa/state_tracker/st_cb_bufferobjects.c|  4 ++--
>>   src/mesa/vbo/vbo_exec_api.c |  2 +-
>>   src/mesa/vbo/vbo_save_api.c |  4 +---
>>   11 files changed, 21 insertions(+), 26 deletions(-)
>>
> 
> IIRC, the original intention was the target could be used as a hint to
> help with memory allocation/type, but as you see, it was never used.

Right... I could have sworn that i915 used this to determine whether to
malloc memory (VBOs) or get GPU memory (PBOs).  I guess it must do that
some other way... that must be at glBufferData time.

Also

Reviewed-by: Ian Romanick 

> Reviewed-by: Brian Paul 
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965: Fix an off-by-1 error in the draw upload code's size calculation.

2014-10-15 Thread Ian Romanick
On 10/15/2014 10:04 AM, Kenneth Graunke wrote:
> On Wednesday, October 15, 2014 09:50:42 AM Jason Ekstrand wrote:
>> On Tue, Oct 14, 2014 at 4:42 PM, Kenneth Graunke 
>> wrote:
>>
>>> According to INTEL_DEBUG=perf, "Borderlands: The Pre-Sequel" was
>>> stalling on nearly every glBufferSubData call, with very slightly
>>> overlapping busy ranges.
>>>
>>> It turns out the draw upload code was accidentally including an extra
>>> stride's worth of data in the vertex buffer size due to a simple
>>> off-by-one error.  We considered this extra bit of buffer space to be
>>> busy (in use by the GPU), when it was actually idle.
>>>
>>
>> Nice Catch!
>>
>>
>>>
>>> The new diagram should make it easier to understand the formula.  It's
>>> basically what I drew on paper when working through an actual
>>> glDrawRangeElements call.
>>>
>>> Eliminates all glBufferSubData stalls in "Borderlands: The Pre-Sequel."
>>>
>>> Signed-off-by: Kenneth Graunke 
>>> Cc: mesa-sta...@lists.freedesktop.org
>>> ---
>>>  src/mesa/drivers/dri/i965/brw_draw_upload.c | 22 +-
>>>  1 file changed, 21 insertions(+), 1 deletion(-)
>>>
>>> No Piglit regressions on Haswell.  This might help Dota 2 and Serious Sam 3
>>> as well, but I haven't checked.
>>>
>>> diff --git a/src/mesa/drivers/dri/i965/brw_draw_upload.c
>>> b/src/mesa/drivers/dri/i965/brw_draw_upload.c
>>> index 5a12439..6cb653c 100644
>>> --- a/src/mesa/drivers/dri/i965/brw_draw_upload.c
>>> +++ b/src/mesa/drivers/dri/i965/brw_draw_upload.c
>>> @@ -486,8 +486,28 @@ brw_prepare_vertices(struct brw_context *brw)
>>>offset = 0;
>>>size = intel_buffer->Base.Size;
>>> } else {
>>> +  /* Compute the size/amount of data referenced by the
>>> GPU.
>>> +   * If the data is interleaved, StrideB may be larger
>>> than
>>> +   * _ElementSize.  As an example, assume we have 2
>>> interleaved
>>> +   * attributes A and B.  The data is organized like this:
>>> +   *
>>> +   *   StrideEltSize
>>> +   *_,,_,
>>> +   *   /\  / \
>>> +   *A: ---   ---   ---   ---   ---   ---
>>> +   *B:---   ---   ---   ---   ---   ---
>>> +   *
>>> +   *   |= 4 elts ==|  (4-1) * Stride + EltSize
>>> +   *
>>> +   * max_index - min_index gives the number of elements
>>> that
>>> +   * will be referenced.  Say we're drawing 4 elements.
>>> On
>>> +   * the first three, we need the full stride in order to
>>> get
>>> +   * to the next element.  But on the last, we only want
>>> the
>>> +   * element size, since we don't actually read the other
>>> +   * interleaved vertex attributes stored beyond that.
>>> +   */
>>>offset = buffer->offset + min_index * buffer->stride;
>>> -  size = (buffer->stride * (max_index - min_index) +
>>> +  size = (buffer->stride * MAX2(max_index - min_index -
>>> 1, 0) +
>>>glarray->_ElementSize);
>>>
>>
>> I'm not so sure about this new formula.  Looking at the docs on
>> glDrawRangeElements, the index range is inclusive on both ends meaning that
>> the number of elements drawn is max_index - min_index + 1.  If that's the
>> case, then the old formula was correct.  Are we using a half-open interval
>> in mesa?
> 
> Yes, you're both right...I got tripped up by seeing (max_index - min_index) 
> instead of (elements = max_index - min_index + 1) and then (elements - 1).
> 
> Looking over the code, everything sure looks correct.
> 
> Which leaves the question of why both Borderlands: TPS and Serious Sam 3 have 
> all kinds of stalling with the current code, and work fine with no stalls 
> with the patch.
> 
> I wonder if the games are incorrectly translating DirectX's 
> DrawIndexedPrimitive call, which takes MinIndex and NumVertices, where 
> NumVertices is apparently supposed to be (Max-Min+1).  I suppose I could 
> ask...

That seems very likely.  This may be an optimization / work-around to
enable via driconf.  If two games make that mistake, you can be sure
there are others. :(

> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev




signature.asc
Description: OpenPGP digital signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/4] i965/vec4: Delete some dead code in visit(ir_expression *).

2014-10-15 Thread Kenneth Graunke
Nothing uses the vector_elements temporary variable.

Setting this->result.file is dead because we overwrite this->result a
few lines later.

Signed-off-by: Kenneth Graunke 
---
 src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp | 8 
 1 file changed, 8 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
index 8e32d8b..683d1f5 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
@@ -1289,14 +1289,6 @@ vec4_visitor::visit(ir_expression *ir)
   assert(!ir->operands[operand]->type->is_matrix());
}
 
-   int vector_elements = ir->operands[0]->type->vector_elements;
-   if (ir->operands[1]) {
-  vector_elements = MAX2(vector_elements,
-ir->operands[1]->type->vector_elements);
-   }
-
-   this->result.file = BAD_FILE;
-
/* Storage for our result.  Ideally for an assignment we'd be using
 * the actual storage for the result here, instead.
 */
-- 
2.1.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/4] i965/fs: Generate better code for ir_triop_csel.

2014-10-15 Thread Kenneth Graunke
Previously, we generated an extra CMP instruction:

   cmp.ge.f0(8)   g4<1>D  g2<0,1,0>F  0F
   cmp.nz.f0(8)   nullg4<8,8,1>D  0D
   (+f0) sel(8)   g120<1>Fg2.4<0,1,0>Fg3<0,1,0>F

The first operand is always a boolean, and we want to predicate the SEL
on that.  Rather than producing a boolean value and comparing it against
zero, we can just produce a condition code in the flag register.

Now we generate:

   cmp.ge.f0(8)nullg2<0,1,0>F  0F
   (+f0) sel(8)g124<1>Fg2.4<0,1,0>Fg3<0,1,0>F

total instructions in shared programs: 5473459 -> 5473253 (-0.00%)
instructions in affected programs: 6219 -> 6013 (-3.31%)

Signed-off-by: Kenneth Graunke 
---
 src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 18 +-
 1 file changed, 13 insertions(+), 5 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
index 8a11b04..3ba5492 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
@@ -487,6 +487,19 @@ fs_visitor::visit(ir_expression *ir)
  return;
   break;
 
+   case ir_triop_csel:
+  ir->operands[1]->accept(this);
+  op[1] = this->result;
+  ir->operands[2]->accept(this);
+  op[2] = this->result;
+
+  emit_bool_to_cond_code(ir->operands[0]);
+
+  this->result = fs_reg(this, ir->type);
+  inst = emit(SEL(this->result, op[1], op[2]));
+  inst->predicate = BRW_PREDICATE_NORMAL;
+  return;
+
case ir_unop_interpolate_at_centroid:
case ir_binop_interpolate_at_offset:
case ir_binop_interpolate_at_sample:
@@ -1023,11 +1036,6 @@ fs_visitor::visit(ir_expression *ir)
   break;
 
case ir_triop_csel:
-  emit(CMP(reg_null_d, op[0], fs_reg(0), BRW_CONDITIONAL_NZ));
-  inst = emit(BRW_OPCODE_SEL, this->result, op[1], op[2]);
-  inst->predicate = BRW_PREDICATE_NORMAL;
-  break;
-
case ir_unop_interpolate_at_centroid:
case ir_binop_interpolate_at_offset:
case ir_binop_interpolate_at_sample:
-- 
2.1.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 4/4] i965/vec4: Generate better code for ir_triop_csel.

2014-10-15 Thread Kenneth Graunke
Previously, we generated an extra CMP instruction:

   cmp.ge.f0(8)g6<1>D  g1<0,4,1>F  0F
   cmp.nz.f0(8)nullg6<4,4,1>D  0D
   (+f0) sel(8)g5<1>F  g1.4<0,4,1>Fg2<0,4,1>F

The first operand is always a boolean, and we want to predicate the SEL
on that.  Rather than producing a boolean value and comparing it against
zero, we can just produce a condition code in the flag register.

Now we generate:

   cmp.ge.f0(8)nullg1<0,4,1>F  0F
   (+f0) sel(8)g5<1>F  g1.4<0,4,1>Fg2<0,4,1>F

No difference in shader-db.

Signed-off-by: Kenneth Graunke 
---
 src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
index 93aa533..4c517af 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
@@ -1277,6 +1277,20 @@ vec4_visitor::visit(ir_expression *ir)
dst_reg result_dst(this, ir->type);
src_reg result_src(result_dst);
 
+   if (ir->operation == ir_triop_csel) {
+  ir->operands[1]->accept(this);
+  op[1] = this->result;
+  ir->operands[2]->accept(this);
+  op[2] = this->result;
+
+  enum brw_predicate predicate;
+  emit_bool_to_cond_code(ir->operands[0], &predicate);
+  inst = emit(BRW_OPCODE_SEL, result_dst, op[1], op[2]);
+  inst->predicate = predicate;
+  this->result = result_src;
+  return;
+   }
+
for (operand = 0; operand < ir->get_num_operands(); operand++) {
   this->result.file = BAD_FILE;
   ir->operands[operand]->accept(this);
-- 
2.1.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/4] i965/vec4: Simplify visit(ir_expression *)'s result_src/dst setup.

2014-10-15 Thread Kenneth Graunke
Using dst_reg(this, ir->type) automatically sets the writemask to the
proper size for the type; src_reg(dst_reg) preserves that.  This should
be equivalent, but less code.

Note that src_reg(dst_reg) either uses SWIZZLE_ or SWIZZLE_XYZW, so
the old code did need the manual writemask adjustment, since it
constructed the registers the other way around.

Signed-off-by: Kenneth Graunke 
---
 src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp | 19 ++-
 1 file changed, 6 insertions(+), 13 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
index 683d1f5..93aa533 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
@@ -1259,8 +1259,6 @@ vec4_visitor::visit(ir_expression *ir)
 {
unsigned int operand;
src_reg op[Elements(ir->operands)];
-   src_reg result_src;
-   dst_reg result_dst;
vec4_instruction *inst;
 
if (ir->operation == ir_binop_add) {
@@ -1273,6 +1271,12 @@ vec4_visitor::visit(ir_expression *ir)
 return;
}
 
+   /* Storage for our result.  Ideally for an assignment we'd be using
+* the actual storage for the result here, instead.
+*/
+   dst_reg result_dst(this, ir->type);
+   src_reg result_src(result_dst);
+
for (operand = 0; operand < ir->get_num_operands(); operand++) {
   this->result.file = BAD_FILE;
   ir->operands[operand]->accept(this);
@@ -1289,19 +1293,8 @@ vec4_visitor::visit(ir_expression *ir)
   assert(!ir->operands[operand]->type->is_matrix());
}
 
-   /* Storage for our result.  Ideally for an assignment we'd be using
-* the actual storage for the result here, instead.
-*/
-   result_src = src_reg(this, ir->type);
-   /* convenience for the emit functions below. */
-   result_dst = dst_reg(result_src);
/* If nothing special happens, this is the result. */
this->result = result_src;
-   /* Limit writes to the channels that will be used by result_src later.
-* This does limit this temp's use as a temporary for multi-instruction
-* sequences.
-*/
-   result_dst.writemask = (1 << ir->type->vector_elements) - 1;
 
switch (ir->operation) {
case ir_unop_logic_not:
-- 
2.1.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 84566] Unify the format conversion code

2014-10-15 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=84566

--- Comment #22 from Iago Toral  ---
(In reply to Jason Ekstrand from comment #20)
> (In reply to Iago Toral from comment #18)
> > (In reply to Jason Ekstrand from comment #17)
> > > (In reply to Iago Toral from comment #16)
> > > > We also need new mesa_format enums and pack/unpack functions for byte
> > > > swapped variants of non-array types.
> > > 
> > > Why?  Is this for the byte_swapped flag on GL upload/downloads?  If that's
> > > the only reason, then I'd rather not bother adding mesa types.  We can
> > > easily do a byte swap with _mesa_swizzle_and_convert.  While it's a little
> > > bit of a performance hit, people shouldn't be using that attribute anyway.
> > 
> > I think that won't help in this case. We can only use
> > _mesa_swizzle_and_convert for array types, but as I say, this involves
> > non-array types like GL_UNSIGNED_INT_10_10_10_2. The master converter will
> > not use _mesa_swizzle_and_convert with these, it goes through things like
> > _mesa_unpack_rgba_row and _mesa_pack_float_rgba_row for example (which rely
> > on auto-generated pack/unpack functions). Do you think we should try expand
> > these to consider byte-swapping?
> 
> My inclination on byte-swapping it to keep it out of the master conversion
> function.  It's really an oddity of the old GL api and not a core format
> conversion thing.  No sanely written app should use byte swapping, so I'm
> not too worried about making it fast.  My inclination there is to keep going
> byteswapping and the other format conversion ops in TexImage, GetTexImag,
> ReadPixels, and DrawPixels.  They can call _mesa_swizzle_and_convert to do
> the byteswap operation befor calling into the actual format conversion
> function.

I'll try it this way then.

> The downside here is that there are a few cases where we can do better. 
> However, that's pretty much limited to GL_UNSIGNED_INT_8_8_8_8[_REV] and I
> don't think it's worth the effort.

I am already handling GL_UNSIGNED_INT_8_8_8_8[_REV] with byte swapping when we
generate the mesa_array_format from the gl format and type.(it is only a matter
of reverting the component mapping in this case), so if we want we can have
TexImage and cia handle byte-swapping for all formats but this one.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] winsys/radeon: Use a single buffer cache manager again

2014-10-15 Thread Michel Dänzer
From: Michel Dänzer 

The trick is to generate a unique buffer usage value for each possible
combination of domains and flags, with only one bit set each for the
domains and flags. This ensures pb_check_usage() only returns TRUE when
the domains and flags the cached buffer was created for exactly match
the requested ones.

Signed-off-by: Michel Dänzer 
---
 src/gallium/winsys/radeon/drm/radeon_drm_bo.c | 23 +
 src/gallium/winsys/radeon/drm/radeon_drm_winsys.c | 30 ++-
 src/gallium/winsys/radeon/drm/radeon_drm_winsys.h |  5 +---
 3 files changed, 21 insertions(+), 37 deletions(-)

diff --git a/src/gallium/winsys/radeon/drm/radeon_drm_bo.c 
b/src/gallium/winsys/radeon/drm/radeon_drm_bo.c
index 9518e53..2cfa43b 100644
--- a/src/gallium/winsys/radeon/drm/radeon_drm_bo.c
+++ b/src/gallium/winsys/radeon/drm/radeon_drm_bo.c
@@ -816,21 +816,24 @@ radeon_winsys_bo_create(struct radeon_winsys *rws,
 memset(&desc, 0, sizeof(desc));
 desc.base.alignment = alignment;
 
-/* Additional criteria for the cache manager. */
-desc.base.usage = domain;
+/* Only set one usage bit each for domains and flags, or the cache manager
+ * might consider different sets of domains / flags compatible
+ */
+if (domain == RADEON_DOMAIN_VRAM_GTT)
+desc.base.usage = 1 << 2;
+else
+desc.base.usage = domain >> 1;
+assert(flags < sizeof(desc.base.usage) * 8 - 3);
+desc.base.usage |= 1 << (flags + 3);
+
 desc.initial_domains = domain;
 desc.flags = flags;
 
 /* Assign a buffer manager. */
-assert(flags < RADEON_NUM_CACHE_MANAGERS);
-if (use_reusable_pool) {
-if (domain == RADEON_DOMAIN_VRAM)
-provider = ws->cman_vram[flags];
-else
-provider = ws->cman_gtt[flags];
-} else {
+if (use_reusable_pool)
+provider = ws->cman;
+else
 provider = ws->kman;
-}
 
 buffer = provider->create_buffer(provider, size, &desc.base);
 if (!buffer)
diff --git a/src/gallium/winsys/radeon/drm/radeon_drm_winsys.c 
b/src/gallium/winsys/radeon/drm/radeon_drm_winsys.c
index c67549e..caba373 100644
--- a/src/gallium/winsys/radeon/drm/radeon_drm_winsys.c
+++ b/src/gallium/winsys/radeon/drm/radeon_drm_winsys.c
@@ -441,7 +441,6 @@ static boolean do_winsys_init(struct radeon_drm_winsys *ws)
 static void radeon_winsys_destroy(struct radeon_winsys *rws)
 {
 struct radeon_drm_winsys *ws = (struct radeon_drm_winsys*)rws;
-int i;
 
 if (ws->thread) {
 ws->kill_thread = 1;
@@ -454,10 +453,7 @@ static void radeon_winsys_destroy(struct radeon_winsys 
*rws)
 pipe_mutex_destroy(ws->cmask_owner_mutex);
 pipe_mutex_destroy(ws->cs_stack_lock);
 
-for (i = 0; i < RADEON_NUM_CACHE_MANAGERS; i++) {
-ws->cman_gtt[i]->destroy(ws->cman_gtt[i]);
-ws->cman_vram[i]->destroy(ws->cman_vram[i]);
-}
+ws->cman->destroy(ws->cman);
 ws->kman->destroy(ws->kman);
 if (ws->gen >= DRV_R600) {
 radeon_surface_manager_free(ws->surf_man);
@@ -644,7 +640,6 @@ PUBLIC struct radeon_winsys *
 radeon_drm_winsys_create(int fd, radeon_screen_create_t screen_create)
 {
 struct radeon_drm_winsys *ws;
-int i;
 
 pipe_mutex_lock(fd_tab_mutex);
 if (!fd_tab) {
@@ -674,17 +669,10 @@ radeon_drm_winsys_create(int fd, radeon_screen_create_t 
screen_create)
 if (!ws->kman)
 goto fail;
 
-for (i = 0; i < RADEON_NUM_CACHE_MANAGERS; i++) {
-ws->cman_vram[i] = pb_cache_manager_create(ws->kman, 100, 2.0f, 0,
-   ws->info.vram_size / 8);
-if (!ws->cman_vram[i])
-goto fail;
-
-ws->cman_gtt[i] = pb_cache_manager_create(ws->kman, 100, 2.0f, 0,
-  ws->info.gart_size / 8);
-if (!ws->cman_gtt[i])
-goto fail;
-}
+ws->cman = pb_cache_manager_create(ws->kman, 100, 2.0f, 0,
+   (ws->info.vram_size + 
ws->info.gart_size) / 8);
+if (!ws->cman)
+goto fail;
 
 if (ws->gen >= DRV_R600) {
 ws->surf_man = radeon_surface_manager_new(fd);
@@ -739,12 +727,8 @@ radeon_drm_winsys_create(int fd, radeon_screen_create_t 
screen_create)
 
 fail:
 pipe_mutex_unlock(fd_tab_mutex);
-for (i = 0; i < RADEON_NUM_CACHE_MANAGERS; i++) {
-if (ws->cman_gtt[i])
-ws->cman_gtt[i]->destroy(ws->cman_gtt[i]);
-if (ws->cman_vram[i])
-ws->cman_vram[i]->destroy(ws->cman_vram[i]);
-}
+if (ws->cman)
+ws->cman->destroy(ws->cman);
 if (ws->kman)
 ws->kman->destroy(ws->kman);
 if (ws->surf_man)
diff --git a/src/gallium/winsys/radeon/drm/radeon_drm_winsys.h 
b/src/gallium/winsys/radeon/drm/radeon_drm_winsys.h
index a2adf4b..1e0c632 100644
--- a/src/gallium/winsys/radeon/drm/radeon_drm_winsys.h
+++ b/src/gallium/winsys/radeon/drm/radeon_drm_winsys.h
@@ -41,8 +41,6 @@ enum radeon_gene