Re: [Mesa-dev] [PATCH] tgsi: lowering support for alpha test

2014-12-21 Thread Eric Anholt
Rob Clark  writes:

> Eric,
>
> we've managed to find the alpha-ref bitfield on a4xx, so suddenly I
> don't actually need this anymore..
>
> I've pushed a slightly updated version where I'd fixed a few more
> things (and was passing the fbo-alphatest-formats tests for me) to my
> github tree, in case you feel motivated to take it over:
>
>   
> https://github.com/freedreno/mesa/commit/70df2f9a2bbddf8e1608bab2a041a81b9b7c34e8
>
> what was left was to drop the const semantic and convert it over to an
> interface where driver passes in which const slot it wants to use.  To
> do that, we might want to re-work the tgsi_lowering interface slightly
> so the driver does the first tgsi_scan_shader() (so it knows # of
> consts) and tgsi_lowering only does the 2nd tgsi_scan_shader() in
> cases where the shader is transformed.  We seems like a sane change to
> me.

With NIR getting ready to land, I'm planning on doing lowering in that
instead.  It already has a way to describe state updates required for
uniforms, which I would just translate to my driver's per-uniform state
update enum.



signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 85380] FTBFS in Clover: /usr/lib/llvm-3.6/include/llvm/Config/Targets.def:26: undefined reference to `LLVMInitialize*Target'

2014-12-21 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=85380

Niels Ole Salscheider  changed:

   What|Removed |Added

 CC||niels_ole@salscheider-onlin
   ||e.de

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] r600: fix texture gradients instruction emission (v2)

2014-12-21 Thread Emil Velikov
Hello guys,

Can anyone backport the following patches for 10.3 and let me know if
we should drop them.
They have a bunch of non-trivial conflicts, thus i would rather avoid
butchering things up.

38ec1844193570486bf35e25a7a4339a00da127e r600: fix texture gradients
instruction emission (v2)
c88385603ae8d983314b736a9459bbf7d002cf11 r600g: do all CUBE ALU
operations before gradient texture operations (v2.1)
07ae69753c6818bcce5d4edaf2cca39c20e37f4c r600g: merge the TXQ and
BUFFER constant buffers (v1.1)
b10ddf962f2ca09073a13ad19817bf7c9b158294 r600g: fix fallout from last patch
91a827624c01d40613e97322632aaffe319540f1 r600g: make llvm code compile this time

Thanks
Emil

On 26 November 2014 at 20:46, Emil Velikov  wrote:
> Hello Dave,
>
> Cherry picking this commit for the 10.3 branch resulted in a number of
> non-trivial conflicts. Can you or anyone else familiar with the code
> backport this for 10.3 ?
>
> Thanks
> Emil
>
> On 18/11/14 22:53, Dave Airlie wrote:
>> From: Dave Airlie 
>>
>> The piglit tests were failing, and it appeared to be SB
>> optimising out things, but Glenn pointed out the gradients
>> are meant to be clause local, so we should emit the texture
>> instructions in the same clause. This moves things around
>> to always copy to a temp and then emit the texture clauses
>> for H/V.
>>
>> v2: Glenn pointed out we could get another ALU fetch in
>> the wrong place, so load the src gpr earlier as well.
>>
>> Fixes at least:
>> ./bin/tex-miplevel-selection textureGrad 2D
>>
>> Signed-off-by: Dave Airlie 
>> ---
>>  src/gallium/drivers/r600/r600_shader.c | 59 
>> ++
>>  1 file changed, 31 insertions(+), 28 deletions(-)
>>
>> diff --git a/src/gallium/drivers/r600/r600_shader.c 
>> b/src/gallium/drivers/r600/r600_shader.c
>> index 76daf2c..41caac3 100644
>> --- a/src/gallium/drivers/r600/r600_shader.c
>> +++ b/src/gallium/drivers/r600/r600_shader.c
>> @@ -5110,11 +5110,37 @@ static int tgsi_tex(struct r600_shader_ctx *ctx)
>>   }
>>
>>   if (inst->Instruction.Opcode == TGSI_OPCODE_TXD) {
>> + int temp_h, temp_v;
>>   /* TGSI moves the sampler to src reg 3 for TXD */
>>   sampler_src_reg = 3;
>>
>>   sampler_index_mode = inst->Src[sampler_src_reg].Indirect.Index 
>> == 2 ? 2 : 0; // CF_INDEX_1 : CF_INDEX_NONE
>>
>> + src_loaded = TRUE;
>> + for (i = 0; i < 3; i++) {
>> + int treg = r600_get_temp(ctx);
>> +
>> + if (i == 0)
>> + src_gpr = treg;
>> + else if (i == 1)
>> + temp_h = treg;
>> + else
>> + temp_v = treg;
>> +
>> + for (j = 0; j < 4; j++) {
>> + memset(&alu, 0, sizeof(struct 
>> r600_bytecode_alu));
>> + alu.op = ALU_OP1_MOV;
>> +r600_bytecode_src(&alu.src[0], 
>> &ctx->src[i], j);
>> +alu.dst.sel = treg;
>> +alu.dst.chan = j;
>> +if (j == 3)
>> +   alu.last = 1;
>> +alu.dst.write = 1;
>> +r = r600_bytecode_add_alu(ctx->bc, &alu);
>> +if (r)
>> +return r;
>> + }
>> + }
>>   for (i = 1; i < 3; i++) {
>>   /* set gradients h/v */
>>   memset(&tex, 0, sizeof(struct r600_bytecode_tex));
>> @@ -5125,35 +5151,12 @@ static int tgsi_tex(struct r600_shader_ctx *ctx)
>>   tex.resource_id = tex.sampler_id + 
>> R600_MAX_CONST_BUFFERS;
>>   tex.resource_index_mode = sampler_index_mode;
>>
>> - if (tgsi_tex_src_requires_loading(ctx, i)) {
>> - tex.src_gpr = r600_get_temp(ctx);
>> - tex.src_sel_x = 0;
>> - tex.src_sel_y = 1;
>> - tex.src_sel_z = 2;
>> - tex.src_sel_w = 3;
>> + tex.src_gpr = (i == 1) ? temp_h : temp_v;
>> + tex.src_sel_x = 0;
>> + tex.src_sel_y = 1;
>> + tex.src_sel_z = 2;
>> + tex.src_sel_w = 3;
>>
>> - for (j = 0; j < 4; j++) {
>> - memset(&alu, 0, sizeof(struct 
>> r600_bytecode_alu));
>> - alu.op = ALU_OP1_MOV;
>> -r600_bytecode_src(&alu.src[0], 
>> &ctx->src[i], j);
>> -alu.dst.sel = tex.src_gpr;
>> -alu.dst.chan = j;
>> -  

Re: [Mesa-dev] [PATCH 1/8] meta: Put _mesa_meta_in_progress in the header file

2014-12-21 Thread Ian Romanick
Jason asked on IRC what the overall results were for the series.  Here
is that data.  n=50 for all four.

Gl32Batch7  Gl32Multithread
32-bit  4.11846% +/- 0.216796%  1.97044% +/- 0.161024%
64-bit  5.48027% +/- 0.368804%  2.46053% +/- 0.19554%

On 12/19/2014 02:22 PM, Ian Romanick wrote:
> I should have also mentioned that this series is on the
> byt-cpu-optimizations branch of my fd.o tree.
> 
> On 12/19/2014 02:20 PM, Ian Romanick wrote:
>> From: Ian Romanick 
>>
>> ...so that it can be inlined in the two places that call it.
>>
>> On Bay Trail-D using Fedora 20 compile flags (-m64 -O2 -mtune=generic
>> for 64-bit and -m32 -march=i686 -mtune=atom for 32-bit), affects
>> Gl32Batch7:
>>
>> 32-bit: No difference proven at 95.0% confidence (n=120)
>> 64-bit: Difference at 95.0% confidence 1.24042% +/- 0.382277% (n=40)
>>
>> Signed-off-by: Ian Romanick 
>> ---
>>  src/mesa/drivers/common/meta.c | 10 --
>>  src/mesa/drivers/common/meta.h |  7 +--
>>  2 files changed, 5 insertions(+), 12 deletions(-)
>>
>> diff --git a/src/mesa/drivers/common/meta.c b/src/mesa/drivers/common/meta.c
>> index 87532c1..3284ad1 100644
>> --- a/src/mesa/drivers/common/meta.c
>> +++ b/src/mesa/drivers/common/meta.c
>> @@ -1213,16 +1213,6 @@ _mesa_meta_end(struct gl_context *ctx)
>>  
>>  
>>  /**
>> - * Determine whether Mesa is currently in a meta state.
>> - */
>> -GLboolean
>> -_mesa_meta_in_progress(struct gl_context *ctx)
>> -{
>> -   return ctx->Meta->SaveStackDepth != 0;
>> -}
>> -
>> -
>> -/**
>>   * Convert Z from a normalized value in the range [0, 1] to an object-space
>>   * Z coordinate in [-1, +1] so that drawing at the new Z position with the
>>   * default/identity ortho projection results in the original Z value.
>> diff --git a/src/mesa/drivers/common/meta.h b/src/mesa/drivers/common/meta.h
>> index 6ecf3c0..c6aef01 100644
>> --- a/src/mesa/drivers/common/meta.h
>> +++ b/src/mesa/drivers/common/meta.h
>> @@ -446,8 +446,11 @@ _mesa_meta_begin(struct gl_context *ctx, GLbitfield 
>> state);
>>  extern void
>>  _mesa_meta_end(struct gl_context *ctx);
>>  
>> -extern GLboolean
>> -_mesa_meta_in_progress(struct gl_context *ctx);
>> +static inline bool
>> +_mesa_meta_in_progress(struct gl_context *ctx)
>> +{
>> +   return ctx->Meta->SaveStackDepth != 0;
>> +}
>>  
>>  extern void
>>  _mesa_meta_fb_tex_blit_begin(const struct gl_context *ctx,
>>
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/8] meta: Put _mesa_meta_in_progress in the header file

2014-12-21 Thread Ian Romanick
On 12/19/2014 06:10 PM, Matt Turner wrote:
> On Fri, Dec 19, 2014 at 2:20 PM, Ian Romanick  wrote:
>> From: Ian Romanick 
>>
>> ...so that it can be inlined in the two places that call it.
> 
>if (brw->meta_in_progress != _mesa_meta_in_progress(ctx)) {
>   brw->meta_in_progress = _mesa_meta_in_progress(ctx);
>   brw->state.dirty.brw |= BRW_NEW_META_IN_PROGRESS;
>}
> 
> I suspect the compiler wasn't able to deduce that
> _mesa_meta_in_progress() was a pure function, so it actually called it
> twice. Inlining it obviously gives it that information.

The other problem is that one of the callers was in a different file, so
it would never be inlined there.  Of course, that callsite was never
executed in my testing.

> Reviewed-by: Matt Turner 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/8] i965: Micro-optimize brw_get_index_type

2014-12-21 Thread Ian Romanick
On 12/19/2014 06:22 PM, Matt Turner wrote:
> On Fri, Dec 19, 2014 at 2:20 PM, Ian Romanick  wrote:
>> From: Ian Romanick 
>>
>> With the switch-statement, GCC 4.8.3 produces a small pile of code with
>> a branch.
>>
>>  :
>>   00:   8b 54 24 04 mov0x4(%esp),%edx
>>   04:   b8 01 00 00 00  mov$0x1,%eax
>>   09:   81 fa 03 14 00 00   cmp$0x1403,%edx
>>   0f:   74 0d   je 1e 
>> 
>>   11:   31 c0   xor%eax,%eax
>>   13:   81 fa 05 14 00 00   cmp$0x1405,%edx
>>   19:   0f 94 c0sete   %al
>>   1c:   01 c0   add%eax,%eax
>>   1e:   c3  ret
>>
>> However, this could be two instructions.
>>
>>  :
>>   00:   2d 01 14 00 00  sub$0x1401,%eax
>>   05:   d1 e8   shr%eax
>>   07:   90  nop
>>   08:   90  nop
>>   09:   90  nop
>>   0a:   90  nop
>>   0b:   c3  ret
>>
>> The function was also moved to the header so that it could be inlined at
>> the two call sites.  Without this, 32-bit also needs to pull the
> 
> I would have liked to see what inlining the original function did, and
> then separately optimizing it, but oh well.
> 
>> parameter from the stack.  This means there is a push, a call, a move,
>> and a ret added to a two instruction function.  The above code shows the
>> function with __attribute__((regparm=1)), but even this adds several
>> extra instructions.  There is also an extra instruction on 64-bit to
>> move the parameter to %eax for the subtract.
>>
>> On Bay Trail-D using Fedora 20 compile flags (-m64 -O2 -mtune=generic
>> for 64-bit and -m32 -march=i686 -mtune=atom for 32-bit), affects
>> Gl32Batch7:
>>
>> 32-bit: Difference at 95.0% confidence 0.818589% +/- 0.234661% (n=40)
>> 64-bit: Difference at 95.0% confidence 0.54554% +/- 0.354092% (n=40)
> 
> What are we benchmarking? +/- 0.35% on a 0.54% increase is kind of large.

Gl32Batch7.  I've noticed quite a few cases with that benchmark where
the standard deviation is quite large compared to the magnitude of the
results.

I tried to account for that with my test procedure, but it occasionally
gets tricked by results that are quite possible "no difference."  My
procedure has been as follows:

- Run before vs. after 40 times.
- While there is no difference with 99.5% confidence and we're run less
than 120 tests, run 10 more times.
- Report result with 95% confidence.

After the results that I sent out, I've increased this to 50 initial
runs, +25, and up to 150.  I used this for the overall results I sent a
few minutes ago.  On that run I got 1.23843% +/- 0.239952% on 32-bit an
no difference on 64-bit.

>> Signed-off-by: Ian Romanick 
>> ---
>>  src/mesa/drivers/dri/i965/brw_context.h  | 22 +-
>>  src/mesa/drivers/dri/i965/brw_draw_upload.c  | 13 +
>>  src/mesa/drivers/dri/i965/gen8_draw_upload.c |  2 +-
>>  3 files changed, 23 insertions(+), 14 deletions(-)
>>
>> diff --git a/src/mesa/drivers/dri/i965/brw_context.h 
>> b/src/mesa/drivers/dri/i965/brw_context.h
>> index a63c483..92eb022 100644
>> --- a/src/mesa/drivers/dri/i965/brw_context.h
>> +++ b/src/mesa/drivers/dri/i965/brw_context.h
>> @@ -1602,7 +1602,27 @@ gl_clip_plane *brw_select_clip_planes(struct 
>> gl_context *ctx);
>>  /* brw_draw_upload.c */
>>  unsigned brw_get_vertex_surface_type(struct brw_context *brw,
>>   const struct gl_client_array *glarray);
>> -unsigned brw_get_index_type(GLenum type);
>> +
>> +static inline unsigned
>> +brw_get_index_type(GLenum type)
>> +{
>> +   assert((type == GL_UNSIGNED_BYTE)
>> +  || (type == GL_UNSIGNED_SHORT)
>> +  || (type == GL_UNSIGNED_INT));
> 
> I don't think the extra parenthesis really make anything more clear?

Ok

>> +
>> +   /* The possible values for type are GL_UNSIGNED_BYTE (0x1401),
>> +* GL_UNSIGNED_SHORT (0x1403), and GL_UNSIGNED_INT (0x1405) which we want
>> +* to map to scale factors of 0, 1, and 2, respectively.  These scale
>> +* factors are then left-shfited by 8 to be in the correct position in 
>> the
> 
> typo: shifted

Oops.

>> +* CMD_INDEX_BUFFER packet.
>> +*
>> +* Subtracting 0x1401 gives 0, 2, and 4.  Shifting left by 7 afterwards
>> +* gives 0x, 0x0100, and 0x0200.  These just happen to be
>> +* the values the need to be written in the CMD_INDEX_BUFFER packet.
>> +*/
>> +   return (type - 0x1401) << 7;
> 
> Wow, nice.

I had a similar trick for index_bytes (in src/mesa/main/api_validate.c)
that used inline assembly to emit an ADC, but then I figured out I could
just delete the function (in patch 6). :)  Part of me wanted to send
that patch out anyway...

>> +}
>> +
>>  

Re: [Mesa-dev] [PATCH] mesa: Add mesa SHA-1 functions

2014-12-21 Thread Emil Velikov
On 20 December 2014 at 14:21, Olivier Galibert  wrote:
> Here is an implementation I've written myself, so no license issues.
>
Thanks OG,

Afaics the main issue is not the lack of implementation, but that
no-one wants to step up to "maintain" it.
Even adding code that is x2 the size is considered a better solution :'-(

If you're up-to the maintenance task, we can resolve all the issues
(linking, multi platform support) in half the size and a lot cleaner
build :-)

Cheers,
Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/3] mesa: Always generate GL_INVALID_OPERATION in _mesa_GetProgramBinary

2014-12-21 Thread Ian Romanick
From: Ian Romanick 

There are no binary formats supported, so what are you doing?  At least
this gives the application developer some feedback about what's going
on.  The spec gives no guidance about what to do in this scenario.

Signed-off-by: Ian Romanick 
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=87516
---
 src/mesa/main/shaderapi.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/src/mesa/main/shaderapi.c b/src/mesa/main/shaderapi.c
index c7e2633..92cafdc 100644
--- a/src/mesa/main/shaderapi.c
+++ b/src/mesa/main/shaderapi.c
@@ -1719,6 +1719,8 @@ _mesa_GetProgramBinary(GLuint program, GLsizei bufSize, 
GLsizei *length,
}
 
*length = 0;
+   _mesa_error(ctx, GL_INVALID_OPERATION,
+   "glGetProgramBinary(driver supports zero binary formats)");
 
(void) binaryFormat;
(void) binary;
-- 
1.8.1.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/3] mesa: Add a missing error check in _mesa_GetProgramBinary

2014-12-21 Thread Ian Romanick
From: Ian Romanick 

Signed-off-by: Ian Romanick 
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=87516
---
 src/mesa/main/shaderapi.c | 26 --
 1 file changed, 20 insertions(+), 6 deletions(-)

diff --git a/src/mesa/main/shaderapi.c b/src/mesa/main/shaderapi.c
index 108e3f5..c7e2633 100644
--- a/src/mesa/main/shaderapi.c
+++ b/src/mesa/main/shaderapi.c
@@ -1681,16 +1681,35 @@ _mesa_GetProgramBinary(GLuint program, GLsizei bufSize, 
GLsizei *length,
GLenum *binaryFormat, GLvoid *binary)
 {
struct gl_shader_program *shProg;
+   GLsizei length_dummy;
GET_CURRENT_CONTEXT(ctx);
 
shProg = _mesa_lookup_shader_program_err(ctx, program, 
"glGetProgramBinary");
if (!shProg)
   return;
 
+   /* The ARB_get_program_binary spec says:
+*
+* "If  is NULL, then no length is returned."
+*
+* Ensure that length always points to valid storage to avoid multiple NULL
+* pointer checks below.
+*/
+   if (length != NULL)
+  *length = &length_dummy;
+
+
+   /* The ARB_get_program_binary spec says:
+*
+* "When a program object's LINK_STATUS is FALSE, its program binary
+* length is zero, and a call to GetProgramBinary will generate an
+* INVALID_OPERATION error.
+*/
if (!shProg->LinkStatus) {
   _mesa_error(ctx, GL_INVALID_OPERATION,
   "glGetProgramBinary(program %u not linked)",
   shProg->Name);
+  *length = 0;
   return;
}
 
@@ -1699,12 +1718,7 @@ _mesa_GetProgramBinary(GLuint program, GLsizei bufSize, 
GLsizei *length,
   return;
}
 
-   /* The ARB_get_program_binary spec says:
-*
-* "If  is NULL, then no length is returned."
-*/
-   if (length != NULL)
-  *length = 0;
+   *length = 0;
 
(void) binaryFormat;
(void) binary;
-- 
1.8.1.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/3] mesa: Add missing error checks in _mesa_ProgramBinary

2014-12-21 Thread Ian Romanick
From: Ian Romanick 

Signed-off-by: Ian Romanick 
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=87516
---
 src/mesa/main/shaderapi.c | 27 +--
 1 file changed, 25 insertions(+), 2 deletions(-)

diff --git a/src/mesa/main/shaderapi.c b/src/mesa/main/shaderapi.c
index 6d831f7..108e3f5 100644
--- a/src/mesa/main/shaderapi.c
+++ b/src/mesa/main/shaderapi.c
@@ -1723,8 +1723,31 @@ _mesa_ProgramBinary(GLuint program, GLenum binaryFormat,
 
(void) binaryFormat;
(void) binary;
-   (void) length;
-   _mesa_error(ctx, GL_INVALID_OPERATION, __FUNCTION__);
+
+   /* Section 2.3.1 (Errors) of the OpenGL 4.5 spec says:
+*
+* "If a negative number is provided where an argument of type sizei or
+* sizeiptr is specified, an INVALID_VALUE error is generated."
+*/
+   if (length < 0) {
+  _mesa_error(ctx, GL_INVALID_VALUE, "glProgramBinary(length < 0)");
+  return;
+   }
+
+   /* The ARB_get_program_binary spec says:
+*
+* " and  must be those returned by a previous
+* call to GetProgramBinary, and  must be the length of the
+* program binary as returned by GetProgramBinary or GetProgramiv with
+*  PROGRAM_BINARY_LENGTH. Loading the program binary will fail,
+* setting the LINK_STATUS of  to FALSE, if these conditions
+* are not met."
+*
+* Since any value of binaryFormat passed "is not one of those specified as
+* allowable for [this] command, an INVALID_ENUM error is generated."
+*/
+   shProg->LinkStatus = GL_FALSE;
+   _mesa_error(ctx, GL_INVALID_ENUM, __FUNCTION__);
 }
 
 
-- 
1.8.1.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/8] i965: Micro-optimize brw_get_index_type

2014-12-21 Thread Matt Turner
On Sun, Dec 21, 2014 at 11:40 AM, Ian Romanick  wrote:
>>> diff --git a/src/mesa/drivers/dri/i965/brw_draw_upload.c 
>>> b/src/mesa/drivers/dri/i965/brw_draw_upload.c
>>> index 6e0cf3e..e46c54b 100644
>>> --- a/src/mesa/drivers/dri/i965/brw_draw_upload.c
>>> +++ b/src/mesa/drivers/dri/i965/brw_draw_upload.c
>>> @@ -344,17 +344,6 @@ brw_get_vertex_surface_type(struct brw_context *brw,
>>> }
>>>  }
>>>
>>> -unsigned
>>> -brw_get_index_type(GLenum type)
>>> -{
>>> -   switch (type) {
>>> -   case GL_UNSIGNED_BYTE:  return BRW_INDEX_BYTE;
>>> -   case GL_UNSIGNED_SHORT: return BRW_INDEX_WORD;
>>> -   case GL_UNSIGNED_INT:   return BRW_INDEX_DWORD;
>>
>> The BRW_INDEX_* defines are now unused. I'd remove them.
>
> Separate patch or with this patch?

No preference really.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 87516] glProgramBinary violates spec

2014-12-21 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=87516

--- Comment #3 from Ian Romanick  ---
I sent some patches to the mesa-dev mailing list:

http://lists.freedesktop.org/archives/mesa-dev/2014-December/073210.html
http://lists.freedesktop.org/archives/mesa-dev/2014-December/073212.html
http://lists.freedesktop.org/archives/mesa-dev/2014-December/073211.html

The last one I'm a little iffy about.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 87516] glProgramBinary violates spec

2014-12-21 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=87516

--- Comment #4 from Leith Bade  ---
Looks good.

Are you able to send feedback upstream to the GL ARB about the vagueness around
having no support binaryFormats in the spec?

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 0/8] i965/vec4: Use Vector-float immediates.

2014-12-21 Thread Matt Turner
This series adds support to i965's vec4 backend for using vector-float
immediates. The shader-db results are pretty nice:

total instructions in shared programs: 5889529 -> 5876617 (-0.22%)
instructions in affected programs: 465347 -> 452435 (-2.77%)

and there's still room for improvement. It helps 4914 shaders and hurts
none. 50 Valley and Heaven shaders are cut by 30% and there are a few
that do

   mov(8)  g25<1>F  [-1.5F, -1.5F, -0.5F, -1.5F]VF
   mov(8)  g27<1>F  [ 0.5F, -1.5F,  1.5F, -1.5F]VF
   mov(8)  g29<1>F  [-1.5F, -0.5F, -0.5F, -0.5F]VF
   mov(8)  g31<1>F  [ 0.5F, -0.5F,  1.5F, -0.5F]VF
   mov(8)  g33<1>F  [-1.5F,  0.5F, -0.5F,  0.5F]VF
   mov(8)  g35<1>F  [ 0.5F,  0.5F,  1.5F,  0.5F]VF
   mov(8)  g37<1>F  [-1.5F,  1.5F, -0.5F,  1.5F]VF
   mov(8)  g39<1>F  [ 0.5F,  1.5F,  1.5F,  1.5F]VF

before using each register once in a MAD instruction. I'll have to think
more about a general solution that would optimize that into a single

   mov(8)  g25<1>F  [-1.5F,  1.5F, -0.5F,  0.5F]VF

and then sources it 8 times with different swizzles. Probably something
like the constant-combining pass I've floated for the FS backend.

On a side note, can someone tell me how to run Heaven or Valley in an
automated benchmarking mode?
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 6/8] i965/vec4: Add parameter to skip doing constant propagation.

2014-12-21 Thread Matt Turner
After CSEing some MOV ..., VF instructions we have code like

   mov tmp, [1F, 2F, 3F, 4F]VF
   mov r10, tmp
   mov r11, tmp
   ...
   use r10
   use r11

We want to copy propagate tmp into the uses of r10 and r11, but *not*
constant propagate the VF immediate into the uses of tmp.
---
 src/mesa/drivers/dri/i965/brw_vec4.h| 2 +-
 src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4.h 
b/src/mesa/drivers/dri/i965/brw_vec4.h
index 4a8a467..75ecaf1 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.h
+++ b/src/mesa/drivers/dri/i965/brw_vec4.h
@@ -369,7 +369,7 @@ public:
bool opt_reduce_swizzle();
bool dead_code_eliminate();
bool virtual_grf_interferes(int a, int b);
-   bool opt_copy_propagation();
+   bool opt_copy_propagation(bool do_constant_prop = true);
bool opt_cse_local(bblock_t *block);
bool opt_cse();
bool opt_algebraic();
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp
index 65564c9..9deaffa 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp
@@ -330,7 +330,7 @@ try_copy_propagate(struct brw_context *brw, 
vec4_instruction *inst,
 }
 
 bool
-vec4_visitor::opt_copy_propagation()
+vec4_visitor::opt_copy_propagation(bool do_constant_prop)
 {
bool progress = false;
struct copy_entry entries[virtual_grf_reg_count];
@@ -395,7 +395,7 @@ vec4_visitor::opt_copy_propagation()
 if (c != 4)
continue;
 
-if (try_constant_propagate(brw, inst, i, &entry))
+ if (do_constant_prop && try_constant_propagate(brw, inst, i, &entry))
 progress = true;
 
 if (try_copy_propagate(brw, inst, i, &entry, reg))
-- 
2.0.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 5/8] i965/vec4: Do CSE, copy propagation, and DCE after opt_vector_float().

2014-12-21 Thread Matt Turner
total instructions in shared programs: 5869005 -> 5868220 (-0.01%)
instructions in affected programs: 70208 -> 69423 (-1.12%)
---
 src/mesa/drivers/dri/i965/brw_vec4.cpp | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4.cpp
index c1aaeea..d36a735 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
@@ -1798,7 +1798,11 @@ vec4_visitor::run()
   OPT(opt_register_coalesce);
} while (progress);
 
-   opt_vector_float();
+   if (opt_vector_float()) {
+  opt_cse();
+  opt_copy_propagation();
+  dead_code_eliminate();
+   }
 
if (failed)
   return false;
-- 
2.0.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/8] i965/vec4: Add pass to gather constants into a vector-float MOV.

2014-12-21 Thread Matt Turner
Currently only handles consecutive instructions with the same
destination that collectively write all channels.

total instructions in shared programs: 5879798 -> 5869011 (-0.18%)
instructions in affected programs: 465236 -> 454449 (-2.32%)
---
 src/mesa/drivers/dri/i965/brw_vec4.cpp | 61 ++
 src/mesa/drivers/dri/i965/brw_vec4.h   |  1 +
 2 files changed, 62 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4.cpp
index 7619eb6..c1aaeea 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
@@ -336,6 +336,66 @@ src_reg::equals(const src_reg &r) const
  sizeof(fixed_hw_reg)) == 0);
 }
 
+bool
+vec4_visitor::opt_vector_float()
+{
+   bool progress = false;
+
+   int last_reg = -1;
+   int remaining_channels;
+   uint8_t imm[4];
+   int inst_count;
+   vec4_instruction *imm_inst[4];
+
+   foreach_block_and_inst_safe(block, vec4_instruction, inst, cfg) {
+  if (last_reg != inst->dst.reg) {
+ last_reg = inst->dst.reg;
+ remaining_channels = WRITEMASK_XYZW;
+
+ inst_count = 0;
+  }
+
+  if (inst->opcode != BRW_OPCODE_MOV ||
+  inst->dst.writemask == WRITEMASK_XYZW ||
+  inst->src[0].file != IMM)
+ continue;
+
+  int vf = brw_float_to_vf(inst->src[0].fixed_hw_reg.dw1.f);
+  if (vf == -1)
+ continue;
+
+  if ((inst->dst.writemask & WRITEMASK_X) != 0)
+ imm[0] = vf;
+  if ((inst->dst.writemask & WRITEMASK_Y) != 0)
+ imm[1] = vf;
+  if ((inst->dst.writemask & WRITEMASK_Z) != 0)
+ imm[2] = vf;
+  if ((inst->dst.writemask & WRITEMASK_W) != 0)
+ imm[3] = vf;
+
+  imm_inst[inst_count++] = inst;
+
+  remaining_channels &= ~inst->dst.writemask;
+  if (remaining_channels == 0) {
+ vec4_instruction *mov = MOV(inst->dst, imm);
+ mov->dst.type = BRW_REGISTER_TYPE_F;
+ mov->dst.writemask = WRITEMASK_XYZW;
+ inst->insert_after(block, mov);
+ last_reg = -1;
+
+ for (int i = 0; i < inst_count; i++) {
+imm_inst[i]->remove(block);
+ }
+ progress = true;
+  }
+   }
+
+   if (progress)
+  invalidate_live_intervals();
+
+   return progress;
+}
+
 /* Replaces unused channels of a swizzle with channels that are used.
  *
  * For instance, this pass transforms
@@ -1738,6 +1798,7 @@ vec4_visitor::run()
   OPT(opt_register_coalesce);
} while (progress);
 
+   opt_vector_float();
 
if (failed)
   return false;
diff --git a/src/mesa/drivers/dri/i965/brw_vec4.h 
b/src/mesa/drivers/dri/i965/brw_vec4.h
index 0c44ad3..4a8a467 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.h
+++ b/src/mesa/drivers/dri/i965/brw_vec4.h
@@ -365,6 +365,7 @@ public:
void calculate_live_intervals();
void invalidate_live_intervals();
void split_virtual_grfs();
+   bool opt_vector_float();
bool opt_reduce_swizzle();
bool dead_code_eliminate();
bool virtual_grf_interferes(int a, int b);
-- 
2.0.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 7/8] i965/vec4: Allow constant propagation of VF immediates.

2014-12-21 Thread Matt Turner
total instructions in shared programs: 5877951 -> 5877012 (-0.02%)
instructions in affected programs: 155923 -> 154984 (-0.60%)

Helps 1233, hurts 156 shaders. The hurt shaders are addressed in the
next commit.
---
 .../drivers/dri/i965/brw_vec4_copy_propagation.cpp | 28 +-
 1 file changed, 27 insertions(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp
index 9deaffa..9e47dd9 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp
@@ -50,7 +50,9 @@ is_direct_copy(vec4_instruction *inst)
   inst->dst.file == GRF &&
   !inst->dst.reladdr &&
   !inst->src[0].reladdr &&
-  inst->dst.type == inst->src[0].type);
+  (inst->dst.type == inst->src[0].type ||
+(inst->dst.type == BRW_REGISTER_TYPE_F &&
+ inst->src[0].type == BRW_REGISTER_TYPE_VF)));
 }
 
 static bool
@@ -77,6 +79,22 @@ is_channel_updated(vec4_instruction *inst, src_reg 
*values[4], int ch)
   inst->dst.writemask & (1 << BRW_GET_SWZ(src->swizzle, ch)));
 }
 
+static unsigned
+swizzle_vf_imm(unsigned vf4, unsigned swizzle)
+{
+   union {
+  unsigned vf4;
+  uint8_t vf[4];
+   } v = { vf4 }, ret;
+
+   ret.vf[0] = v.vf[BRW_GET_SWZ(swizzle, 0)];
+   ret.vf[1] = v.vf[BRW_GET_SWZ(swizzle, 1)];
+   ret.vf[2] = v.vf[BRW_GET_SWZ(swizzle, 2)];
+   ret.vf[3] = v.vf[BRW_GET_SWZ(swizzle, 3)];
+
+   return ret.vf4;
+}
+
 static bool
 try_constant_propagate(struct brw_context *brw, vec4_instruction *inst,
int arg, struct copy_entry *entry)
@@ -98,6 +116,8 @@ try_constant_propagate(struct brw_context *brw, 
vec4_instruction *inst,
if (inst->src[arg].abs) {
   if (value.type == BRW_REGISTER_TYPE_F) {
 value.fixed_hw_reg.dw1.f = fabs(value.fixed_hw_reg.dw1.f);
+  } else if (value.type == BRW_REGISTER_TYPE_VF) {
+ value.fixed_hw_reg.dw1.ud &= ~0x80808080;
   } else if (value.type == BRW_REGISTER_TYPE_D) {
 if (value.fixed_hw_reg.dw1.d < 0)
value.fixed_hw_reg.dw1.d = -value.fixed_hw_reg.dw1.d;
@@ -107,10 +127,16 @@ try_constant_propagate(struct brw_context *brw, 
vec4_instruction *inst,
if (inst->src[arg].negate) {
   if (value.type == BRW_REGISTER_TYPE_F)
 value.fixed_hw_reg.dw1.f = -value.fixed_hw_reg.dw1.f;
+  else if (value.type == BRW_REGISTER_TYPE_VF)
+ value.fixed_hw_reg.dw1.ud ^= 0x80808080;
   else
 value.fixed_hw_reg.dw1.ud = -value.fixed_hw_reg.dw1.ud;
}
 
+   if (value.type == BRW_REGISTER_TYPE_VF)
+  value.fixed_hw_reg.dw1.ud = swizzle_vf_imm(value.fixed_hw_reg.dw1.ud,
+ inst->src[arg].swizzle);
+
switch (inst->opcode) {
case BRW_OPCODE_MOV:
   inst->src[arg] = value;
-- 
2.0.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 4/8] i965/vec4: Perform CSE on MOV ..., VF instructions.

2014-12-21 Thread Matt Turner
Port of commit a28ad9d4 from the fs backend.

No shader-db changes since we don't emit MOV ..., VF instructions yet.
---
 src/mesa/drivers/dri/i965/brw_vec4_cse.cpp | 16 +++-
 1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4_cse.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_cse.cpp
index 7071213..37c930c 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_cse.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_cse.cpp
@@ -48,6 +48,7 @@ static bool
 is_expression(const vec4_instruction *const inst)
 {
switch (inst->opcode) {
+   case BRW_OPCODE_MOV:
case BRW_OPCODE_SEL:
case BRW_OPCODE_NOT:
case BRW_OPCODE_AND:
@@ -154,11 +155,16 @@ vec4_visitor::opt_cse_local(bblock_t *block)
  }
 
  if (!found) {
-/* Our first sighting of this expression.  Create an entry. */
-aeb_entry *entry = ralloc(cse_ctx, aeb_entry);
-entry->tmp = src_reg(); /* file will be BAD_FILE */
-entry->generator = inst;
-aeb.push_tail(entry);
+if (inst->opcode != BRW_OPCODE_MOV ||
+(inst->opcode == BRW_OPCODE_MOV &&
+ inst->src[0].file == IMM &&
+ inst->src[0].type == BRW_REGISTER_TYPE_VF)) {
+   /* Our first sighting of this expression.  Create an entry. */
+   aeb_entry *entry = ralloc(cse_ctx, aeb_entry);
+   entry->tmp = src_reg(); /* file will be BAD_FILE */
+   entry->generator = inst;
+   aeb.push_tail(entry);
+}
  } else {
 /* This is at least our second sighting of this expression.
  * If we don't have a temporary already, make one.
-- 
2.0.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/8] i965: Add fs_reg/src_reg constructors that take vf[4].

2014-12-21 Thread Matt Turner
Sometimes it's easier to generate 4x values into an array, and the
memcpy is 1 instruction, rather than 11 to piece 4 arguments together.

I'd forgotten to remove the prototype from fs_reg from a previous patch,
so it's already there for us here.
---
 src/mesa/drivers/dri/i965/brw_fs.cpp   | 9 +
 src/mesa/drivers/dri/i965/brw_vec4.cpp | 9 +
 src/mesa/drivers/dri/i965/brw_vec4.h   | 1 +
 3 files changed, 19 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index 3639ed2..2837fc0 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -585,6 +585,15 @@ fs_reg::fs_reg(uint32_t u)
 }
 
 /** Vector float immediate value constructor. */
+fs_reg::fs_reg(uint8_t vf[4])
+{
+   init();
+   this->file = IMM;
+   this->type = BRW_REGISTER_TYPE_VF;
+   memcpy(&this->fixed_hw_reg.dw1.ud, vf, sizeof(unsigned));
+}
+
+/** Vector float immediate value constructor. */
 fs_reg::fs_reg(uint8_t vf0, uint8_t vf1, uint8_t vf2, uint8_t vf3)
 {
init();
diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4.cpp
index 2fb578e..b303eb6 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
@@ -113,6 +113,15 @@ src_reg::src_reg(int32_t i)
this->fixed_hw_reg.dw1.d = i;
 }
 
+src_reg::src_reg(uint8_t vf[4])
+{
+   init();
+
+   this->file = IMM;
+   this->type = BRW_REGISTER_TYPE_VF;
+   memcpy(&this->fixed_hw_reg.dw1.ud, vf, sizeof(unsigned));
+}
+
 src_reg::src_reg(uint8_t vf0, uint8_t vf1, uint8_t vf2, uint8_t vf3)
 {
init();
diff --git a/src/mesa/drivers/dri/i965/brw_vec4.h 
b/src/mesa/drivers/dri/i965/brw_vec4.h
index be52fbc..0c44ad3 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.h
+++ b/src/mesa/drivers/dri/i965/brw_vec4.h
@@ -82,6 +82,7 @@ public:
src_reg(float f);
src_reg(uint32_t u);
src_reg(int32_t i);
+   src_reg(uint8_t vf[4]);
src_reg(uint8_t vf0, uint8_t vf1, uint8_t vf2, uint8_t vf3);
src_reg(struct brw_reg reg);
 
-- 
2.0.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/8] i965: Add support for saturating immediates.

2014-12-21 Thread Matt Turner
I don't feel great about unreachable("unimplemented: ...") but these
cases do only seem possible under some currently impossible circumstances.
---
 src/mesa/drivers/dri/i965/brw_fs.cpp | 16 +++
 src/mesa/drivers/dri/i965/brw_shader.cpp | 47 
 src/mesa/drivers/dri/i965/brw_shader.h   |  1 +
 src/mesa/drivers/dri/i965/brw_vec4.cpp   | 16 +++
 4 files changed, 80 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index 2837fc0..789f967 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -2292,6 +2292,22 @@ fs_visitor::opt_algebraic()
 
foreach_block_and_inst(block, fs_inst, inst, cfg) {
   switch (inst->opcode) {
+  case BRW_OPCODE_MOV:
+ if (inst->src[0].file != IMM)
+break;
+
+ if (inst->saturate) {
+if (inst->dst.type != inst->src[0].type)
+   unreachable("unimplemented: saturate mixed types");
+
+if (brw_saturate_immediate(inst->dst.type,
+   &inst->src[0].fixed_hw_reg)) {
+   inst->saturate = false;
+   progress = true;
+}
+ }
+ break;
+
   case BRW_OPCODE_MUL:
 if (inst->src[1].file != IMM)
continue;
diff --git a/src/mesa/drivers/dri/i965/brw_shader.cpp 
b/src/mesa/drivers/dri/i965/brw_shader.cpp
index 1e5227c..a2fc471 100644
--- a/src/mesa/drivers/dri/i965/brw_shader.cpp
+++ b/src/mesa/drivers/dri/i965/brw_shader.cpp
@@ -575,6 +575,53 @@ brw_instruction_name(enum opcode op)
unreachable("not reached");
 }
 
+bool
+brw_saturate_immediate(enum brw_reg_type type, struct brw_reg *reg)
+{
+   union {
+  unsigned ud;
+  int d;
+  float f;
+   } imm = { reg->dw1.ud }, sat_imm;
+
+   switch (type) {
+   case BRW_REGISTER_TYPE_UD:
+   case BRW_REGISTER_TYPE_D:
+   case BRW_REGISTER_TYPE_UQ:
+   case BRW_REGISTER_TYPE_Q:
+  /* Nothing to do. */
+  return false;
+   case BRW_REGISTER_TYPE_UW:
+  sat_imm.ud = CLAMP(imm.ud, 0, USHRT_MAX);
+  break;
+   case BRW_REGISTER_TYPE_W:
+  sat_imm.d = CLAMP(imm.d, SHRT_MIN, SHRT_MAX);
+  break;
+   case BRW_REGISTER_TYPE_F:
+  sat_imm.f = CLAMP(imm.f, 0.0f, 1.0f);
+  break;
+   case BRW_REGISTER_TYPE_UB:
+  sat_imm.ud = CLAMP(imm.ud, 0, UCHAR_MAX);
+  break;
+   case BRW_REGISTER_TYPE_B:
+  sat_imm.d = CLAMP(imm.d, CHAR_MIN, CHAR_MAX);
+  break;
+   case BRW_REGISTER_TYPE_V:
+   case BRW_REGISTER_TYPE_UV:
+   case BRW_REGISTER_TYPE_VF:
+  unreachable("unimplemented: saturate vector immediate");
+   case BRW_REGISTER_TYPE_DF:
+   case BRW_REGISTER_TYPE_HF:
+  unreachable("unimplemented: saturate DF/HF immediate");
+   }
+
+   if (imm.ud != sat_imm.ud) {
+  reg->dw1.ud = sat_imm.ud;
+  return true;
+   }
+   return false;
+}
+
 backend_visitor::backend_visitor(struct brw_context *brw,
  struct gl_shader_program *shader_prog,
  struct gl_program *prog,
diff --git a/src/mesa/drivers/dri/i965/brw_shader.h 
b/src/mesa/drivers/dri/i965/brw_shader.h
index 05434a7..233e224 100644
--- a/src/mesa/drivers/dri/i965/brw_shader.h
+++ b/src/mesa/drivers/dri/i965/brw_shader.h
@@ -192,6 +192,7 @@ enum brw_reg_type brw_type_for_base_type(const struct 
glsl_type *type);
 enum brw_conditional_mod brw_conditional_for_comparison(unsigned int op);
 uint32_t brw_math_function(enum opcode op);
 const char *brw_instruction_name(enum opcode op);
+bool brw_saturate_immediate(enum brw_reg_type type, struct brw_reg *reg);
 
 #ifdef __cplusplus
 extern "C" {
diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4.cpp
index b303eb6..7619eb6 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
@@ -572,6 +572,22 @@ vec4_visitor::opt_algebraic()
 
foreach_block_and_inst(block, vec4_instruction, inst, cfg) {
   switch (inst->opcode) {
+  case BRW_OPCODE_MOV:
+ if (inst->src[0].file != IMM)
+break;
+
+ if (inst->saturate) {
+if (inst->dst.type != inst->src[0].type)
+   unreachable("unimplemented: saturate mixed types");
+
+if (brw_saturate_immediate(inst->dst.type,
+   &inst->src[0].fixed_hw_reg)) {
+   inst->saturate = false;
+   progress = true;
+}
+ }
+ break;
+
   case VEC4_OPCODE_UNPACK_UNIFORM:
  if (inst->src[0].file != UNIFORM) {
 inst->opcode = BRW_OPCODE_MOV;
-- 
2.0.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 8/8] i965/vec4: Do copy before constant propagation after opt_vector_float().

2014-12-21 Thread Matt Turner
total instructions in shared programs: 5877012 -> 5876617 (-0.01%)
instructions in affected programs: 33140 -> 32745 (-1.19%)

From before the commit that allows VF constant propagation (which hurt
some programs) to here, the results are:

total instructions in shared programs: 5877951 -> 5876617 (-0.02%)
instructions in affected programs: 123444 -> 122110 (-1.08%)

with no programs hurt.
---
 src/mesa/drivers/dri/i965/brw_vec4.cpp | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4.cpp
index d36a735..c9ba338 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
@@ -1800,6 +1800,7 @@ vec4_visitor::run()
 
if (opt_vector_float()) {
   opt_cse();
+  opt_copy_propagation(false);
   opt_copy_propagation();
   dead_code_eliminate();
}
-- 
2.0.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] i965: Consider SEL.{GE, L} to be commutative operations.

2014-12-21 Thread Matt Turner
---
No shader-db changes, unfortunately.

 src/mesa/drivers/dri/i965/brw_fs_cse.cpp   | 14 ++
 src/mesa/drivers/dri/i965/brw_vec4_cse.cpp | 18 --
 2 files changed, 22 insertions(+), 10 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs_cse.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_cse.cpp
index 38fae17..2488596 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_cse.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_cse.cpp
@@ -109,22 +109,28 @@ is_expression(const fs_inst *const inst)
 }
 
 static bool
-is_expression_commutative(enum opcode op)
+is_expression_commutative(const fs_inst *inst)
 {
-   switch (op) {
+   switch (inst->opcode) {
case BRW_OPCODE_AND:
case BRW_OPCODE_OR:
case BRW_OPCODE_XOR:
case BRW_OPCODE_ADD:
case BRW_OPCODE_MUL:
   return true;
+   case BRW_OPCODE_SEL:
+  if (inst->conditional_mod == BRW_CONDITIONAL_GE ||
+  inst->conditional_mod == BRW_CONDITIONAL_L) {
+ return true;
+  }
+  /* fallthrough */
default:
   return false;
}
 }
 
 static bool
-operands_match(fs_inst *a, fs_inst *b)
+operands_match(const fs_inst *a, fs_inst *b)
 {
fs_reg *xs = a->src;
fs_reg *ys = b->src;
@@ -133,7 +139,7 @@ operands_match(fs_inst *a, fs_inst *b)
   return xs[0].equals(ys[0]) &&
  ((xs[1].equals(ys[1]) && xs[2].equals(ys[2])) ||
   (xs[2].equals(ys[1]) && xs[1].equals(ys[2])));
-   } else if (!is_expression_commutative(a->opcode)) {
+   } else if (!is_expression_commutative(a)) {
   bool match = true;
   for (int i = 0; i < a->sources; i++) {
  if (!xs[i].equals(ys[i])) {
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_cse.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_cse.cpp
index 7071213..1b47de4 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_cse.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_cse.cpp
@@ -88,28 +88,34 @@ is_expression(const vec4_instruction *const inst)
 }
 
 static bool
-is_expression_commutative(enum opcode op)
+is_expression_commutative(const vec4_instruction *inst)
 {
-   switch (op) {
+   switch (inst->opcode) {
case BRW_OPCODE_AND:
case BRW_OPCODE_OR:
case BRW_OPCODE_XOR:
case BRW_OPCODE_ADD:
case BRW_OPCODE_MUL:
   return true;
+   case BRW_OPCODE_SEL:
+  if (inst->conditional_mod == BRW_CONDITIONAL_GE ||
+  inst->conditional_mod == BRW_CONDITIONAL_L) {
+ return true;
+  }
+  /* fallthrough */
default:
   return false;
}
 }
 
 static bool
-operands_match(enum opcode op, src_reg *xs, src_reg *ys)
+operands_match(const vec4_instruction *inst, src_reg *xs, src_reg *ys)
 {
-   if (op == BRW_OPCODE_MAD) {
+   if (inst->opcode == BRW_OPCODE_MAD) {
   return xs[0].equals(ys[0]) &&
  ((xs[1].equals(ys[1]) && xs[2].equals(ys[2])) ||
   (xs[2].equals(ys[1]) && xs[1].equals(ys[2])));
-   } else if (!is_expression_commutative(op)) {
+   } else if (!is_expression_commutative(inst)) {
   return xs[0].equals(ys[0]) && xs[1].equals(ys[1]) && xs[2].equals(ys[2]);
} else {
   return (xs[0].equals(ys[0]) && xs[1].equals(ys[1])) ||
@@ -125,7 +131,7 @@ instructions_match(vec4_instruction *a, vec4_instruction *b)
   a->conditional_mod == b->conditional_mod &&
   a->dst.type == b->dst.type &&
   a->dst.writemask == b->dst.writemask &&
-  operands_match(a->opcode, a->src, b->src);
+  operands_match(a, a->src, b->src);
 }
 
 bool
-- 
2.0.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965: Consider SEL.{GE, L} to be commutative operations.

2014-12-21 Thread Kenneth Graunke
On Sunday, December 21, 2014 03:37:25 PM Matt Turner wrote:
> ---
> No shader-db changes, unfortunately.

Unsurprising :) This patch doesn't implement what you meant.

1. instructions_match() checks a->conditional_mod == b->conditional_mod.
   So you won't ever see mixed conditional mods in operands_match() or
   is_expression_commutative().  I suspect this is why you see no changes.

2. It looks like your patch allows CSE of (a >= b) with (b >= a),
   which are clearly different.  So any changes would be wrong.

What you want is to use brw_swap_cmod.  It's then trivial to handle /all/
comparitors, not just L/GE.  I think what you want is:

static bool
instructions match(fs_inst *a, fs_inst *b)
{
   bool match = a->opcode == b->opcode &&
a->saturate == b->saturate &&
a->predicate == b->predicate &&
a->predicate_inverse == b->predicate_inverse &&
a->dst.type == b->dst.type &&
a->sources == b->sources;

   if (a->is_tex()) {
   match = match &&
   a->offset == b->offset &&
   a->mlen == b->mlen &&
   a->regs_written == b->regs_written &&
   a->base_mrf == b->base_mrf &&
   a->eot == b->eot &&
   a->header_present == b->header_present &&
   a->shadow_compare == b->shadow_compare);
   }

   if (!match)
  return false;

   /* Comparisons match if both the comparitor and operands are flipped. */
   if (a->opcode == BRW_OPCODE_SEL &&
   a->conditional_mod == brw_swap_cmod(b->conditional_mod)
   a->src[0].equals(b->src[1]) && a->src[1].equals(b->src[0])) {
  return true;
   }

   return a->conditional_mod == b->conditional_mod && operands_match(a, b);
}

I'm also not sure why you special case SEL - it seems like CMP would work fine
as well, and would probably help even more.  Maybe just drop the opcode check.

FWIW, I haven't tested the above code.

--Ken

signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/8] i965: Micro-optimize brw_get_index_type

2014-12-21 Thread Kenneth Graunke
On Friday, December 19, 2014 02:20:53 PM Ian Romanick wrote:
[snip]
> diff --git a/src/mesa/drivers/dri/i965/brw_context.h 
> b/src/mesa/drivers/dri/i965/brw_context.h
> index a63c483..92eb022 100644
> --- a/src/mesa/drivers/dri/i965/brw_context.h
> +++ b/src/mesa/drivers/dri/i965/brw_context.h
> @@ -1602,7 +1602,27 @@ gl_clip_plane *brw_select_clip_planes(struct 
> gl_context *ctx);
>  /* brw_draw_upload.c */
>  unsigned brw_get_vertex_surface_type(struct brw_context *brw,
>   const struct gl_client_array *glarray);
> -unsigned brw_get_index_type(GLenum type);
> +
> +static inline unsigned
> +brw_get_index_type(GLenum type)
> +{
> +   assert((type == GL_UNSIGNED_BYTE)
> +  || (type == GL_UNSIGNED_SHORT)
> +  || (type == GL_UNSIGNED_INT));
> +
> +   /* The possible values for type are GL_UNSIGNED_BYTE (0x1401),
> +* GL_UNSIGNED_SHORT (0x1403), and GL_UNSIGNED_INT (0x1405) which we want
> +* to map to scale factors of 0, 1, and 2, respectively.  These scale
> +* factors are then left-shfited by 8 to be in the correct position in the
> +* CMD_INDEX_BUFFER packet.
> +*
> +* Subtracting 0x1401 gives 0, 2, and 4.  Shifting left by 7 afterwards
> +* gives 0x, 0x0100, and 0x0200.  These just happen to be
> +* the values the need to be written in the CMD_INDEX_BUFFER packet.
> +*/
> +   return (type - 0x1401) << 7;
> +}

Clever!

How about putting it in brw_draw.h?  It seems like the right place, and would
avoid cluttering brw_context.h with even more random stuff.

With Matt's suggestions, this is:
Reviewed-by: Kenneth Graunke 

I don't care about removing the BRW_INDEX_* #defines - either way's fine.

signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965: Consider SEL.{GE, L} to be commutative operations.

2014-12-21 Thread Matt Turner
On Sun, Dec 21, 2014 at 7:50 PM, Kenneth Graunke  wrote:
> On Sunday, December 21, 2014 03:37:25 PM Matt Turner wrote:
>> ---
>> No shader-db changes, unfortunately.
>
> Unsurprising :) This patch doesn't implement what you meant.
>
> 1. instructions_match() checks a->conditional_mod == b->conditional_mod.
>So you won't ever see mixed conditional mods in operands_match() or
>is_expression_commutative().  I suspect this is why you see no changes.

I'm not expecting to CSE SELs with different conditional mods. Maybe
I'm misunderstanding.

> 2. It looks like your patch allows CSE of (a >= b) with (b >= a),
>which are clearly different.  So any changes would be wrong.

I don't think so? Those operations aren't implemented with SEL. I
think thinking about this as SEL+conditional mod is more confusing
than just MIN/MAX.

The intention of this patch is to allow MAX(a, b) and MAX(b, a) to be
recognized as identical expressions. Same for MIN.

> What you want is to use brw_swap_cmod.  It's then trivial to handle /all/
> comparitors, not just L/GE.  I think what you want is:
>
> static bool
> instructions match(fs_inst *a, fs_inst *b)
> {
>bool match = a->opcode == b->opcode &&
> a->saturate == b->saturate &&
> a->predicate == b->predicate &&
> a->predicate_inverse == b->predicate_inverse &&
> a->dst.type == b->dst.type &&
> a->sources == b->sources;
>
>if (a->is_tex()) {
>match = match &&
>a->offset == b->offset &&
>a->mlen == b->mlen &&
>a->regs_written == b->regs_written &&
>a->base_mrf == b->base_mrf &&
>a->eot == b->eot &&
>a->header_present == b->header_present &&
>a->shadow_compare == b->shadow_compare);
>}
>
>if (!match)
>   return false;
>
>/* Comparisons match if both the comparitor and operands are flipped. */
>if (a->opcode == BRW_OPCODE_SEL &&
>a->conditional_mod == brw_swap_cmod(b->conditional_mod)
>a->src[0].equals(b->src[1]) && a->src[1].equals(b->src[0])) {
>   return true;
>}
>
>return a->conditional_mod == b->conditional_mod && operands_match(a, b);
> }
>
> I'm also not sure why you special case SEL - it seems like CMP would work fine
> as well, and would probably help even more.  Maybe just drop the opcode check.

I think you're confused. :)

I'm special casing SEL because it's used to implement MIN/MAX and MIN
and MAX are commutative operations.

You're right that we could probably consider things like CMP with
different conditional mods as identical expressions (I'm not sure, the
comparison rules for CMP are non-obvious when you consider NaNs), but
that's not related to this patch.

The IVB PRM says about SEL with .l/.ge (specifically those, since they
are used to implement MIN/MAX):

For a sel instruction with a .l or .ge conditional modifier, if one
source is NaN and the other not NaN, the
non-NaN source is the result. If both sources are NaNs, the result is
NaN. For all other conditional
modifiers, if either source is NaN then src1 is selected.

That's the behavior I suggested GLSL should adopt in another thread.
That's the inspiration for this patch.

Does that clear it up?
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/8] i965: Store the atoms directly in the context

2014-12-21 Thread Kenneth Graunke
On Friday, December 19, 2014 02:20:54 PM Ian Romanick wrote:
> From: Ian Romanick 
> 
> Instead of having an extra pointer indirection in one of the hottest
> loops in the driver.
> 
> On Bay Trail-D using Fedora 20 compile flags (-m64 -O2 -mtune=generic
> for 64-bit and -m32 -march=i686 -mtune=atom for 32-bit), affects
> Gl32Batch7:
> 
> 32-bit: Difference at 95.0% confidence 1.98515% +/- 0.20814% (n=40)
> 64-bit: Difference at 95.0% confidence 1.5163% +/- 0.811016% (n=60)
> 
> Signed-off-by: Ian Romanick 
> ---
>  src/mesa/drivers/dri/i965/brw_context.h  |  2 +-
>  src/mesa/drivers/dri/i965/brw_state_upload.c | 19 ---
>  2 files changed, 17 insertions(+), 4 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_context.h 
> b/src/mesa/drivers/dri/i965/brw_context.h
> index 92eb022..5f5f807 100644
> --- a/src/mesa/drivers/dri/i965/brw_context.h
> +++ b/src/mesa/drivers/dri/i965/brw_context.h
> @@ -1384,7 +1384,7 @@ struct brw_context
> } perfmon;
>  
> int num_atoms;
> -   const struct brw_tracked_state **atoms;
> +   const struct brw_tracked_state atoms[64];

Making this atoms[57] would save memory, and the static asserts make it easy
to remember to increase the size if we ever add additional atoms.

Honestly, it seems like there should be a better way to do this.  Currently,
the atoms list only depends on the generation number, and thus is identical
for all GL contexts.  Storing an additional 1.5 kB of redundant data in each
GL context seems a bit wasteful.

Would it help to make brw_upload_state cache the array and size?

void
brw_upload_state(struct brw_context *brw)
{
   static const struct brw_tracked_state *atoms = brw->atoms;
   static const int num_atoms = brw->num_atoms;

   ...

   for (i = 0; i < num_atoms; i++) {
  ...use atoms[i]...
   }

   ...
}

It seems like that would achieve the same effect - one indirection.

One downside to that approach is that we couldn't have separate atom
lists for API_CORE vs. API_COMPAT (while we could using your approach).
(We could skip stippling for core contexts, and skip GS/HS/DS for legacy
contexts, which might help a tiny bit...)

> /* If (INTEL_DEBUG & DEBUG_BATCH) */
> struct {
> diff --git a/src/mesa/drivers/dri/i965/brw_state_upload.c 
> b/src/mesa/drivers/dri/i965/brw_state_upload.c
> index a579781..9a5d49a 100644
> --- a/src/mesa/drivers/dri/i965/brw_state_upload.c
> +++ b/src/mesa/drivers/dri/i965/brw_state_upload.c
> @@ -357,6 +357,11 @@ void brw_init_state( struct brw_context *brw )
> const struct brw_tracked_state **atoms;
> int num_atoms;
>  
> +   STATIC_ASSERT(ARRAY_SIZE(gen4_atoms) <= ARRAY_SIZE(brw->atoms));
> +   STATIC_ASSERT(ARRAY_SIZE(gen6_atoms) <= ARRAY_SIZE(brw->atoms));
> +   STATIC_ASSERT(ARRAY_SIZE(gen7_atoms) <= ARRAY_SIZE(brw->atoms));
> +   STATIC_ASSERT(ARRAY_SIZE(gen8_atoms) <= ARRAY_SIZE(brw->atoms));

I do like the static asserts.

> +
> brw_init_caches(brw);
>  
> if (brw->gen >= 8) {
> @@ -373,9 +378,17 @@ void brw_init_state( struct brw_context *brw )
>num_atoms = ARRAY_SIZE(gen4_atoms);
> }
>  
> -   brw->atoms = atoms;
> brw->num_atoms = num_atoms;
>  
> +   /* This is to work around brw_context::atoms being declared const.  We 
> want
> +* it to be const, but it needs to be initialized somehow!
> +*/
> +   struct brw_tracked_state *context_atoms =
> +  (struct brw_tracked_state *) &brw->atoms[0];

Ugly, but... not sure what else to suggest.

I suppose this patch is fine.  It does offer a small CPU overhead reduction in
the hottest path, for a small memory usage penalty.  I think we can do better,
but I don't know that it makes sense to block your patch over nebulous ideas
for future improvements.  We can do those in the future...

Reviewed-by: Kenneth Graunke 

> +
> +   for (int i = 0; i < num_atoms; i++)
> +  context_atoms[i] = *atoms[i];
> +
> while (num_atoms--) {
>assert((*atoms)->dirty.mesa | (*atoms)->dirty.brw);
>assert((*atoms)->emit);
> @@ -607,7 +620,7 @@ void brw_upload_state(struct brw_context *brw)
>prev = *state;
>  
>for (i = 0; i < brw->num_atoms; i++) {
> -  const struct brw_tracked_state *atom = brw->atoms[i];
> +  const struct brw_tracked_state *atom = &brw->atoms[i];
>struct brw_state_flags generated;
>  
>if (check_state(state, &atom->dirty)) {
> @@ -627,7 +640,7 @@ void brw_upload_state(struct brw_context *brw)
> }
> else {
>for (i = 0; i < brw->num_atoms; i++) {
> -  const struct brw_tracked_state *atom = brw->atoms[i];
> +  const struct brw_tracked_state *atom = &brw->atoms[i];
>  
>if (check_state(state, &atom->dirty)) {
>   atom->emit(brw);

signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/8] i965: Store the atoms directly in the context

2014-12-21 Thread Jason Ekstrand
On Sun, Dec 21, 2014 at 9:11 PM, Kenneth Graunke 
wrote:
>
> On Friday, December 19, 2014 02:20:54 PM Ian Romanick wrote:
> > From: Ian Romanick 
> >
> > Instead of having an extra pointer indirection in one of the hottest
> > loops in the driver.
> >
> > On Bay Trail-D using Fedora 20 compile flags (-m64 -O2 -mtune=generic
> > for 64-bit and -m32 -march=i686 -mtune=atom for 32-bit), affects
> > Gl32Batch7:
> >
> > 32-bit: Difference at 95.0% confidence 1.98515% +/- 0.20814% (n=40)
> > 64-bit: Difference at 95.0% confidence 1.5163% +/- 0.811016% (n=60)
> >
> > Signed-off-by: Ian Romanick 
> > ---
> >  src/mesa/drivers/dri/i965/brw_context.h  |  2 +-
> >  src/mesa/drivers/dri/i965/brw_state_upload.c | 19 ---
> >  2 files changed, 17 insertions(+), 4 deletions(-)
> >
> > diff --git a/src/mesa/drivers/dri/i965/brw_context.h
> b/src/mesa/drivers/dri/i965/brw_context.h
> > index 92eb022..5f5f807 100644
> > --- a/src/mesa/drivers/dri/i965/brw_context.h
> > +++ b/src/mesa/drivers/dri/i965/brw_context.h
> > @@ -1384,7 +1384,7 @@ struct brw_context
> > } perfmon;
> >
> > int num_atoms;
> > -   const struct brw_tracked_state **atoms;
> > +   const struct brw_tracked_state atoms[64];
>
> Making this atoms[57] would save memory, and the static asserts make it
> easy
> to remember to increase the size if we ever add additional atoms.
>
> Honestly, it seems like there should be a better way to do this.
> Currently,
> the atoms list only depends on the generation number, and thus is identical
> for all GL contexts.  Storing an additional 1.5 kB of redundant data in
> each
> GL context seems a bit wasteful.
>
> Would it help to make brw_upload_state cache the array and size?
>
> void
> brw_upload_state(struct brw_context *brw)
> {
>static const struct brw_tracked_state *atoms = brw->atoms;
>static const int num_atoms = brw->num_atoms;
>
>...
>
>for (i = 0; i < num_atoms; i++) {
>   ...use atoms[i]...
>}
>
>...
> }
>
> It seems like that would achieve the same effect - one indirection.
>

Yeah, but then we have deal with threading issues.  It might be ok, but I
think we'd have to be a bit more careful than that.


> One downside to that approach is that we couldn't have separate atom
> lists for API_CORE vs. API_COMPAT (while we could using your approach).
> (We could skip stippling for core contexts, and skip GS/HS/DS for legacy
> contexts, which might help a tiny bit...)
>

Another option would be to make the atoms list a screen thing and do the
copy at screen creation time.  That would allow it to be shared across
contexts without the above threading issues.  If we wanted, we could store
one core version and one compat version and have the context point to
whichever one it needs.  The problem with this is that it's wasteful in the
one-context case where we would have both core an compat lists.  I guess
you can't win them all.

> /* If (INTEL_DEBUG & DEBUG_BATCH) */
> > struct {
> > diff --git a/src/mesa/drivers/dri/i965/brw_state_upload.c
> b/src/mesa/drivers/dri/i965/brw_state_upload.c
> > index a579781..9a5d49a 100644
> > --- a/src/mesa/drivers/dri/i965/brw_state_upload.c
> > +++ b/src/mesa/drivers/dri/i965/brw_state_upload.c
> > @@ -357,6 +357,11 @@ void brw_init_state( struct brw_context *brw )
> > const struct brw_tracked_state **atoms;
> > int num_atoms;
> >
> > +   STATIC_ASSERT(ARRAY_SIZE(gen4_atoms) <= ARRAY_SIZE(brw->atoms));
> > +   STATIC_ASSERT(ARRAY_SIZE(gen6_atoms) <= ARRAY_SIZE(brw->atoms));
> > +   STATIC_ASSERT(ARRAY_SIZE(gen7_atoms) <= ARRAY_SIZE(brw->atoms));
> > +   STATIC_ASSERT(ARRAY_SIZE(gen8_atoms) <= ARRAY_SIZE(brw->atoms));
>
> I do like the static asserts.
>
> > +
> > brw_init_caches(brw);
> >
> > if (brw->gen >= 8) {
> > @@ -373,9 +378,17 @@ void brw_init_state( struct brw_context *brw )
> >num_atoms = ARRAY_SIZE(gen4_atoms);
> > }
> >
> > -   brw->atoms = atoms;
> > brw->num_atoms = num_atoms;
> >
> > +   /* This is to work around brw_context::atoms being declared const.
> We want
> > +* it to be const, but it needs to be initialized somehow!
> > +*/
> > +   struct brw_tracked_state *context_atoms =
> > +  (struct brw_tracked_state *) &brw->atoms[0];
>
> Ugly, but... not sure what else to suggest.
>

I don't like it either, but I did exactly the same thing when I wrote it,
so I can't blame Ian.


>
> I suppose this patch is fine.  It does offer a small CPU overhead
> reduction in
> the hottest path, for a small memory usage penalty.  I think we can do
> better,
> but I don't know that it makes sense to block your patch over nebulous
> ideas
> for future improvements.  We can do those in the future...
>
> Reviewed-by: Kenneth Graunke 
>
> > +
> > +   for (int i = 0; i < num_atoms; i++)
> > +  context_atoms[i] = *atoms[i];
> > +
> > while (num_atoms--) {
> >assert((*atoms)->dirty.mesa | (*atoms)->dirty.brw);
> >assert((*atoms)->emit);
> > @@ -607,7 +620,7 @@ void

Re: [Mesa-dev] [PATCH 4/8] mesa: Only validate shaders that can exist in the context

2014-12-21 Thread Kenneth Graunke
On Friday, December 19, 2014 02:20:55 PM Ian Romanick wrote:
> From: Ian Romanick 
> 
> On Bay Trail-D using Fedora 20 compile flags (-m64 -O2 -mtune=generic
> for 64-bit and -m32 -march=i686 -mtune=atom for 32-bit), affects
> Gl32Batch7:
> 
> 32-bit: Difference at 95.0% confidence 0.495267% +/- 0.202063% (n=40)
> 64-bit: Difference at 95.0% confidence 3.57576% +/- 0.288175% (n=40)
> 
> Signed-off-by: Ian Romanick 

I'm guessing it would help legacy context programs even more.

> ---
>  src/mesa/main/context.c | 78 
> +++--
>  1 file changed, 49 insertions(+), 29 deletions(-)
> 
> diff --git a/src/mesa/main/context.c b/src/mesa/main/context.c
> index 400c158..74f9004 100644
> --- a/src/mesa/main/context.c
> +++ b/src/mesa/main/context.c
> @@ -1903,49 +1903,69 @@ shader_linked_or_absent(struct gl_context *ctx,
>  GLboolean
>  _mesa_valid_to_render(struct gl_context *ctx, const char *where)
>  {
> -   bool from_glsl_shader[MESA_SHADER_COMPUTE] = { false };
> unsigned i;
>  
> /* This depends on having up to date derived state (shaders) */
> if (ctx->NewState)
>_mesa_update_state(ctx);
>  
> -   for (i = 0; i < MESA_SHADER_COMPUTE; i++) {
> -  if (!shader_linked_or_absent(ctx, ctx->_Shader->CurrentProgram[i],
> -   &from_glsl_shader[i], where))
> - return GL_FALSE;
> -   }
> +   if (ctx->API == API_OPENGL_CORE || ctx->API == API_OPENGLES2) {
> +  bool from_glsl_shader[MESA_SHADER_COMPUTE] = { false };
>  
> -   /* Any shader stages that are not supplied by the GLSL shader and have
> -* assembly shaders enabled must now be validated.
> -*/
> -   if (!from_glsl_shader[MESA_SHADER_VERTEX]
> -   && ctx->VertexProgram.Enabled && !ctx->VertexProgram._Enabled) {
> -  _mesa_error(ctx, GL_INVALID_OPERATION,
> -   "%s(vertex program not valid)", where);
> -  return GL_FALSE;
> -   }
> +  for (i = 0; i < MESA_SHADER_COMPUTE; i++) {
> + if (!shader_linked_or_absent(ctx, ctx->_Shader->CurrentProgram[i],
> +  &from_glsl_shader[i], where))
> +return GL_FALSE;
> +  }
>  
> -   /* FINISHME: If GL_NV_geometry_program4 is ever supported, the current
> -* FINISHME: geometry program should validated here.
> -*/
> -   (void) from_glsl_shader[MESA_SHADER_GEOMETRY];
> +  /* In OpenGL Core Profile and OpenGL ES 2.0 / 3.0, there are no 
> assembly
> +   * shaders.  Don't check state related to those.
> +   */

FWIW, I don't think we actually enforce this restriction.  Although we don't
advertise the extension, the API functions don't appear to contain any
API_OPENGL_CORE -> raise GL error code.

We might want to do that.  In the unlikely event that there are applications
attempting to use ARB programs in core profile without checking for the
extension, landing those checks first would make debugging the problem easier:

1. broken app somehow works
2. broken app starts returning INVALID_OPERATION
3. code to support broken applications is removed.

*shrug*.  It probably doesn't actually matter.

> +   } else {
> +  bool has_vertex_shader = false;
> +  bool has_fragment_shader = false;
> +
> +  /* In OpenGL Compatibility Profile, there is only vertex shader and
> +   * fragment shader.  We take this path also for API_OPENGLES because

"there is only vertex shader" sounds a little odd.  Perhaps "there are only
vertex and fragment shaders" or "there are only vertex and fragment shader 
stages"?

Whatever you decide is fine.

> +   * optimizing that path would make the other (more common) paths
> +   * slightly slower.
> +   */
> +  if (!shader_linked_or_absent(ctx,
> +   
> ctx->_Shader->CurrentProgram[MESA_SHADER_VERTEX],
> +   &has_vertex_shader, where))
> + return GL_FALSE;
>  
> -   if (!from_glsl_shader[MESA_SHADER_FRAGMENT]) {
> -  if (ctx->FragmentProgram.Enabled && !ctx->FragmentProgram._Enabled) {
> -  _mesa_error(ctx, GL_INVALID_OPERATION,
> -  "%s(fragment program not valid)", where);
> -  return GL_FALSE;
> -  }
> +  if (!shader_linked_or_absent(ctx,
> +   
> ctx->_Shader->CurrentProgram[MESA_SHADER_FRAGMENT],
> +   &has_fragment_shader, where))
> + return GL_FALSE;
>  
> -  /* If drawing to integer-valued color buffers, there must be an
> -   * active fragment shader (GL_EXT_texture_integer).
> +  /* Any shader stages that are not supplied by the GLSL shader and have
> +   * assembly shaders enabled must now be validated.
> */
> -  if (ctx->DrawBuffer && ctx->DrawBuffer->_IntegerColor) {
> +  if (!has_vertex_shader
> +  && ctx->VertexProgram.Enabled && !ctx->VertexProgram._Enabled) {
>   _mesa_error(ctx, GL_INVALID_OPERATION,
> -   

Re: [Mesa-dev] [PATCH 8/8] mesa: Micro-optimize _mesa_is_valid_prim_mode

2014-12-21 Thread Kenneth Graunke
On Friday, December 19, 2014 02:20:59 PM Ian Romanick wrote:
> From: Ian Romanick 
> 
> You would not believe the mess GCC 4.8.3 generated for the old
> switch-statement.
> 
> On Bay Trail-D using Fedora 20 compile flags (-m64 -O2 -mtune=generic
> for 64-bit and -m32 -march=i686 -mtune=atom for 32-bit), affects
> Gl32Batch7:
> 
> 32-bit: Difference at 95.0% confidence -0.37374% +/- 0.184057% (n=40)
> 64-bit: Difference at 95.0% confidence 0.966722% +/- 0.338442% (n=40)
> 
> The regression on 32-bit is odd.  Callgrind says the caller,
> _mesa_is_valid_prim_mode is faster.  Before it says 2,293,760
> cycles, and after it says 917,504.
> 
> Signed-off-by: Ian Romanick 
> ---
>  src/mesa/main/api_validate.c | 30 --
>  1 file changed, 12 insertions(+), 18 deletions(-)
> 
> diff --git a/src/mesa/main/api_validate.c b/src/mesa/main/api_validate.c
> index b882f0e..9c2e29e 100644
> --- a/src/mesa/main/api_validate.c
> +++ b/src/mesa/main/api_validate.c
> @@ -113,27 +113,21 @@ check_valid_to_render(struct gl_context *ctx, const 
> char *function)
>  bool
>  _mesa_is_valid_prim_mode(struct gl_context *ctx, GLenum mode)
>  {
> -   switch (mode) {
> -   case GL_POINTS:
> -   case GL_LINES:
> -   case GL_LINE_LOOP:
> -   case GL_LINE_STRIP:
> -   case GL_TRIANGLES:
> -   case GL_TRIANGLE_STRIP:
> -   case GL_TRIANGLE_FAN:
> +   /* The overwhelmingly common case is (mode <= GL_TRIANGLE_FAN).  Test that
> +* first and exit.  You would think that a switch-statement would be the
> +* right approach, but at least GCC 4.7.2 generates some pretty dire code

Perhaps update this to 4.8.3 to match your commit message?

> +* for the common case.
> +*/
> +   if (likely(mode <= GL_TRIANGLE_FAN))

I'm not sure if likely() is a good idea.  My understanding is the penalty for
getting it wrong is pretty hefty, and the other modes do happen.

Getting rid of the switch statement makes a lot of sense.  I like the new
approach.

With likely removed, this is:
Reviewed-by: Kenneth Graunke 

>return true;
> -   case GL_QUADS:
> -   case GL_QUAD_STRIP:
> -   case GL_POLYGON:
> +
> +   if (mode <= GL_POLYGON)
>return (ctx->API == API_OPENGL_COMPAT);
> -   case GL_LINES_ADJACENCY:
> -   case GL_LINE_STRIP_ADJACENCY:
> -   case GL_TRIANGLES_ADJACENCY:
> -   case GL_TRIANGLE_STRIP_ADJACENCY:
> +
> +   if (mode <= GL_TRIANGLE_STRIP_ADJACENCY)
>return _mesa_has_geometry_shaders(ctx);
> -   default:
> -  return false;
> -   }
> +
> +   return false;
>  }

signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/8] meta: Put _mesa_meta_in_progress in the header file

2014-12-21 Thread Kenneth Graunke
On Friday, December 19, 2014 02:20:52 PM Ian Romanick wrote:
> From: Ian Romanick 
> 
> ...so that it can be inlined in the two places that call it.
> 
> On Bay Trail-D using Fedora 20 compile flags (-m64 -O2 -mtune=generic
> for 64-bit and -m32 -march=i686 -mtune=atom for 32-bit), affects
> Gl32Batch7:
> 
> 32-bit: No difference proven at 95.0% confidence (n=120)
> 64-bit: Difference at 95.0% confidence 1.24042% +/- 0.382277% (n=40)
> 
> Signed-off-by: Ian Romanick 
> ---
>  src/mesa/drivers/common/meta.c | 10 --
>  src/mesa/drivers/common/meta.h |  7 +--
>  2 files changed, 5 insertions(+), 12 deletions(-)

I sent comments on patches 2, 3, 4, and 8.

The series is:
Reviewed-by: Kenneth Graunke 

signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/8] i965: Store the atoms directly in the context

2014-12-21 Thread Kenneth Graunke
On Sunday, December 21, 2014 09:19:53 PM Jason Ekstrand wrote:
> On Sun, Dec 21, 2014 at 9:11 PM, Kenneth Graunke 
> wrote:
> >
> > On Friday, December 19, 2014 02:20:54 PM Ian Romanick wrote:
> > > From: Ian Romanick 
> > >
> > > Instead of having an extra pointer indirection in one of the hottest
> > > loops in the driver.
> > >
> > > On Bay Trail-D using Fedora 20 compile flags (-m64 -O2 -mtune=generic
> > > for 64-bit and -m32 -march=i686 -mtune=atom for 32-bit), affects
> > > Gl32Batch7:
> > >
> > > 32-bit: Difference at 95.0% confidence 1.98515% +/- 0.20814% (n=40)
> > > 64-bit: Difference at 95.0% confidence 1.5163% +/- 0.811016% (n=60)
> > >
> > > Signed-off-by: Ian Romanick 
> > > ---
> > >  src/mesa/drivers/dri/i965/brw_context.h  |  2 +-
> > >  src/mesa/drivers/dri/i965/brw_state_upload.c | 19 ---
> > >  2 files changed, 17 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/src/mesa/drivers/dri/i965/brw_context.h
> > b/src/mesa/drivers/dri/i965/brw_context.h
> > > index 92eb022..5f5f807 100644
> > > --- a/src/mesa/drivers/dri/i965/brw_context.h
> > > +++ b/src/mesa/drivers/dri/i965/brw_context.h
> > > @@ -1384,7 +1384,7 @@ struct brw_context
> > > } perfmon;
> > >
> > > int num_atoms;
> > > -   const struct brw_tracked_state **atoms;
> > > +   const struct brw_tracked_state atoms[64];
> >
> > Making this atoms[57] would save memory, and the static asserts make it
> > easy
> > to remember to increase the size if we ever add additional atoms.
> >
> > Honestly, it seems like there should be a better way to do this.
> > Currently,
> > the atoms list only depends on the generation number, and thus is identical
> > for all GL contexts.  Storing an additional 1.5 kB of redundant data in
> > each
> > GL context seems a bit wasteful.
> >
> > Would it help to make brw_upload_state cache the array and size?
> >
> > void
> > brw_upload_state(struct brw_context *brw)
> > {
> >static const struct brw_tracked_state *atoms = brw->atoms;
> >static const int num_atoms = brw->num_atoms;
> >
> >...
> >
> >for (i = 0; i < num_atoms; i++) {
> >   ...use atoms[i]...
> >}
> >
> >...
> > }
> >
> > It seems like that would achieve the same effect - one indirection.
> >
> 
> Yeah, but then we have deal with threading issues.  It might be ok, but I
> think we'd have to be a bit more careful than that.

Should be doable.  But yeah, I guess it is kind of hacky.

> > One downside to that approach is that we couldn't have separate atom
> > lists for API_CORE vs. API_COMPAT (while we could using your approach).
> > (We could skip stippling for core contexts, and skip GS/HS/DS for legacy
> > contexts, which might help a tiny bit...)
> >
> 
> Another option would be to make the atoms list a screen thing and do the
> copy at screen creation time.  That would allow it to be shared across
> contexts without the above threading issues.  If we wanted, we could store
> one core version and one compat version and have the context point to
> whichever one it needs.  The problem with this is that it's wasteful in the
> one-context case where we would have both core an compat lists.  I guess
> you can't win them all.

Ah, store the actual atoms, like Ian's doing...but in the screen.

That's not a crazy plan either.

signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev