Re: [Mesa-dev] [PATCH 09/12] R600/SI: replace AllReg_* with [SV]Src_*

2013-02-13 Thread Christian König

Am 12.02.2013 21:49, schrieb Michel Dänzer:

On Die, 2013-02-12 at 18:13 +0100, Christian König wrote:

From: Christian König 

Mark all the operands that can also have an immediate.

Signed-off-by: Christian König 
---
  lib/Target/R600/SIInstrFormats.td |   32 +-
  lib/Target/R600/SIInstructions.td |  128 ++---
  lib/Target/R600/SIRegisterInfo.td |   10 ++-
  3 files changed, 87 insertions(+), 83 deletions(-)

diff --git a/lib/Target/R600/SIInstrFormats.td 
b/lib/Target/R600/SIInstrFormats.td
index a7a3558..473d3dc 100644
--- a/lib/Target/R600/SIInstrFormats.td
+++ b/lib/Target/R600/SIInstrFormats.td
@@ -22,25 +22,25 @@
  
//===--===//
  
  class VOP3_32  op, string opName, list pattern>

-  : VOP3 ;
+  : VOP3 ;

Note that I think all source operands can use inline constants, even
several different ones. Also, the same SGPR can be used for several
source operands. The only limitation is that at most one actual SGPR or
literal constant can be used by the VALU. Not sure how to model the
latter more accurately, but I think the former might be relatively
straightforward.


Yeah agree. My idea is that we handle the constraints (only one literal 
or SGPR in a VOP* encoding) after the initial selection, e.g. change 
VOP3 to have three VSrc_* operands and in a post selection pass pull in 
the inline constants and handle multiple SGPR/literals.



BTW, please rebase this series on top of the output modifier fix I
pushed to the LLVM trunk, it added a couple more uses of SREG_LIT_0,
which I'm not sure how to properly merge with your changes.


Which repository is that? I'm currently still working on Toms 
(git://people.freedesktop.org/~tstellar/llvm) master branch.



P.S. I noticed this warning that isn't there without your series:

/lib/Target/R600/MCTargetDesc/SIMCCodeEmitter.cpp: In member function ‘virtual void 
{anonymous}::SIMCCodeEmitter::EncodeInstruction(const llvm::MCInst&, llvm::raw_ostream&, 
llvm::SmallVectorImpl&) const’:
/lib/Target/R600/MCTargetDesc/SIMCCodeEmitter.cpp:174:53: warning: 
‘Imm.{anonymous}::IntFloatUnion::F’ may be used uninitialized in this function 
[-Wmaybe-uninitialized]

I guess it can't tell that one of the union members will always be
initialized. Maybe convert

 else if (Op.isFPImm())
   Imm.F = Op.getFPImm();

to something like

 else {
   assert(Op.isFPImm());
   Imm.F = Op.getFPImm();
 }



Thx, missed that, going to fix it.

Christian.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 07/12] R600/SI: move *_Helper definitions to SIInstrFormat.td

2013-02-13 Thread Christian König

Am 13.02.2013 01:20, schrieb Tom Stellard:

On Tue, Feb 12, 2013 at 06:13:19PM +0100, Christian König wrote:

From: Christian König 


SIInstrFormats.td should contain the instruction encoding definitions
and everything else should go in SIInstrInfo.td.  I got this backwards,
when I first created these files, so really these helpers and everything
but the encoding definitions should be in SIInstrInfo.td


Already noticed that there is something mixed up here, but wasn't 100% 
sure to make a patch.


Well, I would suggest that we apply this one anyway and then just switch 
SIInstrInfo.td and SIInstrFormat.td.


Christian.



-Tom


Signed-off-by: Christian König 
---
  lib/Target/R600/SIInstrFormats.td |   66 +
  lib/Target/R600/SIInstrInfo.td|   66 -
  2 files changed, 66 insertions(+), 66 deletions(-)

diff --git a/lib/Target/R600/SIInstrFormats.td 
b/lib/Target/R600/SIInstrFormats.td
index aea3b5a..7040115 100644
--- a/lib/Target/R600/SIInstrFormats.td
+++ b/lib/Target/R600/SIInstrFormats.td
@@ -144,3 +144,69 @@ class SOPC_32  op, string opName, list 
pattern>
  class SOPC_64  op, string opName, list pattern>
: SOPC ;
  
+class MIMG_Load_Helper  op, string asm> : MIMG <

+  op,
+  (outs VReg_128:$vdata),
+  (ins i32imm:$dmask, i1imm:$unorm, i1imm:$glc, i1imm:$da, i1imm:$r128,
+   i1imm:$tfe, i1imm:$lwe, i1imm:$slc, VReg_32:$vaddr,
+   GPR4Align:$srsrc, GPR4Align:$ssamp),
+  asm,
+  []> {
+  let mayLoad = 1;
+  let mayStore = 0;
+}
+
+class MTBUF_Store_Helper  op, string asm, RegisterClass regClass> : MTBUF 
<
+  op,
+  (outs),
+  (ins regClass:$vdata, i16imm:$offset, i1imm:$offen, i1imm:$idxen, i1imm:$glc,
+   i1imm:$addr64, i8imm:$dfmt, i8imm:$nfmt, VReg_32:$vaddr,
+   GPR4Align:$srsrc, i1imm:$slc, i1imm:$tfe, SReg_32:$soffset),
+  asm,
+  []> {
+  let mayStore = 1;
+  let mayLoad = 0;
+}
+
+class MUBUF_Load_Helper  op, string asm, RegisterClass regClass> : MUBUF 
<
+  op,
+  (outs regClass:$dst),
+  (ins i16imm:$offset, i1imm:$offen, i1imm:$idxen, i1imm:$glc, i1imm:$addr64,
+   i1imm:$lds, VReg_32:$vaddr, GPR4Align:$srsrc, i1imm:$slc,
+   i1imm:$tfe, SReg_32:$soffset),
+  asm,
+  []> {
+  let mayLoad = 1;
+  let mayStore = 0;
+}
+
+class MTBUF_Load_Helper  op, string asm, RegisterClass regClass> : MTBUF 
<
+  op,
+  (outs regClass:$dst),
+  (ins i16imm:$offset, i1imm:$offen, i1imm:$idxen, i1imm:$glc, i1imm:$addr64,
+   i8imm:$dfmt, i8imm:$nfmt, VReg_32:$vaddr, GPR4Align:$srsrc,
+   i1imm:$slc, i1imm:$tfe, SReg_32:$soffset),
+  asm,
+  []> {
+  let mayLoad = 1;
+  let mayStore = 0;
+}
+
+multiclass SMRD_Helper  op, string asm, RegisterClass dstClass> {
+  def _IMM : SMRD <
+ op, 1,
+ (outs dstClass:$dst),
+ (ins GPR2Align:$sbase, i32imm:$offset),
+ asm,
+ []
+  >;
+
+  def _SGPR : SMRD <
+  op, 0,
+  (outs dstClass:$dst),
+  (ins GPR2Align:$sbase, SReg_32:$soff),
+  asm,
+  []
+  >;
+}
+
diff --git a/lib/Target/R600/SIInstrInfo.td b/lib/Target/R600/SIInstrInfo.td
index b983e8a..aa156f3 100644
--- a/lib/Target/R600/SIInstrInfo.td
+++ b/lib/Target/R600/SIInstrInfo.td
@@ -484,71 +484,5 @@ class VOPC  op, dag ins, string asm, list 
pattern> :
  
  } // End Uses = [EXEC]
  
-class MIMG_Load_Helper  op, string asm> : MIMG <

-  op,
-  (outs VReg_128:$vdata),
-  (ins i32imm:$dmask, i1imm:$unorm, i1imm:$glc, i1imm:$da, i1imm:$r128,
-   i1imm:$tfe, i1imm:$lwe, i1imm:$slc, VReg_32:$vaddr,
-   GPR4Align:$srsrc, GPR4Align:$ssamp),
-  asm,
-  []> {
-  let mayLoad = 1;
-  let mayStore = 0;
-}
-
-class MUBUF_Load_Helper  op, string asm, RegisterClass regClass> : MUBUF 
<
-  op,
-  (outs regClass:$dst),
-  (ins i16imm:$offset, i1imm:$offen, i1imm:$idxen, i1imm:$glc, i1imm:$addr64,
-   i1imm:$lds, VReg_32:$vaddr, GPR4Align:$srsrc, i1imm:$slc,
-   i1imm:$tfe, SReg_32:$soffset),
-  asm,
-  []> {
-  let mayLoad = 1;
-  let mayStore = 0;
-}
-
-class MTBUF_Load_Helper  op, string asm, RegisterClass regClass> : MTBUF 
<
-  op,
-  (outs regClass:$dst),
-  (ins i16imm:$offset, i1imm:$offen, i1imm:$idxen, i1imm:$glc, i1imm:$addr64,
-   i8imm:$dfmt, i8imm:$nfmt, VReg_32:$vaddr, GPR4Align:$srsrc,
-   i1imm:$slc, i1imm:$tfe, SReg_32:$soffset),
-  asm,
-  []> {
-  let mayLoad = 1;
-  let mayStore = 0;
-}
-
-class MTBUF_Store_Helper  op, string asm, RegisterClass regClass> : MTBUF 
<
-  op,
-  (outs),
-  (ins regClass:$vdata, i16imm:$offset, i1imm:$offen, i1imm:$idxen, i1imm:$glc,
-   i1imm:$addr64, i8imm:$dfmt, i8imm:$nfmt, VReg_32:$vaddr,
-   GPR4Align:$srsrc, i1imm:$slc, i1imm:$tfe, SReg_32:$soffset),
-  asm,
-  []> {
-  let mayStore = 1;
-  let mayLoad = 0;
-}
-
-multiclass SMRD_Helper  op, string asm, RegisterClass dstClass> {
-  def _IMM : SMRD <
- op, 1,
- (outs dstClass:$dst),
- (ins GPR2Align:$sbase, i32imm:$offset),
- asm,
-

Re: [Mesa-dev] [PATCH 09/12] R600/SI: replace AllReg_* with [SV]Src_*

2013-02-13 Thread Michel Dänzer
On Mit, 2013-02-13 at 10:16 +0100, Christian König wrote: 
> Am 12.02.2013 21:49, schrieb Michel Dänzer:
> > On Die, 2013-02-12 at 18:13 +0100, Christian König wrote:
> >> From: Christian König 
> >>
> >> Mark all the operands that can also have an immediate.
> >>
> >> Signed-off-by: Christian König 
> >> ---
> >>   lib/Target/R600/SIInstrFormats.td |   32 +-
> >>   lib/Target/R600/SIInstructions.td |  128 
> >> ++---
> >>   lib/Target/R600/SIRegisterInfo.td |   10 ++-
> >>   3 files changed, 87 insertions(+), 83 deletions(-)
> >>
> >> diff --git a/lib/Target/R600/SIInstrFormats.td 
> >> b/lib/Target/R600/SIInstrFormats.td
> >> index a7a3558..473d3dc 100644
> >> --- a/lib/Target/R600/SIInstrFormats.td
> >> +++ b/lib/Target/R600/SIInstrFormats.td
> >> @@ -22,25 +22,25 @@
> >>   
> >> //===--===//
> >>   
> >>   class VOP3_32  op, string opName, list pattern>
> >> -  : VOP3  >> VReg_32:$src2, i32imm:$src3, i32imm:$src4, i32imm:$src5, i32imm:$src6), 
> >> opName, pattern>;
> >> +  : VOP3  >> VReg_32:$src2, i32imm:$src3, i32imm:$src4, i32imm:$src5, i32imm:$src6), 
> >> opName, pattern>;
> > Note that I think all source operands can use inline constants, even
> > several different ones. Also, the same SGPR can be used for several
> > source operands. The only limitation is that at most one actual SGPR or
> > literal constant can be used by the VALU. Not sure how to model the
> > latter more accurately, but I think the former might be relatively
> > straightforward.
> 
> Yeah agree. My idea is that we handle the constraints (only one literal 
> or SGPR in a VOP* encoding) after the initial selection, e.g. change 
> VOP3 to have three VSrc_* operands and in a post selection pass pull in 
> the inline constants and handle multiple SGPR/literals.

Sounds good.


> > BTW, please rebase this series on top of the output modifier fix I
> > pushed to the LLVM trunk, it added a couple more uses of SREG_LIT_0,
> > which I'm not sure how to properly merge with your changes.
> 
> Which repository is that? I'm currently still working on Toms 
> (git://people.freedesktop.org/~tstellar/llvm) master branch.

The main LLVM repository, http://llvm.org/git/llvm.git /
https://llvm.org/svn/llvm-project/llvm/trunk . See Tom's post about
stabilizing his tree from January 31st.


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast |  Debian, X and DRI developer
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 10/12] R600/SI: cleanup literal handling

2013-02-13 Thread Christian König

Am 13.02.2013 01:39, schrieb Tom Stellard:

[SNIP]

Way back when I first started working on the backend I was using
immediate operands in instructions defined to only uses registers, and
it worked most of the time, but I ran into a few cases where some of the
passes weren't able to handle it.  So, I'm actually kind of surprised that this
works, but it's a nice simplification in any case.


I stumbled over a couple of problems while coding it, but they were not 
related to the fact that I tried to put immediate values into operands 
that should only have registers. It was more that it surfaced a couple 
of other problems, for example we shouldn't use f64 as supported type in 
a RegisterClass and then forget to set a default class for f64 with 
"addRegisterClass", that can lead to a whole bunch of problems, 
including segfaults.


Actually when you look into the MCOperandInfo class the only difference 
between an immediate operand and an register operand is that registers 
have a RegClass. Apart from that this information is only available at 
pattern matching time, and there it doesn't seems to matter.


Christian.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH V3 04/19] glsl: add support for ARB_texture_multisample

2013-02-13 Thread Chris Forbes
Ouch. Thanks for catching that -- are there any other similar things I
need to be aware of?

On Wed, Feb 13, 2013 at 8:19 PM, Kenneth Graunke  wrote:
> On 02/08/2013 01:19 AM, Chris Forbes wrote:
>>
>> V2: - emit `sample` parameter properly for multisample texelFetch()
>>  - fix spurious whitespace change
>>  - introduce a new opcode ir_txf_ms rather than overloading the
>>existing ir_txf further. This makes doing the right thing in
>>the driver somewhat simpler.
>>
>> V3: - fix weird whitespace
>>
>> Signed-off-by: Chris Forbes 
>> Reviewed-by: Eric Anholt 
>> Reviewed-by: Paul Berry 
>
>
> Hey Chris,
>
> This patch needs to update ir.cpp's tex_opcode_strs[].  Otherwise,
> MESA_GLSL=dump segfaults when trying to print the texturing opcode.
>
> Thanks again for your great work!
>
> --Ken
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 10/12] R600/SI: cleanup literal handling

2013-02-13 Thread Christian König

Am 13.02.2013 08:00, schrieb Michel Dänzer:

On Die, 2013-02-12 at 19:39 -0500, Tom Stellard wrote:

On Tue, Feb 12, 2013 at 06:13:22PM +0100, Christian König wrote:

From: Christian König 

Seems to be allot simpler, and also paves the
way for further improvements.

[...]


diff --git a/lib/Target/R600/SIInstructions.td 
b/lib/Target/R600/SIInstructions.td
index 63ef9ef..e68bbe7 100644
--- a/lib/Target/R600/SIInstructions.td
+++ b/lib/Target/R600/SIInstructions.td
[...]
@@ -1180,7 +1141,7 @@ def : Pat<
  VReg_32:$buf_idx_vgpr),
(BUFFER_LOAD_FORMAT_XYZW imm:$attr_offset, 0, 1, 0, 0, 0,
 VReg_32:$buf_idx_vgpr, SReg_128:$tlst,
-   0, 0, (i32 SREG_LIT_0))
+   0, 0, 0x80)

What does this 0x80 mean?

It's the encoding of inline constant 0, i.e. the same as SREG_LIT_0 was.


Actually I wanted to clean that up before sending out the patches and 
correctly use SSrc_32 for SOFFSET, but somehow forgot about it.


Going to clean that up for the second version.


diff --git a/lib/Target/R600/SILowerControlFlow.cpp 
b/lib/Target/R600/SILowerControlFlow.cpp
index 3780e40..051f460 100644
--- a/lib/Target/R600/SILowerControlFlow.cpp
+++ b/lib/Target/R600/SILowerControlFlow.cpp
@@ -158,10 +158,10 @@ void SILowerControlFlowPass::SkipIfDead(MachineInstr &MI) 
{
.addImm(0)
.addImm(1)
.addImm(1)
-  .addReg(AMDGPU::SREG_LIT_0)
-  .addReg(AMDGPU::SREG_LIT_0)
-  .addReg(AMDGPU::SREG_LIT_0)
-  .addReg(AMDGPU::SREG_LIT_0);
+  .addImm(0)
+  .addImm(0)
+  .addImm(0)
+  .addImm(0);

Because the EXP definition declares these operands as VReg_32, not
VSrc_32, this results in encoding 0 directly, i.e. VGPR0. It doesn't
really matter here for exporting to the NULL target, but then it might
be better to make that explicit by using an undefined value.


SI doesn't support inline literals in the EXP command (in opposition to 
R600), so using SREG_LIT_0 here was completely wrong in the first place 
and actually tried to export VGPR 128.


But as you mentioned it doesn't really matter cause we are exporting to 
the NULL target anyway. Going to change that to use VGPR0 here instead, 
that should make clear what's going on here.


Christian.


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2] gles2: a stub implementation for GL_EXT_discard_framebuffer

2013-02-13 Thread Tapani Pälli
This patch implements a stub for GL_EXT_discard_framebuffer with
required checks listed by the extension specification. This extension
is required by GLBenchmark 2.5 when compiled with OpenGL ES 2.0
as the rendering backend.

Signed-off-by: Tapani Pälli 
---
 src/mapi/glapi/gen/es_EXT.xml   | 13 
 src/mesa/drivers/common/driverfuncs.c   |  1 +
 src/mesa/main/dd.h  |  4 ++-
 src/mesa/main/extensions.c  |  1 +
 src/mesa/main/fbobject.c| 54 +
 src/mesa/main/fbobject.h|  4 +++
 src/mesa/main/tests/dispatch_sanity.cpp |  2 ++
 7 files changed, 78 insertions(+), 1 deletion(-)

diff --git a/src/mapi/glapi/gen/es_EXT.xml b/src/mapi/glapi/gen/es_EXT.xml
index 0f6746d..103c93a 100644
--- a/src/mapi/glapi/gen/es_EXT.xml
+++ b/src/mapi/glapi/gen/es_EXT.xml
@@ -663,6 +663,19 @@
 
 
 
+
+
+
+
+
+
+
+
+
+
+
+
+
 
 
 
diff --git a/src/mesa/drivers/common/driverfuncs.c 
b/src/mesa/drivers/common/driverfuncs.c
index aab61e1..43c9de9 100644
--- a/src/mesa/drivers/common/driverfuncs.c
+++ b/src/mesa/drivers/common/driverfuncs.c
@@ -174,6 +174,7 @@ _mesa_init_driver_functions(struct dd_function_table 
*driver)
driver->ValidateFramebuffer = _mesa_validate_framebuffer;
 
driver->BlitFramebuffer = _swrast_BlitFramebuffer;
+   driver->DiscardFramebuffer = NULL;
 
_mesa_init_texture_barrier_functions(driver);
 
diff --git a/src/mesa/main/dd.h b/src/mesa/main/dd.h
index 9a75fd9..9c818cc 100644
--- a/src/mesa/main/dd.h
+++ b/src/mesa/main/dd.h
@@ -604,7 +604,7 @@ struct dd_function_table {
/*@}*/
 
/**
-* \name Functions for GL_EXT_framebuffer_{object,blit}.
+* \name Functions for GL_EXT_framebuffer_{object,blit,discard}.
 */
/*@{*/
struct gl_framebuffer * (*NewFramebuffer)(struct gl_context *ctx, GLuint 
name);
@@ -628,6 +628,8 @@ struct dd_function_table {
GLint srcX0, GLint srcY0, GLint srcX1, GLint srcY1,
GLint dstX0, GLint dstY0, GLint dstX1, GLint dstY1,
GLbitfield mask, GLenum filter);
+   void (*DiscardFramebuffer)(struct gl_context *ctx,
+  GLenum target, GLsizei numAttachments, const 
GLenum *attachments);
 
/**
 * \name Query objects
diff --git a/src/mesa/main/extensions.c b/src/mesa/main/extensions.c
index 04435e0..2112a06 100644
--- a/src/mesa/main/extensions.c
+++ b/src/mesa/main/extensions.c
@@ -168,6 +168,7 @@ static const struct extension extension_table[] = {
{ "GL_EXT_blend_color", o(EXT_blend_color), 
GLL,1995 },
{ "GL_EXT_blend_equation_separate", 
o(EXT_blend_equation_separate), GL, 2003 },
{ "GL_EXT_blend_func_separate", o(EXT_blend_func_separate), 
GLL,1999 },
+   { "GL_EXT_discard_framebuffer", o(dummy_true),  
  ES1 | ES2, 2009 },
{ "GL_EXT_blend_minmax",o(EXT_blend_minmax),
GLL | ES1 | ES2, 1995 },
{ "GL_EXT_blend_subtract",  o(dummy_true),  
GLL,1995 },
{ "GL_EXT_clip_volume_hint",o(EXT_clip_volume_hint),
GL, 1996 },
diff --git a/src/mesa/main/fbobject.c b/src/mesa/main/fbobject.c
index 257f839..9acf3f5 100644
--- a/src/mesa/main/fbobject.c
+++ b/src/mesa/main/fbobject.c
@@ -3310,3 +3310,57 @@ _mesa_InvalidateFramebuffer(GLenum target, GLsizei 
numAttachments,
   0, 0, MAX_VIEWPORT_WIDTH, 
MAX_VIEWPORT_HEIGHT,
   "glInvalidateFramebuffer");
 }
+
+void GLAPIENTRY
+_mesa_DiscardFramebufferEXT(GLenum target, GLsizei numAttachments,
+const GLenum *attachments)
+{
+   struct gl_framebuffer *fb;
+   GLint i;
+
+   GET_CURRENT_CONTEXT(ctx);
+
+   fb = get_framebuffer_target(ctx, target);
+   if (!fb) {
+  _mesa_error(ctx, GL_INVALID_ENUM,
+ "glDiscardFramebufferEXT(target %s)",
+ _mesa_lookup_enum_by_nr(target));
+  return;
+   }
+
+   if (numAttachments < 0) {
+  _mesa_error(ctx, GL_INVALID_VALUE,
+  "glDiscardFramebufferEXT(numAttachments < 0)");
+  return;
+   }
+
+   for(i = 0; i < numAttachments; i++) {
+
+  switch (attachments[i]) {
+  case GL_COLOR:
+  case GL_DEPTH:
+  case GL_STENCIL:
+ if (_mesa_is_user_fbo(fb))
+goto invalid_enum;
+ break;
+  case GL_COLOR_ATTACHMENT0:
+  case GL_DEPTH_ATTACHMENT:
+  case GL_STENCIL_ATTACHMENT:
+ if (_mesa_is_winsys_fbo(fb))
+goto invalid_enum;
+ break;
+  default:
+ goto invalid_enum;
+  }
+   }
+
+   if (ctx->Driver.DiscardFramebuffer)
+  ctx->Drive

Re: [Mesa-dev] [PATCH] R600: Do not fold single instruction with more that 3 kcache read

2013-02-13 Thread Tom Stellard
On Tue, Feb 12, 2013 at 10:45:27PM +0100, Vincent Lejeune wrote:
> It fixes around 100 tfb piglit tests and 16 glean tests.
> 
> NOTE: This is a candidate for the Mesa stable branch.

Reviewed-by: Tom Stellard 

> ---
>  lib/Target/R600/AMDILISelDAGToDAG.cpp  |  2 ++
>  lib/Target/R600/R600LowerConstCopy.cpp |  2 +-
>  test/CodeGen/R600/kcache-fold.ll   | 52 
> ++
>  3 files changed, 55 insertions(+), 1 deletion(-)
>  create mode 100644 test/CodeGen/R600/kcache-fold.ll
> 
> diff --git a/lib/Target/R600/AMDILISelDAGToDAG.cpp 
> b/lib/Target/R600/AMDILISelDAGToDAG.cpp
> index 2f34fe3..858eb5d 100644
> --- a/lib/Target/R600/AMDILISelDAGToDAG.cpp
> +++ b/lib/Target/R600/AMDILISelDAGToDAG.cpp
> @@ -358,6 +358,8 @@ bool AMDGPUDAGToDAGISel::FoldOperands(unsigned Opcode,
>  SDValue Operand = Ops[OperandIdx[i] - 1];
>  switch (Operand.getOpcode()) {
>  case AMDGPUISD::CONST_ADDRESS: {
> +  if (i == 2)
> +break;
>SDValue CstOffset;
>if (!Operand.getValueType().isVector() &&
>SelectGlobalValueConstantOffset(Operand.getOperand(0), CstOffset)) 
> {
> diff --git a/lib/Target/R600/R600LowerConstCopy.cpp 
> b/lib/Target/R600/R600LowerConstCopy.cpp
> index 2557e8f..c8c27a8 100644
> --- a/lib/Target/R600/R600LowerConstCopy.cpp
> +++ b/lib/Target/R600/R600LowerConstCopy.cpp
> @@ -180,7 +180,7 @@ bool 
> R600LowerConstCopy::runOnMachineFunction(MachineFunction &MF) {
>  int ConstMovSel =
>  TII->getOperandIdx(CstMov->getOpcode(), 
> R600Operands::SRC0_SEL);
>  unsigned ConstIndex = CstMov->getOperand(ConstMovSel).getImm();
> -if (canFoldInBundle(CP, ConstIndex)) {
> +if (MI->isInsideBundle() && canFoldInBundle(CP, ConstIndex)) {
>TII->setImmOperand(MI, OpTable[SrcOp][1], ConstIndex);
>MI->getOperand(SrcIdx).setReg(AMDGPU::ALU_CONST);
>  } else {
> diff --git a/test/CodeGen/R600/kcache-fold.ll 
> b/test/CodeGen/R600/kcache-fold.ll
> new file mode 100644
> index 000..382f78c
> --- /dev/null
> +++ b/test/CodeGen/R600/kcache-fold.ll
> @@ -0,0 +1,52 @@
> +;RUN: llc < %s -march=r600 -mcpu=redwood | FileCheck %s
> +
> +; CHECK: MOV T{{[0-9]+\.[XYZW], CBuf0\[[0-9]+\]\.[XYZW]}}
> +
> +define void @main() {
> +main_body:
> +  %0 = load <4 x float> addrspace(9)* null
> +  %1 = extractelement <4 x float> %0, i32 0
> +  %2 = load <4 x float> addrspace(9)* getelementptr ([1024 x <4 x float>] 
> addrspace(9)* null, i64 0, i32 1)
> +  %3 = extractelement <4 x float> %2, i32 0
> +  %4 = load <4 x float> addrspace(9)* getelementptr ([1024 x <4 x float>] 
> addrspace(9)* null, i64 0, i32 2)
> +  %5 = extractelement <4 x float> %4, i32 0
> +  %6 = fcmp ult float %1, 0.00e+00
> +  %7 = select i1 %6, float %3, float %5
> +  %8 = load <4 x float> addrspace(9)* null
> +  %9 = extractelement <4 x float> %8, i32 1
> +  %10 = load <4 x float> addrspace(9)* getelementptr ([1024 x <4 x float>] 
> addrspace(9)* null, i64 0, i32 1)
> +  %11 = extractelement <4 x float> %10, i32 1
> +  %12 = load <4 x float> addrspace(9)* getelementptr ([1024 x <4 x float>] 
> addrspace(9)* null, i64 0, i32 2)
> +  %13 = extractelement <4 x float> %12, i32 1
> +  %14 = fcmp ult float %9, 0.00e+00
> +  %15 = select i1 %14, float %11, float %13
> +  %16 = load <4 x float> addrspace(9)* null
> +  %17 = extractelement <4 x float> %16, i32 2
> +  %18 = load <4 x float> addrspace(9)* getelementptr ([1024 x <4 x float>] 
> addrspace(9)* null, i64 0, i32 1)
> +  %19 = extractelement <4 x float> %18, i32 2
> +  %20 = load <4 x float> addrspace(9)* getelementptr ([1024 x <4 x float>] 
> addrspace(9)* null, i64 0, i32 2)
> +  %21 = extractelement <4 x float> %20, i32 2
> +  %22 = fcmp ult float %17, 0.00e+00
> +  %23 = select i1 %22, float %19, float %21
> +  %24 = load <4 x float> addrspace(9)* null
> +  %25 = extractelement <4 x float> %24, i32 3
> +  %26 = load <4 x float> addrspace(9)* getelementptr ([1024 x <4 x float>] 
> addrspace(9)* null, i64 0, i32 1)
> +  %27 = extractelement <4 x float> %26, i32 3
> +  %28 = load <4 x float> addrspace(9)* getelementptr ([1024 x <4 x float>] 
> addrspace(9)* null, i64 0, i32 2)
> +  %29 = extractelement <4 x float> %28, i32 3
> +  %30 = fcmp ult float %25, 0.00e+00
> +  %31 = select i1 %30, float %27, float %29
> +  %32 = call float @llvm.AMDIL.clamp.(float %7, float 0.00e+00, float 
> 1.00e+00)
> +  %33 = call float @llvm.AMDIL.clamp.(float %15, float 0.00e+00, float 
> 1.00e+00)
> +  %34 = call float @llvm.AMDIL.clamp.(float %23, float 0.00e+00, float 
> 1.00e+00)
> +  %35 = call float @llvm.AMDIL.clamp.(float %31, float 0.00e+00, float 
> 1.00e+00)
> +  %36 = insertelement <4 x float> undef, float %32, i32 0
> +  %37 = insertelement <4 x float> %36, float %33, i32 1
> +  %38 = insertelement <4 x float> %37, float %34, i32 2
> +  %39 = insertelement <4 x float> %38, float %35, i32 3
> +  call vo

Re: [Mesa-dev] [PATCH 1/2] st/mesa: fix texture buffer objects

2013-02-13 Thread Brian Paul

On 02/12/2013 06:11 PM, Marek Olšák wrote:

Broken by 624528834f53f54c7a934f929769b7e6b230a0b1.
---
  src/mesa/state_tracker/st_atom_texture.c |   14 ++
  1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/src/mesa/state_tracker/st_atom_texture.c 
b/src/mesa/state_tracker/st_atom_texture.c
index 28327bc..fc2d690 100644
--- a/src/mesa/state_tracker/st_atom_texture.c
+++ b/src/mesa/state_tracker/st_atom_texture.c
@@ -234,11 +234,17 @@ update_single_texture(struct st_context *st,
 }

 /* Determine the format of the texture sampler view */
-   view_format = stObj->pt->format;
+   if (texObj->Target == GL_TEXTURE_BUFFER) {
+  view_format =
+ st_mesa_format_to_pipe_format(stObj->base._BufferObjectFormat);
+   }
+   else {
+  view_format = stObj->pt->format;


For buffer objects, isn't the stObj->pt->format field set like an 
ordinary texture?




-   /* If sRGB decoding is off, use the linear format */
-   if (samp->sRGBDecode == GL_SKIP_DECODE_EXT) {
-  view_format = util_format_linear(view_format);
+  /* If sRGB decoding is off, use the linear format */
+  if (samp->sRGBDecode == GL_SKIP_DECODE_EXT) {
+ view_format = util_format_linear(view_format);
+  }
 }

 /* if sampler view has changed dereference it */


Thanks for fixing this.

Reviewed-by: Brian Paul 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] st/mesa: try to find exact format matching user format and type for DrawPixels

2013-02-13 Thread Brian Paul

On 02/12/2013 06:11 PM, Marek Olšák wrote:

---
  src/mesa/state_tracker/st_cb_drawpixels.c |   28 +++
  src/mesa/state_tracker/st_cb_texture.c|   30 ++---
  src/mesa/state_tracker/st_format.c|   35 +
  src/mesa/state_tracker/st_format.h|3 +++
  4 files changed, 59 insertions(+), 37 deletions(-)



Reviewed-by: Brian Paul 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] gles2: a stub implementation for GL_EXT_discard_framebuffer

2013-02-13 Thread Ian Romanick

On 02/12/2013 11:04 PM, Tapani Pälli wrote:

On 02/12/2013 08:45 PM, Eric Anholt wrote:

Tapani Pälli  writes:


On 02/12/2013 12:38 AM, Eric Anholt wrote:

Tapani Pälli  writes:

---
   src/mapi/glapi/gen/es_EXT.xml | 13 +
   src/mesa/drivers/dri/intel/intel_extensions.c |  1 +
   src/mesa/main/dd.h|  4 ++-
   src/mesa/main/extensions.c|  1 +
   src/mesa/main/fbobject.c  | 40
+++
   src/mesa/main/fbobject.h  |  4 +++
   src/mesa/main/mtypes.h|  1 +
   src/mesa/main/tests/dispatch_sanity.cpp   |  1 +
   8 files changed, 64 insertions(+), 1 deletion(-)
diff --git a/src/mesa/drivers/dri/intel/intel_extensions.c
b/src/mesa/drivers/dri/intel/intel_extensions.c
index bf5e2b5..a182139 100755
--- a/src/mesa/drivers/dri/intel/intel_extensions.c
+++ b/src/mesa/drivers/dri/intel/intel_extensions.c
@@ -65,6 +65,7 @@ intelInitExtensions(struct gl_context *ctx)
  ctx->Extensions.EXT_blend_equation_separate = true;
  ctx->Extensions.EXT_blend_func_separate = true;
  ctx->Extensions.EXT_blend_minmax = true;
+   ctx->Extensions.EXT_discard_framebuffer = true;
  ctx->Extensions.EXT_framebuffer_blit = true;
  ctx->Extensions.EXT_framebuffer_object = true;
  ctx->Extensions.EXT_fog_coord = true;
diff --git a/src/mesa/main/extensions.c b/src/mesa/main/extensions.c
index 04435e0..f7bd6f2 100644
--- a/src/mesa/main/extensions.c
+++ b/src/mesa/main/extensions.c
@@ -168,6 +168,7 @@ static const struct extension extension_table[]
= {
  { "GL_EXT_blend_color",
o(EXT_blend_color), GLL,1995 },
  { "GL_EXT_blend_equation_separate",
o(EXT_blend_equation_separate), GL, 2003 },
  { "GL_EXT_blend_func_separate",
o(EXT_blend_func_separate), GLL,1999 },
+   { "GL_EXT_discard_framebuffer",
o(EXT_discard_framebuffer), ES2, 2009 },
  { "GL_EXT_blend_minmax",
o(EXT_blend_minmax),GLL | ES1 | ES2, 1995 },
  { "GL_EXT_blend_subtract",
o(dummy_true),  GLL,1995 },
  { "GL_EXT_clip_volume_hint",
o(EXT_clip_volume_hint),GL, 1996 },

Is there any reason not to just expose this stub extension everywhere,
since it requires nothing from the driver author?

I'm not aware of any other than ES2 users so this is why I enabled it
only for ES2. I'll add ES1 too as requested by Ian and Chad.

I wasnt't meaning gles1, I meant using o(dummy_true) so that every
driver gets it.



OK, now I got it. Yes, it can be enabled for every driver.


No, it can't.  This extension requires OES_framebuffer_object.  If we're 
going to enable this on ES1, it must, at the very least, use the same 
enable bit as OES_framebuffer_object.


+   { "GL_EXT_discard_framebuffer", 
o(EXT_framebuffer_object),ES1 | ES2, 
2009 },


I said this in my reply on 2/11:

http://lists.freedesktop.org/archives/mesa-dev/2013-February/034408.html


// Tapani

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] st/mesa: fix texture buffer objects

2013-02-13 Thread Marek Olšák
On Wed, Feb 13, 2013 at 4:03 PM, Brian Paul  wrote:
> On 02/12/2013 06:11 PM, Marek Olšák wrote:
>>
>> Broken by 624528834f53f54c7a934f929769b7e6b230a0b1.
>> ---
>>   src/mesa/state_tracker/st_atom_texture.c |   14 ++
>>   1 file changed, 10 insertions(+), 4 deletions(-)
>>
>> diff --git a/src/mesa/state_tracker/st_atom_texture.c
>> b/src/mesa/state_tracker/st_atom_texture.c
>> index 28327bc..fc2d690 100644
>> --- a/src/mesa/state_tracker/st_atom_texture.c
>> +++ b/src/mesa/state_tracker/st_atom_texture.c
>> @@ -234,11 +234,17 @@ update_single_texture(struct st_context *st,
>>  }
>>
>>  /* Determine the format of the texture sampler view */
>> -   view_format = stObj->pt->format;
>> +   if (texObj->Target == GL_TEXTURE_BUFFER) {
>> +  view_format =
>> + st_mesa_format_to_pipe_format(stObj->base._BufferObjectFormat);
>> +   }
>> +   else {
>> +  view_format = stObj->pt->format;
>
>
> For buffer objects, isn't the stObj->pt->format field set like an ordinary
> texture?

stObj->pt->target is PIPE_BUFFER, so the format is most probably R8_UNORM.

glTexBuffer behaves pretty much like create_sampler_view and specifies
the format for the buffer.

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] R600/SI: Fix INTERP_CONST.

2013-02-13 Thread Michel Dänzer
From: Michel Dänzer 

The important fix is that the constant interpolation value is stored in the
parameter slot P0, which is encoded as 2.

In addition, pass the parameter slot as an operand to V_INTERP_MOV_F32
instead of hardcoding it there, and add a special register class for the
parameter slots for type checking and pretty dumping.

NOTE: This is a candidate for the Mesa stable branch.

Signed-off-by: Michel Dänzer 
---
 lib/Target/R600/AMDGPUAsmPrinter.cpp | 3 +++
 lib/Target/R600/SIISelLowering.cpp   | 1 +
 lib/Target/R600/SIInstructions.td| 3 +--
 lib/Target/R600/SIRegisterInfo.td| 4 
 4 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/lib/Target/R600/AMDGPUAsmPrinter.cpp 
b/lib/Target/R600/AMDGPUAsmPrinter.cpp
index 254e62e..544aed2 100644
--- a/lib/Target/R600/AMDGPUAsmPrinter.cpp
+++ b/lib/Target/R600/AMDGPUAsmPrinter.cpp
@@ -92,6 +92,9 @@ void AMDGPUAsmPrinter::EmitProgramInfo(MachineFunction &MF) {
 default: break;
 case AMDGPU::EXEC:
 case AMDGPU::M0:
+case AMDGPU::P0:
+case AMDGPU::P10:
+case AMDGPU::P20:
   continue;
 }
 
diff --git a/lib/Target/R600/SIISelLowering.cpp 
b/lib/Target/R600/SIISelLowering.cpp
index 92de379..c0b04e5 100644
--- a/lib/Target/R600/SIISelLowering.cpp
+++ b/lib/Target/R600/SIISelLowering.cpp
@@ -186,6 +186,7 @@ void SITargetLowering::LowerSI_INTERP_CONST(MachineInstr 
*MI,
 
   BuildMI(BB, I, BB.findDebugLoc(I), TII->get(AMDGPU::V_INTERP_MOV_F32))
   .addOperand(dst)
+  .addReg(AMDGPU::P0)
   .addOperand(attr_chan)
   .addOperand(attr)
   .addReg(M0);
diff --git a/lib/Target/R600/SIInstructions.td 
b/lib/Target/R600/SIInstructions.td
index feb0188..d4f7b37 100644
--- a/lib/Target/R600/SIInstructions.td
+++ b/lib/Target/R600/SIInstructions.td
@@ -681,10 +681,9 @@ def V_INTERP_P2_F32 : VINTRP <
 def V_INTERP_MOV_F32 : VINTRP <
   0x0002,
   (outs VReg_32:$dst),
-  (ins i32imm:$attr_chan, i32imm:$attr, M0Reg:$m0),
+  (ins ParamReg:$src0, i32imm:$attr_chan, i32imm:$attr, M0Reg:$m0),
   "V_INTERP_MOV_F32",
   []> {
-  let VSRC = 0;
   let DisableEncoding = "$m0";
 }
 
diff --git a/lib/Target/R600/SIRegisterInfo.td 
b/lib/Target/R600/SIRegisterInfo.td
index ab36b87..46c8f91 100644
--- a/lib/Target/R600/SIRegisterInfo.td
+++ b/lib/Target/R600/SIRegisterInfo.td
@@ -23,6 +23,9 @@ def EXEC_HI : SIReg <"EXEC HI", 127>;
 def EXEC : SI_64<"EXEC", [EXEC_LO, EXEC_HI], 126>;
 def SCC : SIReg<"SCC", 253>;
 def M0 : SIReg <"M0", 124>;
+def P10 : SIReg <"P10", 0>;
+def P20 : SIReg <"P20", 1>;
+def P0 : SIReg <"P0", 2>;
 
 //Interpolation registers
 def PERSP_SAMPLE_I : SIReg <"PERSP_SAMPLE_I">;
@@ -187,4 +190,5 @@ def SCCReg : RegisterClass<"AMDGPU", [i1], 1, (add SCC)>;
 def VCCReg : RegisterClass<"AMDGPU", [i1], 1, (add VCC)>;
 def EXECReg : RegisterClass<"AMDGPU", [i1], 1, (add EXEC)>;
 def M0Reg : RegisterClass<"AMDGPU", [i32], 32, (add M0)>;
+def ParamReg : RegisterClass<"AMDGPU", [i32], 32, (add P0, P10, P20)>;
 
-- 
1.8.1.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] radeonsi: Fix up and enable flat shading.

2013-02-13 Thread Michel Dänzer
From: Michel Dänzer 

Requires corresponding LLVM R600 backend fix to work correctly, but even
without that it doesn't hang anymore.

13 more little piglits.

NOTE: This is a candidate for the 9.1 branch.

Signed-off-by: Michel Dänzer 
---
 src/gallium/drivers/radeonsi/radeonsi_shader.c | 11 +--
 src/gallium/drivers/radeonsi/radeonsi_shader.h |  1 +
 src/gallium/drivers/radeonsi/si_state.c|  5 ++---
 src/gallium/drivers/radeonsi/si_state_draw.c   | 10 +-
 4 files changed, 5 insertions(+), 22 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/radeonsi_shader.c 
b/src/gallium/drivers/radeonsi/radeonsi_shader.c
index 7c57f4c..704f859 100644
--- a/src/gallium/drivers/radeonsi/radeonsi_shader.c
+++ b/src/gallium/drivers/radeonsi/radeonsi_shader.c
@@ -301,14 +301,8 @@ static void declare_input_fs(
/* XXX: Handle all possible interpolation modes */
switch (decl->Interp.Interpolate) {
case TGSI_INTERPOLATE_COLOR:
-   /* XXX: Flat shading hangs the GPU */
-   if (si_shader_ctx->rctx->queued.named.rasterizer &&
-   si_shader_ctx->rctx->queued.named.rasterizer->flatshade) {
-#if 0
+   if (si_shader_ctx->key.flatshade) {
intr_name = "llvm.SI.fs.interp.constant";
-#else
-   intr_name = "llvm.SI.fs.interp.linear.center";
-#endif
} else {
if (decl->Interp.Centroid)
intr_name = "llvm.SI.fs.interp.persp.centroid";
@@ -317,11 +311,8 @@ static void declare_input_fs(
}
break;
case TGSI_INTERPOLATE_CONSTANT:
-   /* XXX: Flat shading hangs the GPU */
-#if 0
intr_name = "llvm.SI.fs.interp.constant";
break;
-#endif
case TGSI_INTERPOLATE_LINEAR:
if (decl->Interp.Centroid)
intr_name = "llvm.SI.fs.interp.linear.centroid";
diff --git a/src/gallium/drivers/radeonsi/radeonsi_shader.h 
b/src/gallium/drivers/radeonsi/radeonsi_shader.h
index 07b2f9f..f54f67c 100644
--- a/src/gallium/drivers/radeonsi/radeonsi_shader.h
+++ b/src/gallium/drivers/radeonsi/radeonsi_shader.h
@@ -82,6 +82,7 @@ struct si_shader_key {
unsignednr_cbufs:4;
unsignedcolor_two_side:1;
unsignedalpha_func:3;
+   unsignedflatshade:1;
float   alpha_ref;
 };
 
diff --git a/src/gallium/drivers/radeonsi/si_state.c 
b/src/gallium/drivers/radeonsi/si_state.c
index 5cca085..6bd1055 100644
--- a/src/gallium/drivers/radeonsi/si_state.c
+++ b/src/gallium/drivers/radeonsi/si_state.c
@@ -419,8 +419,7 @@ static void *si_create_rs_state(struct pipe_context *ctx,
rs->offset_units = state->offset_units;
rs->offset_scale = state->offset_scale * 12.0f;
 
-   /* XXX: Flat shading hangs the GPU */
-   tmp = S_0286D4_FLAT_SHADE_ENA(0);
+   tmp = S_0286D4_FLAT_SHADE_ENA(rs->flatshade);
if (state->sprite_coord_enable) {
tmp |= S_0286D4_PNT_SPRITE_ENA(1) |

S_0286D4_PNT_SPRITE_OVRD_X(V_0286D4_SPI_PNT_SPRITE_SEL_S) |
@@ -1875,7 +1874,7 @@ static INLINE struct si_shader_key 
si_shader_selector_key(struct pipe_context *c
key.export_16bpc = rctx->export_16bpc;
if (rctx->queued.named.rasterizer) {
key.color_two_side = 
rctx->queued.named.rasterizer->two_side;
-   /*key.flatshade = 
rctx->queued.named.rasterizer->flatshade;*/
+   key.flatshade = 
rctx->queued.named.rasterizer->flatshade;
}
if (rctx->queued.named.dsa) {
key.alpha_func = rctx->queued.named.dsa->alpha_func;
diff --git a/src/gallium/drivers/radeonsi/si_state_draw.c 
b/src/gallium/drivers/radeonsi/si_state_draw.c
index 3704410..61dea74 100644
--- a/src/gallium/drivers/radeonsi/si_state_draw.c
+++ b/src/gallium/drivers/radeonsi/si_state_draw.c
@@ -128,11 +128,6 @@ static void si_pipe_shader_ps(struct pipe_context *ctx, 
struct si_pipe_shader *s
continue;
}
 
-   /* XXX: Flat shading hangs the GPU */
-   if (shader->shader.input[i].interpolate == 
TGSI_INTERPOLATE_CONSTANT ||
-   (shader->shader.input[i].interpolate == 
TGSI_INTERPOLATE_COLOR &&
-rctx->queued.named.rasterizer->flatshade))
-   have_linear = TRUE;
if (shader->shader.input[i].interpolate == 
TGSI_INTERPOLATE_LINEAR)
have_linear = TRUE;
if (shader->shader.input[i].interpolate == 
TGSI_INTERPOLATE_PERSPECTIVE)
@@ -327,15 +322,12 @@ static void si_update_spi_map(struct r600_context *rctx)
 bcolor:
tmp = 0;
 
-#if 0
-   /* XXX: Flat shading hangs the GPU */
if (name == TGSI_SEMANTIC

[Mesa-dev] [Bug 59187] [Steam] Black screen but audio song On TF2 (Intel HM 55/ Ironlake Mobile)

2013-02-13 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=59187

--- Comment #2 from Ian Romanick  ---
*** Bug 60481 has been marked as a duplicate of this bug. ***

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] radeonsi: Fix w component of TGSI_SEMANTIC_POSITION fragment shader inputs.

2013-02-13 Thread Michel Dänzer
From: Michel Dänzer 

It's the reciprocal of the register value.

Fixes piglit fragcoord_w and glsl-fs-fragcoord-zw-perspective.

NOTE: This is a candidate for the 9.1 branch.

Signed-off-by: Michel Dänzer 
---
 src/gallium/drivers/radeonsi/radeonsi_shader.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/src/gallium/drivers/radeonsi/radeonsi_shader.c 
b/src/gallium/drivers/radeonsi/radeonsi_shader.c
index 704f859..7922928 100644
--- a/src/gallium/drivers/radeonsi/radeonsi_shader.c
+++ b/src/gallium/drivers/radeonsi/radeonsi_shader.c
@@ -263,6 +263,14 @@ static void declare_input_fs(
build_intrinsic(base->gallivm->builder,
"llvm.SI.fs.read.pos", input_type,
args, 1, LLVMReadNoneAttribute);
+
+   if (chan == 3)
+   /* RCP for fragcoord.w */
+   si_shader_ctx->radeon_bld.inputs[soa_index] =
+   LLVMBuildFDiv(gallivm->builder,
+ 
lp_build_const_float(gallivm, 1.0f),
+ 
si_shader_ctx->radeon_bld.inputs[soa_index],
+ "");
}
return;
}
-- 
1.8.1.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] R600/SI: Fix INTERP_CONST.

2013-02-13 Thread Tom Stellard
On Wed, Feb 13, 2013 at 04:34:23PM +0100, Michel Dänzer wrote:
> From: Michel Dänzer 
> 
> The important fix is that the constant interpolation value is stored in the
> parameter slot P0, which is encoded as 2.
> 
> In addition, pass the parameter slot as an operand to V_INTERP_MOV_F32
> instead of hardcoding it there, and add a special register class for the
> parameter slots for type checking and pretty dumping.
> 
> NOTE: This is a candidate for the Mesa stable branch.
> 
> Signed-off-by: Michel Dänzer 
> ---
>  lib/Target/R600/AMDGPUAsmPrinter.cpp | 3 +++
>  lib/Target/R600/SIISelLowering.cpp   | 1 +
>  lib/Target/R600/SIInstructions.td| 3 +--
>  lib/Target/R600/SIRegisterInfo.td| 4 
>  4 files changed, 9 insertions(+), 2 deletions(-)
> 
> diff --git a/lib/Target/R600/AMDGPUAsmPrinter.cpp 
> b/lib/Target/R600/AMDGPUAsmPrinter.cpp
> index 254e62e..544aed2 100644
> --- a/lib/Target/R600/AMDGPUAsmPrinter.cpp
> +++ b/lib/Target/R600/AMDGPUAsmPrinter.cpp
> @@ -92,6 +92,9 @@ void AMDGPUAsmPrinter::EmitProgramInfo(MachineFunction &MF) 
> {
>  default: break;
>  case AMDGPU::EXEC:
>  case AMDGPU::M0:
> +case AMDGPU::P0:
> +case AMDGPU::P10:
> +case AMDGPU::P20:
>continue;
>  }
>  
> diff --git a/lib/Target/R600/SIISelLowering.cpp 
> b/lib/Target/R600/SIISelLowering.cpp
> index 92de379..c0b04e5 100644
> --- a/lib/Target/R600/SIISelLowering.cpp
> +++ b/lib/Target/R600/SIISelLowering.cpp
> @@ -186,6 +186,7 @@ void SITargetLowering::LowerSI_INTERP_CONST(MachineInstr 
> *MI,
>  
>BuildMI(BB, I, BB.findDebugLoc(I), TII->get(AMDGPU::V_INTERP_MOV_F32))
>.addOperand(dst)
> +  .addReg(AMDGPU::P0)
>.addOperand(attr_chan)
>.addOperand(attr)
>.addReg(M0);
> diff --git a/lib/Target/R600/SIInstructions.td 
> b/lib/Target/R600/SIInstructions.td
> index feb0188..d4f7b37 100644
> --- a/lib/Target/R600/SIInstructions.td
> +++ b/lib/Target/R600/SIInstructions.td
> @@ -681,10 +681,9 @@ def V_INTERP_P2_F32 : VINTRP <
>  def V_INTERP_MOV_F32 : VINTRP <
>0x0002,
>(outs VReg_32:$dst),
> -  (ins i32imm:$attr_chan, i32imm:$attr, M0Reg:$m0),
> +  (ins ParamReg:$src0, i32imm:$attr_chan, i32imm:$attr, M0Reg:$m0),
>"V_INTERP_MOV_F32",
>[]> {
> -  let VSRC = 0;
>let DisableEncoding = "$m0";
>  }
>  
> diff --git a/lib/Target/R600/SIRegisterInfo.td 
> b/lib/Target/R600/SIRegisterInfo.td
> index ab36b87..46c8f91 100644
> --- a/lib/Target/R600/SIRegisterInfo.td
> +++ b/lib/Target/R600/SIRegisterInfo.td
> @@ -23,6 +23,9 @@ def EXEC_HI : SIReg <"EXEC HI", 127>;
>  def EXEC : SI_64<"EXEC", [EXEC_LO, EXEC_HI], 126>;
>  def SCC : SIReg<"SCC", 253>;
>  def M0 : SIReg <"M0", 124>;
> +def P10 : SIReg <"P10", 0>;
> +def P20 : SIReg <"P20", 1>;
> +def P0 : SIReg <"P0", 2>;
>

These registers (and other read-only registers) should be added to the reserved
register list by SIRegisterInfo::getReservedRegs().  This will prevent
errors in the machine verifier pass, and also help the register
coalescer in some cases.


-Tom
>  //Interpolation registers
>  def PERSP_SAMPLE_I : SIReg <"PERSP_SAMPLE_I">;
> @@ -187,4 +190,5 @@ def SCCReg : RegisterClass<"AMDGPU", [i1], 1, (add SCC)>;
>  def VCCReg : RegisterClass<"AMDGPU", [i1], 1, (add VCC)>;
>  def EXECReg : RegisterClass<"AMDGPU", [i1], 1, (add EXEC)>;
>  def M0Reg : RegisterClass<"AMDGPU", [i32], 32, (add M0)>;
> +def ParamReg : RegisterClass<"AMDGPU", [i32], 32, (add P0, P10, P20)>;
>  
> -- 
> 1.8.1.3
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2] R600/SI: Fix INTERP_CONST.

2013-02-13 Thread Michel Dänzer
From: Michel Dänzer 

The important fix is that the constant interpolation value is stored in the
parameter slot P0, which is encoded as 2.

In addition, pass the parameter slot as an operand to V_INTERP_MOV_F32
instead of hardcoding it there, and add a special register class for the
parameter slots for type checking and pretty dumping.

NOTE: This is a candidate for the Mesa stable branch.

Signed-off-by: Michel Dänzer 
---

v2: Mark parameter slot registers as reserved.

 lib/Target/R600/AMDGPUAsmPrinter.cpp | 3 +++
 lib/Target/R600/SIISelLowering.cpp   | 1 +
 lib/Target/R600/SIInstructions.td| 3 +--
 lib/Target/R600/SIRegisterInfo.cpp   | 5 +
 lib/Target/R600/SIRegisterInfo.td| 4 
 5 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/lib/Target/R600/AMDGPUAsmPrinter.cpp 
b/lib/Target/R600/AMDGPUAsmPrinter.cpp
index 254e62e..544aed2 100644
--- a/lib/Target/R600/AMDGPUAsmPrinter.cpp
+++ b/lib/Target/R600/AMDGPUAsmPrinter.cpp
@@ -92,6 +92,9 @@ void AMDGPUAsmPrinter::EmitProgramInfo(MachineFunction &MF) {
 default: break;
 case AMDGPU::EXEC:
 case AMDGPU::M0:
+case AMDGPU::P0:
+case AMDGPU::P10:
+case AMDGPU::P20:
   continue;
 }
 
diff --git a/lib/Target/R600/SIISelLowering.cpp 
b/lib/Target/R600/SIISelLowering.cpp
index 92de379..c0b04e5 100644
--- a/lib/Target/R600/SIISelLowering.cpp
+++ b/lib/Target/R600/SIISelLowering.cpp
@@ -186,6 +186,7 @@ void SITargetLowering::LowerSI_INTERP_CONST(MachineInstr 
*MI,
 
   BuildMI(BB, I, BB.findDebugLoc(I), TII->get(AMDGPU::V_INTERP_MOV_F32))
   .addOperand(dst)
+  .addReg(AMDGPU::P0)
   .addOperand(attr_chan)
   .addOperand(attr)
   .addReg(M0);
diff --git a/lib/Target/R600/SIInstructions.td 
b/lib/Target/R600/SIInstructions.td
index feb0188..d4f7b37 100644
--- a/lib/Target/R600/SIInstructions.td
+++ b/lib/Target/R600/SIInstructions.td
@@ -681,10 +681,9 @@ def V_INTERP_P2_F32 : VINTRP <
 def V_INTERP_MOV_F32 : VINTRP <
   0x0002,
   (outs VReg_32:$dst),
-  (ins i32imm:$attr_chan, i32imm:$attr, M0Reg:$m0),
+  (ins ParamReg:$src0, i32imm:$attr_chan, i32imm:$attr, M0Reg:$m0),
   "V_INTERP_MOV_F32",
   []> {
-  let VSRC = 0;
   let DisableEncoding = "$m0";
 }
 
diff --git a/lib/Target/R600/SIRegisterInfo.cpp 
b/lib/Target/R600/SIRegisterInfo.cpp
index 88275c5..4e7cbf9 100644
--- a/lib/Target/R600/SIRegisterInfo.cpp
+++ b/lib/Target/R600/SIRegisterInfo.cpp
@@ -27,6 +27,11 @@ SIRegisterInfo::SIRegisterInfo(AMDGPUTargetMachine &tm,
 
 BitVector SIRegisterInfo::getReservedRegs(const MachineFunction &MF) const {
   BitVector Reserved(getNumRegs());
+
+  Reserved.set(AMDGPU::P0);
+  Reserved.set(AMDGPU::P10);
+  Reserved.set(AMDGPU::P20);
+
   return Reserved;
 }
 
diff --git a/lib/Target/R600/SIRegisterInfo.td 
b/lib/Target/R600/SIRegisterInfo.td
index ab36b87..46c8f91 100644
--- a/lib/Target/R600/SIRegisterInfo.td
+++ b/lib/Target/R600/SIRegisterInfo.td
@@ -23,6 +23,9 @@ def EXEC_HI : SIReg <"EXEC HI", 127>;
 def EXEC : SI_64<"EXEC", [EXEC_LO, EXEC_HI], 126>;
 def SCC : SIReg<"SCC", 253>;
 def M0 : SIReg <"M0", 124>;
+def P10 : SIReg <"P10", 0>;
+def P20 : SIReg <"P20", 1>;
+def P0 : SIReg <"P0", 2>;
 
 //Interpolation registers
 def PERSP_SAMPLE_I : SIReg <"PERSP_SAMPLE_I">;
@@ -187,4 +190,5 @@ def SCCReg : RegisterClass<"AMDGPU", [i1], 1, (add SCC)>;
 def VCCReg : RegisterClass<"AMDGPU", [i1], 1, (add VCC)>;
 def EXECReg : RegisterClass<"AMDGPU", [i1], 1, (add EXEC)>;
 def M0Reg : RegisterClass<"AMDGPU", [i32], 32, (add M0)>;
+def ParamReg : RegisterClass<"AMDGPU", [i32], 32, (add P0, P10, P20)>;
 
-- 
1.8.1.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2] R600/SI: Fix INTERP_CONST.

2013-02-13 Thread Tom Stellard
On Wed, Feb 13, 2013 at 05:07:43PM +0100, Michel Dänzer wrote:
> From: Michel Dänzer 
> 
> The important fix is that the constant interpolation value is stored in the
> parameter slot P0, which is encoded as 2.
> 
> In addition, pass the parameter slot as an operand to V_INTERP_MOV_F32
> instead of hardcoding it there, and add a special register class for the
> parameter slots for type checking and pretty dumping.
> 
> NOTE: This is a candidate for the Mesa stable branch.
> 
> Signed-off-by: Michel Dänzer 
Reviewed-by: Tom Stellard 
> ---
> 
> v2: Mark parameter slot registers as reserved.
> 
>  lib/Target/R600/AMDGPUAsmPrinter.cpp | 3 +++
>  lib/Target/R600/SIISelLowering.cpp   | 1 +
>  lib/Target/R600/SIInstructions.td| 3 +--
>  lib/Target/R600/SIRegisterInfo.cpp   | 5 +
>  lib/Target/R600/SIRegisterInfo.td| 4 
>  5 files changed, 14 insertions(+), 2 deletions(-)
> 
> diff --git a/lib/Target/R600/AMDGPUAsmPrinter.cpp 
> b/lib/Target/R600/AMDGPUAsmPrinter.cpp
> index 254e62e..544aed2 100644
> --- a/lib/Target/R600/AMDGPUAsmPrinter.cpp
> +++ b/lib/Target/R600/AMDGPUAsmPrinter.cpp
> @@ -92,6 +92,9 @@ void AMDGPUAsmPrinter::EmitProgramInfo(MachineFunction &MF) 
> {
>  default: break;
>  case AMDGPU::EXEC:
>  case AMDGPU::M0:
> +case AMDGPU::P0:
> +case AMDGPU::P10:
> +case AMDGPU::P20:
>continue;
>  }
>  
> diff --git a/lib/Target/R600/SIISelLowering.cpp 
> b/lib/Target/R600/SIISelLowering.cpp
> index 92de379..c0b04e5 100644
> --- a/lib/Target/R600/SIISelLowering.cpp
> +++ b/lib/Target/R600/SIISelLowering.cpp
> @@ -186,6 +186,7 @@ void SITargetLowering::LowerSI_INTERP_CONST(MachineInstr 
> *MI,
>  
>BuildMI(BB, I, BB.findDebugLoc(I), TII->get(AMDGPU::V_INTERP_MOV_F32))
>.addOperand(dst)
> +  .addReg(AMDGPU::P0)
>.addOperand(attr_chan)
>.addOperand(attr)
>.addReg(M0);
> diff --git a/lib/Target/R600/SIInstructions.td 
> b/lib/Target/R600/SIInstructions.td
> index feb0188..d4f7b37 100644
> --- a/lib/Target/R600/SIInstructions.td
> +++ b/lib/Target/R600/SIInstructions.td
> @@ -681,10 +681,9 @@ def V_INTERP_P2_F32 : VINTRP <
>  def V_INTERP_MOV_F32 : VINTRP <
>0x0002,
>(outs VReg_32:$dst),
> -  (ins i32imm:$attr_chan, i32imm:$attr, M0Reg:$m0),
> +  (ins ParamReg:$src0, i32imm:$attr_chan, i32imm:$attr, M0Reg:$m0),
>"V_INTERP_MOV_F32",
>[]> {
> -  let VSRC = 0;
>let DisableEncoding = "$m0";
>  }
>  
> diff --git a/lib/Target/R600/SIRegisterInfo.cpp 
> b/lib/Target/R600/SIRegisterInfo.cpp
> index 88275c5..4e7cbf9 100644
> --- a/lib/Target/R600/SIRegisterInfo.cpp
> +++ b/lib/Target/R600/SIRegisterInfo.cpp
> @@ -27,6 +27,11 @@ SIRegisterInfo::SIRegisterInfo(AMDGPUTargetMachine &tm,
>  
>  BitVector SIRegisterInfo::getReservedRegs(const MachineFunction &MF) const {
>BitVector Reserved(getNumRegs());
> +
> +  Reserved.set(AMDGPU::P0);
> +  Reserved.set(AMDGPU::P10);
> +  Reserved.set(AMDGPU::P20);
> +
>return Reserved;
>  }
>  
> diff --git a/lib/Target/R600/SIRegisterInfo.td 
> b/lib/Target/R600/SIRegisterInfo.td
> index ab36b87..46c8f91 100644
> --- a/lib/Target/R600/SIRegisterInfo.td
> +++ b/lib/Target/R600/SIRegisterInfo.td
> @@ -23,6 +23,9 @@ def EXEC_HI : SIReg <"EXEC HI", 127>;
>  def EXEC : SI_64<"EXEC", [EXEC_LO, EXEC_HI], 126>;
>  def SCC : SIReg<"SCC", 253>;
>  def M0 : SIReg <"M0", 124>;
> +def P10 : SIReg <"P10", 0>;
> +def P20 : SIReg <"P20", 1>;
> +def P0 : SIReg <"P0", 2>;
>  
>  //Interpolation registers
>  def PERSP_SAMPLE_I : SIReg <"PERSP_SAMPLE_I">;
> @@ -187,4 +190,5 @@ def SCCReg : RegisterClass<"AMDGPU", [i1], 1, (add SCC)>;
>  def VCCReg : RegisterClass<"AMDGPU", [i1], 1, (add VCC)>;
>  def EXECReg : RegisterClass<"AMDGPU", [i1], 1, (add EXEC)>;
>  def M0Reg : RegisterClass<"AMDGPU", [i32], 32, (add M0)>;
> +def ParamReg : RegisterClass<"AMDGPU", [i32], 32, (add P0, P10, P20)>;
>  
> -- 
> 1.8.1.3
> 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 07/12] R600/SI: move *_Helper definitions to SIInstrFormat.td

2013-02-13 Thread Tom Stellard
On Wed, Feb 13, 2013 at 10:18:23AM +0100, Christian König wrote:
> Am 13.02.2013 01:20, schrieb Tom Stellard:
> > On Tue, Feb 12, 2013 at 06:13:19PM +0100, Christian König wrote:
> >> From: Christian König 
> >>
> > SIInstrFormats.td should contain the instruction encoding definitions
> > and everything else should go in SIInstrInfo.td.  I got this backwards,
> > when I first created these files, so really these helpers and everything
> > but the encoding definitions should be in SIInstrInfo.td
> 
> Already noticed that there is something mixed up here, but wasn't 100% 
> sure to make a patch.
> 
> Well, I would suggest that we apply this one anyway and then just switch 
> SIInstrInfo.td and SIInstrFormat.td.
>

Ok, this sounds good to me.

> Christian.
> 
> >
> > -Tom
> >
> >> Signed-off-by: Christian König 
> >> ---
> >>   lib/Target/R600/SIInstrFormats.td |   66 
> >> +
> >>   lib/Target/R600/SIInstrInfo.td|   66 
> >> -
> >>   2 files changed, 66 insertions(+), 66 deletions(-)
> >>
> >> diff --git a/lib/Target/R600/SIInstrFormats.td 
> >> b/lib/Target/R600/SIInstrFormats.td
> >> index aea3b5a..7040115 100644
> >> --- a/lib/Target/R600/SIInstrFormats.td
> >> +++ b/lib/Target/R600/SIInstrFormats.td
> >> @@ -144,3 +144,69 @@ class SOPC_32  op, string opName, list 
> >> pattern>
> >>   class SOPC_64  op, string opName, list pattern>
> >> : SOPC  >> opName, pattern>;
> >>   
> >> +class MIMG_Load_Helper  op, string asm> : MIMG <
> >> +  op,
> >> +  (outs VReg_128:$vdata),
> >> +  (ins i32imm:$dmask, i1imm:$unorm, i1imm:$glc, i1imm:$da, i1imm:$r128,
> >> +   i1imm:$tfe, i1imm:$lwe, i1imm:$slc, VReg_32:$vaddr,
> >> +   GPR4Align:$srsrc, GPR4Align:$ssamp),
> >> +  asm,
> >> +  []> {
> >> +  let mayLoad = 1;
> >> +  let mayStore = 0;
> >> +}
> >> +
> >> +class MTBUF_Store_Helper  op, string asm, RegisterClass regClass> 
> >> : MTBUF <
> >> +  op,
> >> +  (outs),
> >> +  (ins regClass:$vdata, i16imm:$offset, i1imm:$offen, i1imm:$idxen, 
> >> i1imm:$glc,
> >> +   i1imm:$addr64, i8imm:$dfmt, i8imm:$nfmt, VReg_32:$vaddr,
> >> +   GPR4Align:$srsrc, i1imm:$slc, i1imm:$tfe, SReg_32:$soffset),
> >> +  asm,
> >> +  []> {
> >> +  let mayStore = 1;
> >> +  let mayLoad = 0;
> >> +}
> >> +
> >> +class MUBUF_Load_Helper  op, string asm, RegisterClass regClass> 
> >> : MUBUF <
> >> +  op,
> >> +  (outs regClass:$dst),
> >> +  (ins i16imm:$offset, i1imm:$offen, i1imm:$idxen, i1imm:$glc, 
> >> i1imm:$addr64,
> >> +   i1imm:$lds, VReg_32:$vaddr, GPR4Align:$srsrc, i1imm:$slc,
> >> +   i1imm:$tfe, SReg_32:$soffset),
> >> +  asm,
> >> +  []> {
> >> +  let mayLoad = 1;
> >> +  let mayStore = 0;
> >> +}
> >> +
> >> +class MTBUF_Load_Helper  op, string asm, RegisterClass regClass> 
> >> : MTBUF <
> >> +  op,
> >> +  (outs regClass:$dst),
> >> +  (ins i16imm:$offset, i1imm:$offen, i1imm:$idxen, i1imm:$glc, 
> >> i1imm:$addr64,
> >> +   i8imm:$dfmt, i8imm:$nfmt, VReg_32:$vaddr, 
> >> GPR4Align:$srsrc,
> >> +   i1imm:$slc, i1imm:$tfe, SReg_32:$soffset),
> >> +  asm,
> >> +  []> {
> >> +  let mayLoad = 1;
> >> +  let mayStore = 0;
> >> +}
> >> +
> >> +multiclass SMRD_Helper  op, string asm, RegisterClass dstClass> {
> >> +  def _IMM : SMRD <
> >> + op, 1,
> >> + (outs dstClass:$dst),
> >> + (ins GPR2Align:$sbase, i32imm:$offset),
> >> + asm,
> >> + []
> >> +  >;
> >> +
> >> +  def _SGPR : SMRD <
> >> +  op, 0,
> >> +  (outs dstClass:$dst),
> >> +  (ins GPR2Align:$sbase, SReg_32:$soff),
> >> +  asm,
> >> +  []
> >> +  >;
> >> +}
> >> +
> >> diff --git a/lib/Target/R600/SIInstrInfo.td 
> >> b/lib/Target/R600/SIInstrInfo.td
> >> index b983e8a..aa156f3 100644
> >> --- a/lib/Target/R600/SIInstrInfo.td
> >> +++ b/lib/Target/R600/SIInstrInfo.td
> >> @@ -484,71 +484,5 @@ class VOPC  op, dag ins, string asm, 
> >> list pattern> :
> >>   
> >>   } // End Uses = [EXEC]
> >>   
> >> -class MIMG_Load_Helper  op, string asm> : MIMG <
> >> -  op,
> >> -  (outs VReg_128:$vdata),
> >> -  (ins i32imm:$dmask, i1imm:$unorm, i1imm:$glc, i1imm:$da, i1imm:$r128,
> >> -   i1imm:$tfe, i1imm:$lwe, i1imm:$slc, VReg_32:$vaddr,
> >> -   GPR4Align:$srsrc, GPR4Align:$ssamp),
> >> -  asm,
> >> -  []> {
> >> -  let mayLoad = 1;
> >> -  let mayStore = 0;
> >> -}
> >> -
> >> -class MUBUF_Load_Helper  op, string asm, RegisterClass regClass> 
> >> : MUBUF <
> >> -  op,
> >> -  (outs regClass:$dst),
> >> -  (ins i16imm:$offset, i1imm:$offen, i1imm:$idxen, i1imm:$glc, 
> >> i1imm:$addr64,
> >> -   i1imm:$lds, VReg_32:$vaddr, GPR4Align:$srsrc, i1imm:$slc,
> >> -   i1imm:$tfe, SReg_32:$soffset),
> >> -  asm,
> >> -  []> {
> >> -  let mayLoad = 1;
> >> -  let mayStore = 0;
> >> -}
> >> -
> >> -class MTBUF_Load_Helper  op, string asm, RegisterClass regClass> 
> >> : MTBUF <
> >> -  op,
> >> -  (outs regClass:$dst),
> >> -  (ins i16imm:$offset, i

Re: [Mesa-dev] [PATCH 10/12] R600/SI: cleanup literal handling

2013-02-13 Thread Tom Stellard
On Wed, Feb 13, 2013 at 10:34:12AM +0100, Christian König wrote:
> Am 13.02.2013 01:39, schrieb Tom Stellard:
> > [SNIP]
> >
> > Way back when I first started working on the backend I was using
> > immediate operands in instructions defined to only uses registers, and
> > it worked most of the time, but I ran into a few cases where some of the
> > passes weren't able to handle it.  So, I'm actually kind of surprised that 
> > this
> > works, but it's a nice simplification in any case.
> 
> I stumbled over a couple of problems while coding it, but they were not 
> related to the fact that I tried to put immediate values into operands 
> that should only have registers. It was more that it surfaced a couple 
> of other problems, for example we shouldn't use f64 as supported type in 
> a RegisterClass and then forget to set a default class for f64 with 
> "addRegisterClass", that can lead to a whole bunch of problems, 
> including segfaults.
>

It's highly likely that when I implemented it in the past, it was
uncovering bugs rather than causing them.  It will still pretty unstable
at that point.

> Actually when you look into the MCOperandInfo class the only difference 
> between an immediate operand and an register operand is that registers 
> have a RegClass. Apart from that this information is only available at 
> pattern matching time, and there it doesn't seems to matter.
> 

The OpInfo information is stored in the MCInstrDesc object for each
instruction, so it can be accessed after pattern matching too, however
it looks like the only pass that does this is the MachineVerification
pass, so it looks like this won't be an issue.

-Tom
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 12/12] R600/SI: Add pattern to simplify i64 loading

2013-02-13 Thread Tom Stellard
On Tue, Feb 12, 2013 at 06:13:24PM +0100, Christian König wrote:
> From: Christian König 
> 
> Signed-off-by: Christian König 
> ---
>  lib/Target/R600/SIInstrInfo.td|4 
>  lib/Target/R600/SIInstructions.td |5 +
>  2 files changed, 9 insertions(+)
> 
> diff --git a/lib/Target/R600/SIInstrInfo.td b/lib/Target/R600/SIInstrInfo.td
> index efc6015..8c4e5af 100644
> --- a/lib/Target/R600/SIInstrInfo.td
> +++ b/lib/Target/R600/SIInstrInfo.td
> @@ -40,6 +40,10 @@ def IMM12bit : ImmLeaf <
>[{return isUInt<12>(Imm);}]
>  >;
>  
> +class InlineImm  : ImmLeaf  +  return -16 <= Imm && Imm <= 64;
> +}]>;
> +

I don't understand what this is checking for:

-16 <= Imm && Imm <= 64

Can you explain this?

>  class InstSI  pattern> :
>  AMDGPUInst {
>  
> diff --git a/lib/Target/R600/SIInstructions.td 
> b/lib/Target/R600/SIInstructions.td
> index f7d5cbf..19ea414 100644
> --- a/lib/Target/R600/SIInstructions.td
> +++ b/lib/Target/R600/SIInstructions.td
> @@ -1290,6 +1290,11 @@ def : Pat <
>(S_MOV_B32 fpimm:$imm)
>  >;
>  
> +def : Pat <
> +  (i64 InlineImm:$imm),
> +  (S_MOV_B64 InlineImm:$imm)
> +>;
> +
>  // i64 immediates aren't supported in hardware, split it into two 32bit 
> values
>  def : Pat <
>(i64 imm:$imm),
> -- 
> 1.7.9.5
> 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 01/12] R600/structurizer: add class to find the Nearest Common Dominator

2013-02-13 Thread Tom Stellard
There's just the one cleanup on patch 10 that you mentioned, but
otherwise the series looks good to me.  Should we mark all these patches
as candidates for the stable branch?

For the series:

Reviewed-by: Tom Stellard 

On Tue, Feb 12, 2013 at 06:13:13PM +0100, Christian König wrote:
> From: Christian König 
> 
> Signed-off-by: Christian König 
> ---
>  lib/Target/R600/AMDGPUStructurizeCFG.cpp |   66 
> ++
>  1 file changed, 66 insertions(+)
> 
> diff --git a/lib/Target/R600/AMDGPUStructurizeCFG.cpp 
> b/lib/Target/R600/AMDGPUStructurizeCFG.cpp
> index 169d954..ad628c1 100644
> --- a/lib/Target/R600/AMDGPUStructurizeCFG.cpp
> +++ b/lib/Target/R600/AMDGPUStructurizeCFG.cpp
> @@ -39,6 +39,7 @@ typedef SmallVector BBValueVector;
>  typedef SmallPtrSet BBSet;
>  
>  typedef DenseMap PhiMap;
> +typedef DenseMap DTN2UnsignedMap;
>  typedef DenseMap BBPhiMap;
>  typedef DenseMap BBPredicates;
>  typedef DenseMap PredMap;
> @@ -48,6 +49,71 @@ typedef DenseMap BB2BBVecMap;
>  
>  static const char *FlowBlockName = "Flow";
>  
> +/// @brief Find the nearest common dominator for multiple BasicBlocks
> +///
> +/// Helper class for AMDGPUStructurizeCFG
> +/// TODO: Maybe move into common code
> +class NearestCommonDominator {
> +
> +  DominatorTree *DT;
> +
> +  DTN2UnsignedMap IndexMap;
> +
> +  BasicBlock *Result;
> +  unsigned ResultIndex;
> +  bool ExplicitMentioned;
> +
> +public:
> +  /// \brief Start a new query
> +  NearestCommonDominator(DominatorTree *DomTree) {
> +DT = DomTree;
> +Result = 0;
> +  }
> +
> +  /// \brief Add BB to the resulting dominator
> +  void addBlock(BasicBlock *BB, bool Remember = true) {
> +
> +DomTreeNode *Node = DT->getNode(BB);
> +
> +if (Result == 0) {
> +  unsigned Numbering = 0;
> +  for (;Node;Node = Node->getIDom())
> +IndexMap[Node] = ++Numbering;
> +  Result = BB;
> +  ResultIndex = 1;
> +  ExplicitMentioned = Remember;
> +  return;
> +}
> +
> +for (;Node;Node = Node->getIDom())
> +  if (IndexMap.count(Node))
> +break;
> +  else
> +IndexMap[Node] = 0;
> +
> +assert(Node && "Dominator tree invalid!");
> +
> +unsigned Numbering = IndexMap[Node];
> +if (Numbering > ResultIndex) {
> +  Result = Node->getBlock();
> +  ResultIndex = Numbering;
> +  ExplicitMentioned = Remember && (Result == BB);
> +} else if (Numbering == ResultIndex) {
> +  ExplicitMentioned |= Remember;
> +}
> +  }
> +
> +  /// \brief Is "Result" one of the BBs added with "Remember" = True?
> +  bool wasResultExplicitMentioned() {
> +return ExplicitMentioned;
> +  }
> +
> +  /// \brief Get the query result
> +  BasicBlock *getResult() {
> +return Result;
> +  }
> +};
> +
>  /// @brief Transforms the control flow graph on one single entry/exit region
>  /// at a time.
>  ///
> -- 
> 1.7.9.5
> 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 12/12] R600/SI: Add pattern to simplify i64 loading

2013-02-13 Thread Tom Stellard
On Wed, Feb 13, 2013 at 11:24:24AM -0500, Tom Stellard wrote:
> On Tue, Feb 12, 2013 at 06:13:24PM +0100, Christian König wrote:
> > From: Christian König 
> > 
> > Signed-off-by: Christian König 
> > ---
> >  lib/Target/R600/SIInstrInfo.td|4 
> >  lib/Target/R600/SIInstructions.td |5 +
> >  2 files changed, 9 insertions(+)
> > 
> > diff --git a/lib/Target/R600/SIInstrInfo.td b/lib/Target/R600/SIInstrInfo.td
> > index efc6015..8c4e5af 100644
> > --- a/lib/Target/R600/SIInstrInfo.td
> > +++ b/lib/Target/R600/SIInstrInfo.td
> > @@ -40,6 +40,10 @@ def IMM12bit : ImmLeaf <
> >[{return isUInt<12>(Imm);}]
> >  >;
> >  
> > +class InlineImm  : ImmLeaf  > +  return -16 <= Imm && Imm <= 64;
> > +}]>;
> > +
> 
> I don't understand what this is checking for:
> 
> -16 <= Imm && Imm <= 64
> 
> Can you explain this?
>

Michel just explained this to me, -16 to 64 is the range of inline
constants.

-Tom

> >  class InstSI  pattern> :
> >  AMDGPUInst {
> >  
> > diff --git a/lib/Target/R600/SIInstructions.td 
> > b/lib/Target/R600/SIInstructions.td
> > index f7d5cbf..19ea414 100644
> > --- a/lib/Target/R600/SIInstructions.td
> > +++ b/lib/Target/R600/SIInstructions.td
> > @@ -1290,6 +1290,11 @@ def : Pat <
> >(S_MOV_B32 fpimm:$imm)
> >  >;
> >  
> > +def : Pat <
> > +  (i64 InlineImm:$imm),
> > +  (S_MOV_B64 InlineImm:$imm)
> > +>;
> > +
> >  // i64 immediates aren't supported in hardware, split it into two 32bit 
> > values
> >  def : Pat <
> >(i64 imm:$imm),
> > -- 
> > 1.7.9.5
> > 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/5] R600: Fix tracking of implicit defs in the IndirectAddressing pass

2013-02-13 Thread Tom Stellard
From: Tom Stellard 

In some cases, we were losing track of live implicit registers which
was creating dead defs and causing the scheduler to produce invalid
code.

NOTE: This is a candidate for the Mesa stable branch.
---
 lib/Target/R600/AMDGPUIndirectAddressing.cpp |   35 -
 1 files changed, 28 insertions(+), 7 deletions(-)

diff --git a/lib/Target/R600/AMDGPUIndirectAddressing.cpp 
b/lib/Target/R600/AMDGPUIndirectAddressing.cpp
index 56aaf23..3efd5c7 100644
--- a/lib/Target/R600/AMDGPUIndirectAddressing.cpp
+++ b/lib/Target/R600/AMDGPUIndirectAddressing.cpp
@@ -169,9 +169,9 @@ bool 
AMDGPUIndirectAddressingPass::runOnMachineFunction(MachineFunction &MF) {
 }
 
 if (RegisterAddressMap[Reg] == Address) {
-  if (!regHasExplicitDef(MRI, Reg)) {
-continue;
-  }
+//  if (!regHasExplicitDef(MRI, Reg)) {
+//continue;
+//  }
   PhiRegisters.push_back(Reg);
 }
   }
@@ -270,7 +270,8 @@ bool 
AMDGPUIndirectAddressingPass::runOnMachineFunction(MachineFunction &MF) {
   // instruction that uses indirect addressing. 
   BuildMI(MBB, I, MBB.findDebugLoc(I), TII->get(AMDGPU::COPY),
MI.getOperand(0).getReg())
-   .addReg(AddrReg);
+   .addReg(AddrReg)
+   .addReg(Reg, RegState::Implicit);
 }
   } else {
 // Indirect register access
@@ -292,8 +293,7 @@ bool 
AMDGPUIndirectAddressingPass::runOnMachineFunction(MachineFunction &MF) {
   // We only need to use REG_SEQUENCE for explicit defs, since the
   // register coalescer won't do anything with the implicit defs.
   MachineInstr *DefInstr = MRI.getVRegDef(Reg);
-  if (!DefInstr->getOperand(0).isReg() ||
-  DefInstr->getOperand(0).getReg() != Reg) {
+  if (!regHasExplicitDef(MRI, Reg)) {
 continue;
   }
 
@@ -310,6 +310,7 @@ bool 
AMDGPUIndirectAddressingPass::runOnMachineFunction(MachineFunction &MF) {
 
 
 Mov.addReg(IndirectReg, RegState::Implicit | RegState::Kill);
+Mov.addReg(LiveAddressRegisterMap[Address], RegState::Implicit);
 
   }
   MI.eraseFromParent();
@@ -321,6 +322,26 @@ bool 
AMDGPUIndirectAddressingPass::runOnMachineFunction(MachineFunction &MF) {
 bool AMDGPUIndirectAddressingPass::regHasExplicitDef(MachineRegisterInfo &MRI,
   unsigned Reg) const {
   MachineInstr *DefInstr = MRI.getVRegDef(Reg);
-  return DefInstr && DefInstr->getOperand(0).isReg() &&
+
+  if (!DefInstr) {
+return false;
+  }
+
+  if (DefInstr->getOpcode() == AMDGPU::PHI) {
+bool Explicit = false;
+for (MachineInstr::const_mop_iterator I = DefInstr->operands_begin(),
+  E = DefInstr->operands_end();
+  I != E; ++I) {
+  const MachineOperand &MO = *I;
+  if (!MO.isReg() || MO.isDef()) {
+continue;
+  }
+
+  Explicit = Explicit || regHasExplicitDef(MRI, MO.getReg());
+}
+return Explicit;
+  }
+
+  return DefInstr->getOperand(0).isReg() &&
  DefInstr->getOperand(0).getReg() == Reg;
 }
-- 
1.7.8.6

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/5] R600: Fix scheduler crash caused by invalid MachinePointerInfo

2013-02-13 Thread Tom Stellard
From: Tom Stellard 

Kernel function arguments are lowered to loads from the PARAM_I address
space.  When creating these load instructions, we were initializing
their MachinePointerInfo with an Arguement object that was not attached
to any function.  This was causing the MachineScheduler to crash when
it tried to access the parent of the Arguement.

This has been fixed by initializing the MachinePointerInfo with a
UndefValue instead.

NOTE: This is a candidate for the Mesa stable branch.
---
 lib/Target/R600/R600ISelLowering.cpp |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/lib/Target/R600/R600ISelLowering.cpp 
b/lib/Target/R600/R600ISelLowering.cpp
index a479cee..9c38522 100644
--- a/lib/Target/R600/R600ISelLowering.cpp
+++ b/lib/Target/R600/R600ISelLowering.cpp
@@ -1080,7 +1080,7 @@ SDValue R600TargetLowering::LowerFormalArguments(
 AMDGPUAS::PARAM_I_ADDRESS);
 SDValue Arg = DAG.getExtLoad(ISD::ZEXTLOAD, DL, VT, DAG.getRoot(),
 DAG.getConstant(ParamOffsetBytes, MVT::i32),
-   MachinePointerInfo(new Argument(PtrTy)),
+   
MachinePointerInfo(UndefValue::get(PtrTy)),
ArgVT, false, false, ArgBytes);
 InVals.push_back(Arg);
 ParamOffsetBytes += ArgBytes;
-- 
1.7.8.6

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/5] R600: Mark all members of the TRegMem register class as reserved

2013-02-13 Thread Tom Stellard
From: Tom Stellard 

This stops the Machine Verifier from complaining about uses of undefined
physical registers.

NOTE: This is a candidate for the Mesa stable branch.
---
 lib/Target/R600/R600RegisterInfo.cpp |6 ++
 1 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/lib/Target/R600/R600RegisterInfo.cpp 
b/lib/Target/R600/R600RegisterInfo.cpp
index 33e858d..bbd7995 100644
--- a/lib/Target/R600/R600RegisterInfo.cpp
+++ b/lib/Target/R600/R600RegisterInfo.cpp
@@ -49,6 +49,12 @@ BitVector R600RegisterInfo::getReservedRegs(const 
MachineFunction &MF) const {
 Reserved.set(*I);
   }
 
+  for (TargetRegisterClass::iterator I = AMDGPU::TRegMemRegClass.begin(),
+ E = AMDGPU::TRegMemRegClass.end();
+ I !=  E; ++I) {
+Reserved.set(*I);
+  }
+
   const R600InstrInfo *RII = static_cast(&TII);
   std::vector IndirectRegs = RII->getIndirectReservedRegs(MF);
   for (std::vector::iterator I = IndirectRegs.begin(),
-- 
1.7.8.6

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 4/5] R600: Add AR_X to the R600_TReg_X register class.

2013-02-13 Thread Tom Stellard
From: Tom Stellard 

NOTE: This is a candidate for the Mesa stable branch.
---
 lib/Target/R600/R600RegisterInfo.td |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/lib/Target/R600/R600RegisterInfo.td 
b/lib/Target/R600/R600RegisterInfo.td
index 3812eb7..a7d847a 100644
--- a/lib/Target/R600/R600RegisterInfo.td
+++ b/lib/Target/R600/R600RegisterInfo.td
@@ -81,7 +81,7 @@ def R600_Addr : RegisterClass <"AMDGPU", [i32], 127, (add 
(sequence "Addr%u_X",
 } // End isAllocatable = 0
 
 def R600_TReg32_X : RegisterClass <"AMDGPU", [f32, i32], 32,
-   (add (sequence "T%u_X", 0, 127))>;
+   (add (sequence "T%u_X", 0, 127), AR_X)>;
 
 def R600_TReg32_Y : RegisterClass <"AMDGPU", [f32, i32], 32,
(add (sequence "T%u_Y", 0, 127))>;
-- 
1.7.8.6

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 5/5] R600: initial scheduler code

2013-02-13 Thread Tom Stellard
From: Vadim Girlin 

This is a skeleton for a pre-RA MachineInstr scheduler strategy. Currently
it only tries to expose more parallelism for ALU instructions (this also
makes the distribution of GPR channels more uniform and increases the
chances of ALU instructions to be packed together in a single VLIW group).
Also it tries to reduce clause switching by grouping instruction of the
same kind (ALU/FETCH/CF) together.

Vincent Lejeune:
 - Support for VLIW4 Slot assignement
 - Recomputation of ScheduleDAG to get more parallelism opportunities

Tom Stellard:
 - Fix assertion failure when trying to determine an instruction's slot
   based on its destination register's class
 - Fix some compiler warnings
---
 lib/Target/R600/AMDGPUTargetMachine.cpp  |   17 +-
 lib/Target/R600/R600MachineScheduler.cpp |  462 ++
 lib/Target/R600/R600MachineScheduler.h   |  120 
 test/CodeGen/R600/fdiv.v4f32.ll  |6 +-
 4 files changed, 601 insertions(+), 4 deletions(-)
 create mode 100644 lib/Target/R600/R600MachineScheduler.cpp
 create mode 100644 lib/Target/R600/R600MachineScheduler.h

diff --git a/lib/Target/R600/AMDGPUTargetMachine.cpp 
b/lib/Target/R600/AMDGPUTargetMachine.cpp
index 821e864..e6070cd 100644
--- a/lib/Target/R600/AMDGPUTargetMachine.cpp
+++ b/lib/Target/R600/AMDGPUTargetMachine.cpp
@@ -17,6 +17,7 @@
 #include "AMDGPU.h"
 #include "R600ISelLowering.h"
 #include "R600InstrInfo.h"
+#include "R600MachineScheduler.h"
 #include "SIISelLowering.h"
 #include "SIInstrInfo.h"
 #include "llvm/Analysis/Passes.h"
@@ -39,6 +40,14 @@ extern "C" void LLVMInitializeR600Target() {
   RegisterTargetMachine X(TheAMDGPUTarget);
 }
 
+static ScheduleDAGInstrs *createR600MachineScheduler(MachineSchedContext *C) {
+  return new ScheduleDAGMI(C, new R600SchedStrategy());
+}
+
+static MachineSchedRegistry
+SchedCustomRegistry("r600", "Run R600's custom scheduler",
+createR600MachineScheduler);
+
 AMDGPUTargetMachine::AMDGPUTargetMachine(const Target &T, StringRef TT,
 StringRef CPU, StringRef FS,
   TargetOptions Options,
@@ -70,7 +79,13 @@ namespace {
 class AMDGPUPassConfig : public TargetPassConfig {
 public:
   AMDGPUPassConfig(AMDGPUTargetMachine *TM, PassManagerBase &PM)
-: TargetPassConfig(TM, PM) {}
+: TargetPassConfig(TM, PM) {
+const AMDGPUSubtarget &ST = TM->getSubtarget();
+if (ST.device()->getGeneration() <= AMDGPUDeviceInfo::HD6XXX) {
+  enablePass(&MachineSchedulerID);
+  MachineSchedRegistry::setDefault(createR600MachineScheduler);
+}
+  }
 
   AMDGPUTargetMachine &getAMDGPUTargetMachine() const {
 return getTM();
diff --git a/lib/Target/R600/R600MachineScheduler.cpp 
b/lib/Target/R600/R600MachineScheduler.cpp
new file mode 100644
index 000..3ebe369
--- /dev/null
+++ b/lib/Target/R600/R600MachineScheduler.cpp
@@ -0,0 +1,462 @@
+//===-- R600MachineScheduler.cpp - R600 Scheduler Interface -*- C++ 
-*-===//
+//
+// The LLVM Compiler Infrastructure
+//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+//
+//===--===//
+//
+/// \file
+/// \brief R600 Machine Scheduler interface
+// TODO: Scheduling is optimised for VLIW4 arch, modify it to support TRANS 
slot
+//
+//===--===//
+
+#define DEBUG_TYPE "misched"
+
+#include "R600MachineScheduler.h"
+#include "llvm/CodeGen/MachineRegisterInfo.h"
+#include "llvm/CodeGen/LiveIntervalAnalysis.h"
+#include "llvm/Pass.h"
+#include "llvm/PassManager.h"
+#include 
+#include 
+using namespace llvm;
+
+/// \brief Recompute Output and Anti dependencies of incoming dag
+/// ScheduleDAGInstrs has a conservative policy about subregisters 
dependencies.
+/// All subreg write of a same superreg will be chained by Output/Anti deps.
+/// These artificial deps delay releases of MI and thus reduce parallelism
+/// oportunities. This function recompute the ScheduleDag to produce proper
+/// subreg aware dependencies.
+// Todo : It should also recompute Data dependencies
+static
+void RecomputeScheduleDAGMI(ScheduleDAGMI *dag) {
+
+  // Remove all Output/Anti deps
+  for (unsigned i = 0; i < dag->SUnits.size(); ++i) {
+SUnit &SU = dag->SUnits[i];
+for (SUnit::pred_iterator SUIt = SU.Preds.begin(), SUE = SU.Preds.end(); 
+SUIt != SUE; ++SUIt) {
+  SDep &SD = *SUIt;
+  SUnit *SUPred = SD.getSUnit();
+  if (SD.getKind() == SDep::Output) {
+SUPred->removePred(SD);
+  }
+}
+  }
+
+// Now recompute output/anti dependencies
+  for (unsigned i = 0; i < dag->SUnits.size(); ++i) {
+SUnit &SU = dag->SUnits[i];
+MachineOperand &DestMO = SU.getInstr()->getOperand(0);
+unsigned DestReg = SU.getInstr()->getOperand(0).getReg();
+DEBUG(dbgs() << "Recomputing deps for "; SU.dump(dag); dbgs() << "\n";);
+// Using LiveInterva

Re: [Mesa-dev] [PATCH v2] R600/SI: Fix INTERP_CONST.

2013-02-13 Thread Christian König

Am 13.02.2013 17:07, schrieb Michel Dänzer:

From: Michel Dänzer 

The important fix is that the constant interpolation value is stored in the
parameter slot P0, which is encoded as 2.

In addition, pass the parameter slot as an operand to V_INTERP_MOV_F32
instead of hardcoding it there, and add a special register class for the
parameter slots for type checking and pretty dumping.

NOTE: This is a candidate for the Mesa stable branch.

Signed-off-by: Michel Dänzer 
---

v2: Mark parameter slot registers as reserved.

  lib/Target/R600/AMDGPUAsmPrinter.cpp | 3 +++
  lib/Target/R600/SIISelLowering.cpp   | 1 +
  lib/Target/R600/SIInstructions.td| 3 +--
  lib/Target/R600/SIRegisterInfo.cpp   | 5 +
  lib/Target/R600/SIRegisterInfo.td| 4 
  5 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/lib/Target/R600/AMDGPUAsmPrinter.cpp 
b/lib/Target/R600/AMDGPUAsmPrinter.cpp
index 254e62e..544aed2 100644
--- a/lib/Target/R600/AMDGPUAsmPrinter.cpp
+++ b/lib/Target/R600/AMDGPUAsmPrinter.cpp
@@ -92,6 +92,9 @@ void AMDGPUAsmPrinter::EmitProgramInfo(MachineFunction &MF) {
  default: break;
  case AMDGPU::EXEC:
  case AMDGPU::M0:
+case AMDGPU::P0:
+case AMDGPU::P10:
+case AMDGPU::P20:
continue;
  }
  
diff --git a/lib/Target/R600/SIISelLowering.cpp b/lib/Target/R600/SIISelLowering.cpp

index 92de379..c0b04e5 100644
--- a/lib/Target/R600/SIISelLowering.cpp
+++ b/lib/Target/R600/SIISelLowering.cpp
@@ -186,6 +186,7 @@ void SITargetLowering::LowerSI_INTERP_CONST(MachineInstr 
*MI,
  
BuildMI(BB, I, BB.findDebugLoc(I), TII->get(AMDGPU::V_INTERP_MOV_F32))

.addOperand(dst)
+  .addReg(AMDGPU::P0)
.addOperand(attr_chan)
.addOperand(attr)
.addReg(M0);
diff --git a/lib/Target/R600/SIInstructions.td 
b/lib/Target/R600/SIInstructions.td
index feb0188..d4f7b37 100644
--- a/lib/Target/R600/SIInstructions.td
+++ b/lib/Target/R600/SIInstructions.td
@@ -681,10 +681,9 @@ def V_INTERP_P2_F32 : VINTRP <
  def V_INTERP_MOV_F32 : VINTRP <
0x0002,
(outs VReg_32:$dst),
-  (ins i32imm:$attr_chan, i32imm:$attr, M0Reg:$m0),
+  (ins ParamReg:$src0, i32imm:$attr_chan, i32imm:$attr, M0Reg:$m0),
"V_INTERP_MOV_F32",
[]> {
-  let VSRC = 0;
let DisableEncoding = "$m0";
  }
  
diff --git a/lib/Target/R600/SIRegisterInfo.cpp b/lib/Target/R600/SIRegisterInfo.cpp

index 88275c5..4e7cbf9 100644
--- a/lib/Target/R600/SIRegisterInfo.cpp
+++ b/lib/Target/R600/SIRegisterInfo.cpp
@@ -27,6 +27,11 @@ SIRegisterInfo::SIRegisterInfo(AMDGPUTargetMachine &tm,
  
  BitVector SIRegisterInfo::getReservedRegs(const MachineFunction &MF) const {

BitVector Reserved(getNumRegs());
+
+  Reserved.set(AMDGPU::P0);
+  Reserved.set(AMDGPU::P10);
+  Reserved.set(AMDGPU::P20);
+
return Reserved;
  }
  
diff --git a/lib/Target/R600/SIRegisterInfo.td b/lib/Target/R600/SIRegisterInfo.td

index ab36b87..46c8f91 100644
--- a/lib/Target/R600/SIRegisterInfo.td
+++ b/lib/Target/R600/SIRegisterInfo.td
@@ -23,6 +23,9 @@ def EXEC_HI : SIReg <"EXEC HI", 127>;
  def EXEC : SI_64<"EXEC", [EXEC_LO, EXEC_HI], 126>;
  def SCC : SIReg<"SCC", 253>;
  def M0 : SIReg <"M0", 124>;
+def P10 : SIReg <"P10", 0>;
+def P20 : SIReg <"P20", 1>;
+def P0 : SIReg <"P0", 2>;


I'm not sure if representing constants as registers is such a good idea. 
From the POV of the selection DAG a register is primary the destination 
of an operation.


Maybe using an Operand with a proper "PrintMethod" set might be 
better, take a look at ARMInstPrinter::printThumbSRImm on how that's done.


Christian.

  
  //Interpolation registers

  def PERSP_SAMPLE_I : SIReg <"PERSP_SAMPLE_I">;
@@ -187,4 +190,5 @@ def SCCReg : RegisterClass<"AMDGPU", [i1], 1, (add SCC)>;
  def VCCReg : RegisterClass<"AMDGPU", [i1], 1, (add VCC)>;
  def EXECReg : RegisterClass<"AMDGPU", [i1], 1, (add EXEC)>;
  def M0Reg : RegisterClass<"AMDGPU", [i32], 32, (add M0)>;
+def ParamReg : RegisterClass<"AMDGPU", [i32], 32, (add P0, P10, P20)>;
  


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] r600g: Report Instructions Group count with R600_DUMP_SHADERS=1

2013-02-13 Thread Vincent Lejeune
---
 src/gallium/drivers/r600/r600_asm.c | 7 ++-
 src/gallium/drivers/r600/r600_asm.h | 1 +
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/src/gallium/drivers/r600/r600_asm.c 
b/src/gallium/drivers/r600/r600_asm.c
index 3632aa5..eacdb0c 100644
--- a/src/gallium/drivers/r600/r600_asm.c
+++ b/src/gallium/drivers/r600/r600_asm.c
@@ -812,6 +812,8 @@ static int merge_inst_groups(struct r600_bytecode *bc, 
struct r600_bytecode_alu
 
/* looks like everything worked out right, apply the changes */
 
+   bc->nig --;
+
/* undo adding previus literals */
bc->cf_last->ndw -= align(prev_nliteral, 2);
 
@@ -1140,6 +1142,9 @@ int r600_bytecode_add_alu_type(struct r600_bytecode *bc,
if (nalu->dst.sel >= bc->ngpr) {
bc->ngpr = nalu->dst.sel + 1;
}
+   if (nalu->last) {
+   bc->nig ++;
+   }
LIST_ADDTAIL(&nalu->list, &bc->cf_last->alu);
/* each alu use 2 dwords */
bc->cf_last->ndw += 2;
@@ -2105,7 +2110,7 @@ void r600_bytecode_dump(struct r600_bytecode *bc)
chip = '6';
break;
}
-   fprintf(stderr, "bytecode %d dw -- %d gprs -\n", 
bc->ndw, bc->ngpr);
+   fprintf(stderr, "bytecode %d dw -- %d gprs -- %d ig-\n", 
bc->ndw, bc->ngpr, bc->nig);
fprintf(stderr, " %c\n", chip);
 
LIST_FOR_EACH_ENTRY(cf, &bc->cf, list) {
diff --git a/src/gallium/drivers/r600/r600_asm.h 
b/src/gallium/drivers/r600/r600_asm.h
index 03cd238..1638ca0 100644
--- a/src/gallium/drivers/r600/r600_asm.h
+++ b/src/gallium/drivers/r600/r600_asm.h
@@ -192,6 +192,7 @@ struct r600_bytecode {
struct r600_bytecode_cf *cf_last;
unsignedndw;
unsignedncf;
+   unsignednig; // Number of Instructions Group
unsignedngpr;
unsignednstack;
unsignednresource;
-- 
1.8.1.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 01/12] R600/structurizer: add class to find the Nearest Common Dominator

2013-02-13 Thread Michel Dänzer
On Mit, 2013-02-13 at 11:34 -0500, Tom Stellard wrote: 
> There's just the one cleanup on patch 10 that you mentioned, but
> otherwise the series looks good to me.  Should we mark all these patches
> as candidates for the stable branch?

I think so, at least the parts which prevent things such as Vincent's
MAD changes or switching to the Source scheduler from breaking stuff.


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast |  Debian, X and DRI developer
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 01/12] R600/structurizer: add class to find the Nearest Common Dominator

2013-02-13 Thread Christian König

Am 13.02.2013 18:11, schrieb Michel Dänzer:

On Mit, 2013-02-13 at 11:34 -0500, Tom Stellard wrote:

There's just the one cleanup on patch 10 that you mentioned, but
otherwise the series looks good to me.  Should we mark all these patches
as candidates for the stable branch?

I think so, at least the parts which prevent things such as Vincent's
MAD changes or switching to the Source scheduler from breaking stuff.


I still have no idea why the MAD change actually breaks anything. I'm 
working on OMOD/ABS/NEG folding in another branch and have Vincents MAD 
in there, but I haven't had time to fully figure out what's wrong there. 
And unfortunately I don't think I will have time in the near future.


Christian.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 01/12] R600/structurizer: add class to find the Nearest Common Dominator

2013-02-13 Thread Michel Dänzer
On Mit, 2013-02-13 at 18:17 +0100, Christian König wrote: 
> Am 13.02.2013 18:11, schrieb Michel Dänzer:
> > On Mit, 2013-02-13 at 11:34 -0500, Tom Stellard wrote:
> >> There's just the one cleanup on patch 10 that you mentioned, but
> >> otherwise the series looks good to me.  Should we mark all these patches
> >> as candidates for the stable branch?
> > I think so, at least the parts which prevent things such as Vincent's
> > MAD changes or switching to the Source scheduler from breaking stuff.
> 
> I still have no idea why the MAD change actually breaks anything. I'm 
> working on OMOD/ABS/NEG folding in another branch and have Vincents MAD 
> in there, but I haven't had time to fully figure out what's wrong there. 
> And unfortunately I don't think I will have time in the near future.

Well, as I said, his MAD changes no longer break after this series. Does
anything speak against backporting the whole series to the stable tree?


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast |  Debian, X and DRI developer
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH V3 04/19] glsl: add support for ARB_texture_multisample

2013-02-13 Thread Kenneth Graunke

On 02/13/2013 01:39 AM, Chris Forbes wrote:

Ouch. Thanks for catching that -- are there any other similar things I
need to be aware of?


Not that I can think of, no.

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/3] R600/SI: Add pattern for fceil.

2013-02-13 Thread Michel Dänzer
From: Michel Dänzer 

9 more little piglits with radeonsi.

NOTE: This is a candidate for the Mesa stable branch.

Signed-off-by: Michel Dänzer 
---
 lib/Target/R600/SIInstructions.td | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/lib/Target/R600/SIInstructions.td 
b/lib/Target/R600/SIInstructions.td
index 43908ca..06e8ee6 100644
--- a/lib/Target/R600/SIInstructions.td
+++ b/lib/Target/R600/SIInstructions.td
@@ -608,7 +608,9 @@ defm V_FRACT_F32 : VOP1_32 <0x0020, "V_FRACT_F32",
   [(set VReg_32:$dst, (AMDGPUfract VSrc_32:$src0))]
 >;
 defm V_TRUNC_F32 : VOP1_32 <0x0021, "V_TRUNC_F32", []>;
-defm V_CEIL_F32 : VOP1_32 <0x0022, "V_CEIL_F32", []>;
+defm V_CEIL_F32 : VOP1_32 <0x0022, "V_CEIL_F32",
+  [(set VReg_32:$dst, (fceil VSrc_32:$src0))]
+>;
 defm V_RNDNE_F32 : VOP1_32 <0x0023, "V_RNDNE_F32",
   [(set VReg_32:$dst, (frint VSrc_32:$src0))]
 >;
-- 
1.8.1.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/3] R600/SI: Add pattern for sign extension of i1 to i32.

2013-02-13 Thread Michel Dänzer
From: Michel Dänzer 

16 more little piglits with radeonsi.

NOTE: This is a candidate for the Mesa stable branch.

Signed-off-by: Michel Dänzer 
---
 lib/Target/R600/SIInstructions.td | 5 +
 1 file changed, 5 insertions(+)

diff --git a/lib/Target/R600/SIInstructions.td 
b/lib/Target/R600/SIInstructions.td
index d4f7b37..ad1eff7 100644
--- a/lib/Target/R600/SIInstructions.td
+++ b/lib/Target/R600/SIInstructions.td
@@ -1404,6 +1404,11 @@ def : Pat <
   0, 0, 0, 0), sub3)
 >;
 
+def : Pat <
+  (i32 (sext (i1 SReg_64:$src0))),
+  (V_CNDMASK_B32_e64 (i32 0), (i32 -1), SReg_64:$src0)
+>;
+
 /** == **/
 /**   VOP3 Patterns**/
 /** == **/
-- 
1.8.1.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/3] R600/SI: Add pattern for logical or of i1 values.

2013-02-13 Thread Michel Dänzer
From: Michel Dänzer 

24 more little piglits with radeonsi.

NOTE: This is a candidate for the Mesa stable branch.

Signed-off-by: Michel Dänzer 
---
 lib/Target/R600/SIInstructions.td | 4 
 1 file changed, 4 insertions(+)

diff --git a/lib/Target/R600/SIInstructions.td 
b/lib/Target/R600/SIInstructions.td
index ad1eff7..43908ca 100644
--- a/lib/Target/R600/SIInstructions.td
+++ b/lib/Target/R600/SIInstructions.td
@@ -982,6 +982,10 @@ def : Pat <
 
 def S_OR_B32 : SOP2_32 <0x0010, "S_OR_B32", []>;
 def S_OR_B64 : SOP2_64 <0x0011, "S_OR_B64", []>;
+def : Pat <
+  (i1 (or SSrc_64:$src0, SSrc_64:$src1)),
+  (S_OR_B64 SSrc_64:$src0, SSrc_64:$src1)
+>;
 def S_XOR_B32 : SOP2_32 <0x0012, "S_XOR_B32", []>;
 def S_XOR_B64 : SOP2_64 <0x0013, "S_XOR_B64", []>;
 def S_ANDN2_B32 : SOP2_32 <0x0014, "S_ANDN2_B32", []>;
-- 
1.8.1.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] radeonsi: Fix up and enable flat shading.

2013-02-13 Thread Tom Stellard
On Wed, Feb 13, 2013 at 04:34:24PM +0100, Michel Dänzer wrote:
> From: Michel Dänzer 
> 
> Requires corresponding LLVM R600 backend fix to work correctly, but even
> without that it doesn't hang anymore.
> 
> 13 more little piglits.
> 
> NOTE: This is a candidate for the 9.1 branch.

Since this change depends on a change to the LLVM backend, could you
commit the LLVM change first and then add the revision number of that
commit to this commit message.  Something like:

Depends on LLVM: r123456

This will help make it easier if we have to bisect changes.

-Tom

> 
> Signed-off-by: Michel Dänzer 
> ---
>  src/gallium/drivers/radeonsi/radeonsi_shader.c | 11 +--
>  src/gallium/drivers/radeonsi/radeonsi_shader.h |  1 +
>  src/gallium/drivers/radeonsi/si_state.c|  5 ++---
>  src/gallium/drivers/radeonsi/si_state_draw.c   | 10 +-
>  4 files changed, 5 insertions(+), 22 deletions(-)
> 
> diff --git a/src/gallium/drivers/radeonsi/radeonsi_shader.c 
> b/src/gallium/drivers/radeonsi/radeonsi_shader.c
> index 7c57f4c..704f859 100644
> --- a/src/gallium/drivers/radeonsi/radeonsi_shader.c
> +++ b/src/gallium/drivers/radeonsi/radeonsi_shader.c
> @@ -301,14 +301,8 @@ static void declare_input_fs(
>   /* XXX: Handle all possible interpolation modes */
>   switch (decl->Interp.Interpolate) {
>   case TGSI_INTERPOLATE_COLOR:
> - /* XXX: Flat shading hangs the GPU */
> - if (si_shader_ctx->rctx->queued.named.rasterizer &&
> - si_shader_ctx->rctx->queued.named.rasterizer->flatshade) {
> -#if 0
> + if (si_shader_ctx->key.flatshade) {
>   intr_name = "llvm.SI.fs.interp.constant";
> -#else
> - intr_name = "llvm.SI.fs.interp.linear.center";
> -#endif
>   } else {
>   if (decl->Interp.Centroid)
>   intr_name = "llvm.SI.fs.interp.persp.centroid";
> @@ -317,11 +311,8 @@ static void declare_input_fs(
>   }
>   break;
>   case TGSI_INTERPOLATE_CONSTANT:
> - /* XXX: Flat shading hangs the GPU */
> -#if 0
>   intr_name = "llvm.SI.fs.interp.constant";
>   break;
> -#endif
>   case TGSI_INTERPOLATE_LINEAR:
>   if (decl->Interp.Centroid)
>   intr_name = "llvm.SI.fs.interp.linear.centroid";
> diff --git a/src/gallium/drivers/radeonsi/radeonsi_shader.h 
> b/src/gallium/drivers/radeonsi/radeonsi_shader.h
> index 07b2f9f..f54f67c 100644
> --- a/src/gallium/drivers/radeonsi/radeonsi_shader.h
> +++ b/src/gallium/drivers/radeonsi/radeonsi_shader.h
> @@ -82,6 +82,7 @@ struct si_shader_key {
>   unsignednr_cbufs:4;
>   unsignedcolor_two_side:1;
>   unsignedalpha_func:3;
> + unsignedflatshade:1;
>   float   alpha_ref;
>  };
>  
> diff --git a/src/gallium/drivers/radeonsi/si_state.c 
> b/src/gallium/drivers/radeonsi/si_state.c
> index 5cca085..6bd1055 100644
> --- a/src/gallium/drivers/radeonsi/si_state.c
> +++ b/src/gallium/drivers/radeonsi/si_state.c
> @@ -419,8 +419,7 @@ static void *si_create_rs_state(struct pipe_context *ctx,
>   rs->offset_units = state->offset_units;
>   rs->offset_scale = state->offset_scale * 12.0f;
>  
> - /* XXX: Flat shading hangs the GPU */
> - tmp = S_0286D4_FLAT_SHADE_ENA(0);
> + tmp = S_0286D4_FLAT_SHADE_ENA(rs->flatshade);
>   if (state->sprite_coord_enable) {
>   tmp |= S_0286D4_PNT_SPRITE_ENA(1) |
>   
> S_0286D4_PNT_SPRITE_OVRD_X(V_0286D4_SPI_PNT_SPRITE_SEL_S) |
> @@ -1875,7 +1874,7 @@ static INLINE struct si_shader_key 
> si_shader_selector_key(struct pipe_context *c
>   key.export_16bpc = rctx->export_16bpc;
>   if (rctx->queued.named.rasterizer) {
>   key.color_two_side = 
> rctx->queued.named.rasterizer->two_side;
> - /*key.flatshade = 
> rctx->queued.named.rasterizer->flatshade;*/
> + key.flatshade = 
> rctx->queued.named.rasterizer->flatshade;
>   }
>   if (rctx->queued.named.dsa) {
>   key.alpha_func = rctx->queued.named.dsa->alpha_func;
> diff --git a/src/gallium/drivers/radeonsi/si_state_draw.c 
> b/src/gallium/drivers/radeonsi/si_state_draw.c
> index 3704410..61dea74 100644
> --- a/src/gallium/drivers/radeonsi/si_state_draw.c
> +++ b/src/gallium/drivers/radeonsi/si_state_draw.c
> @@ -128,11 +128,6 @@ static void si_pipe_shader_ps(struct pipe_context *ctx, 
> struct si_pipe_shader *s
>   continue;
>   }
>  
> - /* XXX: Flat shading hangs the GPU */
> - if (shader->shader.input[i].interpolate == 
> TGSI_INTERPOLATE_CONSTANT ||
> - (shader->shader.input[i].interpolate == 
> TGSI_INTERPOLATE_COLOR &&
> -  rctx->queued.named.rasterizer->flatshade))
> -

[Mesa-dev] [PATCH 1/2] r600g: Add $(DEFINES) to AM_CXXFLAGS

2013-02-13 Thread Tom Stellard
From: Tom Stellard 

This way llvm_wrapper.cpp is compiled with -DHAVE_LLVM=0x
---
 src/gallium/drivers/r600/Makefile.am | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/gallium/drivers/r600/Makefile.am 
b/src/gallium/drivers/r600/Makefile.am
index 6f48b56..e1c8574 100644
--- a/src/gallium/drivers/r600/Makefile.am
+++ b/src/gallium/drivers/r600/Makefile.am
@@ -26,7 +26,8 @@ AM_CFLAGS += \
-I$(top_srcdir)/src/gallium/drivers/radeon/
 
 AM_CXXFLAGS= \
-   $(LLVM_CXXFLAGS)
+   $(LLVM_CXXFLAGS) \
+   $(DEFINES)
 endif
 
 if USE_R600_LLVM_COMPILER
-- 
1.7.11.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/2] radeon/llvm: Fix build with LLVM 3.3

2013-02-13 Thread Tom Stellard
From: Tom Stellard 

---
 src/gallium/drivers/r600/llvm_wrapper.cpp   | 4 
 src/gallium/drivers/radeon/radeon_llvm_emit.cpp | 9 -
 2 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/src/gallium/drivers/r600/llvm_wrapper.cpp 
b/src/gallium/drivers/r600/llvm_wrapper.cpp
index 81f5397..ae2f4d2 100644
--- a/src/gallium/drivers/r600/llvm_wrapper.cpp
+++ b/src/gallium/drivers/r600/llvm_wrapper.cpp
@@ -1,6 +1,10 @@
 #include 
 #include 
+#if HAVE_LLVM < 0x0303
 #include 
+#else
+#include 
+#endif
 #include 
 #include 
 #include 
diff --git a/src/gallium/drivers/radeon/radeon_llvm_emit.cpp 
b/src/gallium/drivers/radeon/radeon_llvm_emit.cpp
index 903e102..0491e64 100644
--- a/src/gallium/drivers/radeon/radeon_llvm_emit.cpp
+++ b/src/gallium/drivers/radeon/radeon_llvm_emit.cpp
@@ -25,8 +25,16 @@
  */
 #include "radeon_llvm_emit.h"
 
+#if HAVE_LLVM < 0x0303
 #include 
 #include 
+#include 
+#else
+#include 
+#include 
+#include 
+#endif
+
 #include 
 #include 
 #include 
@@ -39,7 +47,6 @@
 #include 
 #include 
 #include 
-#include 
 
 #include 
 #include 
-- 
1.7.11.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2] shaderapi: Fix AttachShader error

2013-02-13 Thread Chad Versace
Thanks, committed.

On 02/11/2013 10:42 PM, Tapani Pälli wrote:
> From: bma 
> 
> Detect a duplicate Shader type as and error instead of silently allowing
> it, restrict to ES2 API.
> 
> v2: Tapani Pälli 
> - make the check run time instead of compile time
> 
> Signed-off-by: bma 
> Signed-off-by: Tapani Pälli 
> ---
>  src/mesa/main/shaderapi.c | 12 
>  1 file changed, 12 insertions(+)
> 
> diff --git a/src/mesa/main/shaderapi.c b/src/mesa/main/shaderapi.c
> index 2590abe..64f7ed3 100644
> --- a/src/mesa/main/shaderapi.c
> +++ b/src/mesa/main/shaderapi.c
> @@ -207,6 +207,8 @@ attach_shader(struct gl_context *ctx, GLuint program, 
> GLuint shader)
> struct gl_shader *sh;
> GLuint i, n;
>  
> +   const bool same_type_disallowed = _mesa_is_gles(ctx);
> +
> shProg = _mesa_lookup_shader_program_err(ctx, program, "glAttachShader");
> if (!shProg)
>return;
> @@ -227,6 +229,16 @@ attach_shader(struct gl_context *ctx, GLuint program, 
> GLuint shader)
>*/
>   _mesa_error(ctx, GL_INVALID_OPERATION, "glAttachShader");
>   return;
> +  } else if (same_type_disallowed &&
> + shProg->Shaders[i]->Type == sh->Type) {
> +/* Shader with the same type is already attached to this program,
> + * OpenGL ES 2.0 and 3.0 specs say:
> + *
> + *  "Multiple shader objects of the same type may not be attached
> + *  to a single program object."
> + */
> + _mesa_error(ctx, GL_INVALID_OPERATION, "glAttachShader");
> + return;
>}
> }
>  
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2] gles2: a stub implementation for GL_EXT_discard_framebuffer

2013-02-13 Thread Chad Versace
On 02/13/2013 04:38 AM, Tapani Pälli wrote:
> This patch implements a stub for GL_EXT_discard_framebuffer with
> required checks listed by the extension specification. This extension
> is required by GLBenchmark 2.5 when compiled with OpenGL ES 2.0
> as the rendering backend.
> 
> Signed-off-by: Tapani Pälli 

This patch looks good to me.
Reviewed-by: Chad Versace 

Before committing, I'll wait a bit to give others a chance to review.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2] gles2: a stub implementation for GL_EXT_discard_framebuffer

2013-02-13 Thread Chad Versace
On 02/13/2013 02:25 PM, Chad Versace wrote:
> On 02/13/2013 04:38 AM, Tapani Pälli wrote:
>> This patch implements a stub for GL_EXT_discard_framebuffer with
>> required checks listed by the extension specification. This extension
>> is required by GLBenchmark 2.5 when compiled with OpenGL ES 2.0
>> as the rendering backend.
>>
>> Signed-off-by: Tapani Pälli 
> 
> This patch looks good to me.
> Reviewed-by: Chad Versace 

Oops. I withdraw that r-b. I just saw the comments from Ian about
GL_OES_framebuffer_object.

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] r600g: properly implement S8Z24 depth-stencil format for Evergreen

2013-02-13 Thread Jerome Glisse
On Tue, Feb 12, 2013 at 8:06 PM, Marek Olšák  wrote:
> I should say "fix", but it has never been used until now.
> S8Z24 is the format equivalent to the GL_UNSIGNED_INT_24_8 packing,
> so we'll start to see it more often with st/mesa now making smart decisions
> about formats.
>
> The DB<->CB copy can change the channel ordering for transfers, other than
> that, the internal DB format doesn't really matter.
>
> R600-R700 support is possible except shadow mapping.
> FMT_24_8 is broken if the SAMPLE_C instruction is used (no idea why).
>
> Also the sampler swizzling was broken in theory and the fact it worked was
> a lucky coincidence.
>
> radeonsi might need to port this.

Reviewed-by: Jerome Glisse 

> ---
>  src/gallium/drivers/r600/evergreen_state.c |   13 +++-
>  src/gallium/drivers/r600/r600_state.c  |8 -
>  src/gallium/drivers/r600/r600_texture.c|   44 
> ++--
>  3 files changed, 47 insertions(+), 18 deletions(-)
>
> diff --git a/src/gallium/drivers/r600/evergreen_state.c 
> b/src/gallium/drivers/r600/evergreen_state.c
> index 211c218..c6e29db 100644
> --- a/src/gallium/drivers/r600/evergreen_state.c
> +++ b/src/gallium/drivers/r600/evergreen_state.c
> @@ -200,6 +200,8 @@ static uint32_t r600_translate_dbformat(enum pipe_format 
> format)
> return V_028040_Z_16;
> case PIPE_FORMAT_Z24X8_UNORM:
> case PIPE_FORMAT_Z24_UNORM_S8_UINT:
> +   case PIPE_FORMAT_X8Z24_UNORM:
> +   case PIPE_FORMAT_S8_UINT_Z24_UNORM:
> return V_028040_Z_24;
> case PIPE_FORMAT_Z32_FLOAT:
> case PIPE_FORMAT_Z32_FLOAT_S8X24_UINT:
> @@ -339,7 +341,7 @@ static uint32_t r600_translate_colorswap(enum pipe_format 
> format)
>
> case PIPE_FORMAT_X8Z24_UNORM:
> case PIPE_FORMAT_S8_UINT_Z24_UNORM:
> -   return V_028C70_SWAP_STD;
> +   return V_028C70_SWAP_STD_REV;
>
> case PIPE_FORMAT_R10G10B10A2_UNORM:
> case PIPE_FORMAT_R10G10B10X2_SNORM:
> @@ -1106,6 +1108,11 @@ evergreen_create_sampler_view_custom(struct 
> pipe_context *ctx,
> case PIPE_FORMAT_Z32_FLOAT_S8X24_UINT:
> pipe_format = PIPE_FORMAT_Z32_FLOAT;
> break;
> +   case PIPE_FORMAT_X8Z24_UNORM:
> +   case PIPE_FORMAT_S8_UINT_Z24_UNORM:
> +   /* Z24 is always stored like this. */
> +   pipe_format = PIPE_FORMAT_Z24X8_UNORM;
> +   break;
> case PIPE_FORMAT_X24S8_UINT:
> case PIPE_FORMAT_S8X24_UINT:
> case PIPE_FORMAT_X32_S8X24_UINT:
> @@ -1603,6 +1610,8 @@ static void evergreen_init_depth_surface(struct 
> r600_context *rctx,
> switch (surf->base.format) {
> case PIPE_FORMAT_Z24X8_UNORM:
> case PIPE_FORMAT_Z24_UNORM_S8_UINT:
> +   case PIPE_FORMAT_X8Z24_UNORM:
> +   case PIPE_FORMAT_S8_UINT_Z24_UNORM:
> surf->pa_su_poly_offset_db_fmt_cntl =
> S_028B78_POLY_OFFSET_NEG_NUM_DB_BITS((char)-24);
> break;
> @@ -2179,6 +2188,8 @@ static void evergreen_emit_polygon_offset(struct 
> r600_context *rctx, struct r600
> switch (state->zs_format) {
> case PIPE_FORMAT_Z24X8_UNORM:
> case PIPE_FORMAT_Z24_UNORM_S8_UINT:
> +   case PIPE_FORMAT_X8Z24_UNORM:
> +   case PIPE_FORMAT_S8_UINT_Z24_UNORM:
> offset_units *= 2.0f;
> break;
> case PIPE_FORMAT_Z16_UNORM:
> diff --git a/src/gallium/drivers/r600/r600_state.c 
> b/src/gallium/drivers/r600/r600_state.c
> index 5322850..d1f6626 100644
> --- a/src/gallium/drivers/r600/r600_state.c
> +++ b/src/gallium/drivers/r600/r600_state.c
> @@ -270,10 +270,6 @@ static uint32_t r600_translate_colorswap(enum 
> pipe_format format)
> case PIPE_FORMAT_Z24_UNORM_S8_UINT:
> return V_0280A0_SWAP_STD;
>
> -   case PIPE_FORMAT_X8Z24_UNORM:
> -   case PIPE_FORMAT_S8_UINT_Z24_UNORM:
> -   return V_0280A0_SWAP_STD;
> -
> case PIPE_FORMAT_R10G10B10A2_UNORM:
> case PIPE_FORMAT_R10G10B10X2_SNORM:
> case PIPE_FORMAT_R10SG10SB10SA2U_NORM:
> @@ -440,10 +436,6 @@ static uint32_t r600_translate_colorformat(enum 
> pipe_format format)
> case PIPE_FORMAT_Z24_UNORM_S8_UINT:
> return V_0280A0_COLOR_8_24;
>
> -   case PIPE_FORMAT_X8Z24_UNORM:
> -   case PIPE_FORMAT_S8_UINT_Z24_UNORM:
> -   return V_0280A0_COLOR_24_8;
> -
> case PIPE_FORMAT_Z32_FLOAT_S8X24_UINT:
> return V_0280A0_COLOR_X24_8_32_FLOAT;
>
> diff --git a/src/gallium/drivers/r600/r600_texture.c 
> b/src/gallium/drivers/r600/r600_texture.c
> index 85fc887..7f5752d 100644
> --- a/src/gallium/drivers/r600/r600_texture.c
> +++ b/src/gallium/drivers/r600/r600_texture.c
> @@ -985,11 +985,14 @@ uint32_t r600_translate_texformat(struct pipe_screen 
> *screen,
>   

[Mesa-dev] r600g: status of my work on the shader optimization

2013-02-13 Thread Vadim Girlin

Hi,

Last month I finally found the time to work on the rewrite of my 
previous shader optimization branch, now it's mostly done in terms of 
the correctness of produced code and feature support (at least on 
evergreen), though it's still a work in progress in terms of the 
efficiency of generated shader code and the efficiency of the backend 
itself.


I spent some time last year studying the LLVM infrastructure and R600 
LLVM backend and trying to improve it, but after all I came to the 
conclusion that for me it might be easier to implement all that I wanted 
in the custom backend. This allows for more simple and efficient 
implementation - e.g. I don't have to deal with CFGs because in fact we 
have structured code, so it's possible to use more simple and efficient 
algorithms.


Currently the branch has no regressions with piglit's quick-driver.tests 
on evergreen (it doesn't rely on the fallback to unoptimized code for 
the shaders with relative addressing and other cases unlike the previous 
branch), and so far I don't see any rendering issues with the apps that 
I used for testing -  Lightsmark 2008, Unigine Heaven 3.0 and some 
others. There are also some performance improvements with the gpu-bound 
apps.


I tried to keep in mind the differences between chip classes, so I hope 
it should only require minor fixes to make it work on non-evergreen 
chips, but I doubt that it will work out of the box - support for some 
non-evergreen hw-specific features is still missing, e.g. I'm sure that 
indirect addressing currently won't work on R6xx, though basic tests 
might work in theory. Fixing this shouldn't require a lot of work though.


The branch can be found in my freedesktop repo:

http://cgit.freedesktop.org/~vadimg/mesa/log/?h=r600-sb

Regarding the differences from the previous branch - there are some 
additional optimizations, e.g. global value numbering with some basic 
support for constant folding (not all instructions are currently 
handled, but it's easy to extend), global code motion that can hoist 
invariant code out of the loops etc. Some optimizations that were 
implemented in the previous branch are not implemented in the new branch 
(yet), e.g. propagation of modifiers (I'm not even sure if it has any 
noticeable effect on performance).


Unlike the previous branch, there is support for indirect addressing on 
registers -  currently it uses my previously posted patch (that was not 
very welcome) for obtaining the  information about addressable register 
ranges, but it's not required and can be dropped, I just used that patch 
for testing. Without that information opportunities for optimization are 
limited though, and perhaps it makes sense to not try to optimize the 
shaders with indirect gpr addressing at all and rely on the old backend 
until we'll have the proper solution to pass that information to the 
drivers.


There is also initial support for ALU predication, but it's not complete 
and currently unused, I'm not sure if predication support will have 
significant effect on performance that will justify more complex and 
expensive algorithms for register allocator and scheduler, probably I'll 
look into it later, I consider this as a low priority. In the case of 
predicated source code (from LLVM backend) the predication is eliminated 
using speculative execution and conditional moves, same as with the 
simple if-conversion pass that is also implemented.


The branch currently uses as source the bytecode built by the old 
backend (that may also come from LLVM backend) and some additional 
information (about inputs etc), final bytecode is built by the new 
builder in the branch. Building two versions of the bytecode doesn't 
look very efficient, but currently it simplifies debugging. I'm planning 
to implement translation from TGSI directly to my representation, it 
should simplify the translator and allow to get rid of unnecessary 
intermediate passes.


Some old and new environment variables can be used to control the 
behavior of this backend:


R600_SB - 0 - disable new backend completely, 1 - enable (default)
R600_SB_USE_NEW_BYTECODE - 0 - disable use of the produced bytecode 
(useful if you only want to look at the dump of the optimized shader 
without passing it to hw), 1 - enable (default)
R600_DUMP_SHADERS - will also dump the dissasemble of the optimized 
shader after original bytecode (if backend is not disabled with R600_SB=0).


Produced shader code is not ideal - e.g. you may notice not very 
necessary MOVs inserted before DOT4 instructions, it's a known issue and 
I'm going to look into it - this may require rework of the 
regalloc/scheduler. I had to sacrifice some features to make it work 
correctly with Heaven first, so that now I can try to improve it while 
being able to test for regressions.


Also probably there are some issues with the cleanness of the code - I 
had to rework some parts a few times while fixing all problems, so there 
is possibly unused co

Re: [Mesa-dev] [PATCH 2/3] i965: Don't record a WAW dependency between implicit MRF operations.

2013-02-13 Thread Eric Anholt
Kenneth Graunke  writes:

> Meta-instructions that implicitly write then read/consume a MRF value
> don't pose write-after-write conflicts with each other, since they're
> actually:
>
> - Write value 1, then consume it.
> - Write value 2, then consume it.

OK, here's the case I'm thinking of:

1: MOV m2, temp1
2: send m2 no_implied_write
3: send m2 implied_write_m2
4: send m2 implied_write_m2
5: MOV m2, temp2
6: send m2 no_implied_write

Before this series we had deps:
2 -> 1 (RAW)
3 -> 2 (WAW)
4 -> 3 (WAW)
5 -> 4 (WAW)
6 -> 5 (RAW)

I'm concerned that now you only have deps:
2 -> 1 (RAW)
3 -> 2 (WAW)
5 -> 4 (WAW)
5 -> 2 (RAW)
6 -> 5 (RAW)

and that now a possible ordering would be:

1: MOV m2, value
2: send m2 no_implied_write
4: send m2 implied_write_m2
5: MOV m2, value
3: send m2 implied_write_m2
6: send m2 no_implied_write

and "6: send" will be broken.

I could definitely have missed something -- this stuff is super twisty.


pgp4QFDo7xPTK.pgp
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/6] i965/fs: Do a general SEND dependency workaround for the original 965.

2013-02-13 Thread Kenneth Graunke

On 02/06/2013 05:29 PM, Eric Anholt wrote:

We'd been ad-hoc inserting instructions in some SEND messages with no
knowledge of when it was required (so extra instructions), but not all SENDs
(so not often enough).  This should do much better than that, though it's
still flow-control-ignorant.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=58960
NOTE: Candidate for the stable branches.
---
  src/mesa/drivers/dri/i965/brw_fs.cpp  |  225 +
  src/mesa/drivers/dri/i965/brw_fs.h|4 +
  src/mesa/drivers/dri/i965/brw_fs_emit.cpp |   42 --
  3 files changed, 229 insertions(+), 42 deletions(-)


Presumably we should have one of these for the vertex shader as well. :(


diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index fdccd75..264c8c2 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -258,6 +258,26 @@ fs_visitor::VARYING_PULL_CONSTANT_LOAD(fs_reg dst, fs_reg 
surf_index,
 return instructions;
  }

+/**
+ * A helper for MOV generation for fixing up broken hardware SEND dependency
+ * handling.
+ */
+fs_inst *
+fs_visitor::DEP_RESOLVE_MOV(int grf)
+{
+   fs_inst *inst = MOV(brw_null_reg(), fs_reg(GRF, grf, BRW_REGISTER_TYPE_F));
+
+   inst->ir = NULL;
+   inst->annotation = "send dependency resolve";
+
+   /* The caller always wants uncompressed to emit the minimal extra
+* dependencies, and to avoid having to deal with aligning its regs to 2.
+*/
+   inst->force_uncompressed = true;
+
+   return inst;
+}
+
  bool
  fs_inst::equals(fs_inst *inst)
  {
@@ -2228,6 +2248,205 @@ fs_visitor::remove_duplicate_mrf_writes()
 return progress;
  }

+static void
+clear_deps_for_inst_src(fs_inst *inst, int dispatch_width, bool *deps,
+int first_grf, int grf_len)
+{
+   bool inst_16wide = (dispatch_width > 8 &&
+   !inst->force_uncompressed &&
+   !inst->force_sechalf);
+
+   /* Clear the flag for registers that actually got read (as expected). */
+   for (int i = 0; i < 3; i++) {
+  int grf;
+  if (inst->src[i].file == GRF) {
+ grf = inst->src[i].reg;
+  } else if (inst->src[i].file == FIXED_HW_REG &&
+ inst->src[i].fixed_hw_reg.file == BRW_GENERAL_REGISTER_FILE) {
+ grf = inst->src[i].fixed_hw_reg.nr;
+  } else {
+ continue;
+  }
+
+  if (grf >= first_grf &&
+  grf < first_grf + grf_len) {
+ deps[grf - first_grf] = false;
+ if (inst_16wide)
+deps[grf - first_grf + 1] = false;
+  }
+   }
+}
+
+/**
+ * Implements this workaround for the original 965:
+ *
+ * "[DevBW, DevCL] Implementation Restrictions: As the hardware does not
+ *  check for post destination dependencies on this instruction, software
+ *  must ensure that there is no destination hazard for the case of ‘write
+ *  followed by a posted write’ shown in the following example.
+ *
+ *  1. mov r3 0
+ *  2. send r3.xy 
+ *  3. mov r2 r3
+ *
+ *  Due to no post-destination dependency check on the ‘send’, the above
+ *  code sequence could have two instructions (1 and 2) in flight at the
+ *  same time that both consider ‘r3’ as the target of their final writes.
+ */
+void
+fs_visitor::insert_gen4_pre_send_dependency_workarounds(fs_inst *inst)
+{
+   int write_len = inst->regs_written() * dispatch_width / 8;
+   int first_write_grf = inst->dst.reg;
+   bool needs_dep[16];


Perhaps:
bool needs_dep[BRW_MAX_MRF];

I guess it's fine, though...even if someday we expand BRW_MAX_MRF to 
more than 16 (for Sandybridge), this workaround only applies to Gen4 
which will always have exactly 16.



+   assert(write_len < (int)sizeof(needs_dep) - 1);
+
+   memset(needs_dep, false, sizeof(needs_dep));
+   memset(needs_dep, true, write_len);


The second memset only works if sizeof(bool) == 1.  While that's likely 
true with our toolchain, it's explicitly _not_ guaranteed.


From the C++98 specification, section 5.3.3 Sizeof [expr.sizeof], 
paragraph 1: "in particular, sizeof(bool) and sizeof(wchar_t) are 
implementation-defined." Footnote 65 reiterates: "sizeof(bool) is not 
required to be 1."


I'd be more comfortable with:

memset(needs_dep, true, write_len * sizeof(bool));

or the obvious loop.


+
+   clear_deps_for_inst_src(inst, dispatch_width,
+   needs_dep, first_write_grf, write_len);
+
+   /* Walk backwards looking for writes to registers we're writing which
+* aren't read since being written.  If we hit the start of the program,
+* we assume that there are no outstanding dependencies on entry to the
+* program.
+*/
+   for (fs_inst *scan_inst = (fs_inst *)inst->prev;
+scan_inst != NULL;
+scan_inst = (fs_inst *)scan_inst->prev) {
+
+  /* If we hit flow control, assume that there *are* outstanding
+   * dependencies, and force their cleanup befor

Re: [Mesa-dev] [PATCH 2/3] i965: Don't record a WAW dependency between implicit MRF operations.

2013-02-13 Thread Kenneth Graunke

On 02/13/2013 06:31 PM, Eric Anholt wrote:

Kenneth Graunke  writes:


Meta-instructions that implicitly write then read/consume a MRF value
don't pose write-after-write conflicts with each other, since they're
actually:

- Write value 1, then consume it.
- Write value 2, then consume it.


OK, here's the case I'm thinking of:

1: MOV m2, temp1
2: send m2 no_implied_write
3: send m2 implied_write_m2
4: send m2 implied_write_m2
5: MOV m2, temp2
6: send m2 no_implied_write

Before this series we had deps:
2 -> 1 (RAW)
3 -> 2 (WAW)
4 -> 3 (WAW)
5 -> 4 (WAW)
6 -> 5 (RAW)

I'm concerned that now you only have deps:
2 -> 1 (RAW)
3 -> 2 (WAW)
5 -> 4 (WAW)
5 -> 2 (RAW)
6 -> 5 (RAW)

and that now a possible ordering would be:

1: MOV m2, value
2: send m2 no_implied_write
4: send m2 implied_write_m2
5: MOV m2, value
3: send m2 implied_write_m2
6: send m2 no_implied_write

and "6: send" will be broken.

I could definitely have missed something -- this stuff is super twisty.


I double checked your work and got an extra 3 -> 1 (WAW) dependency, but 
that doesn't matter.  The reordering you mentioned is indeed possible. 
Thanks.


Consider patches 1 & 2 NAK'd...I'm working on a new series...
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/3] i965: Don't record a WAW dependency between implicit MRF operations.

2013-02-13 Thread Eric Anholt
Kenneth Graunke  writes:

> On 02/13/2013 06:31 PM, Eric Anholt wrote:
>> Kenneth Graunke  writes:
>>
>>> Meta-instructions that implicitly write then read/consume a MRF value
>>> don't pose write-after-write conflicts with each other, since they're
>>> actually:
>>>
>>> - Write value 1, then consume it.
>>> - Write value 2, then consume it.
>>
>> OK, here's the case I'm thinking of:
>>
>> 1: MOV m2, temp1
>> 2: send m2 no_implied_write
>> 3: send m2 implied_write_m2
>> 4: send m2 implied_write_m2
>> 5: MOV m2, temp2
>> 6: send m2 no_implied_write
>>
>> Before this series we had deps:
>> 2 -> 1 (RAW)
>> 3 -> 2 (WAW)
>> 4 -> 3 (WAW)
>> 5 -> 4 (WAW)
>> 6 -> 5 (RAW)
>>
>> I'm concerned that now you only have deps:
>> 2 -> 1 (RAW)
>> 3 -> 2 (WAW)
>> 5 -> 4 (WAW)
>> 5 -> 2 (RAW)
>> 6 -> 5 (RAW)
>>
>> and that now a possible ordering would be:
>>
>> 1: MOV m2, value
>> 2: send m2 no_implied_write
>> 4: send m2 implied_write_m2
>> 5: MOV m2, value
>> 3: send m2 implied_write_m2
>> 6: send m2 no_implied_write
>>
>> and "6: send" will be broken.
>>
>> I could definitely have missed something -- this stuff is super twisty.
>
> I double checked your work and got an extra 3 -> 1 (WAW) dependency, but 
> that doesn't matter.  The reordering you mentioned is indeed possible. 
> Thanks.
>
> Consider patches 1 & 2 NAK'd...I'm working on a new series...

In releated news, we got a report today that ubo performance sucks (not
surprising, right?), and the testcase shows repeated loads from the same
offset.  That makes me think that we want to have a single IR
instruction for "load from this UBO surface at this offset" which can
get CSEd, and then before the post-regalloc scheduling we would lower it
to the sequence of 3 instructions we use now.  That could deal with
avoiding these dependencies for the case we need to deal with here,
right?  Of course, it would need to use a reserved range of MRFs like
spill/unspill do (though it could be the same range), so that it doesn't
interfere with anything when scheduled between some other instruction
sequence's MRF setup and its SEND.



pgpMwrS_RyquZ.pgp
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] gles2: a stub implementation for GL_EXT_discard_framebuffer

2013-02-13 Thread Tapani Pälli
On 02/13/2013 05:23 PM, Ian Romanick wrote:
> On 02/12/2013 11:04 PM, Tapani Pälli wrote:
>> On 02/12/2013 08:45 PM, Eric Anholt wrote:
>>> Tapani Pälli  writes:
>>>
 On 02/12/2013 12:38 AM, Eric Anholt wrote:
> Tapani Pälli  writes:
>> ---
>>src/mapi/glapi/gen/es_EXT.xml | 13 +
>>src/mesa/drivers/dri/intel/intel_extensions.c |  1 +
>>src/mesa/main/dd.h|  4 ++-
>>src/mesa/main/extensions.c|  1 +
>>src/mesa/main/fbobject.c  | 40
>> +++
>>src/mesa/main/fbobject.h  |  4 +++
>>src/mesa/main/mtypes.h|  1 +
>>src/mesa/main/tests/dispatch_sanity.cpp   |  1 +
>>8 files changed, 64 insertions(+), 1 deletion(-)
>> diff --git a/src/mesa/drivers/dri/intel/intel_extensions.c
>> b/src/mesa/drivers/dri/intel/intel_extensions.c
>> index bf5e2b5..a182139 100755
>> --- a/src/mesa/drivers/dri/intel/intel_extensions.c
>> +++ b/src/mesa/drivers/dri/intel/intel_extensions.c
>> @@ -65,6 +65,7 @@ intelInitExtensions(struct gl_context *ctx)
>>   ctx->Extensions.EXT_blend_equation_separate = true;
>>   ctx->Extensions.EXT_blend_func_separate = true;
>>   ctx->Extensions.EXT_blend_minmax = true;
>> +   ctx->Extensions.EXT_discard_framebuffer = true;
>>   ctx->Extensions.EXT_framebuffer_blit = true;
>>   ctx->Extensions.EXT_framebuffer_object = true;
>>   ctx->Extensions.EXT_fog_coord = true;
>> diff --git a/src/mesa/main/extensions.c b/src/mesa/main/extensions.c
>> index 04435e0..f7bd6f2 100644
>> --- a/src/mesa/main/extensions.c
>> +++ b/src/mesa/main/extensions.c
>> @@ -168,6 +168,7 @@ static const struct extension extension_table[]
>> = {
>>   { "GL_EXT_blend_color",
>> o(EXT_blend_color), GLL,1995 },
>>   { "GL_EXT_blend_equation_separate",
>> o(EXT_blend_equation_separate), GL, 2003 },
>>   { "GL_EXT_blend_func_separate",
>> o(EXT_blend_func_separate), GLL,1999 },
>> +   { "GL_EXT_discard_framebuffer",
>> o(EXT_discard_framebuffer), ES2, 2009 },
>>   { "GL_EXT_blend_minmax",
>> o(EXT_blend_minmax),GLL | ES1 | ES2, 1995 },
>>   { "GL_EXT_blend_subtract",
>> o(dummy_true),  GLL,1995 },
>>   { "GL_EXT_clip_volume_hint",
>> o(EXT_clip_volume_hint),GL, 1996 },
> Is there any reason not to just expose this stub extension everywhere,
> since it requires nothing from the driver author?
 I'm not aware of any other than ES2 users so this is why I enabled it
 only for ES2. I'll add ES1 too as requested by Ian and Chad.
>>> I wasnt't meaning gles1, I meant using o(dummy_true) so that every
>>> driver gets it.
>>>
>>
>> OK, now I got it. Yes, it can be enabled for every driver.
> 
> No, it can't.  This extension requires OES_framebuffer_object.  If we're
> going to enable this on ES1, it must, at the very least, use the same
> enable bit as OES_framebuffer_object.
> 
> +   { "GL_EXT_discard_framebuffer",
> o(EXT_framebuffer_object),ES1 | ES2,
> 2009 },
> 
> I said this in my reply on 2/11:
> 
> http://lists.freedesktop.org/archives/mesa-dev/2013-February/034408.html

Sorry, I managed to miss this. I will change it to use the same bit as
some other OES extensions seem to do also.

>> // Tapani
>>
>> ___
>> mesa-dev mailing list
>> mesa-dev@lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
> 

// Tapani




signature.asc
Description: OpenPGP digital signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev