On Don, 2013-08-15 at 09:16 -0700, Tom Stellard wrote:
> On Thu, Aug 15, 2013 at 08:22:39AM -0700, Tom Stellard wrote:
> > On Thu, Aug 15, 2013 at 11:55:36AM +0200, Michel Dänzer wrote:
> > > On Fre, 2013-08-02 at 17:58 +0200, Michel Dänzer wrote:
> > > > On Mit, 2013-07-31 at 08:42 -0700, Tom Stellard wrote:
> > > > > On Wed, Jul 31, 2013 at 01:04:01PM +0200, Michel Dänzer wrote:
> > > > > > 
> > > > > > LLVM revision 187139 ('Allocate local registers in order for optimal
> > > > > > coloring.') broke some derivative related piglit tests with the 
> > > > > > radeonsi
> > > > > > driver. 
> > > > > > 
> > > > > > I'm attaching a diff between the bad and good generated code (as 
> > > > > > printed
> > > > > > with RADEON_DUMP_SHADERS=1) for the glsl-derivs test. The only
> > > > > > difference I can see is in which registers are used in which order.
> > > > > > 
> > > > > > I wonder if we might be missing S_WAITCNT after DS_READ/WRITE
> > > > > > instructions in some cases, but I haven't spotted any candidates for
> > > > > > that in the bad code which aren't there in the good code as well. 
> > > > > > Can
> > > > > > anyone else spot something I've missed?
> > > > > 
> > > > > Shouldn't we be using the S_BARRIER instruction to keep the threads 
> > > > > in sync?
> > > > 
> > > > Doesn't seem to help unfortunately, but thanks for the good suggestion.
> > > 
> > > I found one thing going wrong: DS_WRITE_B32 ends up using a VGPR
> > > register number instead of the $gds operand for encoding the GDS field
> > > (the asm output from llc even shows the VGPR name). If the VGPR number
> > > happens to be odd (i.e. to have the least significant bit set), the
> > > shader ends up writing to GDS instead of LDS.
> > >
> > 
> > Ouch, that's a pretty bad bug.
> > 
> > > But I have no idea why this is happening, or how to fix it. :(
> > > 
> > > 
> > 
> > I can take a look at it.
> 
> The attached patch should fix the problem, can you test?

Thanks for finding my silly mistake.

However, I'd like to preserve the ability to use these instructions for
GDS access, and the logic in SIInsertWaits::getHwCounts() only really
makes sense for SMRD anyway.

How about this patch instead? It fixes the piglit regressions that
prompted me to start this thread.


-- 
Earthling Michel Dänzer           |                   http://www.amd.com
Libre software enthusiast         |          Debian, X and DRI developer
>From 4ad4d9bba31b06bbadb83fead3e53d989b80d83e Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Michel=20D=C3=A4nzer?= <michel.daen...@amd.com>
Date: Thu, 15 Aug 2013 19:43:02 +0200
Subject: [PATCH] R600/SI: Fix broken encoding of DS_WRITE_B32
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The logic in SIInsertWaits::getHwCounts() only really made sense for SMRD
instructions, and trying to shoehorn it into handling DS_WRITE_B32 caused
it to corrupt the encoding of that by clobbering the first operand with
the second one.

Undo that damage and only apply the SMRD logic to that.

Fixes some derivates related piglit regressions with radeonsi.

Signed-off-by: Michel Dänzer <michel.daen...@amd.com>
---
 lib/Target/R600/SIInsertWaits.cpp | 22 ++++++++++++++--------
 test/CodeGen/R600/local-memory.ll |  4 ++--
 2 files changed, 16 insertions(+), 10 deletions(-)

diff --git a/lib/Target/R600/SIInsertWaits.cpp b/lib/Target/R600/SIInsertWaits.cpp
index ba202e3..200e064 100644
--- a/lib/Target/R600/SIInsertWaits.cpp
+++ b/lib/Target/R600/SIInsertWaits.cpp
@@ -134,14 +134,20 @@ Counters SIInsertWaits::getHwCounts(MachineInstr &MI) {
   // LGKM may uses larger values
   if (TSFlags & SIInstrFlags::LGKM_CNT) {
 
-    MachineOperand &Op = MI.getOperand(0);
-    if (!Op.isReg())
-      Op = MI.getOperand(1);
-    assert(Op.isReg() && "First LGKM operand must be a register!");
-
-    unsigned Reg = Op.getReg();
-    unsigned Size = TRI->getMinimalPhysRegClass(Reg)->getSize();
-    Result.Named.LGKM = Size > 4 ? 2 : 1;
+    if (MI.getNumOperands() == 3) {
+
+      // SMRD
+      MachineOperand &Op = MI.getOperand(0);
+      assert(Op.isReg() && "First LGKM operand must be a register!");
+
+      unsigned Reg = Op.getReg();
+      unsigned Size = TRI->getMinimalPhysRegClass(Reg)->getSize();
+      Result.Named.LGKM = Size > 4 ? 2 : 1;
+
+    } else {
+      // DS
+      Result.Named.LGKM = 1;
+    }
 
   } else {
     Result.Named.LGKM = 0;
diff --git a/test/CodeGen/R600/local-memory.ll b/test/CodeGen/R600/local-memory.ll
index 5458fb9..9ebb769 100644
--- a/test/CodeGen/R600/local-memory.ll
+++ b/test/CodeGen/R600/local-memory.ll
@@ -13,7 +13,7 @@
 ; SI-CHECK-NEXT: .long 32768
 
 ; EG-CHECK: LDS_WRITE
-; SI-CHECK: DS_WRITE_B32
+; SI-CHECK: DS_WRITE_B32 0
 
 ; GROUP_BARRIER must be the last instruction in a clause
 ; EG-CHECK: GROUP_BARRIER
@@ -21,7 +21,7 @@
 ; SI-CHECK: S_BARRIER
 
 ; EG-CHECK: LDS_READ_RET
-; SI-CHECK: DS_READ_B32
+; SI-CHECK: DS_READ_B32 {{VGPR[0-9]+}}, 0
 
 define void @local_memory(i32 addrspace(1)* %out) {
 entry:
-- 
1.8.4.rc2

_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Reply via email to