On Don, 2013-08-15 at 09:16 -0700, Tom Stellard wrote: > On Thu, Aug 15, 2013 at 08:22:39AM -0700, Tom Stellard wrote: > > On Thu, Aug 15, 2013 at 11:55:36AM +0200, Michel Dänzer wrote: > > > On Fre, 2013-08-02 at 17:58 +0200, Michel Dänzer wrote: > > > > On Mit, 2013-07-31 at 08:42 -0700, Tom Stellard wrote: > > > > > On Wed, Jul 31, 2013 at 01:04:01PM +0200, Michel Dänzer wrote: > > > > > > > > > > > > LLVM revision 187139 ('Allocate local registers in order for optimal > > > > > > coloring.') broke some derivative related piglit tests with the > > > > > > radeonsi > > > > > > driver. > > > > > > > > > > > > I'm attaching a diff between the bad and good generated code (as > > > > > > printed > > > > > > with RADEON_DUMP_SHADERS=1) for the glsl-derivs test. The only > > > > > > difference I can see is in which registers are used in which order. > > > > > > > > > > > > I wonder if we might be missing S_WAITCNT after DS_READ/WRITE > > > > > > instructions in some cases, but I haven't spotted any candidates for > > > > > > that in the bad code which aren't there in the good code as well. > > > > > > Can > > > > > > anyone else spot something I've missed? > > > > > > > > > > Shouldn't we be using the S_BARRIER instruction to keep the threads > > > > > in sync? > > > > > > > > Doesn't seem to help unfortunately, but thanks for the good suggestion. > > > > > > I found one thing going wrong: DS_WRITE_B32 ends up using a VGPR > > > register number instead of the $gds operand for encoding the GDS field > > > (the asm output from llc even shows the VGPR name). If the VGPR number > > > happens to be odd (i.e. to have the least significant bit set), the > > > shader ends up writing to GDS instead of LDS. > > > > > > > Ouch, that's a pretty bad bug. > > > > > But I have no idea why this is happening, or how to fix it. :( > > > > > > > > > > I can take a look at it. > > The attached patch should fix the problem, can you test?
Thanks for finding my silly mistake. However, I'd like to preserve the ability to use these instructions for GDS access, and the logic in SIInsertWaits::getHwCounts() only really makes sense for SMRD anyway. How about this patch instead? It fixes the piglit regressions that prompted me to start this thread. -- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Debian, X and DRI developer
>From 4ad4d9bba31b06bbadb83fead3e53d989b80d83e Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Michel=20D=C3=A4nzer?= <michel.daen...@amd.com> Date: Thu, 15 Aug 2013 19:43:02 +0200 Subject: [PATCH] R600/SI: Fix broken encoding of DS_WRITE_B32 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The logic in SIInsertWaits::getHwCounts() only really made sense for SMRD instructions, and trying to shoehorn it into handling DS_WRITE_B32 caused it to corrupt the encoding of that by clobbering the first operand with the second one. Undo that damage and only apply the SMRD logic to that. Fixes some derivates related piglit regressions with radeonsi. Signed-off-by: Michel Dänzer <michel.daen...@amd.com> --- lib/Target/R600/SIInsertWaits.cpp | 22 ++++++++++++++-------- test/CodeGen/R600/local-memory.ll | 4 ++-- 2 files changed, 16 insertions(+), 10 deletions(-) diff --git a/lib/Target/R600/SIInsertWaits.cpp b/lib/Target/R600/SIInsertWaits.cpp index ba202e3..200e064 100644 --- a/lib/Target/R600/SIInsertWaits.cpp +++ b/lib/Target/R600/SIInsertWaits.cpp @@ -134,14 +134,20 @@ Counters SIInsertWaits::getHwCounts(MachineInstr &MI) { // LGKM may uses larger values if (TSFlags & SIInstrFlags::LGKM_CNT) { - MachineOperand &Op = MI.getOperand(0); - if (!Op.isReg()) - Op = MI.getOperand(1); - assert(Op.isReg() && "First LGKM operand must be a register!"); - - unsigned Reg = Op.getReg(); - unsigned Size = TRI->getMinimalPhysRegClass(Reg)->getSize(); - Result.Named.LGKM = Size > 4 ? 2 : 1; + if (MI.getNumOperands() == 3) { + + // SMRD + MachineOperand &Op = MI.getOperand(0); + assert(Op.isReg() && "First LGKM operand must be a register!"); + + unsigned Reg = Op.getReg(); + unsigned Size = TRI->getMinimalPhysRegClass(Reg)->getSize(); + Result.Named.LGKM = Size > 4 ? 2 : 1; + + } else { + // DS + Result.Named.LGKM = 1; + } } else { Result.Named.LGKM = 0; diff --git a/test/CodeGen/R600/local-memory.ll b/test/CodeGen/R600/local-memory.ll index 5458fb9..9ebb769 100644 --- a/test/CodeGen/R600/local-memory.ll +++ b/test/CodeGen/R600/local-memory.ll @@ -13,7 +13,7 @@ ; SI-CHECK-NEXT: .long 32768 ; EG-CHECK: LDS_WRITE -; SI-CHECK: DS_WRITE_B32 +; SI-CHECK: DS_WRITE_B32 0 ; GROUP_BARRIER must be the last instruction in a clause ; EG-CHECK: GROUP_BARRIER @@ -21,7 +21,7 @@ ; SI-CHECK: S_BARRIER ; EG-CHECK: LDS_READ_RET -; SI-CHECK: DS_READ_B32 +; SI-CHECK: DS_READ_B32 {{VGPR[0-9]+}}, 0 define void @local_memory(i32 addrspace(1)* %out) { entry: -- 1.8.4.rc2
_______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev