[PATCH] D96906: [AMDGPU] gfx90a support

Stanislav Mekhanoshin via Phabricator via cfe-commits Tue, 23 Feb 2021 13:26:03 -0800

rampitec added inline comments.


================
Comment at: llvm/lib/Target/AMDGPU/SIFormMemoryClauses.cpp:191-199
+  MCRegister RepReg;
+  for (MCRegister R : *MRI->getRegClass(Reg)) {
+    if (!MRI->isReserved(R)) {
+      RepReg = R;
+      break;
+    }
+  }
----------------
rampitec wrote:
> arsenm wrote:
> > rampitec wrote:
> > > rampitec wrote:
> > > > arsenm wrote:
> > > > > rampitec wrote:
> > > > > > arsenm wrote:
> > > > > > > rampitec wrote:
> > > > > > > > arsenm wrote:
> > > > > > > > > This is a problem because I've removed forAllLanes.
> > > > > > > > > 
> > > > > > > > > This is a hack, we should be using a different register class 
> > > > > > > > > for cases that don't support a given subregister index not 
> > > > > > > > > scanning for an example non-reserved register
> > > > > > > > This would be massive duplication of all instructions with such 
> > > > > > > > operands, isn't it?
> > > > > > > Ideally yes. We can still use register classes for this, with 
> > > > > > > special care to make sure we never end up with the unaligned 
> > > > > > > virtual registers in the wrong contexts.
> > > > > > > 
> > > > > > >  The less that's tracked by the instruction definitions, the more 
> > > > > > > special case code we have to right. I've been thinking of 
> > > > > > > swapping out the entire MCInstrDesc table per-subtarget to make 
> > > > > > > this easier, although that may be a painful change.
> > > > > > I do not see how it can be less code. You will need to duplicate 
> > > > > > all VALU pseudos, not just real instructions. Which means every 
> > > > > > time you write in the source something like 
> > > > > > AMDGPU::FLAT_LOAD_DWORDX2 you would have to write an if. For every 
> > > > > > VALU instruction.
> > > > > It's less code because the code that's already there is supposed to 
> > > > > rely on the static operand definitions. Every time we want to deviate 
> > > > > from those, we end up writing manual code in the verifier and fixup 
> > > > > things here and there that differ.
> > > > > 
> > > > > The point of swapping out the table would be to eliminate all the 
> > > > > VALU pseudos. We would just have the same enum values referring to 
> > > > > different physical instruction definitions
> > > > This makes sense, although as you said also quite painful and to me 
> > > > also sounds like a hack. There is still a lot of legalization needed 
> > > > even with this approach. Every time you hit an instruction not 
> > > > supported by a target you will need to do something about it. In a 
> > > > worst case expanding. Sounds like another year of work. Especially when 
> > > > you look at highly specialized ASICs which can do this but cannot do 
> > > > that, and you have a lot them.
> > > JBTW it will not help anyway. Not for this problem. You may create an 
> > > operand of a different RC or you may just reserve every other register 
> > > like I did, the net result will be the same, you will end up using 
> > > prohibited register. Imagine you are using an RC where only even tuples 
> > > are added. And then you are using sub1_sub2 subreg of it. RA will happily 
> > > allocate forbidden register just like it does now. To me this is RA bug 
> > > in the first place to allocate a reserved register.
> > > 
> > > The only thing which could help is an another register info without odd 
> > > wide subregs, but that you cannot achieve just by duplication of 
> > > instruction definitions, for that you would need to duplicate register 
> > > info as well. This is almost a new BE.
> > It's not a hack, this is how operand classes are intended to work. You 
> > wouldn't be producing these instructions on targets that don't support them 
> > (ideally we would also have a verifier for this, which is another area 
> > where subtarget handling is weak).
> > 
> > The point is to not reserve them. References to unaligned registers can 
> > exist, they just can't be used in the context of a real machine operand. 
> > D97316 switches to using dedicated classes for alignment (the further 
> > cleanup would be to have this come directly from the instruction 
> > definitions instead of fixing them up after isel).
> > It's not a hack, this is how operand classes are intended to work. You 
> > wouldn't be producing these instructions on targets that don't support them 
> > (ideally we would also have a verifier for this, which is another area 
> > where subtarget handling is weak).
> > 
> > The point is to not reserve them. References to unaligned registers can 
> > exist, they just can't be used in the context of a real machine operand. 
> > D97316 switches to using dedicated classes for alignment (the further 
> > cleanup would be to have this come directly from the instruction 
> > definitions instead of fixing them up after isel).
> 
> I have replied in D97316, but I do not believe it will help as is. It will 
> run into the exactly same issue as with reserved registers approach and their 
> subregs.
I have checked the commit. This code is not even related to vgpr agignment. It 
was about SGPRs in fact:

If stack is present 4 low SGPRs are reserved. The pass was
using just a first register as a representative for RC to
check for reserved subregs. The first register was reserved
and pass failed to find a valid lane split.

The testcase is mem_clause_sreg256_used_stack from memory_clause.mir.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D96906/new/

https://reviews.llvm.org/D96906

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D96906: [AMDGPU] gfx90a support

Reply via email to