[PATCH] D128158: [AMDGPU] Add amdgcn_sched_group_barrier builtin

Jeffrey Byrnes via Phabricator via cfe-commits Fri, 01 Jul 2022 11:45:50 -0700

jrbyrnes added a comment.

Hey Austin -- I like the removal of canAddMIs. In the original design, I was 
leaving open the possibility for users to pass in canAddMIs rather than a mask 
/ SchedGroup name, but it looks like this isn't the direction we're going, and 
the classification functions defined in a general canAddMI makes things easier.


I see this is a WIP, but I've added some thoughts I had from reading it over. I 
may have more as I use the design for my patch.



================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUIGroupLP.cpp:199
+  // SchedGroupMask of instructions that should be barred.
+  SchedGroupMask invertSchedBarrierMask(SchedGroupMask Mask) const;
+
----------------
I find it confusing that SchedBarrier uses inversion while SchedGroupBarrier 
doesn't.


================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUIGroupLP.cpp:306
+bool SchedGroup::isFull() const {
+  return MaxSize.hasValue() && Collection.size() >= *MaxSize;
+}
----------------
As in the update to IGroupLP.cpp in trunk, seems like we are not supposed to 
use hasValue.


================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUIGroupLP.cpp:349
+  add(InitSU);
+  assert(MaxSize.hasValue());
+  (*MaxSize)++;
----------------
Not possible to have unsized groups?


================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUIGroupLP.cpp:445
+  // initialized all of the SCHED_GROUP_BARRIER SchedGroups.
+  addSchedGroupBarrierEdges();
 }
----------------
If both types of barriers are present -- the SchedBarriers are handled first. 
However, if there is a conflict between SchedBarrier and SchedGroupBarrier, 
should SchedBarrier always get the priority? Maybe SchedBarrier should only 
handle groups not present in SchedGroupBarrier?


================
Comment at: llvm/test/CodeGen/AMDGPU/sched-group-barrier-pre-RA.mir:104
+    GLOBAL_STORE_DWORD_SADDR %1, %13, %0, 512, 0, implicit $exec :: (store 
(s32) into %ir.out, !noalias !0, addrspace 1)
+    ; 1 VMEM_READ
+    SCHED_GROUP_BARRIER 32, 1, 0
----------------
I think you are aware of this issue. But the ability for the mutation to match 
the pipeline is dependent upon which instructions go into which group (when an 
instruction can be mapped to multiple groups).

If we had SchedGroups: 2 VMEM_READ, 1 VALU, 1 MFMA, 2 VMEM_READ

and initial schedule: VMEMR, VALU, VMEMR, MFMA, VMEMR, with a dependency 
between middle VMEMR->MFMA. 

initSchedGroup will add the middle VMEMR to the last VMEMR group, but we could 
get a more accurate pipeline by adding it to the first group.




Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D128158/new/

https://reviews.llvm.org/D128158

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D128158: [AMDGPU] Add amdgcn_sched_group_barrier builtin

Reply via email to