On 25/07/2025 11:54, Tobias Burnus wrote:
There are still issues with MI300, some which get resolved by adding s_nop.

One case where it is exactly known where the s_nop fixes a fail is for
libgomp.c-c++-common/task-detach-10.c, where libgomp/single.c's
GOMP_single_start() never returns 1, such that 'omp single' is
never executed. Adding an s_nop fixes it; namely it has to be added at

   v_cmp_eq_u64   vcc, v[4:5], v[8:9]    ;, tmp711, _4
   s_nop  0x0  ; ← now added
   v_mov_b32      v0, vcc_lo     ; tmp744, tmp714

The other case is taken from the manual and I have no idea whether it
actually has an effect (both in how often it gets inserted and whether
it fixes any testcase); still, it makes sense to have it.

Looking at the number of fails for check-target-libgomp on an MI300A
system (x86-64 with CDNA3 GPU), the fails are now down to:

# of expected passes            31299
# of unexpected failures        72
# of unexpected successes       1
# of expected failures          706
# of unresolved testcases       5
# of unsupported tests          867

i.e. 0.2% fail. Or - looking only at the 'execution test' lines, 55 fail
with a total of (8104+55) PASS+FAIL tests; that's 0.7%. Some fails are
known, e.g. 9 link fails because I don't have a gfx{900,…} multilib; some
others are also known fails. Still, most execution-test fails shouldn't
be there ...

* * *

Next step: Identify those libgomp tests which start working when inserting
more s_nop (e.g. at least one 1 or 2). But there are also other known issues,
which are not fixed by even 5 s_nop - and require another solution.

Tobias

PS: One of such fails is for 'indirect' combined with 'target teams'.
It is not quite clear whether any other issue contributes to it, but
there is a known race. See https://gcc.gnu.org/PR114445 (comment 0 for
the known issue, comment 1 for the MI300 issue).


+static bool
+gcn_v_cmp_insn_p (attr_type type)
+{
+  return type == TYPE_VOPC || type == TYPE_VOP3A;
 }

There are many vop3a encoded instructions. I don't understand how this uniquely identifies v_cmp instructions?

+         /* CDNA3: VALU writes VGPR/VCC: v_readlane, v_readfirstlane, v_cmp,
+            v_add_*i/u, v_sub_*i/u, v_div_*scale - followed by:
+            - VALU reads SGPR as constant requires 1 waite state
+            - VALU reads SGPR as carry-in requires no waite state
+            - v_readlane/v_writelane reads SGPR as lane select requires 4 wait
+              states.  */
+         if (TARGET_CDNA3_NOPS
+             && (prev_insn->age + nops_rqd) < 4
+             && prev_insn->unit == UNIT_VECTOR
+             && (get_attr_laneselect (prev_insn->insn) == LANESELECT_READ
+                 || gcn_v_cmp_insn_p (prev_insn->type)
+                 || prev_insn->type == TYPE_VOP2
+                 || prev_insn->type == TYPE_VOP3B)
+             && hard_reg_set_intersect_p
+                  (depregs, reg_class_contents[(int) SGPR_REGS]))

Is it necessary to check all those attributes? "prev_insn->unit == UNIT_VECTOR" together with the register dependency check is enough to establish "VALU reads SGPR". Is this attempting to rule out the carry-in instructions?

+             if (get_attr_laneselect (insn) != LANESELECT_NO)
+               nops_rqd = 4 - prev_insn->age;
+             else if ((prev_insn->age + nops_rqd) < 1)
+               nops_rqd = 1 - prev_insn->age;

This is safe, but I don't think it actually determines if the value is use *as* laneselect (not for write, anyway). I might revisit this stuff at some point, but it's fine for now.

Andrew

Reply via email to