On 28/07/2025 12:06, Tobias Burnus wrote:
Hi Andrew,

Andrew Stubbs wrote:
+static bool
+gcn_v_cmp_insn_p (attr_type type)
+{
+  return type == TYPE_VOPC || type == TYPE_VOP3A;
 }
There are many vop3a encoded instructions. I don't understand how this uniquely identifies v_cmp instructions?

It doesn't - how about an attribute variant?

This part is fine now, thanks.



+      /* CDNA3: VALU writes VGPR/VCC: v_readlane, v_readfirstlane, v_cmp,
+         v_add_*i/u, v_sub_*i/u, v_div_*scale - followed by:
+         - VALU reads SGPR as constant requires 1 waite state
+         - VALU reads SGPR as carry-in requires no waite state
+         - v_readlane/v_writelane reads SGPR as lane select requires 4 wait
+           states.  */
+      if (TARGET_CDNA3_NOPS
+          && (prev_insn->age + nops_rqd) < 4
+          && prev_insn->unit == UNIT_VECTOR
+          && (get_attr_laneselect (prev_insn->insn) == LANESELECT_READ
+          || gcn_v_cmp_insn_p (prev_insn->type)
+          || prev_insn->type == TYPE_VOP2
+          || prev_insn->type == TYPE_VOP3B)
+          && hard_reg_set_intersect_p
+           (depregs, reg_class_contents[(int) SGPR_REGS]))

Is it necessary to check all those attributes? "prev_insn->unit == UNIT_VECTOR" together with the register dependency check is enough to establish "VALU reads SGPR". Is this attempting to rule out the carry- in instructions?

The depregs is to check whether the write followed by read affects the same SGPR register. – I have not tried to handle the kind of register usage so far (carry in vs. constant; lane-select vs other use).

Otherwise, idea is to handle 'VALU writes' by the listed insn, only. I am not sure whether restricting it to the subset of "v_readlane, v_readfirstlane, v_cmp, v_add_*i/u, v_sub_*i/u, v_div_*scale" makes sense or not – as that list covers a large portion of the VALU write insns.

The comment doesn't match and the condition looks wrong though.

+         /* CDNA3: VALU writes VGPR/VCC: v_readlane, v_readfirstlane, v_cmp,
+            v_add_*i/u, v_sub_*i/u, v_div_*scale - followed by:
+            - VALU reads SGPR as constant requires 1 waite state
+            - VALU reads SGPR as carry-in requires no waite state
+            - v_readlane/v_writelane reads SGPR as lane select requires 4 wait
+              states.  */

I think you meant "VALU writes *S*GPR/VCC:".

+         if (TARGET_CDNA3_NOPS
+             && (prev_insn->age + nops_rqd) < 4
+             && prev_insn->unit == UNIT_VECTOR
+             && (get_attr_laneselect (prev_insn->insn) == LANESELECT_READ
+                 || get_attr_vcmp (prev_insn->insn) == VCMP_VCMP
+                 || prev_insn->type == TYPE_VOP2
+                 || prev_insn->type == TYPE_VOP3B)
+             && hard_reg_set_intersect_p
+                  (depregs, reg_class_contents[(int) SGPR_REGS]))

"VALU writes SGPR" is recognized by:

  unit == UNIT_VECTOR
  && hard_reg_set_intersect_p
         (depregs, reg_class_contents[(int) SGPR_REGS]))

Although, to include VCC it should probably be SGPR_SRC_REGS.

The text does continue to list some instructions, but it's pretty much all of the ones that can write to SGPRs. The part I don't understand is "as constant": the ISA manual only really uses the word "constant" for literal values in the instruction encoding, not registers.

And "VALU reads SGPR" is recognized by:

  prev_insn->unit == UNIT_VECTOR
  && hard_reg_set_intersect_p
         (depregs, reg_class_contents[(int) SGPR_REGS]))

So, as far as I can tell, the whole conditional should be:

  iunit == UNIT_VECTOR
  && prev_insn->unit == UNIT_VECTOR
  && hard_reg_set_intersect_p
         (depregs, reg_class_contents[(int) SGPR_SRC_REGS]))

The laneselect attribute is interesting for choosing the number of nops, but I don't see why you're testing VOP2 or VOP3B?



+          if (get_attr_laneselect (insn) != LANESELECT_NO)
+        nops_rqd = 4 - prev_insn->age;
+          else if ((prev_insn->age + nops_rqd) < 1)
+        nops_rqd = 1 - prev_insn->age;

This is safe, but I don't think it actually determines if the value is use *as* laneselect (not for write, anyway).  I might revisit this stuff at some point, but it's fine for now.

True, albeit that's already preexisting for all lane-select checks (one for all CDNA and 3 additional cases for CDNA, one of which is the one above).

Attached is an updated patch (as RFC – or is it by chance already okay?).

Tobias

Reply via email to