On 28/07/2025 13:54, Tobias Burnus wrote:
Hi Andrew,

thanks for all the suggestions and the review!

Andrew Stubbs wrote:
...
+      /* CDNA3: VALU writes VGPR/VCC: v_readlane, v_readfirstlane, v_cmp,
+         v_add_*i/u, v_sub_*i/u, v_div_*scale - followed by:
+         - VALU reads SGPR as constant requires 1 waite state
+         - VALU reads SGPR as carry-in requires no waite state
+         - v_readlane/v_writelane reads SGPR as lane select requires 4 wait +           states.  */
I think you meant "VALU writes *S*GPR/VCC:".
Ups.
+      if (TARGET_CDNA3_NOPS
+          && (prev_insn->age + nops_rqd) < 4
+          && prev_insn->unit == UNIT_VECTOR
...
unit == UNIT_VECTOR

I think I missed it because VALU read wasn't mentioned for 'v_{read,write}lane'

&& hard_reg_set_intersect_p
         (depregs, reg_class_contents[(int) SGPR_REGS]))

Although, to include VCC it should probably be SGPR_SRC_REGS.

(I really need to eventually do a deep-dive at GCC's REGS definitions.)


The text does continue to list some instructions, but it's pretty much all of the ones that can write to SGPRs. The part I don't understand is "as constant": the ISA manual only really uses the word "constant" for literal values in the instruction encoding, not registers.

(Hence, it makes sense to remove all my attempts to narrow it down to the specified ones.)


So, as far as I can tell, the whole conditional should be:

  iunit == UNIT_VECTOR
  && prev_insn->unit == UNIT_VECTOR
  && hard_reg_set_intersect_p
         (depregs, reg_class_contents[(int) SGPR_SRC_REGS]))

The laneselect attribute is interesting for choosing the number of nops, but I don't see why you're testing VOP2 or VOP3B?

That was to detect "v_add_*i/u, v_sub_*i/u, v_div_*scale".

Updated patch attached.

Thanks for all the suggestions and the review! I clearly have to learn more about CDNA*/RNDA* :-)

Tobias

This version looks good to me.

Thanks

Andrew

Reply via email to