On 28/07/2025 13:54, Tobias Burnus wrote:
Hi Andrew,
thanks for all the suggestions and the review!
Andrew Stubbs wrote:
...
+ /* CDNA3: VALU writes VGPR/VCC: v_readlane, v_readfirstlane,
v_cmp,
+ v_add_*i/u, v_sub_*i/u, v_div_*scale - followed by:
+ - VALU reads SGPR as constant requires 1 waite state
+ - VALU reads SGPR as carry-in requires no waite state
+ - v_readlane/v_writelane reads SGPR as lane select requires
4 wait
+ states. */
I think you meant "VALU writes *S*GPR/VCC:".
Ups.
+ if (TARGET_CDNA3_NOPS
+ && (prev_insn->age + nops_rqd) < 4
+ && prev_insn->unit == UNIT_VECTOR
...
unit == UNIT_VECTOR
I think I missed it because VALU read wasn't mentioned for
'v_{read,write}lane'
&& hard_reg_set_intersect_p
(depregs, reg_class_contents[(int) SGPR_REGS]))
Although, to include VCC it should probably be SGPR_SRC_REGS.
(I really need to eventually do a deep-dive at GCC's REGS definitions.)
The text does continue to list some instructions, but it's pretty much
all of the ones that can write to SGPRs. The part I don't understand
is "as constant": the ISA manual only really uses the word "constant"
for literal values in the instruction encoding, not registers.
(Hence, it makes sense to remove all my attempts to narrow it down to
the specified ones.)
So, as far as I can tell, the whole conditional should be:
iunit == UNIT_VECTOR
&& prev_insn->unit == UNIT_VECTOR
&& hard_reg_set_intersect_p
(depregs, reg_class_contents[(int) SGPR_SRC_REGS]))
The laneselect attribute is interesting for choosing the number of
nops, but I don't see why you're testing VOP2 or VOP3B?
That was to detect "v_add_*i/u, v_sub_*i/u, v_div_*scale".
Updated patch attached.
Thanks for all the suggestions and the review! I clearly have to learn
more about CDNA*/RNDA* :-)
Tobias
This version looks good to me.
Thanks
Andrew