On 28/07/2025 12:06, Tobias Burnus wrote:
Hi Andrew,
Andrew Stubbs wrote:
+static bool
+gcn_v_cmp_insn_p (attr_type type)
+{
+ return type == TYPE_VOPC || type == TYPE_VOP3A;
}
There are many vop3a encoded instructions. I don't understand how this
uniquely identifies v_cmp instructions?
It doesn't - how about an attribute variant?
This part is fine now, thanks.
+ /* CDNA3: VALU writes VGPR/VCC: v_readlane, v_readfirstlane,
v_cmp,
+ v_add_*i/u, v_sub_*i/u, v_div_*scale - followed by:
+ - VALU reads SGPR as constant requires 1 waite state
+ - VALU reads SGPR as carry-in requires no waite state
+ - v_readlane/v_writelane reads SGPR as lane select requires
4 wait
+ states. */
+ if (TARGET_CDNA3_NOPS
+ && (prev_insn->age + nops_rqd) < 4
+ && prev_insn->unit == UNIT_VECTOR
+ && (get_attr_laneselect (prev_insn->insn) == LANESELECT_READ
+ || gcn_v_cmp_insn_p (prev_insn->type)
+ || prev_insn->type == TYPE_VOP2
+ || prev_insn->type == TYPE_VOP3B)
+ && hard_reg_set_intersect_p
+ (depregs, reg_class_contents[(int) SGPR_REGS]))
Is it necessary to check all those attributes? "prev_insn->unit ==
UNIT_VECTOR" together with the register dependency check is enough to
establish "VALU reads SGPR". Is this attempting to rule out the carry-
in instructions?
The depregs is to check whether the write followed by read affects the
same SGPR register. – I have not tried to handle the kind of register
usage so far (carry in vs. constant; lane-select vs other use).
Otherwise, idea is to handle 'VALU writes' by the listed insn, only. I
am not sure whether restricting it to the subset of "v_readlane,
v_readfirstlane, v_cmp, v_add_*i/u, v_sub_*i/u, v_div_*scale" makes
sense or not – as that list covers a large portion of the VALU write insns.
The comment doesn't match and the condition looks wrong though.
+ /* CDNA3: VALU writes VGPR/VCC: v_readlane, v_readfirstlane, v_cmp,
+ v_add_*i/u, v_sub_*i/u, v_div_*scale - followed by:
+ - VALU reads SGPR as constant requires 1 waite state
+ - VALU reads SGPR as carry-in requires no waite state
+ - v_readlane/v_writelane reads SGPR as lane select requires 4 wait
+ states. */
I think you meant "VALU writes *S*GPR/VCC:".
+ if (TARGET_CDNA3_NOPS
+ && (prev_insn->age + nops_rqd) < 4
+ && prev_insn->unit == UNIT_VECTOR
+ && (get_attr_laneselect (prev_insn->insn) == LANESELECT_READ
+ || get_attr_vcmp (prev_insn->insn) == VCMP_VCMP
+ || prev_insn->type == TYPE_VOP2
+ || prev_insn->type == TYPE_VOP3B)
+ && hard_reg_set_intersect_p
+ (depregs, reg_class_contents[(int) SGPR_REGS]))
"VALU writes SGPR" is recognized by:
unit == UNIT_VECTOR
&& hard_reg_set_intersect_p
(depregs, reg_class_contents[(int) SGPR_REGS]))
Although, to include VCC it should probably be SGPR_SRC_REGS.
The text does continue to list some instructions, but it's pretty much
all of the ones that can write to SGPRs. The part I don't understand is
"as constant": the ISA manual only really uses the word "constant" for
literal values in the instruction encoding, not registers.
And "VALU reads SGPR" is recognized by:
prev_insn->unit == UNIT_VECTOR
&& hard_reg_set_intersect_p
(depregs, reg_class_contents[(int) SGPR_REGS]))
So, as far as I can tell, the whole conditional should be:
iunit == UNIT_VECTOR
&& prev_insn->unit == UNIT_VECTOR
&& hard_reg_set_intersect_p
(depregs, reg_class_contents[(int) SGPR_SRC_REGS]))
The laneselect attribute is interesting for choosing the number of nops,
but I don't see why you're testing VOP2 or VOP3B?
+ if (get_attr_laneselect (insn) != LANESELECT_NO)
+ nops_rqd = 4 - prev_insn->age;
+ else if ((prev_insn->age + nops_rqd) < 1)
+ nops_rqd = 1 - prev_insn->age;
This is safe, but I don't think it actually determines if the value is
use *as* laneselect (not for write, anyway). I might revisit this
stuff at some point, but it's fine for now.
True, albeit that's already preexisting for all lane-select checks (one
for all CDNA and 3 additional cases for CDNA, one of which is the one
above).
Attached is an updated patch (as RFC – or is it by chance already okay?).
Tobias