On 06/05/25 19:35, Richard Sandiford wrote:
External email: Use caution opening links or attachments
Hi,
Thanks for the update. The patch mostly looks good, but one minor and
one more substantial comment below.
BTW, the patch seems to have been corrupted en route, in that unchanged
lines have too much space. Attaching is fine if that's easier.
Hi,
I have tried using git send-email for the next round of patches. Please let me
know if the formatting is still broken! Thanks.
Dhruv Chawla <dhr...@nvidia.com> writes:
diff --git a/gcc/config/aarch64/aarch64-sve.md
b/gcc/config/aarch64/aarch64-sve.md
index a72ca2a500d..42802bac653 100644
--- a/gcc/config/aarch64/aarch64-sve.md
+++ b/gcc/config/aarch64/aarch64-sve.md
@@ -4149,80 +4149,58 @@
(define_expand "@aarch64_adr<mode>_shift"
[(set (match_operand:SVE_FULL_SDI 0 "register_operand")
(plus:SVE_FULL_SDI
- (unspec:SVE_FULL_SDI
- [(match_dup 4)
- (ashift:SVE_FULL_SDI
- (match_operand:SVE_FULL_SDI 2 "register_operand")
- (match_operand:SVE_FULL_SDI 3 "const_1_to_3_operand"))]
- UNSPEC_PRED_X)
+ (ashift:SVE_FULL_SDI
+ (match_operand:SVE_FULL_SDI 2 "register_operand")
+ (match_operand:SVE_FULL_SDI 3 "const_1_to_3_operand"))
(match_operand:SVE_FULL_SDI 1 "register_operand")))]
"TARGET_SVE && TARGET_NON_STREAMING"
- {
- operands[4] = CONSTM1_RTX (<VPRED>mode);
- }
+ {}
)
The {} can be removed.
[...]
@@ -4803,6 +4781,9 @@
;; Unpredicated shift by a scalar, which expands into one of the vector
;; shifts below.
+;;
+;; The unpredicated form is emitted only when the shift amount is a constant
+;; value that is valid for the shift being carried out.
(define_expand "<ASHIFT:optab><mode>3"
[(set (match_operand:SVE_I 0 "register_operand")
(ASHIFT:SVE_I
@@ -4810,20 +4791,29 @@
(match_operand:<VEL> 2 "general_operand")))]
"TARGET_SVE"
{
- rtx amount;
+ rtx amount = NULL_RTX;
if (CONST_INT_P (operands[2]))
{
- amount = gen_const_vec_duplicate (<MODE>mode, operands[2]);
- if (!aarch64_sve_<lr>shift_operand (operands[2], <MODE>mode))
- amount = force_reg (<MODE>mode, amount);
+ if (aarch64_simd_shift_imm_p (operands[2], <MODE>mode, <optab>_optab ==
ashl_optab))
+ operands[2] = aarch64_simd_gen_const_vector_dup (<MODE>mode, INTVAL
(operands[2]));
+ else
+ {
+ amount = gen_const_vec_duplicate (<MODE>mode, operands[2]);
+ if (!aarch64_sve_<lr>shift_operand (operands[2], <MODE>mode))
+ amount = force_reg (<MODE>mode, amount);
+ }
}
else
{
amount = convert_to_mode (<VEL>mode, operands[2], 0);
amount = expand_vector_broadcast (<MODE>mode, amount);
}
- emit_insn (gen_v<optab><mode>3 (operands[0], operands[1], amount));
- DONE;
+
+ if (amount)
+ {
+ emit_insn (gen_v<optab><mode>3 (operands[0], operands[1], amount));
+ DONE;
+ }
}
)
Instead of the two hunks above, I think we should leave <ASHIFT:optab><mode>3
alone and change v<optab><mode>3.
I was not able to move all the changes to v<optab><mode>3: I had to gate the call to
force_reg on aarch64_simd_shift_imm_p as it would otherwise move the immediate into a register and it
would fail to match the pattern later on in v<optab><mode>3.
This would involve changing:
@@ -4867,27 +4857,27 @@
""
)
-;; Unpredicated shift operations by a constant (post-RA only).
+;; Unpredicated shift operations by a constant.
;; These are generated by splitting a predicated instruction whose
;; predicate is unused.
-(define_insn "*post_ra_v_ashl<mode>3"
+(define_insn "*v_ashl<mode>3"
[(set (match_operand:SVE_I 0 "register_operand")
(ashift:SVE_I
(match_operand:SVE_I 1 "register_operand")
(match_operand:SVE_I 2 "aarch64_simd_lshift_imm")))]
- "TARGET_SVE && reload_completed"
+ "TARGET_SVE"
{@ [ cons: =0 , 1 , 2 ]
[ w , w , vs1 ] add\t%0.<Vetype>, %1.<Vetype>, %1.<Vetype>
[ w , w , Dl ] lsl\t%0.<Vetype>, %1.<Vetype>, #%2
}
)
-(define_insn "*post_ra_v_<optab><mode>3"
+(define_insn "*v_<optab><mode>3"
[(set (match_operand:SVE_I 0 "register_operand" "=w")
(SHIFTRT:SVE_I
(match_operand:SVE_I 1 "register_operand" "w")
(match_operand:SVE_I 2 "aarch64_simd_rshift_imm")))]
- "TARGET_SVE && reload_completed"
+ "TARGET_SVE"
"<shift>\t%0.<Vetype>, %1.<Vetype>, #%2"
)
...these instructions to named patterns, e.g. aarch64_vashl<mode>3_const
and aarch64_v<optab><mode>3_const respectively (with no _ after v, for
consistency with the optab name). Then v<optab><mode>3 could
generate aarch64_v<optab><mode>3_const for the constant case.
Thanks,
Richard
--
Regards,
Dhruv