Hi Richard,
On 11/9/19 12:44 PM, Richard Henderson wrote:
On 11/7/19 11:26 AM, Kyrill Tkachov wrote:
-;; The code sequence emitted by this insn pattern uses the Q flag, which GCC
-;; doesn't generally know about, so we don't bother expanding to individual
-;; instructions. It may be better to just use an out-of-line asm libcall for
-;; this.
+;; The code sequence emitted by this insn pattern uses the Q flag, so we need
+;; to bail out when ARM_Q_BIT_READ and resort to a library sequence instead.
+
+(define_expand "ssmulsa3"
+ [(parallel [(set (match_operand:SA 0 "s_register_operand")
+ (ss_mult:SA (match_operand:SA 1 "s_register_operand")
+ (match_operand:SA 2 "s_register_operand")))
+ (clobber (match_scratch:DI 3))
+ (clobber (match_scratch:SI 4))
+ (clobber (reg:CC CC_REGNUM))])]
+ "TARGET_32BIT && arm_arch6"
+ {
+ if (ARM_Q_BIT_READ)
+ FAIL;
+ }
+)
Coming back to this, why would you not just represent the update of the Q bit?
This is not generated by generic pattern matching, but by the __ssmulsa3
builtin function. It seems easy to me to simply describe how this older
builtin operates in conjunction with the new acle builtins.
I recognize that ssadd<mode>3 etc are more difficult, because they can be
generated by arithmetic operations on TYPE_SATURATING. Although again it seems
weird to generate expensive out-of-line code for TYPE_SATURATING when used in
conjunction with acle builtins.
I think it would be better to merely expand the documentation. Even if only so
far as to say "unsupported to mix these".
I'm tempted to agree, as this part of the patch is quite ugly.
Thank you for the comments on these patches, I wasn't aware of some of
the mechanisms.
I guess I should have posted the series as an RFC first...
I'll send patches to fix up the issues.
Thanks,
Kyrill
+(define_expand "maddhisi4"
+ [(set (match_operand:SI 0 "s_register_operand")
+ (plus:SI (mult:SI (sign_extend:SI
+ (match_operand:HI 1 "s_register_operand"))
+ (sign_extend:SI
+ (match_operand:HI 2 "s_register_operand")))
+ (match_operand:SI 3 "s_register_operand")))]
+ "TARGET_DSP_MULTIPLY"
+ {
+ /* If this function reads the Q bit from ACLE intrinsics break up the
+ multiplication and accumulation as an overflow during accumulation will
+ clobber the Q flag. */
+ if (ARM_Q_BIT_READ)
+ {
+ rtx tmp = gen_reg_rtx (SImode);
+ emit_insn (gen_mulhisi3 (tmp, operands[1], operands[2]));
+ emit_insn (gen_addsi3 (operands[0], tmp, operands[3]));
+ DONE;
+ }
+ }
+)
+
+(define_insn "*arm_maddhisi4"
[(set (match_operand:SI 0 "s_register_operand" "=r")
(plus:SI (mult:SI (sign_extend:SI
(match_operand:HI 1 "s_register_operand" "r"))
(sign_extend:SI
(match_operand:HI 2 "s_register_operand" "r")))
(match_operand:SI 3 "s_register_operand" "r")))]
- "TARGET_DSP_MULTIPLY"
+ "TARGET_DSP_MULTIPLY && !ARM_Q_BIT_READ"
"smlabb%?\\t%0, %1, %2, %3"
[(set_attr "type" "smlaxy")
(set_attr "predicable" "yes")]
I think this case would be better represented with a single
define_insn_and_split and a peephole2. It is easy to notice during peep2
whether or not the Q bit is actually live at the exact place we want to expand
this operation. If it is live, then use two insns; if it isn't, use one.
r~