On 20/05/16 11:04, Kyrill Tkachov wrote:
Hi all,
The recent -frename-registers change exposed a deficiency in the way we fuse
AESE/AESMC instruction
pairs in arm.
Basically we want to enforce:
AESE Vn, _
AESMC Vn, Vn
to enable the fusion, but regrename comes along and renames the output Vn
register in AESMC to something
else, killing the fusion in the hardware.
The solution in this patch is to add an alternative that ties the input and
output registers in the AESMC pattern
and enable that alternative when the fusion is enabled.
With this patch I've confirmed that the above preferred register sequence is
kept even with -frename-registers
when tuning for a cpu that enables the fusion and that the chain is broken by
regrename otherwise and have
seen the appropriate improvement in a proprietary benchmark (that I cannot
name) that exercises this sequence.
Bootstrapped and tested on arm-none-linux-gnueabihf.
Ok for trunk?
Following James's feedback on the AArch64 version, this slightly modified
version uses the enum type for the argument of the new function.
Is this ok instead?
Thanks,
Kyrill
2016-05-27 Kyrylo Tkachov <kyrylo.tkac...@arm.com>
* config/arm/arm.c (arm_fusion_enabled_p): New function.
* config/arm/arm-protos.h (arm_fusion_enabled_p): Declare prototype.
* config/arm/crypto.md (crypto_<crypto_pattern>, CRYPTO_UNARY):
Add "=w,0" alternative. Enable it when AES/AESMC fusion is enabled.
diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index cf221d6793eaf0959f2713fe0903a5d8602ec2f4..12a781de13f2f7816cc2b16b04835d87c83f7abb 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -320,6 +320,7 @@ extern int vfp3_const_double_for_bits (rtx);
extern void arm_emit_coreregs_64bit_shift (enum rtx_code, rtx, rtx, rtx, rtx,
rtx);
+extern bool arm_fusion_enabled_p (tune_params::fuse_ops);
extern bool arm_valid_symbolic_address_p (rtx);
extern bool arm_validize_comparison (rtx *, rtx *, rtx *);
#endif /* RTX_CODE */
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 5110d9e989d605a9e2c262e6007b89a1c7dc7080..39a24c06c123b86883134368ef39794abf11898b 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -29704,6 +29704,13 @@ aarch_macro_fusion_pair_p (rtx_insn* prev, rtx_insn* curr)
return false;
}
+/* Return true iff the instruction fusion described by OP is enabled. */
+bool
+arm_fusion_enabled_p (tune_params::fuse_ops op)
+{
+ return current_tune->fusible_ops & op;
+}
+
/* Implement the TARGET_ASAN_SHADOW_OFFSET hook. */
static unsigned HOST_WIDE_INT
diff --git a/gcc/config/arm/crypto.md b/gcc/config/arm/crypto.md
index c6f17270b1dbaf6dc43eb1e9b8a182dbb0f5a1e1..0f510f069408471fcbf6751f161e984f39929813 100644
--- a/gcc/config/arm/crypto.md
+++ b/gcc/config/arm/crypto.md
@@ -18,14 +18,27 @@
;; along with GCC; see the file COPYING3. If not see
;; <http://www.gnu.org/licenses/>.
+
+;; When AES/AESMC fusion is enabled we want the register allocation to
+;; look like:
+;; AESE Vn, _
+;; AESMC Vn, Vn
+;; So prefer to tie operand 1 to operand 0 when fusing.
+
(define_insn "crypto_<crypto_pattern>"
- [(set (match_operand:<crypto_mode> 0 "register_operand" "=w")
+ [(set (match_operand:<crypto_mode> 0 "register_operand" "=w,w")
(unspec:<crypto_mode> [(match_operand:<crypto_mode> 1
- "register_operand" "w")]
+ "register_operand" "0,w")]
CRYPTO_UNARY))]
"TARGET_CRYPTO"
"<crypto_pattern>.<crypto_size_sfx>\\t%q0, %q1"
- [(set_attr "type" "<crypto_type>")]
+ [(set_attr "type" "<crypto_type>")
+ (set_attr_alternative "enabled"
+ [(if_then_else (match_test
+ "arm_fusion_enabled_p (tune_params::FUSE_AES_AESMC)")
+ (const_string "yes" )
+ (const_string "no"))
+ (const_string "yes")])]
)
(define_insn "crypto_<crypto_pattern>"