> 2013-09-16 Wei Mi <w...@google.com> > > * gcc/config/i386/i386-c.c (ix86_target_macros_internal): Separate > PROCESSOR_COREI7_AVX out from PROCESSOR_COREI7. > * gcc/config/i386/i386.c (ix86_option_override_internal): Ditto. > (ix86_issue_rate): Ditto. > (ia32_multipass_dfa_lookahead): Ditto. > (ix86_sched_init_global): Ditto. > (get_builtin_code_for_version): Ditto. > * gcc/config/i386/i386.h (enum target_cpu_default): Ditto. > (enum processor_type): Ditto. > * gcc/config/i386/x86-tune.def (DEF_TUNE): Ditto.
This patch is OK. > > 2013-09-16 Wei Mi <w...@google.com> > > * gcc/config/i386/i386.c (ix86_macro_fusion_p): New Function. > (ix86_macro_fusion_pair_p): Ditto. > * gcc/config/i386/i386.h: Add new tune features about macro-fusion. > * gcc/config/i386/x86-tune.def (DEF_TUNE): Ditto. > * gcc/doc/tm.texi: Generated. > * gcc/doc/tm.texi.in: Ditto. > * gcc/haifa-sched.c (try_group_insn): New function. > (group_insns_for_macro_fusion): Ditto. > (sched_init): Call group_insns_for_macro_fusion. > * gcc/sched-rgn.c (add_branch_dependences): Keep insns in > a SCHED_GROUP at the end of BB to remain their location. > * gcc/target.def: Add two hooks: macro_fusion_p and > macro_fusion_pair_p. I think original plan for fusing was to turn the fused instructions into one within combiner. I guess this would lead to quite a large explossion of insns patterns, so the scheduler approach may be better, if scheduler maintainer agree. > > diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c > index 1fd3f60..85b7aa0 100644 > --- a/gcc/config/i386/i386.c > +++ b/gcc/config/i386/i386.c > @@ -24856,6 +24856,90 @@ ia32_multipass_dfa_lookahead (void) > } > } > > +/* Return true if target platform supports macro-fusion. */ > + > +static bool > +ix86_macro_fusion_p () > +{ > + if (TARGET_FUSE_CMP_AND_BRANCH > + && (!TARGET_64BIT || TARGET_FUSE_CMP_AND_BRANCH_64)) You disable fusion for Budozer here sinze you did not add it into TARGET_FUSE_CMP_AND_BRANCH_64. > @@ -364,6 +364,12 @@ extern unsigned char ix86_tune_features[X86_TUNE_LAST]; > ix86_tune_features[X86_TUNE_USE_VECTOR_CONVERTS] > #define TARGET_FUSE_CMP_AND_BRANCH \ > ix86_tune_features[X86_TUNE_FUSE_CMP_AND_BRANCH] > +#define TARGET_FUSE_CMP_AND_BRANCH_64 \ > + ix86_tune_features[X86_TUNE_FUSE_CMP_AND_BRANCH_64] Perhaps we can have TARGET_FUSE_CMP_AND_BRANCH_64 and TARGET_FUSE_CMP_AND_BRANCH_32 plus an macro TARGET_FUSE_CMP_AND_BRANCH that chose corresponding variant based on TARGET_64BIT rather than having to wind down the test in every use. > +#define TARGET_FUSE_CMP_AND_BRANCH_SOFLAGS \ > + ix86_tune_features[X86_TUNE_FUSE_CMP_AND_BRANCH_SOFLAGS] > +#define TARGET_FUSE_ALU_AND_BRANCH \ > + ix86_tune_features[X86_TUNE_FUSE_ALU_AND_BRANCH] > #define TARGET_OPT_AGU ix86_tune_features[X86_TUNE_OPT_AGU] > #define TARGET_VECTORIZE_DOUBLE \ > ix86_tune_features[X86_TUNE_VECTORIZE_DOUBLE] > diff --git a/gcc/config/i386/x86-tune.def b/gcc/config/i386/x86-tune.def > index 4ae5f70..a60d0f4 100644 > --- a/gcc/config/i386/x86-tune.def > +++ b/gcc/config/i386/x86-tune.def > @@ -196,7 +196,22 @@ DEF_TUNE (X86_TUNE_USE_VECTOR_CONVERTS, > "use_vector_converts", m_AMDFAM10) > /* X86_TUNE_FUSE_CMP_AND_BRANCH: Fuse a compare or test instruction > with a subsequent conditional jump instruction into a single > compare-and-branch uop. */ > -DEF_TUNE (X86_TUNE_FUSE_CMP_AND_BRANCH, "fuse_cmp_and_branch", m_BDVER) > +DEF_TUNE (X86_TUNE_FUSE_CMP_AND_BRANCH, "fuse_cmp_and_branch", > + m_CORE_ALL | m_BDVER) > +/* X86_TUNE_FUSE_CMP_AND_BRANCH_64: Fuse compare with a subsequent > + conditional jump instruction for TARGET_64BIT. */ > +DEF_TUNE (X86_TUNE_FUSE_CMP_AND_BRANCH_64, "fuse_cmp_and_branch_64", > + m_COREI7 | m_COREI7_AVX | m_HASWELL) Add m_BDVER > +/* X86_TUNE_FUSE_CMP_AND_BRANCH_SOFLAGS: Fuse compare with a > + subsequent conditional jump instruction when the condition jump > + check sign flag (SF) or overflow flag (OF). */ > +DEF_TUNE (X86_TUNE_FUSE_CMP_AND_BRANCH_SOFLAGS, > "fuse_cmp_and_branch_soflags", > + m_COREI7 | m_COREI7_AVX | m_HASWELL) This flag is affecting only fuding of ALU and BRANCh or should it also affect X86_TUNE_FUSE_CMP_AND_BRANCH? In current implementation it seems to be the first and in that case it ought to be documented that way and probably called ALT_AND_BRANCH_SOFLAGS to avoid confussion. I am not sure if AMD hardware has any limitations here. It fuses only cmp/test as far as I know, but I do not think it matters what flags you use. The i386 specific part of the change seems resonable to me. Honza