On Wed, Sep 30, 2015 at 12:05 PM, Kumar, Venkataramanan <venkataramanan.ku...@amd.com> wrote: > Hi Maintainers, > > The attached patch enables -march=znver1 (AMD family 17h Zen processor). > > Costs and tunings are copied from bdver4, but we will be adjusting them > later for znver1. > Also a basic scheduler description for znver1 is added and we will update > this as we get more information. > > Testing : > GCC bootstrap and gcc regression passes on x86_64-pc-linux-gnu. > GCC bootstrap passed with "make BOOT_CFLAGS= -O2 -g -march=znver1 -mno-adx > -mno-mwaitx -mno-clzero -mno-sha -mno-clflushopt -mno-rdseed" on > x86_64-pc-linux-gnu . > > Built SPEC2006 benchmarks with -march=znver1 and ran it on bdver4 machine. > > Wrf and Calculix failed to compile but looks like a general register > allocation issue not restricted to -march=znver1. > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67717 > > ChangeLog: > * config.gcc (i[34567]86-*-linux* | ...): Add znver1. > (case ${target}): Add znver1. > * config/i386/cpuid.h(bit_CLZERO): Define. > * config/i386/driver-i386.c: (host_detect_local_cpu): Let > -march=native recognize znver1 processors. > * config/i386/i386-c.c (ix86_target_macros_internal): Add > znver1, clzero def_and_undef. > * config/i386/i386.c (struct processor_costs znver1_cost): > New. > (m_znver1): New definition. > (m_AMD_MULTIPLE): Includes m_znver1. > (processor_target_table): Add znver1 entry. > (ix86_target_string) : Add clzero entry. > (static const char *const cpu_names): Add znver1 entry. > (ix86_option_override_internal): Add znver1 instruction sets. > (PTA_CLZERO) : New definition. > (ix86_option_override_internal): Handle new clzerooption. > (ix86_issue_rate): Add znver1. > (ix86_adjust_cost): Add znver1. > (get_builtin_code_for_version): Set priority for > PROCESSOR_ZNVER1. > (ia32_multipass_dfa_lookahead): Add znver1. > (enum processor_model): Add M_AMDFAM17H_znver1. > (struct _arch_names_table): Add M_AMDFAM17H_znver1. > (has_dispatch): Add znver1. > * config/i386/i386.h (TARGET_znver1): New definition. > (TARGET_CLZERO): Define. > (TARGET_CLZERO_P): Define. > (struct ix86_size_cost): Add TARGET_ZNVER1. > (enum processor_type): Add PROCESSOR_znver1. > * config/i386/i386.md (define_attr "cpu"): Add znver1. > (set_attr znver1_decode): New definitions for znver1. > * config/i386/i386.opt (flag_dispatch_scheduler): Add znver1. > (mclzero): New. > * config/i386/mmx.md (set_attr znver1_decode): New definitions > for znver1. > * config/i386/sse.md (set_attr znver1_decode): Likewise. > * config/i386/x86-tune.def: Add znver1 tunings. > * config/i386/znver1.md: Introduce znver1 cpu and include new > md file. > * gcc/doc/extend.texi: Add details about znver1. > * gcc/doc/invoke.texi: Add details about znver1. > > Ok for trunk?
Please remove changes to get_builtin_code_for_version and fold_builtin_cpu. They should be committed in a future patch, together with relevant libgcc changes to libgcc/config/i386/cpuinfo.c. +++ b/gcc/config.gcc @@ -592,7 +592,7 @@ pentium4 pentium4m pentiumpro prescott iamcu" # 64-bit x86 processors supported by --with-arch=. Each processor # MUST be separated by exactly one space. x86_64_archs="amdfam10 athlon64 athlon64-sse3 barcelona bdver1 bdver2 \ -bdver3 bdver4 btver1 btver2 k8 k8-sse3 opteron opteron-sse3 nocona \ +bdver3 bdver4 btver1 btver2 k8 k8-sse3 opteron opteron-sse3 nocona znver1 \ core2 corei7 corei7-avx core-avx-i core-avx2 atom slm nehalem westmere \ Please group the new processor with other AMD processors. +++ b/gcc/config/i386/i386-c.c | PTA_MOVBE | PTA_MWAITX}, - {"btver1", PROCESSOR_BTVER1, CPU_GENERIC, + {"znver1", PROCESSOR_ZNVER1, CPU_ZNVER1, wrong indetation. +++ b/gcc/config/i386/driver-i386.c @@ -414,6 +414,7 @@ const char *host_detect_local_cpu (int argc, const char **argv) unsigned int has_avx512dq = 0, has_avx512bw = 0, has_avx512vl = 0; unsigned int has_avx512vbmi = 0, has_avx512ifma = 0, has_clwb = 0; unsigned int has_pcommit = 0, has_mwaitx = 0; + unsigned int has_clzero = 0; This option should be added to the options string. Please see the end of driver-i386.c on how other options are precessed. +++ b/gcc/config/i386/i386.opt @@ -569,8 +569,8 @@ the function. mdispatch-scheduler Target RejectNegative Var(flag_dispatch_scheduler) -Do dispatch scheduling if processor is bdver1 or bdver2 or bdver3 or bdver4 and Haifa scheduling -is selected. +Do dispatch scheduling if processor is bdver1 or bdver2 or bdver3 or bdver4 +or znver1 and Haifa scheduling is selected. Please reword the above with commas: ... if processor is bdver1, bdver2, bdver3, bdver4 or znver1 ... +++ b/gcc/config/i386/x86-tune.def @@ -59,7 +59,7 @@ DEF_TUNE (X86_TUNE_PARTIAL_REG_DEPENDENCY, "partial_reg_dependency", that can be partly masked by careful scheduling of moves. */ DEF_TUNE (X86_TUNE_SSE_PARTIAL_REG_DEPENDENCY, "sse_partial_reg_dependency", m_PPRO | m_P4_NOCONA | m_CORE_ALL | m_BONNELL | m_AMDFAM10 - | m_BDVER | m_GENERIC) + | m_BDVER | m_GENERIC | m_ZNVER1) [...] Please leave m_GENERIC as the last entry. Also, please group AMD masks together in some consistent way throughout the x86-tune.def file. +++ b/gcc/config/i386/znver1.md +;; SSE converts +(define_insn_reservation "znver1_ssecvtsf_si_load" 9 "SSE conversions" ? +++ b/gcc/doc/extend.texi @@ -16924,6 +16924,9 @@ AMD Family 15h Bulldozer version 3. @item bdver4 AMD Family 15h Bulldozer version 4. +@item znver1 +AMD Family 17h Zen version 1. + @item btver2 AMD Family 16h CPU. @end table Please remove the not yet implemented change above. Otherwise, the patch looks OK. Please repost it with the above changes for the final approval. Uros.