/On Wed, Nov 2, 2016 at 3:54 AM, James Greenhalgh <james.greenha...@arm.com> wrote: > On Tue, Nov 01, 2016 at 11:08:53AM -0700, Andrew Pinski wrote: >> On Tue, Nov 17, 2015 at 2:10 PM, Andrew Pinski <apin...@cavium.com> wrote: >> > Since ThunderX T88 pass 1 (variant 0) is a ARMv8 part while pass 2 >> > (variant 1) >> > is an ARMv8.1 part, I needed to add detecting of the variant also for this >> > difference. Also I simplify a little bit and combined the single core and >> > arch detecting cases so it would be easier to add variant. >> >> Actually it is a bit more complex than what I said here, see below for >> the full table of options and what are enabled/disabled now. >> >> > OK? Bootstrapped and tested on aarch64-linux-gnu with no regressions. >> > Tested -mcpu=native on both T88 pass 1 and T88 pass 2 to make sure it is >> > deecting the two seperately. >> >> >> Here is the final patch in this series updated; I changed the cpu name >> slightly and made sure I updated invoke.texi too. >> >> The names are going to match the names in LLVM (worked with our LLVM >> engineer here at Cavium about the names). >> Here are the names recorded and >> -mpcu=thunderx: >> * Matches part num 0xA0 (reserved for ThunderX 8x series) >> * T88 Pass 2 scheduling >> * Hardware prefetching (software prefetching disabled) >> * LSE enabled >> * no v8.1 > > This doesn't match the current LLVM proposal > ( https://reviews.llvm.org/D24540 ) which enables full ARMv8.1-A support > for -mcpu=thunderx. > >> -mcpu=thunderxt88: >> * Matches part num 0xA1 >> * T88 Pass 2 scheduling >> * software prefetching enabled >> * LSE enabled >> * no v8.1 >> >> -mcpu=thunderxt88p1 (only for GCC): >> * Matches part num 0xA1, variant 0 >> * T88 Pass 1 scheduling >> * software prefetching enabled >> * no LSE enabled >> * no v8.1 >> >> -mcpu=thunderxt81 and -mcpu=thunderxt83: >> * Matches part num 0xA2/0xA3 >> * T88 Pass 2 scheduling >> * Hardware prefetching (software prefetching disabled) >> * LSE enabled >> * v8.1 > > This looks like what has been added to LLVM as -mcpu=thunderx.
Yes I Know as I tried to mention we came up with this set after both submission happened; next time both myself and my LLVM team will will come to an agreement on names before posting to both LLVM and GCC. > >> I have not hooked up software vs hardware prefetching and the >> scheduler parts (the next patch will do part of that); both ARMv8.1-a >> and LSE parts are hooked up as those parts are only in >> aarch64-cores.def. >> >> OK? Bootstrapped and tested on ThunderX T88 and ThunderX T81 >> (aarch64-linux-gnu). >> >> Index: common/config/aarch64/aarch64-common.c >> =================================================================== >> --- common/config/aarch64/aarch64-common.c (revision 241727) >> +++ common/config/aarch64/aarch64-common.c (working copy) >> @@ -145,7 +145,7 @@ struct arch_to_arch_name >> the default set of architectural feature flags they support. */ >> static const struct processor_name_to_arch all_cores[] = >> { >> -#define AARCH64_CORE(NAME, X, IDENT, ARCH_IDENT, FLAGS, COSTS, IMP, PART) \ >> +#define AARCH64_CORE(NAME, X, IDENT, ARCH_IDENT, FLAGS, COSTS, IMP, PART, >> VARIANT) \ >> {NAME, AARCH64_ARCH_##ARCH_IDENT, FLAGS}, >> #include "config/aarch64/aarch64-cores.def" >> {"generic", AARCH64_ARCH_8A, AARCH64_FL_FOR_ARCH8}, >> Index: config/aarch64/aarch64-cores.def >> =================================================================== >> --- config/aarch64/aarch64-cores.def (revision 241727) >> +++ config/aarch64/aarch64-cores.def (working copy) >> @@ -21,7 +21,7 @@ >> >> Before using #include to read this file, define a macro: >> >> - AARCH64_CORE(CORE_NAME, CORE_IDENT, SCHEDULER_IDENT, ARCH_IDENT, >> FLAGS, COSTS, IMP, PART) >> + AARCH64_CORE(CORE_NAME, CORE_IDENT, SCHEDULER_IDENT, ARCH_IDENT, >> FLAGS, COSTS, IMP, PART, VARIANT) >> >> The CORE_NAME is the name of the core, represented as a string constant. >> The CORE_IDENT is the name of the core, represented as an identifier. >> @@ -39,39 +39,45 @@ >> PART is the part number of the CPU. On a GNU/Linux system it can be >> found in /proc/cpuinfo. For big.LITTLE systems this should use the >> macro AARCH64_BIG_LITTLE where the big part number comes as the first >> - argument to the macro and little is the second. */ >> + argument to the macro and little is the second. >> + VARIANT is the variant of the CPU. In a GNU/Linux system it can found >> + in /proc/cpuinfo. If this is -1, this means it can match any variant. >> */ >> >> /* V8 Architecture Processors. */ >> >> /* ARM ('A') cores. */ >> -AARCH64_CORE("cortex-a35", cortexa35, cortexa53, 8A, AARCH64_FL_FOR_ARCH8 >> | AARCH64_FL_CRC, cortexa35, 0x41, 0xd04) >> -AARCH64_CORE("cortex-a53", cortexa53, cortexa53, 8A, AARCH64_FL_FOR_ARCH8 >> | AARCH64_FL_CRC, cortexa53, 0x41, 0xd03) >> -AARCH64_CORE("cortex-a57", cortexa57, cortexa57, 8A, AARCH64_FL_FOR_ARCH8 >> | AARCH64_FL_CRC, cortexa57, 0x41, 0xd07) >> -AARCH64_CORE("cortex-a72", cortexa72, cortexa57, 8A, AARCH64_FL_FOR_ARCH8 >> | AARCH64_FL_CRC, cortexa72, 0x41, 0xd08) >> -AARCH64_CORE("cortex-a73", cortexa73, cortexa57, 8A, AARCH64_FL_FOR_ARCH8 >> | AARCH64_FL_CRC, cortexa73, 0x41, 0xd09) >> +AARCH64_CORE("cortex-a35", cortexa35, cortexa53, 8A, AARCH64_FL_FOR_ARCH8 >> | AARCH64_FL_CRC, cortexa35, 0x41, 0xd04, -1) >> +AARCH64_CORE("cortex-a53", cortexa53, cortexa53, 8A, AARCH64_FL_FOR_ARCH8 >> | AARCH64_FL_CRC, cortexa53, 0x41, 0xd03, -1) >> +AARCH64_CORE("cortex-a57", cortexa57, cortexa57, 8A, AARCH64_FL_FOR_ARCH8 >> | AARCH64_FL_CRC, cortexa57, 0x41, 0xd07, -1) >> +AARCH64_CORE("cortex-a72", cortexa72, cortexa57, 8A, AARCH64_FL_FOR_ARCH8 >> | AARCH64_FL_CRC, cortexa72, 0x41, 0xd08, -1) >> +AARCH64_CORE("cortex-a73", cortexa73, cortexa57, 8A, AARCH64_FL_FOR_ARCH8 >> | AARCH64_FL_CRC, cortexa73, 0x41, 0xd09, -1) >> >> /* Samsung ('S') cores. */ >> -AARCH64_CORE("exynos-m1", exynosm1, exynosm1, 8A, AARCH64_FL_FOR_ARCH8 >> | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, exynosm1, 0x53, 0x001) >> +AARCH64_CORE("exynos-m1", exynosm1, exynosm1, 8A, AARCH64_FL_FOR_ARCH8 >> | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, exynosm1, 0x53, 0x001, -1) >> >> /* Qualcomm ('Q') cores. */ >> -AARCH64_CORE("qdf24xx", qdf24xx, cortexa57, 8A, AARCH64_FL_FOR_ARCH8 >> | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, qdf24xx, 0x51, 0x800) >> +AARCH64_CORE("qdf24xx", qdf24xx, cortexa57, 8A, AARCH64_FL_FOR_ARCH8 >> | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, qdf24xx, 0x51, 0x800, -1) >> >> /* Cavium ('C') cores. */ >> -AARCH64_CORE("thunderx", thunderx, thunderx, 8A, AARCH64_FL_FOR_ARCH8 >> | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, thunderx, 0x43, 0x0a1) >> +AARCH64_CORE("thunderx", thunderx, thunderx, 8A, >> AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO | AARCH64_FL_LSE, >> thunderx, 0x43, 0x0a0, -1) >> +AARCH64_CORE("thunderxt88p1", thunderxt88p1, thunderx, 8A, >> AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, >> thunderx, 0x43, 0x0a1, 0) >> +AARCH64_CORE("thunderxt88", thunderxt88, thunderx, 8A, >> AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO | AARCH64_FL_LSE, >> thunderx, 0x43, 0x0a1, -1) > > You probably want a comment somewhere here making it clear that the ordering > of thunderxt88p1 and thunderxt88 must remain as is, or detection will fail > (-1 will match before 0). Otherwise someone will come along and helpfully > put these in alphabetical order and cause you trouble... I will do in the next submission. > >> +AARCH64_CORE("thunderxt81", thunderxt81, thunderx, 8_1A, >> AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO | AARCH64_FL_LSE, >> thunderx, 0x43, 0x0a2, -1) >> +AARCH64_CORE("thunderxt83", thunderxt83, thunderx, 8_1A, >> AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO | AARCH64_FL_LSE, >> thunderx, 0x43, 0x0a3, -1) >> >> /* APM ('P') cores. */ >> -AARCH64_CORE("xgene1", xgene1, xgene1, 8A, >> AARCH64_FL_FOR_ARCH8, xgene1, 0x50, 0x000) >> +AARCH64_CORE("xgene1", xgene1, xgene1, 8A, >> AARCH64_FL_FOR_ARCH8, xgene1, 0x50, 0x000, -1) >> >> /* V8.1 Architecture Processors. */ >> >> /* Broadcom ('B') cores. */ >> -AARCH64_CORE("vulcan", vulcan, cortexa57, 8_1A, AARCH64_FL_FOR_ARCH8_1 | >> AARCH64_FL_CRYPTO, vulcan, 0x42, 0x516) >> +AARCH64_CORE("vulcan", vulcan, cortexa57, 8_1A, AARCH64_FL_FOR_ARCH8_1 | >> AARCH64_FL_CRYPTO, vulcan, 0x42, 0x516, -1) >> >> /* V8 big.LITTLE implementations. */ >> >> -AARCH64_CORE("cortex-a57.cortex-a53", cortexa57cortexa53, cortexa53, 8A, >> AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa57, 0x41, AARCH64_BIG_LITTLE >> (0xd07, 0xd03)) >> -AARCH64_CORE("cortex-a72.cortex-a53", cortexa72cortexa53, cortexa53, 8A, >> AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa72, 0x41, AARCH64_BIG_LITTLE >> (0xd08, 0xd03)) >> -AARCH64_CORE("cortex-a73.cortex-a35", cortexa73cortexa35, cortexa53, 8A, >> AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa73, 0x41, AARCH64_BIG_LITTLE >> (0xd09, 0xd04)) >> -AARCH64_CORE("cortex-a73.cortex-a53", cortexa73cortexa53, cortexa53, 8A, >> AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa73, 0x41, AARCH64_BIG_LITTLE >> (0xd09, 0xd03)) >> +AARCH64_CORE("cortex-a57.cortex-a53", cortexa57cortexa53, cortexa53, 8A, >> AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa57, 0x41, AARCH64_BIG_LITTLE >> (0xd07, 0xd03), -1) >> +AARCH64_CORE("cortex-a72.cortex-a53", cortexa72cortexa53, cortexa53, 8A, >> AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa72, 0x41, AARCH64_BIG_LITTLE >> (0xd08, 0xd03), -1) >> +AARCH64_CORE("cortex-a73.cortex-a35", cortexa73cortexa35, cortexa53, 8A, >> AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa73, 0x41, AARCH64_BIG_LITTLE >> (0xd09, 0xd04), -1) >> +AARCH64_CORE("cortex-a73.cortex-a53", cortexa73cortexa53, cortexa53, 8A, >> AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa73, 0x41, AARCH64_BIG_LITTLE >> (0xd09, 0xd03), -1) > > Why do variants for big.LITTLE get a single variant number, but you track > two variant numbers in the code below? You could in theory only track the last variant. But I was thinking rather you cannot have a big.LITTLE where the set of big cores would be the same and the set of LITTLE cores be the same. Doing parsing of /proc/cpuinfo is hard way of getting a good idea of what the cpu is. Really we should be using readdir of /sys/devices/system/cpu to get all cpus (cpuN). And then read regs/identification/midr_el1 and parse that. Note that will only work for Linux 4.8 (and above, maybe 4.9 I can't remember when exactly it went in). Thanks, Andrew > > Thanks, > James