Re: [PATCH 5/5] [AARCH64] Add variant support to -m="native"and add thunderxt88p1.

Andrew Pinski Sun, 06 Nov 2016 01:18:08 -0700

 /On Wed, Nov 2, 2016 at 3:54 AM, James Greenhalgh
<james.greenha...@arm.com> wrote:
> On Tue, Nov 01, 2016 at 11:08:53AM -0700, Andrew Pinski wrote:
>> On Tue, Nov 17, 2015 at 2:10 PM, Andrew Pinski <apin...@cavium.com> wrote:
>> > Since ThunderX T88 pass 1 (variant 0) is a ARMv8 part while pass 2 
>> > (variant 1)
>> > is an ARMv8.1 part, I needed to add detecting of the variant also for this
>> > difference. Also I simplify a little bit and combined the single core and
>> > arch detecting cases so it would be easier to add variant.
>>
>> Actually it is a bit more complex than what I said here, see below for
>> the full table of options and what are enabled/disabled now.
>>
>> > OK?  Bootstrapped and tested on aarch64-linux-gnu with no regressions.
>> > Tested -mcpu=native on both T88 pass 1 and T88 pass 2 to make sure it is
>> > deecting the two seperately.
>>
>>
>> Here is the final patch in this series updated; I changed the cpu name
>> slightly and made sure I updated invoke.texi too.
>>
>> The names are going to match the names in LLVM (worked with our LLVM
>> engineer here at Cavium about the names).
>> Here are the names recorded and
>> -mpcu=thunderx:
>> *        Matches part num 0xA0 (reserved for ThunderX 8x series)
>> *        T88 Pass 2 scheduling
>> *        Hardware prefetching (software prefetching disabled)
>> *        LSE enabled
>> *        no v8.1
>
> This doesn't match the current LLVM proposal
> ( https://reviews.llvm.org/D24540 ) which enables full ARMv8.1-A support
> for -mcpu=thunderx.
>
>> -mcpu=thunderxt88:
>> *        Matches part num 0xA1
>> *        T88 Pass 2 scheduling
>> *        software prefetching enabled
>> *        LSE enabled
>> *        no v8.1
>>
>> -mcpu=thunderxt88p1 (only for GCC):
>> *        Matches part num 0xA1, variant 0
>> *        T88 Pass 1 scheduling
>> *        software prefetching enabled
>> *        no LSE enabled
>> *        no v8.1
>>
>> -mcpu=thunderxt81 and -mcpu=thunderxt83:
>> *        Matches part num 0xA2/0xA3
>> *        T88 Pass 2 scheduling
>> *        Hardware prefetching (software prefetching disabled)
>> *        LSE enabled
>> *        v8.1
>
> This looks like what has been added to LLVM as -mcpu=thunderx.


Yes I Know as I tried to mention we came up with this set after both
submission happened; next time both myself and my LLVM team will will
come to an agreement on names before posting to both LLVM and GCC.

>
>> I have not hooked up software vs hardware prefetching and the
>> scheduler parts (the next patch will do part of that); both ARMv8.1-a
>> and LSE parts are hooked up as those parts are only in
>> aarch64-cores.def.
>>
>> OK?  Bootstrapped and tested on ThunderX T88 and ThunderX T81
>> (aarch64-linux-gnu).
>>
>> Index: common/config/aarch64/aarch64-common.c
>> ===================================================================
>> --- common/config/aarch64/aarch64-common.c    (revision 241727)
>> +++ common/config/aarch64/aarch64-common.c    (working copy)
>> @@ -145,7 +145,7 @@ struct arch_to_arch_name
>>     the default set of architectural feature flags they support.  */
>>  static const struct processor_name_to_arch all_cores[] =
>>  {
>> -#define AARCH64_CORE(NAME, X, IDENT, ARCH_IDENT, FLAGS, COSTS, IMP, PART) \
>> +#define AARCH64_CORE(NAME, X, IDENT, ARCH_IDENT, FLAGS, COSTS, IMP, PART, 
>> VARIANT) \
>>    {NAME, AARCH64_ARCH_##ARCH_IDENT, FLAGS},
>>  #include "config/aarch64/aarch64-cores.def"
>>    {"generic", AARCH64_ARCH_8A, AARCH64_FL_FOR_ARCH8},
>> Index: config/aarch64/aarch64-cores.def
>> ===================================================================
>> --- config/aarch64/aarch64-cores.def  (revision 241727)
>> +++ config/aarch64/aarch64-cores.def  (working copy)
>> @@ -21,7 +21,7 @@
>>
>>     Before using #include to read this file, define a macro:
>>
>> -      AARCH64_CORE(CORE_NAME, CORE_IDENT, SCHEDULER_IDENT, ARCH_IDENT, 
>> FLAGS, COSTS, IMP, PART)
>> +      AARCH64_CORE(CORE_NAME, CORE_IDENT, SCHEDULER_IDENT, ARCH_IDENT, 
>> FLAGS, COSTS, IMP, PART, VARIANT)
>>
>>     The CORE_NAME is the name of the core, represented as a string constant.
>>     The CORE_IDENT is the name of the core, represented as an identifier.
>> @@ -39,39 +39,45 @@
>>     PART is the part number of the CPU.  On a GNU/Linux system it can be
>>     found in /proc/cpuinfo.  For big.LITTLE systems this should use the
>>     macro AARCH64_BIG_LITTLE where the big part number comes as the first
>> -   argument to the macro and little is the second.  */
>> +   argument to the macro and little is the second.
>> +   VARIANT is the variant of the CPU.  In a GNU/Linux system it can found
>> +   in /proc/cpuinfo.  If this is -1, this means it can match any variant.  
>> */
>>
>>  /* V8 Architecture Processors.  */
>>
>>  /* ARM ('A') cores. */
>> -AARCH64_CORE("cortex-a35",  cortexa35, cortexa53, 8A,  AARCH64_FL_FOR_ARCH8 
>> | AARCH64_FL_CRC, cortexa35, 0x41, 0xd04)
>> -AARCH64_CORE("cortex-a53",  cortexa53, cortexa53, 8A,  AARCH64_FL_FOR_ARCH8 
>> | AARCH64_FL_CRC, cortexa53, 0x41, 0xd03)
>> -AARCH64_CORE("cortex-a57",  cortexa57, cortexa57, 8A,  AARCH64_FL_FOR_ARCH8 
>> | AARCH64_FL_CRC, cortexa57, 0x41, 0xd07)
>> -AARCH64_CORE("cortex-a72",  cortexa72, cortexa57, 8A,  AARCH64_FL_FOR_ARCH8 
>> | AARCH64_FL_CRC, cortexa72, 0x41, 0xd08)
>> -AARCH64_CORE("cortex-a73",  cortexa73, cortexa57, 8A,  AARCH64_FL_FOR_ARCH8 
>> | AARCH64_FL_CRC, cortexa73, 0x41, 0xd09)
>> +AARCH64_CORE("cortex-a35",  cortexa35, cortexa53, 8A,  AARCH64_FL_FOR_ARCH8 
>> | AARCH64_FL_CRC, cortexa35, 0x41, 0xd04, -1)
>> +AARCH64_CORE("cortex-a53",  cortexa53, cortexa53, 8A,  AARCH64_FL_FOR_ARCH8 
>> | AARCH64_FL_CRC, cortexa53, 0x41, 0xd03, -1)
>> +AARCH64_CORE("cortex-a57",  cortexa57, cortexa57, 8A,  AARCH64_FL_FOR_ARCH8 
>> | AARCH64_FL_CRC, cortexa57, 0x41, 0xd07, -1)
>> +AARCH64_CORE("cortex-a72",  cortexa72, cortexa57, 8A,  AARCH64_FL_FOR_ARCH8 
>> | AARCH64_FL_CRC, cortexa72, 0x41, 0xd08, -1)
>> +AARCH64_CORE("cortex-a73",  cortexa73, cortexa57, 8A,  AARCH64_FL_FOR_ARCH8 
>> | AARCH64_FL_CRC, cortexa73, 0x41, 0xd09, -1)
>>
>>  /* Samsung ('S') cores. */
>> -AARCH64_CORE("exynos-m1",   exynosm1,  exynosm1,  8A,  AARCH64_FL_FOR_ARCH8 
>> | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, exynosm1,  0x53, 0x001)
>> +AARCH64_CORE("exynos-m1",   exynosm1,  exynosm1,  8A,  AARCH64_FL_FOR_ARCH8 
>> | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, exynosm1,  0x53, 0x001, -1)
>>
>>  /* Qualcomm ('Q') cores. */
>> -AARCH64_CORE("qdf24xx",     qdf24xx,   cortexa57, 8A,  AARCH64_FL_FOR_ARCH8 
>> | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, qdf24xx,   0x51, 0x800)
>> +AARCH64_CORE("qdf24xx",     qdf24xx,   cortexa57, 8A,  AARCH64_FL_FOR_ARCH8 
>> | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, qdf24xx,   0x51, 0x800, -1)
>>
>>  /* Cavium ('C') cores. */
>> -AARCH64_CORE("thunderx",    thunderx,  thunderx,  8A,  AARCH64_FL_FOR_ARCH8 
>> | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, thunderx,  0x43, 0x0a1)
>> +AARCH64_CORE("thunderx",      thunderx,      thunderx,  8A,    
>> AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO | AARCH64_FL_LSE, 
>> thunderx,  0x43, 0x0a0, -1)
>> +AARCH64_CORE("thunderxt88p1", thunderxt88p1, thunderx,  8A,    
>> AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO,               
>> thunderx,  0x43, 0x0a1, 0)
>> +AARCH64_CORE("thunderxt88",   thunderxt88,   thunderx,  8A,    
>> AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO | AARCH64_FL_LSE, 
>> thunderx,  0x43, 0x0a1, -1)
>
> You probably want a comment somewhere here making it clear that the ordering
> of thunderxt88p1 and thunderxt88 must remain as is, or detection will fail
> (-1 will match before 0). Otherwise someone will come along and helpfully
> put these in alphabetical order and cause you trouble...

I will do in the next submission.

>
>> +AARCH64_CORE("thunderxt81",   thunderxt81,   thunderx,  8_1A,  
>> AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO | AARCH64_FL_LSE, 
>> thunderx,  0x43, 0x0a2, -1)
>> +AARCH64_CORE("thunderxt83",   thunderxt83,   thunderx,  8_1A,  
>> AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO | AARCH64_FL_LSE, 
>> thunderx,  0x43, 0x0a3, -1)
>>
>>  /* APM ('P') cores. */
>> -AARCH64_CORE("xgene1",      xgene1,    xgene1,    8A,  
>> AARCH64_FL_FOR_ARCH8, xgene1, 0x50, 0x000)
>> +AARCH64_CORE("xgene1",      xgene1,    xgene1,    8A,  
>> AARCH64_FL_FOR_ARCH8, xgene1, 0x50, 0x000, -1)
>>
>>  /* V8.1 Architecture Processors.  */
>>
>>  /* Broadcom ('B') cores. */
>> -AARCH64_CORE("vulcan",  vulcan, cortexa57, 8_1A,  AARCH64_FL_FOR_ARCH8_1 | 
>> AARCH64_FL_CRYPTO, vulcan, 0x42, 0x516)
>> +AARCH64_CORE("vulcan",  vulcan, cortexa57, 8_1A,  AARCH64_FL_FOR_ARCH8_1 | 
>> AARCH64_FL_CRYPTO, vulcan, 0x42, 0x516, -1)
>>
>>  /* V8 big.LITTLE implementations.  */
>>
>> -AARCH64_CORE("cortex-a57.cortex-a53",  cortexa57cortexa53, cortexa53, 8A,  
>> AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa57, 0x41, AARCH64_BIG_LITTLE 
>> (0xd07, 0xd03))
>> -AARCH64_CORE("cortex-a72.cortex-a53",  cortexa72cortexa53, cortexa53, 8A,  
>> AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa72, 0x41, AARCH64_BIG_LITTLE 
>> (0xd08, 0xd03))
>> -AARCH64_CORE("cortex-a73.cortex-a35",  cortexa73cortexa35, cortexa53, 8A,  
>> AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa73, 0x41, AARCH64_BIG_LITTLE 
>> (0xd09, 0xd04))
>> -AARCH64_CORE("cortex-a73.cortex-a53",  cortexa73cortexa53, cortexa53, 8A,  
>> AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa73, 0x41, AARCH64_BIG_LITTLE 
>> (0xd09, 0xd03))
>> +AARCH64_CORE("cortex-a57.cortex-a53",  cortexa57cortexa53, cortexa53, 8A,  
>> AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa57, 0x41, AARCH64_BIG_LITTLE 
>> (0xd07, 0xd03), -1)
>> +AARCH64_CORE("cortex-a72.cortex-a53",  cortexa72cortexa53, cortexa53, 8A,  
>> AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa72, 0x41, AARCH64_BIG_LITTLE 
>> (0xd08, 0xd03), -1)
>> +AARCH64_CORE("cortex-a73.cortex-a35",  cortexa73cortexa35, cortexa53, 8A,  
>> AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa73, 0x41, AARCH64_BIG_LITTLE 
>> (0xd09, 0xd04), -1)
>> +AARCH64_CORE("cortex-a73.cortex-a53",  cortexa73cortexa53, cortexa53, 8A,  
>> AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa73, 0x41, AARCH64_BIG_LITTLE 
>> (0xd09, 0xd03), -1)
>
> Why do variants for big.LITTLE get a single variant number, but you track
> two variant numbers in the code below?

You could in theory only track the last variant.  But I was thinking
rather you cannot have a big.LITTLE where the set of big cores would
be the same and the set of LITTLE cores be the same.
Doing parsing of /proc/cpuinfo is hard way of getting a good idea of
what the cpu is.
Really we should be using readdir of /sys/devices/system/cpu to get
all cpus (cpuN).  And then read regs/identification/midr_el1 and parse
that.

Note that will only work for Linux 4.8 (and above, maybe 4.9 I can't
remember when exactly it went in).

Thanks,
Andrew

>
> Thanks,
> James

Re: [PATCH 5/5] [AARCH64] Add variant support to -m="native"and add thunderxt88p1.

Reply via email to