On Mon, 13 Jul 2020 14:30:29 -0500 Babu Moger <babu.mo...@amd.com> wrote:
> > -----Original Message----- > > From: Igor Mammedov <imamm...@redhat.com> > > Sent: Monday, July 13, 2020 12:32 PM > > To: Moger, Babu <babu.mo...@amd.com> > > Cc: pbonz...@redhat.com; r...@twiddle.net; ehabk...@redhat.com; qemu- > > de...@nongnu.org > > Subject: Re: [PATCH v2 1/3] hw/i386: Initialize topo_ids from > > CpuInstanceProperties > > > > On Mon, 13 Jul 2020 11:43:33 -0500 > > Babu Moger <babu.mo...@amd.com> wrote: > > > > > On 7/13/20 11:17 AM, Igor Mammedov wrote: > > > > On Mon, 13 Jul 2020 10:02:22 -0500 > > > > Babu Moger <babu.mo...@amd.com> wrote: > > > > > > > >>> -----Original Message----- > > > >>> From: Igor Mammedov <imamm...@redhat.com> > > > >>> Sent: Monday, July 13, 2020 4:08 AM > > > >>> To: Moger, Babu <babu.mo...@amd.com> > > > >>> Cc: pbonz...@redhat.com; r...@twiddle.net; ehabk...@redhat.com; > > > >>> qemu- de...@nongnu.org > > > >>> Subject: Re: [PATCH v2 1/3] hw/i386: Initialize topo_ids from > > > >>> CpuInstanceProperties > > > > [...] > > > >>>> + > > > >>>> +/* > > > >>>> + * Initialize topo_ids from CpuInstanceProperties > > > >>>> + * node_id in CpuInstanceProperties(or in CPU device) is a > > > >>>> sequential > > > >>>> + * number, but while building the topology > > > >>> > > > >>>> we need to separate it for > > > >>>> + * each socket(mod nodes_per_pkg). > > > >>> could you clarify a bit more on why this is necessary? > > > >> > > > >> If you have two sockets and 4 numa nodes, node_id in > > > >> CpuInstanceProperties will be number sequentially as 0, 1, 2, 3. > > > >> But in EPYC topology, it will be 0, 1, 0, 1( Basically mod % number > > > >> of nodes > > per socket). > > > > > > > > I'm confused, let's suppose we have 2 EPYC sockets with 2 nodes per > > > > socket so APIC id woulbe be composed like: > > > > > > > > 1st socket > > > > pkg_id(0) | node_id(0) > > > > pkg_id(0) | node_id(1) > > > > > > > > 2nd socket > > > > pkg_id(1) | node_id(0) > > > > pkg_id(1) | node_id(1) > > > > > > > > if that's the case, then EPYC's node_id here doesn't look like a > > > > NUMA node in the sense it's usually used (above config would have 4 > > > > different memory controllers => 4 conventional NUMA nodes). > > > > > > EPIC model uses combination of socket id and node id to identify the > > > numa nodes. So, it internally uses all the information. > > > > well with above values, EPYC's node_id doesn't look like it's specifying a > > machine numa node, but rather a node index within single socket. In which > > case, > > it doesn't make much sense calling it NUMA node_id, it's rather some index > > within a socket. (it starts looking like terminology is all mixed up) > > > > If you have access to a milti-socket EPYC machine, can you dump and post > > here > > its apic ids, pls? > > Here is the output from my EPYC machine with 2 sockets and totally 8 > nodes(SMT disabled). The cpus 0-31 are in socket 0 and cpus 32-63 in > socket 1. > > # lscpu > Architecture: x86_64 > CPU op-mode(s): 32-bit, 64-bit > Byte Order: Little Endian > CPU(s): 64 > On-line CPU(s) list: 0-63 > Thread(s) per core: 1 > Core(s) per socket: 32 > Socket(s): 2 > NUMA node(s): 8 > Vendor ID: AuthenticAMD > CPU family: 23 > Model: 1 > Model name: AMD Eng Sample: 1S1901A4VIHF5_30/19_N > Stepping: 2 > CPU MHz: 2379.233 > CPU max MHz: 1900.0000 > CPU min MHz: 1200.0000 > BogoMIPS: 3792.81 > Virtualization: AMD-V > L1d cache: 32K > L1i cache: 64K > L2 cache: 512K > L3 cache: 8192K > NUMA node0 CPU(s): 0-7 > NUMA node1 CPU(s): 8-15 > NUMA node2 CPU(s): 16-23 > NUMA node3 CPU(s): 24-31 > NUMA node4 CPU(s): 32-39 > NUMA node5 CPU(s): 40-47 > NUMA node6 CPU(s): 48-55 > NUMA node7 CPU(s): 56-63 > > Here is the output of #cpuid -l 0x8000001e -r (1) > You may want to refer > https://www.amd.com/system/files/TechDocs/54945_3.03_ppr_ZP_B2_pub.zip > (section 2.1.12.2.1.3 ApicId Enumeration Requirements). > Note that this is a general guideline. We tried to generalize in qemu as > much as possible. It is bit complex. > CPU 0: > 0x8000001e 0x00: eax=0x00000000 ebx=0x00000100 ecx=0x00000300 > edx=0x00000000 [...] > CPU 63: > 0x8000001e 0x00: eax=0x0000007e ebx=0x0000011f ecx=0x00000307 > edx=0x00000000 > > > > > > > > > > > > > > I wonder if linux guest actually uses node_id encoded in apic id for > > > > configuring/checking numa structures, or it just uses whatever ACPI > > > > SRAT table provided. > > > > > > > >>>> + */ > > > >>>> +static inline void x86_init_topo_ids(X86CPUTopoInfo *topo_info, > > > >>>> + CpuInstanceProperties props, > > > >>>> + X86CPUTopoIDs *topo_ids) { > > > >>>> + topo_ids->smt_id = props.has_thread_id ? props.thread_id : 0; > > > >>>> + topo_ids->core_id = props.has_core_id ? props.core_id : 0; > > > >>>> + topo_ids->die_id = props.has_die_id ? props.die_id : 0; > > > >>>> + topo_ids->node_id = props.has_node_id ? > > > >>>> + props.node_id % > > > >>>> MAX(topo_info->nodes_per_pkg, 1) : 0; It looks like I was wrong pushing system wide NUMA node-id into APIC ID (choosen naming is confusing a bit), per [1] mentioned above, EPYC's node-id is: • ApicId[6] = Socket ID. * ApicId[5:4]= Node ID. • ApicId[3] = Logical CCX L3 complex ID • ApicId[2:0]= (SMT) ? {LogicalCoreID[1:0],ThreadId} : {1'b0,LogicalCoreID[1:0]} which is can hold only 0-3 values, and defined as: "A node, is an integrated circuit device that includes one to 8 cores (one or two Core Complexes)." spec also mentions it indirectly as die-id if one looks at CPUID_Fn8000001E_EBX [Core Identifiers] (Core::X86::Cpuid::CoreId) ... CoreId = ({2'b0, DieId[1:0], LogicalComplexId[0], LogicalThreadId[2:0]} >> SMT and in (2) CPUID_Fn8000001E_ECX [Node Identifiers] (Core::X86::Cpuid::NodeId) ... {5'b00000,1'b[SOCKET_ID],2'b[DIE_ID]} Question is why we did not reuse topo_ids->die_id instead of adding confusing topo_ids->node_id in the first place? Also looking APIC ID and SRAT table provided here, CPUID_Fn8000001E_ECX corresponds to NUMA node id (i.e. what -numa in QEMU used for) and Node ID embeded into ApicId[5:4] is basically die-id. Difference between die-id implemented in QEMU and EPYC's die id (topo_ids->node_id) is that the former doesn't require numa config (maybe it should, but ship'salready sailed) and gets number of dies from '-smp dies=X' CLI option, while for EPYC we calculate topo_ids->node_id implicitly from number of numa nodes and sockets, which implicitly requires that machine 'must' be configured with -numa options. Maybe we should drop this implicit calculation along with topo_ids->node_id and reuse '-smp dies=X' plus extra checks for EPYC to ask for -numa if there is more than 1 die and if we need to be really strict, add checks for limit of dies/cores within socket/die according to spec[2] so encoded APIC ID and CPUID_8000001E match the spec. > > > >>>> + topo_ids->pkg_id = props.has_socket_id ? props.socket_id : > > > >>>> +0; } > > > >>>> /* > > > >>>> * Make APIC ID for the CPU 'cpu_index' > > > >>>> * > > > >>>> > > > >> > > > > > > > > >