386: Fix uninitialized memory with -device and CPU hotplug

Igor Mammedov Thu, 25 Jun 2020 08:19:28 -0700

On Wed, 24 Jun 2020 12:35:59 -0500
Babu Moger <babu.mo...@amd.com> wrote:


> > -----Original Message-----
> > From: Igor Mammedov <imamm...@redhat.com>
> > Sent: Wednesday, June 24, 2020 8:48 AM
> > To: Moger, Babu <babu.mo...@amd.com>
> > Cc: ehabk...@redhat.com; m...@redhat.com; qemu-devel@nongnu.org;
> > pbonz...@redhat.com; r...@twiddle.net
> > Subject: Re: [PATCH 1/2] hw/386: Fix uninitialized memory with -device and 
> > CPU
> > hotplug
> > 
> > On Tue, 16 Jun 2020 12:18:56 -0500
> > Babu Moger <babu.mo...@amd.com> wrote:
> >   
> > > > -----Original Message-----
> > > > From: Igor Mammedov <imamm...@redhat.com>
> > > > Sent: Tuesday, June 16, 2020 5:59 AM
> > > > To: Moger, Babu <babu.mo...@amd.com>
> > > > Cc: pbonz...@redhat.com; r...@twiddle.net; ehabk...@redhat.com;
> > > > m...@redhat.com; marcel.apfelb...@gmail.com; qemu-devel@nongnu.org
> > > > Subject: Re: [PATCH 1/2] hw/386: Fix uninitialized memory with -device 
> > > > and  
> > CPU  
> > > > hotplug
> > > >
> > > > On Mon, 08 Jun 2020 15:18:50 -0500
> > > > Babu Moger <babu.mo...@amd.com> wrote:
> > > >  
> > > > > Noticed the following command failure while testing CPU hotplug.
> > > > >
> > > > > $ qemu-system-x86_64 -machine q35,accel=kvm -smp 1,maxcpus=2,
> > > > >   cores=1, threads=1,sockets=2 -cpu EPYC -device EPYC-x86_64-
> > > > >   cpu,core-id=0,socket-id=1,thread-id=0
> > > > >
> > > > >   qemu-system-x86_64: -device EPYC-x86_64-cpu,core-id=0,socket-id=1,
> > > > >   thread-id=0: Invalid CPU [socket: 21855, die: 0, core: 0, thread: 0]
> > > > >   with APIC ID 21855, valid index range 0:1
> > > > >
> > > > > This happens because APIC ID is calculated using uninitialized memory.
> > > > > This is happening after the addition of new field node_id in  
> > X86CPUTopoIDs  
> > > > > structure. The node_id field is uninitialized while calling
> > > > > apicid_from_topo_ids. The problem is discussed in the thread below.
> > > > >  
> > > >  
> > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.ker  
> > > > nel.org%2Fqemu-
> > > >  
> > devel%2F20200602171838.GG577771%40habkost.net%2F&amp;data=02%7C01  
> > > >  
> > %7Cbabu.moger%40amd.com%7C02200d75fd8b48d1955608d811e44f5b%7C3d  
> > > >  
> > d8961fe4884e608e11a82d994e183d%7C0%7C0%7C637279019564311233&amp  
> > > >  
> > ;sdata=ry3QO0Z5dxLPoRxkYVkOsVm3nl%2BxfCGv8be%2BMHdoUPY%3D&amp;r  
> > > > eserved=0  
> > > > >
> > > > > Fix the problem by initializing the node_id properly.
> > > > >
> > > > > Signed-off-by: Babu Moger <babu.mo...@amd.com>
> > > > > ---
> > > > >  hw/i386/pc.c               |    2 ++
> > > > >  include/hw/i386/topology.h |   11 +++++++++++
> > > > >  2 files changed, 13 insertions(+)
> > > > >
> > > > > diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> > > > > index 2128f3d6fe..974cc30891 100644
> > > > > --- a/hw/i386/pc.c
> > > > > +++ b/hw/i386/pc.c
> > > > > @@ -1585,6 +1585,8 @@ static void pc_cpu_pre_plug(HotplugHandler  
> > > > *hotplug_dev,  
> > > > >          topo_ids.die_id = cpu->die_id;
> > > > >          topo_ids.core_id = cpu->core_id;
> > > > >          topo_ids.smt_id = cpu->thread_id;
> > > > > +        topo_ids.node_id = cpu_x86_use_epyc_apic_id_encoding(ms-  
> > >cpu_type)  
> > > > ?  
> > > > > +                           x86_node_id_for_epyc(&topo_info, 
> > > > > &topo_ids) : 0;  
> > > >
> > > > I'd rather not calculate some default value here,
> > > > this is the branch where we check user provided topology info and error 
> > > > out
> > > > asking
> > > > to provide missing bits.  
> > > Noticed that cpu->node_id is initialized to 0xFF(NUMA_NODE_UNASSIGNED).
> > > We can initialize cpu->node_id to default node like how we do it in
> > > x86_get_default_cpu_node_id.  We can use it to initialize 
> > > topo_ids.node_id.
> > > This is consistent with other fields core_id, die_id etc..
> > >  
> > > >
> > > > I also wonder if we should force user to specify numa nodes on CLI if 
> > > > EPYC  
> > cpu is  
> > > > used.
> > > > (i.e. I'm assuming that EPYC always requires numa)  
> > >
> > > That is not true. Without numa all the cpus will be configured under one
> > > default numa node 0. Like we do it using x86_get_default_cpu_node_id.  
> > 
> > get_default_cpu_node_id() which is making things up, is going to be removed
> > eventually in favor of asking user to provide numa mapping explicitly on 
> > CLI.  
> 
> That will be good going forward.
> 
> > 
> > now if it's always only node 0, why do we need to calculate it then,
> > why not just assing 0 directly?
> > 
> > what if we have several sockets, would all vCPUs still have node-id = 0?  
> 
> If there are several nodes then socket id becomes node id.
I wonder if node id == socket id then why bother with node_id at all,
probably node id is there to allow for design where several sockets are on
the same node


> > Another question is if we have CPUs with only 1 numa node set on all of 
> > them,
> > does it require all other machinery like ACPI SRAT table defined or it's 
> > just
> > internal CPU impl. detail?  
> 
> I am not very sure about it. To me it looks like it is just internal cpu
> implementation detail.
I'd think it might confuse guest OS, when it decodes more than 1 numa node
for APIC ID/CPUID but then there are no such nodes described in ACPI.
While it might work for caches, it would miss any relation of memory mapping
to nodes or get it wrong if one doesn't match another.


> I think we have two options here.
> 
> 1. Update the patch to initialize the node_id the way it is done
> get_default_cpu_node_id.

if it were only one node for every CPU (incl. multisocket), I'd go with enabling
autonuma assigning all CPUs to default 0 node-id, since there is no ambiguity 
where
CPUs and RAM are mapped to.
Is it possible to use node-id=0 for all EPYC CPUs even in multisocket config?
(it seems spec allows only one node per socket, but doesn't say that node ids 
must
be different.)
If not, then making up node-id is not an option. 

> 2. Ask the user to pass the node_id while hot plugging the device. This is
> a clean solution. Will require some data structure changes.

Here is my brain dump of current very non obvious flow:

  1. x86_possible_cpu_arch_ids()
         ms->possible_cpus->cpus[i].props.* = x86_topo_ids_from_idx()

  2. possible numa_cpu_set()
         ms->possible_cpus->cpus[i].props.node_id = user input|0 -in autonuma 
case

  3. x86_cpus_init()
         // generate apic_id AND makeup node_id embedded into it
         ms->possible_cpus->cpus[i].arch_id = x86_cpu_apic_id_from_index(x86ms, 
i);
                        -> x86_apicid_from_cpu_idx_epyc() -> 
x86_topo_ids_from_idx_epyc()
                                                                 same as 
x86_topo_ids_from_idx() + node_id
                     or
                        -> x86_apicid_from_cpu_idx() -> x86_topo_ids_from_idx() 

  4. pc_cpu_pre_plug()
         // basically topo ids module node-id is not set or user provided
         cpu_slot = pc_find_cpu_slot(MACHINE(pcms), cpu->apic_id, &idx);

  5.
         // do it again with diff that in EPYC case it my have different 
node-id than cpu_slot
         x86ms->topo_ids_from_apicid(cpu->apic_id, &topo_info, &topo_ids);

         //i.e. user input of node-id is ignored
         set socket-id/core-id/... (but not node-id) from topo_info

         numa_cpu_pre_plug(cpu_slot)
                           ^^^^^^
              if (node_id == CPU_UNSET_NUMA_NODE_ID) {
                   if (slot->props.has_node_id)
                       object_property_set_int(... slot->props.node_id, 
"node-id",...);
              // this applies to hotplugged without node-id and to initial CPUs 
(-smp X)
              // so we may end up with "node-id" being set to user defined value
              // or left unset (no numa enabled)
              // while APIC ID will have some node-id encoded in it.


that's quite a mess, maybe we should unify both
amd make x86_apicid_from_cpu_idx_epyc()/x86_apicid_from_cpu_idx() use
ms->possible_cpus->cpus[i].props instead of x86_topo_ids_from_idx()
i.e

       x86_apicid_from_cpu_idx_epyc() {
           topoids = x86_apicid_from_cpu_idx() {
                          return ms->possible_cpus->cpus[i].props
                      }
           if (ms->possible_cpus->cpus[i].props.has_node_id)
               topoids.node_id = ms->possible_cpus->cpus[i].props.node_id
           else
               error_fatal("EPYC requires use of -numa to define topology if 
using more than 1 socket")
       }

that way QEMU makes up only node[0] by enabling autonuma or whatever
user privided explicitly is encoded into APIC ID and it will be always 
consistent with cpu
*-id properties in possible_cpus and SRAT table QEMU generates.

as cleanup we can get rid of back and forth conversion [5] and use cpu_slot to 
set
the same ids.

Also maybe we should have a check that node-id is the same within socket in 
case of EPYC
if it's guarantied that EPYC won't support multiple nodes per socket.

hope it makes at least some sense.


> Let me know if you see any other option.
> 
> > 
> >   
> > > >  
> > > > >          cpu->apic_id = x86ms->apicid_from_topo_ids(&topo_info,  
> > &topo_ids);  
> > > > >      }
> > > > >
> > > > > diff --git a/include/hw/i386/topology.h b/include/hw/i386/topology.h
> > > > > index 07239f95f4..ee4deb84c4 100644
> > > > > --- a/include/hw/i386/topology.h
> > > > > +++ b/include/hw/i386/topology.h
> > > > > @@ -140,6 +140,17 @@ static inline unsigned  
> > > > apicid_pkg_offset_epyc(X86CPUTopoInfo *topo_info)  
> > > > >             apicid_node_width_epyc(topo_info);
> > > > >  }
> > > > >
> > > > > +static inline unsigned x86_node_id_for_epyc(X86CPUTopoInfo  
> > *topo_info,  
> > > > > +                                            const X86CPUTopoIDs 
> > > > > *topo_ids)
> > > > > +{
> > > > > +    unsigned nr_nodes = MAX(topo_info->nodes_per_pkg, 1);
> > > > > +    unsigned cores_per_node = DIV_ROUND_UP((topo_info->dies_per_pkg  
> > *  
> > > > > +                                            topo_info->cores_per_die 
> > > > > *
> > > > > +                                            
> > > > > topo_info->threads_per_core),
> > > > > +                                            nr_nodes);
> > > > > +
> > > > > +    return (topo_ids->core_id / cores_per_node) % nr_nodes;  
> > > > what if nr_nodes == 0?
> > > >  
> > > > > +}
> > > > >  /*
> > > > >   * Make APIC ID for the CPU based on Pkg_ID, Core_ID, SMT_ID
> > > > >   *
> > > > >
> > > > >  
> > >
> > >  
>

Re: [PATCH 1/2] hw/386: Fix uninitialized memory with -device and CPU hotplug

Reply via email to