virt: Fix CPU's default NUMA node ID

Igor Mammedov Fri, 25 Feb 2022 02:07:21 -0800

On Fri, 25 Feb 2022 16:41:43 +0800
Gavin Shan <gs...@redhat.com> wrote:


> Hi Igor,
> 
> On 2/17/22 10:14 AM, Gavin Shan wrote:
> > On 1/26/22 5:14 PM, Igor Mammedov wrote:  
> >> On Wed, 26 Jan 2022 13:24:10 +0800
> >> Gavin Shan <gs...@redhat.com> wrote:
> >>  
> >>> The default CPU-to-NUMA association is given by 
> >>> mc->get_default_cpu_node_id()
> >>> when it isn't provided explicitly. However, the CPU topology isn't fully
> >>> considered in the default association and it causes CPU topology broken
> >>> warnings on booting Linux guest.
> >>>
> >>> For example, the following warning messages are observed when the Linux 
> >>> guest
> >>> is booted with the following command lines.
> >>>
> >>>    /home/gavin/sandbox/qemu.main/build/qemu-system-aarch64 \
> >>>    -accel kvm -machine virt,gic-version=host               \
> >>>    -cpu host                                               \
> >>>    -smp 6,sockets=2,cores=3,threads=1                      \
> >>>    -m 1024M,slots=16,maxmem=64G                            \
> >>>    -object memory-backend-ram,id=mem0,size=128M            \
> >>>    -object memory-backend-ram,id=mem1,size=128M            \
> >>>    -object memory-backend-ram,id=mem2,size=128M            \
> >>>    -object memory-backend-ram,id=mem3,size=128M            \
> >>>    -object memory-backend-ram,id=mem4,size=128M            \
> >>>    -object memory-backend-ram,id=mem4,size=384M            \
> >>>    -numa node,nodeid=0,memdev=mem0                         \
> >>>    -numa node,nodeid=1,memdev=mem1                         \
> >>>    -numa node,nodeid=2,memdev=mem2                         \
> >>>    -numa node,nodeid=3,memdev=mem3                         \
> >>>    -numa node,nodeid=4,memdev=mem4                         \
> >>>    -numa node,nodeid=5,memdev=mem5
> >>>           :
> >>>    alternatives: patching kernel code
> >>>    BUG: arch topology borken
> >>>    the CLS domain not a subset of the MC domain
> >>>    <the above error log repeats>
> >>>    BUG: arch topology borken
> >>>    the DIE domain not a subset of the NODE domain
> >>>
> >>> With current implementation of mc->get_default_cpu_node_id(), CPU#0 to 
> >>> CPU#5
> >>> are associated with NODE#0 to NODE#5 separately. That's incorrect because
> >>> CPU#0/1/2 should be associated with same NUMA node because they're seated
> >>> in same socket.
> >>>
> >>> This fixes the issue by considering the socket when default CPU-to-NUMA
> >>> is given. With this applied, no more CPU topology broken warnings are seen
> >>> from the Linux guest. The 6 CPUs are associated with NODE#0/1, but there 
> >>> are
> >>> no CPUs associated with NODE#2/3/4/5.  
> >>  
> >>> From migration point of view it looks fine to me, and doesn't need a 
> >>> compat knob  
> >> since NUMA data (on virt-arm) only used to construct ACPI tables (and we 
> >> don't
> >> version those unless something is broken by it).
> >>
> >>  
> >>> Signed-off-by: Gavin Shan <gs...@redhat.com>
> >>> ---
> >>>   hw/arm/virt.c | 2 +-
> >>>   1 file changed, 1 insertion(+), 1 deletion(-)
> >>>
> >>> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> >>> index 141350bf21..b4a95522d3 100644
> >>> --- a/hw/arm/virt.c
> >>> +++ b/hw/arm/virt.c
> >>> @@ -2499,7 +2499,7 @@ virt_cpu_index_to_props(MachineState *ms, unsigned 
> >>> cpu_index)
> >>>   static int64_t virt_get_default_cpu_node_id(const MachineState *ms, int 
> >>> idx)
> >>>   {
> >>> -    return idx % ms->numa_state->num_nodes;
> >>> +    return idx / (ms->smp.dies * ms->smp.clusters * ms->smp.cores * 
> >>> ms->smp.threads);  
> >>
> >> I'd like for ARM folks to confirm whether above is correct
> >> (i.e. socket is NUMA node boundary and also if above topo vars
> >> could have odd values. Don't look at horribly complicated x86
> >> as example, but it showed that vendors could stash pretty much
> >> anything there, so we should consider it here as well and maybe
> >> forbid that in smp virt-arm parser)
> >>  
> > 
> > After doing some investigation, I don't think the socket is NUMA node 
> > boundary.
> > Unfortunately, I didn't find it's documented like this in any documents 
> > after
> > checking device-tree specification, Linux CPU topology and NUMA binding 
> > documents.
> > 
> > However, there are two options here according to Linux (guest) kernel code:
> > (A) socket is NUMA node boundary  (B) CPU die is NUMA node boundary. They 
> > are
> > equivalent as CPU die isn't supported on arm/virt machine. Besides, the 
> > topology
> > of one-to-one association between socket and NUMA node sounds natural and 
> > simplified.
> > So I think (A) is the best way to go.
> > 
> > Another thing I want to explain here is how the changes affect the memory
> > allocation in Linux guest. Taking the command lines included in the commit
> > log as an example, the first two NUMA nodes are bound to CPUs while the 
> > other
> > 4 NUMA nodes are regarded as remote NUMA nodes to CPUs. The remote NUMA node
> > won't accommodate the memory allocation until the memory in the near (local)
> > NUMA node becomes exhausted. However, it's uncertain how the memory is 
> > hosted
> > if memory binding isn't applied.
> > 
> > Besides, I think the code should be improved like below to avoid overflow on
> > ms->numa_state->num_nodes.
> > 
> >   static int64_t virt_get_default_cpu_node_id(const MachineState *ms, int 
> > idx)
> >   {
> > -    return idx % ms->numa_state->num_nodes;
> > +    int node_idx;
> > +
> > +    node_idx = idx / (ms->smp.dies * ms->smp.clusters * ms->smp.cores * 
> > ms->smp.threads);
> > +    return node_idx % ms->numa_state->num_nodes;

using idx directly to deduce node looks a bit iffy
take x86_get_default_cpu_node_id() as an example,
it translates it uses idx to pick arch_id (APIC ID)
which has topology encoded into it and than translates
that to node boundary (pkg_id -> socket)

Probably the same should happen here.

PS:
may be a little on tangent to the topic but chunk above
mentions dies/clusters/cores/threads as possible attributes
for CPUs but virt_possible_cpu_arch_ids() says that only
has_thread_id = true
are supported, which looks broken to me.

> >   }
> > 
> >   
> 
> Kindly ping...
> 
> >>>   }
> >>>   static const CPUArchIdList *virt_possible_cpu_arch_ids(MachineState 
> >>> *ms)  
> >>  
> 
> Thanks,
> Gavin
>

Re: [PATCH] hw/arm/virt: Fix CPU's default NUMA node ID

Reply via email to