Re: [Qemu-devel] Regression (?) due to c4177479 ('spapr: make sure RMA is in first mode of first memory node')

Nishanth Aravamudan Tue, 29 Apr 2014 10:35:38 -0700

On 26.04.2014 [00:16:38 +1000], Alexey Kardashevskiy wrote:
> On 04/23/2014 05:04 AM, Nishanth Aravamudan wrote:
> > On 22.04.2014 [19:27:51 +1000], Benjamin Herrenschmidt wrote:
> >> On Tue, 2014-04-22 at 19:12 +1000, Alexey Kardashevskiy wrote:
> >>> I already have in plan to fix non-power-of-two sized memory nodes
> >>> so I will this too.
> >>>
> >>> What exactly is the point in having NUMA memoryless nodes now? We
> >>> do not support memory hotplug yet and balloon is not memory
> >>> hotplug at all.
> >>
> >> It exists on real HW an it's always causing funny bugs, so being able to
> >> simulate it in qemu would make it easier to test & debug.
> > 
> > Yes, my primary purpose here is to help test & debug issues I see in
> > other environments.
> > 
> >>> And I still fail to see how the patch is wrong. May be the idea of
> >>> having DT memory nodes the same thing as NUMA memory nodes is not
> >>> the best one, but the patch is not changing that.
> > 
> > No, but it does make it an error for the RMA not be in node0.
> > 
> > +    if (spapr->rma_size > node0_size) {
> > +        fprintf(stderr, "Error: Numa node 0 has to span the RMA 
> > (%#08"HWADDR_PR
> > +                spapr->rma_size);
> > +        exit(1);
> > +    }
> > 
> > which will never be the case if node0 has no memory?
> 
> 
> Except reproducing "funny" bugs from some real hardware, is there any
> other point in memory-less nodes?


I can't think of one :) But it's a pretty good one and is useful. I can
control the qemu NUMA topology much more easily than I can in other
environments.

> > I'm fine with the change being left in, I guess, I just want to make
> > sure the semantics are intended.
> 
> 
> It is intended. What does make you think that it is not taking into account
> that _now_ memory nodes are equal to NUMA nodes in SPAPR's QEMU?
> 
> I am really confused.

Because nothing in the changelog mentioned that now node 0 can't be
memoryless? That is a functional change in the commit. In fact, the
commit message doesn't even mention NUMA. Like I have said a few times,
it's fine if qemu doesn't support memoryless nodes -- the use-case for
qemu is probably nil.

> > 
> > FWIW, if one instead tries to specify node 1 as memoryless, and nodes 0
> > and 2 as having memory:
> > 
> >     sprintf(mem_name, "memory@" TARGET_FMT_lx, mem_start);
> >     off = fdt_add_subnode(fdt, 0, mem_name);
> >     _FDT(off);
> > 
> > ends up getting a duplicate error from _FDT because we're trying
> > to create memory@<end of node 0> twice, once for node 1 and once for
> > node 2. I'm not actually sure what we're supposed to do in that
> > situation. Looking at a PowerVM LPAR with the following topology [1]:
> > 
> > numactl --hardware
> > available: 3 nodes (1-3)
> > node 1 cpus: 0 1 2 3 4 5 6 7 8 9 10 11
> > node 1 size: 0 MB
> > node 1 free: 0 MB
> > node 2 cpus:
> > node 2 size: 7935 MB
> > node 2 free: 7302 MB
> > node 3 cpus:
> > node 3 size: 8396 MB
> > node 3 free: 8338 MB
> > node distances:
> > node   1   2   3 
> >   1:  10  20  20 
> >   2:  20  10  20 
> >   3:  20  20  10 
> > 
> > I only see /proc/device-tree/memory@0. Perhaps the node 2 and node 3 are
> > from ibm,dynamic-reconfiguration-memory?
> 
> 
> It would help if you told how exactly you run QEMU.

The above example is from PowerVM (where the LPAR only has memory@0).

I will try and get you a qemu command-line that fails with the _FDT
error:

qemu-system-ppc64 -machine pseries,accel=kvm,usb=off -m 4096 -realtime 
mlock=off -smp
16,sockets=1,cores=2,threads=8 -numa node,nodeid=0,cpus=0-7,mem=2048
-numa node,nodeid=1,cpus=8-15,mem=0 -numa node,nodeid=2,mem=2048 -hda
/var/lib/libvirt/images/nacc/ubuntu1 -nographic -L /usr/share/qemu/
qemu: error creating device tree: off: FDT_ERR_EXISTS

This is because we are creating two memory@2048 nodes, one for node1 and
one for node2.

Thanks,
Nish

Re: [Qemu-devel] Regression (?) due to c4177479 ('spapr: make sure RMA is in first mode of first memory node')

Reply via email to