On 26.04.2014 [00:16:38 +1000], Alexey Kardashevskiy wrote: > On 04/23/2014 05:04 AM, Nishanth Aravamudan wrote: > > On 22.04.2014 [19:27:51 +1000], Benjamin Herrenschmidt wrote: > >> On Tue, 2014-04-22 at 19:12 +1000, Alexey Kardashevskiy wrote: > >>> I already have in plan to fix non-power-of-two sized memory nodes > >>> so I will this too. > >>> > >>> What exactly is the point in having NUMA memoryless nodes now? We > >>> do not support memory hotplug yet and balloon is not memory > >>> hotplug at all. > >> > >> It exists on real HW an it's always causing funny bugs, so being able to > >> simulate it in qemu would make it easier to test & debug. > > > > Yes, my primary purpose here is to help test & debug issues I see in > > other environments. > > > >>> And I still fail to see how the patch is wrong. May be the idea of > >>> having DT memory nodes the same thing as NUMA memory nodes is not > >>> the best one, but the patch is not changing that. > > > > No, but it does make it an error for the RMA not be in node0. > > > > + if (spapr->rma_size > node0_size) { > > + fprintf(stderr, "Error: Numa node 0 has to span the RMA > > (%#08"HWADDR_PR > > + spapr->rma_size); > > + exit(1); > > + } > > > > which will never be the case if node0 has no memory? > > > Except reproducing "funny" bugs from some real hardware, is there any > other point in memory-less nodes?
I can't think of one :) But it's a pretty good one and is useful. I can control the qemu NUMA topology much more easily than I can in other environments. > > I'm fine with the change being left in, I guess, I just want to make > > sure the semantics are intended. > > > It is intended. What does make you think that it is not taking into account > that _now_ memory nodes are equal to NUMA nodes in SPAPR's QEMU? > > I am really confused. Because nothing in the changelog mentioned that now node 0 can't be memoryless? That is a functional change in the commit. In fact, the commit message doesn't even mention NUMA. Like I have said a few times, it's fine if qemu doesn't support memoryless nodes -- the use-case for qemu is probably nil. > > > > FWIW, if one instead tries to specify node 1 as memoryless, and nodes 0 > > and 2 as having memory: > > > > sprintf(mem_name, "memory@" TARGET_FMT_lx, mem_start); > > off = fdt_add_subnode(fdt, 0, mem_name); > > _FDT(off); > > > > ends up getting a duplicate error from _FDT because we're trying > > to create memory@<end of node 0> twice, once for node 1 and once for > > node 2. I'm not actually sure what we're supposed to do in that > > situation. Looking at a PowerVM LPAR with the following topology [1]: > > > > numactl --hardware > > available: 3 nodes (1-3) > > node 1 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 > > node 1 size: 0 MB > > node 1 free: 0 MB > > node 2 cpus: > > node 2 size: 7935 MB > > node 2 free: 7302 MB > > node 3 cpus: > > node 3 size: 8396 MB > > node 3 free: 8338 MB > > node distances: > > node 1 2 3 > > 1: 10 20 20 > > 2: 20 10 20 > > 3: 20 20 10 > > > > I only see /proc/device-tree/memory@0. Perhaps the node 2 and node 3 are > > from ibm,dynamic-reconfiguration-memory? > > > It would help if you told how exactly you run QEMU. The above example is from PowerVM (where the LPAR only has memory@0). I will try and get you a qemu command-line that fails with the _FDT error: qemu-system-ppc64 -machine pseries,accel=kvm,usb=off -m 4096 -realtime mlock=off -smp 16,sockets=1,cores=2,threads=8 -numa node,nodeid=0,cpus=0-7,mem=2048 -numa node,nodeid=1,cpus=8-15,mem=0 -numa node,nodeid=2,mem=2048 -hda /var/lib/libvirt/images/nacc/ubuntu1 -nographic -L /usr/share/qemu/ qemu: error creating device tree: off: FDT_ERR_EXISTS This is because we are creating two memory@2048 nodes, one for node1 and one for node2. Thanks, Nish