On 18.04.2014 [08:46:55 +1000], Benjamin Herrenschmidt wrote: > On Fri, 2014-04-18 at 08:43 +1000, Alexey Kardashevskiy wrote: > > On 04/18/2014 06:25 AM, Nishanth Aravamudan wrote: > > > Hi Alexey, > > > > > > Prior to the $SUBJECT commit, we could present memoryless node0s to > > > guests. Now, we indicate that we don't have the requisite 128M for the > > > RMA if node 0 has no memory. Note that a memoryless node0 is possible > > > under PowerVM (but not predictably present) so I was hoping to use KVM > > > to test relevant fixes for memoryless nodes. > > > > > > I think this change is a misinterpretation of the PAPR standard, though. > > > Yes, the RMA must be in the first block of memory, but that isn't > > > necessarily on node 0. The topology of a PAPR-compliant guest does not > > > require a node 0 (and in fact, under PowerVM, Linux doesn't actually > > > require node 0 either, but it would under KVM). > > > > > > Thoughts? I suppose it's fine to say that node 0 must be sufficiently > > > populated under KVM -- there's not really a reason to not have memory on > > > a given node (except maybe ballooning). I can keep the commit reverted > > > locally for testing purposes. Just wanted to see if the semantic change > > > was intentional. > > > > > > PAPR spec 2.7: > > C.6.6 Memory Node > > === > > This section defines the PAPR modifications to the OF /memory node. In > > PAPR, the memory allocated to an OS image > > may be divided into a number of allocation units called ???regions??? or > > ???Logical Memory Blocks (LMB). An OS image > > may be dynamically allocated additional regions or may be asked to release > > regions. Each LMB is either represented in > > the device tree by its own /memory node or by an entry in > > /ibm,dynamic-reconfiguration-memory nodes > > (see Section C.6.6.2??? ???ibm,dynamic-reconfiguration-memory?????? on page > > 1089). > > The /memory node that refers to the > > storage starting at real address zero (???reg??? property starting at the > > value > > zero) always remains allocated to an OS image. > > > > The client program is initially loaded into this storage, called the RMA, > > that is represented by the first value of the > > ???reg??? property of this first /memory node. > > === > > > > The last sentence is why the change was made. It does not say "first > > populated" node. I am adding Ben as he had very strong opinion about this > > thing. > > You are confusing device-tree node with NUMA nodes. > > Yes, it must be the LMB at address 0, which is the /memory node, but > that doesn't have to be NUMA node 0.
Yeah, so I think the check that was added: - if (spapr->rma_size > node0_size) { - fprintf(stderr, "Error: Numa node 0 has to span the RMA (%#08"HWADDR_PR - spapr->rma_size); - exit(1); - } incorrectly is checking against node0_size? It should be checking against the first LMB instead, right? Thanks, Nish