Re: [Qemu-devel] Regression (?) due to c4177479 ('spapr: make sure RMA is in first mode of first memory node')

Alexey Kardashevskiy Fri, 18 Apr 2014 06:31:36 -0700

On 04/18/2014 09:03 AM, Nishanth Aravamudan wrote:
> On 18.04.2014 [08:46:55 +1000], Benjamin Herrenschmidt wrote:
>> On Fri, 2014-04-18 at 08:43 +1000, Alexey Kardashevskiy wrote:
>>> On 04/18/2014 06:25 AM, Nishanth Aravamudan wrote:
>>>> Hi Alexey,
>>>>
>>>> Prior to the $SUBJECT commit, we could present memoryless node0s to
>>>> guests. Now, we indicate that we don't have the requisite 128M for the
>>>> RMA if node 0 has no memory. Note that a memoryless node0 is possible
>>>> under PowerVM (but not predictably present) so I was hoping to use KVM
>>>> to test relevant fixes for memoryless nodes.
>>>>
>>>> I think this change is a misinterpretation of the PAPR standard, though.
>>>> Yes, the RMA must be in the first block of memory, but that isn't
>>>> necessarily on node 0. The topology of a PAPR-compliant guest does not
>>>> require a node 0 (and in fact, under PowerVM, Linux doesn't actually
>>>> require node 0 either, but it would under KVM).
>>>>
>>>> Thoughts? I suppose it's fine to say that node 0 must be sufficiently
>>>> populated under KVM -- there's not really a reason to not have memory on
>>>> a given node (except maybe ballooning). I can keep the commit reverted
>>>> locally for testing purposes. Just wanted to see if the semantic change
>>>> was intentional.
>>>
>>>
>>> PAPR spec 2.7:
>>> C.6.6 Memory Node
>>> ===
>>> This section defines the PAPR modifications to the OF /memory node. In
>>> PAPR, the memory allocated to an OS image
>>> may be divided into a number of allocation units called ???regions??? or
>>> ???Logical Memory Blocks (LMB). An OS image
>>> may be dynamically allocated additional regions or may be asked to release
>>> regions. Each LMB is either represented in
>>> the device tree by its own /memory node or by an entry in
>>> /ibm,dynamic-reconfiguration-memory nodes
>>> (see Section C.6.6.2??? ???ibm,dynamic-reconfiguration-memory?????? on page 
>>> 1089).
>>> The /memory node that refers to the
>>> storage starting at real address zero (???reg??? property starting at the 
>>> value
>>> zero) always remains allocated to an OS image.
>>>
>>> The client program is initially loaded into this storage, called the RMA,
>>> that is represented by the first value of the
>>> ???reg??? property of this first /memory node.
>>> ===
>>>
>>> The last sentence is why the change was made. It does not say "first
>>> populated" node. I am adding Ben as he had very strong opinion about this
>>> thing.
>>
>> You are confusing device-tree node with NUMA nodes.
>>
>> Yes, it must be the LMB at address 0, which is the /memory node, but
>> that doesn't have to be NUMA node 0.
> 
> Yeah, so I think the check that was added:
> 
> -    if (spapr->rma_size > node0_size) {
> -        fprintf(stderr, "Error: Numa node 0 has to span the RMA 
> (%#08"HWADDR_PR
> -                spapr->rma_size);
> -        exit(1);
> -    }
> 
> incorrectly is checking against node0_size? It should be checking
> against the first LMB instead, right?


Which would be what in the current QEMU?

If I read the current QEMU code correctly, NUMA nodes and device tree nodes
are the same thing for SPAPR now, see spapr_populate_memory() function.



-- 
Alexey

Re: [Qemu-devel] Regression (?) due to c4177479 ('spapr: make sure RMA is in first mode of first memory node')

Reply via email to