Re: [Qemu-devel] [RFC PATCH v2] spapr: Enable use of huge pages

Alexey Kardashevskiy Thu, 10 Jul 2014 05:01:03 -0700

On 07/10/2014 08:29 PM, Alexander Graf wrote:
> 
> On 09.07.14 15:59, Alexey Kardashevskiy wrote:
>> On 07/09/2014 05:46 PM, Paolo Bonzini wrote:> Il 09/07/2014 07:57, Alexey
>> Kardashevskiy ha scritto:
>>>> 0b183fc87 "memory: move mem_path handling to
>>>> memory_region_allocate_system_memory" disabled -mempath use for all
>>>> machines that do not use memory_region_allocate_system_memory() to
>>>> register RAM. Since SPAPR uses memory_region_init_ram(), the huge pages
>>>> support was disabled for it.
>>>>
>>>> This replaces memory_region_init_ram()+vmstate_register_ram_global() with
>>>> memory_region_allocate_system_memory() to get huge pages back.
>>>>
>>>> Cc: Paolo Bonzini <pbonz...@redhat.com>
>>>> Cc: Hu Tao <hu...@cn.fujitsu.com>
>>>> Signed-off-by: Alexey Kardashevskiy <a...@ozlabs.ru>
>>>> ---
>>>>   hw/ppc/spapr.c | 4 ++--
>>>>   1 file changed, 2 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
>>>> index a23c0f0..8fa9f7e 100644
>>>> --- a/hw/ppc/spapr.c
>>>> +++ b/hw/ppc/spapr.c
>>>> @@ -1337,8 +1337,8 @@ static void ppc_spapr_init(MachineState *machine)
>>>>           ram_addr_t nonrma_base = rma_alloc_size;
>>>>           ram_addr_t nonrma_size = spapr->ram_limit - rma_alloc_size;
>>>>
>>>> -        memory_region_init_ram(ram, NULL, "ppc_spapr.ram", nonrma_size);
>>>> -        vmstate_register_ram_global(ram);
>>>> +        memory_region_allocate_system_memory(ram, NULL, "ppc_spapr.ram",
>>>> +                                             nonrma_size);
>>> The reason why I didn't do this in the simple way is that depending on the
>>> value of nonrma_base you may get smaller hugepages than you wanted.
>>>
>>> For example, if the hugepage size is 1G but nonrma_base is 32M, you will
>>> not be able to get a page size larger than 32M.
>>>
>>> Depending on the value of nonrma_base, it may be better to allocate the
>>> whole spapr->ram_limit to ppc_spapr.ram, and just ignore the first part
>>> of it.
>>>
>>> I see in target-ppc/kvm.c that rma_alloc_size is capped to 256M, and  in
>>> practice it is 128M (arch/powerpc/kvm/book3s_hv_builtin.c.  Considering
>>> that Linux overcommits so the memory isn't lost in the non-hugepage case, I
>>> think it's better to just waste the 128M of address space.
>>>
>>> Paolo
>>>
>>>>           memory_region_add_subregion(sysmem, nonrma_base, ram);
>>>>       }
>> Did you mean something like below? If so, I have to change MR tree and
>> place RMA under RAM, I guess.
>> I'll try to give it a try tomorrow on bare PPC970.
>>
>>
>>
>>
>> ---
>>   hw/ppc/spapr.c       | 19 ++++++++++++-------
>>   target-ppc/kvm.c     |  9 +--------
>>   target-ppc/kvm_ppc.h |  2 +-
>>   3 files changed, 14 insertions(+), 16 deletions(-)
>>
>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
>> index a23c0f0..47ae6c1 100644
>> --- a/hw/ppc/spapr.c
>> +++ b/hw/ppc/spapr.c
>> @@ -1223,6 +1223,7 @@ static void ppc_spapr_init(MachineState *machine)
>>       int i;
>>       MemoryRegion *sysmem = get_system_memory();
>>       MemoryRegion *ram = g_new(MemoryRegion, 1);
>> +    MemoryRegion *rma_region;
>>       hwaddr rma_alloc_size;
>>       hwaddr node0_size = (nb_numa_nodes > 1) ? numa_info[0].node_mem :
>> ram_size;
>>       uint32_t initrd_base = 0;
>> @@ -1230,6 +1231,7 @@ static void ppc_spapr_init(MachineState *machine)
>>       long load_limit, rtas_limit, fw_size;
>>       bool kernel_le = false;
>>       char *filename;
>> +    void *rma = NULL;
>>         msi_supported = true;
>>   @@ -1239,7 +1241,7 @@ static void ppc_spapr_init(MachineState *machine)
>>       cpu_ppc_hypercall = emulate_spapr_hypercall;
>>         /* Allocate RMA if necessary */
>> -    rma_alloc_size = kvmppc_alloc_rma("ppc_spapr.rma", sysmem);
>> +    rma_alloc_size = kvmppc_alloc_rma(&rma);
>>         if (rma_alloc_size == -1) {
>>           hw_error("qemu: Unable to create RMA\n");
>> @@ -1333,13 +1335,16 @@ static void ppc_spapr_init(MachineState *machine)
>>         /* allocate RAM */
>>       spapr->ram_limit = ram_size;
>> -    if (spapr->ram_limit > rma_alloc_size) {
>> -        ram_addr_t nonrma_base = rma_alloc_size;
>> -        ram_addr_t nonrma_size = spapr->ram_limit - rma_alloc_size;
>> +    memory_region_allocate_system_memory(ram, NULL, "ppc_spapr.ram",
>> +                                         spapr->ram_limit);
>> +    memory_region_add_subregion(sysmem, 0, ram);
>>   -        memory_region_init_ram(ram, NULL, "ppc_spapr.ram", nonrma_size);
>> -        vmstate_register_ram_global(ram);
>> -        memory_region_add_subregion(sysmem, nonrma_base, ram);
>> +    if (rma_alloc_size && rma) {
>> +        rma_region = g_new(MemoryRegion, 1);
>> +        memory_region_init_ram_ptr(rma_region, NULL, "ppc_spapr.rma",
>> +                                   rma_alloc_size, rma);
>> +        vmstate_register_ram_global(rma_region);
>> +        memory_region_add_subregion(sysmem, 0, rma_region);
>>       }
>>         filename = qemu_find_file(QEMU_FILE_TYPE_BIOS, "spapr-rtas.bin");
>> diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c
>> index 995706a..9ca14d2 100644
>> --- a/target-ppc/kvm.c
>> +++ b/target-ppc/kvm.c
>> @@ -1582,13 +1582,11 @@ int kvmppc_smt_threads(void)
>>   }
>>     #ifdef TARGET_PPC64
>> -off_t kvmppc_alloc_rma(const char *name, MemoryRegion *sysmem)
>> +off_t kvmppc_alloc_rma(void **rma)
>>   {
>> -    void *rma;
>>       off_t size;
>>       int fd;
>>       struct kvm_allocate_rma ret;
>> -    MemoryRegion *rma_region;
>>         /* If cap_ppc_rma == 0, contiguous RMA allocation is not supported
>>        * if cap_ppc_rma == 1, contiguous RMA allocation is supported, but
>> @@ -1617,11 +1615,6 @@ off_t kvmppc_alloc_rma(const char *name,
>> MemoryRegion *sysmem)
>>           return -1;
>>       };
>>   -    rma_region = g_new(MemoryRegion, 1);
>> -    memory_region_init_ram_ptr(rma_region, NULL, name, size, rma);
>> -    vmstate_register_ram_global(rma_region);
>> -    memory_region_add_subregion(sysmem, 0, rma_region);
>> -
> 
> I don't see where you set *rma here.


That patch was an RFC, this is why :)

> 
> Apart from that while I think that with hugetlbfs we might actually waste a
> few MB of RAM, I don't think it's a real problem for systems that require
> an RMA. So semantically the change works well for me. Please verify it
> works though :).

I managed to verify that it still works on PPC970 and will report the
patch(es) soon.



-- 
Alexey

Re: [Qemu-devel] [RFC PATCH v2] spapr: Enable use of huge pages

Reply via email to