virt: Add memory hotplugging and virtio-md-pci support

Daniel Henrique Barboza Fri, 24 May 2024 08:30:10 -0700



On 5/24/24 12:02, David Hildenbrand wrote:

On 24.05.24 15:14, Daniel Henrique Barboza wrote:



On 5/21/24 07:56, Björn Töpel wrote:

From: Björn Töpel <bj...@rivosinc.com>

Virtio-based memory devices (virtio-mem/virtio-pmem) allows for
dynamic resizing of virtual machine memory, and requires proper
hotplugging (add/remove) support to work.

Add device memory support for RISC-V "virt" machine, and enable
virtio-md-pci with the corresponding missing hotplugging callbacks.

Signed-off-by: Björn Töpel <bj...@rivosinc.com>
---
   hw/riscv/Kconfig       |  2 +
   hw/riscv/virt.c        | 83 +++++++++++++++++++++++++++++++++++++++++-
   hw/virtio/virtio-mem.c |  5 ++-
   3 files changed, 87 insertions(+), 3 deletions(-)

diff --git a/hw/riscv/Kconfig b/hw/riscv/Kconfig
index a2030e3a6ff0..08f82dbb681a 100644
--- a/hw/riscv/Kconfig
+++ b/hw/riscv/Kconfig
@@ -56,6 +56,8 @@ config RISCV_VIRT
       select PLATFORM_BUS
       select ACPI
       select ACPI_PCI
+    select VIRTIO_MEM_SUPPORTED
+    select VIRTIO_PMEM_SUPPORTED
   config SHAKTI_C
       bool
diff --git a/hw/riscv/virt.c b/hw/riscv/virt.c
index 4fdb66052587..443902f919d2 100644
--- a/hw/riscv/virt.c
+++ b/hw/riscv/virt.c
@@ -53,6 +53,8 @@
   #include "hw/pci-host/gpex.h"
   #include "hw/display/ramfb.h"
   #include "hw/acpi/aml-build.h"
+#include "hw/mem/memory-device.h"
+#include "hw/virtio/virtio-mem-pci.h"
   #include "qapi/qapi-visit-common.h"
   #include "hw/virtio/virtio-iommu.h"
@@ -1407,6 +1409,7 @@ static void virt_machine_init(MachineState *machine)
       DeviceState *mmio_irqchip, *virtio_irqchip, *pcie_irqchip;
       int i, base_hartid, hart_count;
       int socket_count = riscv_socket_count(machine);
+    hwaddr device_memory_base, device_memory_size;
       /* Check socket count limit */
       if (VIRT_SOCKETS_MAX < socket_count) {
@@ -1420,6 +1423,12 @@ static void virt_machine_init(MachineState *machine)
           exit(1);
       }
+    if (machine->ram_slots > ACPI_MAX_RAM_SLOTS) {
+        error_report("unsupported amount of memory slots: %"PRIu64,
+                     machine->ram_slots);
+        exit(EXIT_FAILURE);
+    }
+
       /* Initialize sockets */
       mmio_irqchip = virtio_irqchip = pcie_irqchip = NULL;
       for (i = 0; i < socket_count; i++) {
@@ -1553,6 +1562,37 @@ static void virt_machine_init(MachineState *machine)
       memory_region_add_subregion(system_memory, memmap[VIRT_MROM].base,
                                   mask_rom);
+    /* device memory */
+    device_memory_base = ROUND_UP(s->memmap[VIRT_DRAM].base + 
machine->ram_size,
+                                  GiB);
+    device_memory_size = machine->maxram_size - machine->ram_size;
+    if (device_memory_size > 0) {
+        /*
+         * Each DIMM is aligned based on the backend's alignment value.
+         * Assume max 1G hugepage alignment per slot.
+         */
+        device_memory_size += machine->ram_slots * GiB;


We don't need to align to 1GiB. This calc can use 2MiB instead (or 4MiB if we're
running 32 bits).

+
+        if (riscv_is_32bit(&s->soc[0])) {
+            hwaddr memtop = device_memory_base + ROUND_UP(device_memory_size,
+                                                          GiB);


Same here - alignment is 2/4 MiB.

+
+            if (memtop > UINT32_MAX) {
+                error_report("memory exceeds 32-bit limit by %lu bytes",
+                             memtop - UINT32_MAX);
+                exit(EXIT_FAILURE);
+            }
+        }
+
+        if (device_memory_base + device_memory_size < device_memory_size) {
+            error_report("unsupported amount of device memory");
+            exit(EXIT_FAILURE);
+        }


Took another look and found this a bit strange. These are all unsigned vars, so
if (unsigned a + unsigned b < unsigned b) will always be 'false'. The compiler 
is
probably cropping this out.


No. Unsigned interger overflow is defined behavior and this is a common check 
to detect such overflow. tI's consistent with what we do for other 
architectures.


Oh, ok. We're so far away from UINT64_MAX that it didn't occur to me doing an 
overflow
check here. Fair enough.


The calc we need to do is to ensure that the extra ram_slots * alignment will 
fit into
the VIRT_DRAM block, i.e. maxram_size + (ram_slots * alignment) < 
memmap[VIRT_DRAM].size.


TBH I'm starting to have second thoughts about letting users hotplug whatever 
they want.
It seems cleaner to just force the 2/4 Mb alignment in pre_plug() and be done 
with it,
no need to allocate ram_slots * alignment and doing all these extra checks.


It's worth noting that if user space decides to specify addresses manually, it 
can mess up everything already. There are other events that can result in 
fragmentation of the memory device area (repeated hot(un)plug of differing 
DIMMs).

Assume you have 1 GiB range and hotplug a 512 MiB DIMM at offset 256 MiB. You 
won't be able to hotplug another 512 MiB DIMM even though we reserved space.

My take so far is: if the user wants to do such stuff it should size the area 
(maxmem) much larger or deal with the concequences (not being able to hotplug 
memory).

It usually does not happen in practice ...


I wonder if we should forbid users from removing memory that is 'out of place', 
i.e. users
should always remove pc-dimms in LIFO order. Usually this kind of control is 
done by
management, e.g. libvirt, but if we're not sure we'll keep consistency during 
memory
unplugs ...


As I sent in an earlier email, users must already comply to the alignment of 
the host
memory when plugging pc-dimms, so I'm not sure our value/proposition with all 
this
extra code is worth it - the alignment will most likely be forced by the host 
memory
backend, so might as well force ourselves in pre_plug().


At least on x86-64, the 2 MiB alignment requirement is handled automatically. 
QEMU_VMALLOC_ALIGN implicitly enforces that.

Maybe RISCV also wants to set QEMU_VMALLOC_ALIGN?


Might be a good idea to experiment.



Thanks,


Daniel

Re: [PATCH v2 1/3] hw/riscv/virt: Add memory hotplugging and virtio-md-pci support

Reply via email to