On Wed, Apr 26, 2017 at 12:07:01PM +0200, Laurent Vivier wrote: > When there is more nodes than memory available to put the minimum > allowed memory by node, all the memory is put on the last node. > > This is because we put (ram_size / nb_numa_nodes) & > ~((1 << mc->numa_mem_align_shift) - 1); on each node, and in this > case the value is 0. This is particularly true with pseries, > as the memory must be aligned to 256MB. > > To avoid this problem, this patch uses an error diffusion algorithm [1] > to distribute equally the memory on nodes.
Nice. But we need compat code to keep the previous behavior on older machine-types. We can use either a new boolean MachineClass field, or a MachineClass method (mc->auto_assign_ram(), maybe?) that 2.9 machine-types could override. > > Example: > > qemu-system-ppc64 -S -nographic -nodefaults -monitor stdio -m 1G -smp 8 \ > -numa node -numa node -numa node \ > -numa node -numa node -numa node > > Before: > > (qemu) info numa > 6 nodes > node 0 cpus: 0 6 > node 0 size: 0 MB > node 1 cpus: 1 7 > node 1 size: 0 MB > node 2 cpus: 2 > node 2 size: 0 MB > node 3 cpus: 3 > node 3 size: 0 MB > node 4 cpus: 4 > node 4 size: 0 MB > node 5 cpus: 5 > node 5 size: 1024 MB > > After: > (qemu) info numa > 6 nodes > node 0 cpus: 0 6 > node 0 size: 0 MB > node 1 cpus: 1 7 > node 1 size: 256 MB > node 2 cpus: 2 > node 2 size: 0 MB > node 3 cpus: 3 > node 3 size: 256 MB > node 4 cpus: 4 > node 4 size: 256 MB > node 5 cpus: 5 > node 5 size: 256 MB > > [1] https://en.wikipedia.org/wiki/Error_diffusion > > Signed-off-by: Laurent Vivier <lviv...@redhat.com> > --- > numa.c | 10 +++++++--- > 1 file changed, 7 insertions(+), 3 deletions(-) > > diff --git a/numa.c b/numa.c > index 6fc2393..bcf1c54 100644 > --- a/numa.c > +++ b/numa.c > @@ -336,15 +336,19 @@ void parse_numa_opts(MachineClass *mc) > } > } > if (i == nb_numa_nodes) { > - uint64_t usedmem = 0; > + uint64_t usedmem = 0, node_mem; > + uint64_t granularity = ram_size / nb_numa_nodes; > + uint64_t propagate = 0; > > /* Align each node according to the alignment > * requirements of the machine class > */ > for (i = 0; i < nb_numa_nodes - 1; i++) { > - numa_info[i].node_mem = (ram_size / nb_numa_nodes) & > + node_mem = (granularity + propagate) & > ~((1 << mc->numa_mem_align_shift) - > 1); > - usedmem += numa_info[i].node_mem; > + propagate = granularity + propagate - node_mem; > + numa_info[i].node_mem = node_mem; > + usedmem += node_mem; > } > numa_info[i].node_mem = ram_size - usedmem; > } > -- > 2.9.3 > -- Eduardo