Am 20.12.22 um 10:50 schrieb DERUMIER, Alexandre: >> >>> + my $blocksize = ($MAX_MEM - $static_memory) / 32000;>>> + #round >>> next power of 2 >>> + $blocksize = 2**(int(log($blocksize)/log(2))+1); >> >> What if log($blocksize)/log(2) is exactly an integer? Do we still >> want >> to add 1 then? If not, please use ceil(...) instead of int(...)+1. >> Well, >> I guess it can't happen in practice by what values are possible for >> $MAX_MEM and $static_memory, but still. >> > The math seem to be good to have power of 2 as blocksize > > log(1)/log(2):0 blocksize:2 > log(2)/log(2):1 blocksize:4
Yes, but here you'd use 4, when 2 would also be good. If that doesn't matter, feel free to keep it as-is; I was just wondering. It's likely not even relevant in practice, because with the values $MAX_MEM and $static_memory can take, I'm not sure we can even get integers for the argument to the first log()? > with a ceil, we don't have block begin to 2 > > log(1)/log(2):0 blocksize:1 > log(2)/log(2):1 blocksize:2 Isn't ($MAX_MEM - $static_memory) / 32000 always strictly greater than 1? And if it could get smaller than 1, we also might have issues with the int()+1 approach, because the result of the first log() will become negative. To be on the safe side we could just move the minimum check up: my $blocksize = ($MAX_MEM - $static_memory) / 32000; $blocksize = 2 if $blocksize < 2; $blocksize = 2**(ceil(log($blocksize)/log(2))); >>> + #2MB is the minimum to be aligned with THP >>> + $blocksize = 2 if $blocksize < 2; >> >> Nit: $blocksize is at least 2**1 after the previous caluclation, so >> this >> isn't really needed. > yes,indeed. > >> >>> + my $mem_device = "virtio-mem-pci,block- >>> size=${blocksize}M,requested-size=${node_mem}M,id=$id,memdev=mem- >>> $id,node=$i$pciaddr"; >>> + $mem_device .= ",prealloc=on" if $conf->{hugepages}; >> >> So prealloc=on for the device, but not prealloc=yes for the object >> below[0]. Would you mind explaining it to me? I just found the part >> mentioned for v7.0 here >> > oh, yes, sorry. > Just look here for details: > https://lists.gnu.org/archive/html/qemu-devel/2022-01/msg00902.html > "A common use case for hugetlb will be using "reserve=off,prealloc=off" > for > the memory backend and "prealloc=on" for the virtio-mem device. This > way, no huge pages will be reserved for the process, but we can recover > if there are no actual huge pages when plugging memory. Libvirt is > already prepared for this. > " > Thanks! Could you please add it to the commit message? >>> diff --git a/PVE/QemuServer/PCI.pm b/PVE/QemuServer/PCI.pm >>> index a18b974..0187c74 100644 >>> --- a/PVE/QemuServer/PCI.pm >>> +++ b/PVE/QemuServer/PCI.pm >>> @@ -249,6 +249,14 @@ sub get_pci_addr_map { >>> 'scsihw2' => { bus => 4, addr => 1 }, >>> 'scsihw3' => { bus => 4, addr => 2 }, >>> 'scsihw4' => { bus => 4, addr => 3 }, >>> + 'virtiomem0' => { bus => 4, addr => 4 }, >>> + 'virtiomem1' => { bus => 4, addr => 5 }, >>> + 'virtiomem2' => { bus => 4, addr => 6 }, >>> + 'virtiomem3' => { bus => 4, addr => 7 }, >>> + 'virtiomem4' => { bus => 4, addr => 8 }, >>> + 'virtiomem5' => { bus => 4, addr => 9 }, >>> + 'virtiomem6' => { bus => 4, addr => 10 }, >>> + 'virtiomem7' => { bus => 4, addr => 11 }, >> >> What if $conf->{sockets} > 8? Maybe mention the limitation in the >> description of the 'virtio' property in the 'memory' string. Is the >> plan >> to just add on more virtiomem PCI devices in the future? >> > Well, do we really support more than 8 sockets ? > I'm not sure than current numa implementation is working with more than > 8 socket, and in real world, I never have seen servers with more than 8 > sockets. (I think super-expensive hardware with 16 - 32 sockets is the > max) > AFAIK, The main usecase of sockets/numa node, is for auto > numabalancing, to try to map the qemu numa nodes to physical numa > nodes. > > If needed, we could a dedicated pci bridge (like for virtioscsi), with > 32 devices. > and use something like virtiomem0 = addr=1 , virtiomem1=addr=2 > > Okay, makes sense. You never know what the future brings, but if it'll take some time until we need to add more (and not that much more) I guess it's fine. _______________________________________________ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel