On Wed, Jan 31, 2024 at 10:47:29AM +0530, Ani Sinha wrote:
> Date: Wed, 31 Jan 2024 10:47:29 +0530
> From: Ani Sinha <anisi...@redhat.com>
> Subject: Re: [PATCH v2] pc: q35: Bump max_cpus to 1856 vcpus
> 
> On Wed, Jan 31, 2024 at 9:27 AM Zhao Liu <zhao1....@intel.com> wrote:
> >
> > Hi Ani,
> >
> > On Wed, Jan 31, 2024 at 08:19:06AM +0530, Ani Sinha wrote:
> > > Date: Wed, 31 Jan 2024 08:19:06 +0530
> > > From: Ani Sinha <anisi...@redhat.com>
> > > Subject: [PATCH v2] pc: q35: Bump max_cpus to 1856 vcpus
> > > X-Mailer: git-send-email 2.42.0
> > >
> > > Since commit f10a570b093e6 ("KVM: x86: Add CONFIG_KVM_MAX_NR_VCPUS to 
> > > allow up to 4096 vCPUs")
> > > Linux kernel can support upto a maximum number of 4096 vCPUS when MAXSMP 
> > > is
> > > enabled in the kernel. At present, QEMU has been tested to correctly boot 
> > > a
> > > linux guest with 1856 vcpus and no more both with edk2 and seabios 
> > > firmwares.
> >
> > About background, could I ask if there will be Host machines with so
> > much CPUs? What are the benefits of vCPUs that far exceed the number
> > of Host CPUs?
> 
> Yes HPE has SAP HANA host machines with large numbers of physical
> cores and memory. For example QEMU was tested on a system with 3840
> cores.

Thanks! For such large system, does the vCPU need the CPU affinity, or
just let them run free on the Host's physical cores?

> 
> >
> > Thanks,
> > Zhao
> >
> > > If an additional vcpu is added, that is with 1857 vcpus, edk2 currently 
> > > fails
> > > with the following error messages:
> > >
> > > AllocatePages failed: No 0x400 Pages is available.
> > > There is only left 0x2BF pages memory resource to be allocated.
> > > ERROR: Out of aligned pages
> > > ASSERT 
> > > /builddir/build/BUILD/edk2-ba91d0292e/MdeModulePkg/Core/DxeIplPeim/X64/VirtualMemory.c(814):
> > >  BigPageAddress != 0
> > >
> > > This error exists only with edk2. Seabios currently can boot a linux guest
> > > fine with 4096 vcpus. Since the lowest common denominator for a working 
> > > VM for
> > > both edk2 and seabios is 1856 vcpus, bump up the value max_cpus to 1856 
> > > for q35
> > > machines versions 9 and newer. Q35 machines versions 8.2 and older 
> > > continue
> > > to support 1024 maximum vcpus as before for compatibility reasons.
> > >
> > > If KVM is not able to support the specified number of vcpus, QEMU would
> > > return the following error messages:
> > >
> > > $ ./qemu-system-x86_64 -cpu host -accel kvm -machine q35 -smp 1728

In practice, do users need to set the socket level topology and NUMA to
be consistent with Host for this large system?

NUMA settings are also related to topology, and it's better if NUMA is
also covered.

Thanks,
Zhao

> > > qemu-system-x86_64: -accel kvm: warning: Number of SMP cpus requested 
> > > (1728) exceeds the recommended cpus supported by KVM (12)
> > > qemu-system-x86_64: -accel kvm: warning: Number of hotpluggable cpus 
> > > requested (1728) exceeds the recommended cpus supported by KVM (12)
> > > Number of SMP cpus requested (1728) exceeds the maximum cpus supported by 
> > > KVM (1024)
> > >
> > > Cc: Daniel P. Berrangé <berra...@redhat.com>
> > > Cc: Igor Mammedov <imamm...@redhat.com>
> > > Cc: Michael S. Tsirkin <m...@redhat.com>
> > > Cc: Julia Suvorova <jus...@redhat.com>
> > > Cc: kra...@redhat.com
> > > Signed-off-by: Ani Sinha <anisi...@redhat.com>
> > > ---
> > >  hw/i386/pc_q35.c | 3 ++-
> > >  1 file changed, 2 insertions(+), 1 deletion(-)
> > >
> > > Changelog:
> > > v2: bump up the vcpu number to 1856. Add failure messages from ekd2 in
> > > the commit description.
> > > See also RH Jira https://issues.redhat.com/browse/RHEL-22202
> > >
> > > diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
> > > index f43d5142b8..f9c4b6594d 100644
> > > --- a/hw/i386/pc_q35.c
> > > +++ b/hw/i386/pc_q35.c
> > > @@ -375,7 +375,7 @@ static void pc_q35_machine_options(MachineClass *m)
> > >      m->default_nic = "e1000e";
> > >      m->default_kernel_irqchip_split = false;
> > >      m->no_floppy = 1;
> > > -    m->max_cpus = 1024;
> > > +    m->max_cpus = 1856;
> > >      m->no_parallel = !module_object_class_by_name(TYPE_ISA_PARALLEL);
> > >      machine_class_allow_dynamic_sysbus_dev(m, TYPE_AMD_IOMMU_DEVICE);
> > >      machine_class_allow_dynamic_sysbus_dev(m, TYPE_INTEL_IOMMU_DEVICE);
> > > @@ -396,6 +396,7 @@ static void pc_q35_8_2_machine_options(MachineClass 
> > > *m)
> > >  {
> > >      pc_q35_9_0_machine_options(m);
> > >      m->alias = NULL;
> > > +    m->max_cpus = 1024;
> > >      compat_props_add(m->compat_props, hw_compat_8_2, hw_compat_8_2_len);
> > >      compat_props_add(m->compat_props, pc_compat_8_2, pc_compat_8_2_len);
> > >  }
> > > --
> > > 2.42.0
> > >
> > >
> >
> 
> 

Reply via email to