On Mon, Mar 14, 2022 at 12:59:38PM +0000, David Woodhouse wrote: > On Mon, 2022-03-14 at 11:35 +0100, Igor Mammedov wrote: > > On Fri, 11 Mar 2022 14:58:41 +0000 > > David Woodhouse < > > dw...@infradead.org > > > wrote: > > > > > On Fri, 2022-03-11 at 09:39 -0500, Igor Mammedov wrote: > > > > if VM is started with: > > > > > > > > -enable-kvm -smp 256 > > > > > > > > without specifying 'split' irqchip, VM might eventually boot > > > > but no more than 255 CPUs will be operational and following > > > > error messages in guest could be observed: > > > > ... > > > > smpboot: native_cpu_up: bad cpu 256 > > > > ... > > > > It's a regression introduced by [1], which removed dependency > > > > on intremap=on that were implicitly requiring 'split' irqchip > > > > and forgot to check for 'split' irqchip. > > > > Instead of letting VM boot a broken VM, error out and tell > > > > user how to fix CLI. > > > > > > Hm, wasn't that already fixed in the patches I posted in December? > > > > It might be, could you point to the commit/series that fixed it. > > https://lore.kernel.org/all/20211209220840.14889-1-dw...@infradead.org/ > is the patch I was thinking of, but although that moves the check to a > more useful place and fixes the X2APIC check, it *doesn't* include the > fix you're making; it's still using kvm_irqchip_in_kernel(). > > I can change that and repost the series, which is still sitting (with > fixed Reviewed-By/Acked-By attributions that I screwed up last time) in > https://git.infradead.org/users/dwmw2/qemu.git > > > Regardless of that, fixing it in recent kernels doesn't help > > as still supported kernels are still affected by it. > > > > If there is a way to detect that fix, I can add to q35 a compat > > property and an extra logic to enable kernel-irqchip if fix is present. > > Otherwise the fix does not exist until minimum supported kernel > > version reaches version where it was fixed. > > Hm, I'm not sure I follow here. Do you mean recent versions of *qemu* > when you say 'kernels'? > > I'm not even sure I agree with the observation that qemu should error > out here. The guest boots fine and the guest can even *use* all the > CPUs. IPIs etc. will all work fine. The only thing that doesn't work is > delivering *external* interrupts to CPUs above 254. > > Ultimately, this is the *guest's* problem. Some operating systems can > cope; some can't. > > The fact that *Linux* has a fundamental assumption that *all* CPUs can > receive all interrupts and that affinity can't be limited in hardware, > is a Linux problem. I tried to fix it once but it was distinctly non- > trivial and eventually I gave up and took a different approach. > https://lore.kernel.org/linux-iommu/87lfgj59mp....@nanos.tec.linutronix.de/T/ > > But even if we 'fix' the check as you suggest to bail out and refuse to > boot a certain configuration because Linux guest wouldn't be able to > fully utilize it... Even if we boot with the split IRQ chip and the 15- > bit MSI enlightenment, we're still in the same position. Some guests > will be able to use it; some won't. > > In fact, there are operating systems that don't even know about X2APIC. > > Why should qemu refuse to even start up?
We've generally said QEMU should not reject / block startup of valid hardware configurations, based on existance of bugs in certain guest OS, if the config would be valid for other guest. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|