On 11/29/21 5:57 PM, Claudio Fontana wrote: > On 11/29/21 4:11 PM, David Woodhouse wrote: >> On Mon, 2021-11-29 at 15:14 +0100, Claudio Fontana wrote: >>> On 11/29/21 12:39 PM, Woodhouse, David wrote: >>>> On Fri, 2021-07-23 at 13:29 +0200, Claudio Fontana wrote: >>>>> static void kvm_cpu_instance_init(CPUState *cs) >>>>> { >>>>> X86CPU *cpu = X86_CPU(cs); >>>>> + X86CPUClass *xcc = X86_CPU_GET_CLASS(cpu); >>>>> >>>>> host_cpu_instance_init(cpu); >>>>> >>>>> - if (!kvm_irqchip_in_kernel()) { >>>>> - x86_cpu_change_kvm_default("x2apic", "off"); >>>>> - } else if (kvm_irqchip_is_split() && kvm_enable_x2apic()) { >>>>> - x86_cpu_change_kvm_default("kvm-msi-ext-dest-id", "on"); >>>>> - } >>>>> - >>>>> - /* Special cases not set in the X86CPUDefinition structs: */ >>>>> + if (xcc->model) { >>>>> + /* only applies to builtin_x86_defs cpus */ >>>>> + if (!kvm_irqchip_in_kernel()) { >>>>> + x86_cpu_change_kvm_default("x2apic", "off"); >>>>> + } else if (kvm_irqchip_is_split() && kvm_enable_x2apic()) { >>>>> + x86_cpu_change_kvm_default("kvm-msi-ext-dest-id", "on"); >>>>> + } >>>>> >>>>> - x86_cpu_apply_props(cpu, kvm_default_props); >>>>> + /* Special cases not set in the X86CPUDefinition structs: */ >>>>> + x86_cpu_apply_props(cpu, kvm_default_props); >>>>> + } >>>>> >>>> >>>> I think this causes a regression in x2apic and kvm-msi-ext-dest-id >>>> support. If you start qemu thus: >>> >>> If I recall correctly, this change just tries to restore the behavior prior >>> to >>> commit f5cc5a5c168674f84bf061cdb307c2d25fba5448 , >>> >>> fixing the issue introduced with the refactoring at that time. >>> >>> Can you try bisecting prior to >>> f5cc5a5c168674f84bf061cdb307c2d25fba5448 , to see if the actual >>> breakage comes from somewhere else? >> >> Hm, so it looks like it never worked for '-cpu host' *until* commit >> f5cc5a5c16. > > Right, so here we are talking about properly supporting this for the first > time. > > The fact that it works with f5cc5a5c16 is more an accident than anything > else, that commit was clearly broken > (exemplified by reports of failed boots). > > So we need to find the proper solution, ie, exactly which features should be > enabled for which cpu classes and models. > >> >> It didn't matter before c1bb5418e3 because you couldn't enable that >> many vCPUs without an IOMMU, and the *IOMMU* setup would call >> kvm_enable_x2apic(). >> >> But after that, nothing ever called kvm_enable_x2apic() in the '-cpu >> host' case until commit f5cc5a5c16, which fixed it... until you >> restored the previous behaviour :) >> >> This "works" to fix this case, but presumably isn't correct: > > Right, we cannot just enable all this code, or the original refactor would > have been right. > > These kvm default properties have been as far as I know intended for the cpu > actual models (builtin_x86_defs), > and not for the special cpu classes max, host and base. This is what the > revert addresses. > > I suspect what we actually need here is to review exactly in which specific > cases kvm_enable_x2apic() should be called in the end. > > The code there is mixing changes to the kvm_default_props that are then > applied using x86_cpu_apply_props (and that part should be only for > xcc->model != NULL), > with the actual enablement of the kvm x2apic using kvm_vm_enable_cap(s, > KVM_CAP_X2APIC_API, 0, flags) via kvm_enable_x2apic(). > > One way is to ignore this detail and just move out those checks, since > changes to kvm_default_props are harmless once we skip the > x86_cpu_apply_props call, > as such: > > ------ > > static void kvm_cpu_instance_init(CPUState *cs) > { > X86CPU *cpu = X86_CPU(cs); > X86CPUClass *xcc = X86_CPU_GET_CLASS(cpu); > > host_cpu_instance_init(cpu); > > /* only applies to builtin_x86_defs cpus */ > if (!kvm_irqchip_in_kernel()) { > x86_cpu_change_kvm_default("x2apic", "off"); > } else if (kvm_irqchip_is_split() && kvm_enable_x2apic()) { > x86_cpu_change_kvm_default("kvm-msi-ext-dest-id", "on"); > } > > if (xcc->model) { > /* Special cases not set in the X86CPUDefinition structs: */ > x86_cpu_apply_props(cpu, kvm_default_props); > } > > if (cpu->max_features) { > kvm_cpu_max_instance_init(cpu); > } > > kvm_cpu_xsave_init(); > } > > ------ > > this might however cause further confusion later on, and I wonder if this is > actually correct, should we _always_ enable x2apic when > kvm_irqchip_is_split() returns true?
... and only when kvm_irqchip_is_split() ? > Even for cpu class "base"? I am not too sure. > > Another option that comes to mind is to add a call to enable x2apic for max > features cpus only ("host", "max") and not for base. > > Thoughts? Paolo, Edoardo, anything comes to mind from your side? > > Ciao, > > Claudio > > >> >> --- a/target/i386/kvm/kvm-cpu.c >> +++ b/target/i386/kvm/kvm-cpu.c >> @@ -161,7 +161,7 @@ static void kvm_cpu_instance_init(CPUState *cs) >> >> host_cpu_instance_init(cpu); >> >> - if (xcc->model) { >> + if (1 || xcc->model) { >> /* only applies to builtin_x86_defs cpus */ >> if (!kvm_irqchip_in_kernel()) { >> x86_cpu_change_kvm_default("x2apic", "off"); >> >> >>>> Any image to specifically test out? Would an actual 9 sockets machine be >>>> required to reproduce this? >> >> No, but the more CPUs you have in the host the less you have to wait >> for 288 vCPUs to spin up :) >> >> My test is: >> >> ./qemu-system-x86_64 -machine q35,accel=kvm,usb=off,kernel_irqchip=split >> -cpu host -m 2G -smp sockets=9,cores=16,threads=2 -drive >> file=/var/lib/libvirt/images/fedora.qcow2,if=virtio -serial mon:stdio >> -display none -kernel ~/git/linux/arch/x86/boot/bzImage -append >> "console=ttyS0,115200 root=/dev/vda1" >> >> >> I then play with the affinity of the AHCI MSI. Pointing it at CPU 255 >> should show the problem. >> >> [root@localhost ~]# cd /proc/irq/313 >> [root@localhost 313]# echo 255 > smp_affinity_list >> [root@localhost 313]# >> [ 65.365821] Composed MSI for APIC 255 vector 0x22: 0/feeff000 22 >> [root@localhost 313]# grep ahci /proc/interrupts >> >> >> I also added some debugging into host and guest kernels to be a little >> more explicit: >> >> diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c >> index b70344bf6600..53191db5145d 100644 >> --- a/arch/x86/kernel/apic/apic.c >> +++ b/arch/x86/kernel/apic/apic.c >> @@ -1866,6 +1866,7 @@ static __init void try_to_enable_x2apic(int remap_mode) >> * used for non-remapped IRQ domains. >> */ >> if (x86_init.hyper.msi_ext_dest_id()) { >> + pr_info("x2apic: support extended destination ID\n"); >> virt_ext_dest_id = 1; >> apic_limit = 32767; >> } >> @@ -2539,6 +2540,7 @@ void __irq_msi_compose_msg(struct irq_cfg *cfg, struct >> msi_msg *msg, >> msg->arch_addr_lo.virt_destid_8_14 = cfg->dest_apicid >> 8; >> else >> WARN_ON_ONCE(cfg->dest_apicid > 0xFF); >> + printk("Composed MSI for APIC %d vector 0x%x: %x/%x %x\n", >> cfg->dest_apicid, cfg->vector, msg->address_hi, msg->address_lo, msg->data); >> } >> >> u32 x86_msi_msg_get_destid(struct msi_msg *msg, bool extid) >> diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c >> index 59abbdad7729..f0a7715763a2 100644 >> --- a/arch/x86/kernel/kvm.c >> +++ b/arch/x86/kernel/kvm.c >> @@ -856,6 +856,8 @@ static void __init kvm_apic_init(void) >> >> static bool __init kvm_msi_ext_dest_id(void) >> { >> + printk("dest id? %d (%x)\n", >> kvm_para_has_feature(KVM_FEATURE_MSI_EXT_DEST_ID), >> + kvm_arch_para_features()); >> return kvm_para_has_feature(KVM_FEATURE_MSI_EXT_DEST_ID); >> } >> >> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c >> index 759952dd1222..defe6a843780 100644 >> --- a/arch/x86/kvm/lapic.c >> +++ b/arch/x86/kvm/lapic.c >> @@ -894,15 +894,21 @@ static bool kvm_apic_is_broadcast_dest(struct kvm >> *kvm, struct kvm_lapic **src, >> { >> if (kvm->arch.x2apic_broadcast_quirk_disabled) { >> if ((irq->dest_id == APIC_BROADCAST && >> - map->mode != KVM_APIC_MODE_X2APIC)) >> + map->mode != KVM_APIC_MODE_X2APIC)) { >> + printk("dest %d mode %d makes bcast\n", irq->dest_id, >> map->mode); >> return true; >> - if (irq->dest_id == X2APIC_BROADCAST) >> + } >> + if (irq->dest_id == X2APIC_BROADCAST) { >> + printk("Sent to X2APIC bcast\n"); >> return true; >> + } >> } else { >> bool x2apic_ipi = src && *src && apic_x2apic_mode(*src); >> if (irq->dest_id == (x2apic_ipi ? >> - X2APIC_BROADCAST : APIC_BROADCAST)) >> + X2APIC_BROADCAST : APIC_BROADCAST)) { >> + printk("no quirk dest %x\n", irq->dest_id); >> return true; >> + } >> } >> >> return false; >> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c >> index d8f1d2169b45..5b0fd6d37a7e 100644 >> --- a/arch/x86/kvm/x86.c >> +++ b/arch/x86/kvm/x86.c >> @@ -5714,6 +5714,7 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm, >> if (cap->args[0] & KVM_X2APIC_API_DISABLE_BROADCAST_QUIRK) >> kvm->arch.x2apic_broadcast_quirk_disabled = true; >> >> + printk("X2APIC API: %x\n", cap->args[0]); >> r = 0; >> break; >> case KVM_CAP_X86_DISABLE_EXITS: >> >