From: lantianyu1...@gmail.com <lantianyu1...@gmail.com> Sent: Saturday, February 2, 2019 5:15 AM
I have a couple more comments .... > > +config HYPERV_IOMMU > + bool "Hyper-V IRQ Remapping Support" > + depends on HYPERV > + select IOMMU_API > + help > + Hyper-V stub IOMMU driver provides IRQ Remapping capability > + to run Linux guest with X2APIC mode on Hyper-V. > + > + I'm a little concerned about the terminology here. The comments and commit messages for these patches all say that Hyper-V guests don't have interrupt remapping support. And we don't really *need* interrupt remapping support because all the interrupts that should be nicely spread out across all vCPUs (i.e., the MSI interrupts for PCI pass-thru devices) are handled via a hypercall in pci-hyperv.c, and not via the virtual IOAPIC. So we have this stub IOMMU driver that doesn't actually do interrupt remapping. It just prevents assigning the very small number of non-performance sensitive IOAPIC interrupts to a CPU with an APIC ID above 255. With that background, describing this feature as "Hyper-V IRQ Remapping Support" seems incorrect, and similarly in the "help" description. Finding good terminology for this situation is hard. But how about narrowing the focus to x2APIC handling: bool "Hyper-V x2APIC IRQ Handling" ... help Stub IOMMU driver to handle IRQs as to allow Hyper-V Linux guests to run with x2APIC mode enabled > +static int hyperv_irq_remapping_alloc(struct irq_domain *domain, > + unsigned int virq, unsigned int nr_irqs, > + void *arg) > +{ > + struct irq_alloc_info *info = arg; > + struct irq_data *irq_data; > + struct irq_desc *desc; > + int ret = 0; > + > + if (!info || info->type != X86_IRQ_ALLOC_TYPE_IOAPIC || nr_irqs > 1) > + return -EINVAL; > + > + ret = irq_domain_alloc_irqs_parent(domain, virq, nr_irqs, arg); > + if (ret < 0) > + return ret; > + > + irq_data = irq_domain_get_irq_data(domain, virq); > + if (!irq_data) { > + irq_domain_free_irqs_common(domain, virq, nr_irqs); > + return -EINVAL; > + } > + > + irq_data->chip = &hyperv_ir_chip; > + > + /* > + * IOAPIC entry pointer is saved in chip_data to allow > + * hyperv_irq_remappng_activate()/hyperv_ir_set_affinity() to set > + * vector and dest_apicid. cfg->vector and cfg->dest_apicid are > + * ignorred when IRQ remapping is enabled. See ioapic_configure_entry(). > + */ > + irq_data->chip_data = info->ioapic_entry; > + > + /* > + * Hypver-V IO APIC irq affinity should be in the scope of > + * ioapic_max_cpumask because no irq remapping support. > + */ > + desc = irq_data_to_desc(irq_data); > + cpumask_and(desc->irq_common_data.affinity, > + desc->irq_common_data.affinity, > + &ioapic_max_cpumask); The intent of this cpumask_and() call is to ensure that IOAPIC interrupts are initially assigned to a CPU with APIC ID < 256. But do we know that the initial value of desc->irq_common_data.affinity is such that the result of the cpumask_and() will not be the empty set? My impression is that these local IOAPIC interrupts are assigned an initial affinity mask with all CPUs set, in which case this will work just fine. But I'm not sure if that is guaranteed. An alternative would be to set the affinity to ioapic_max_cpumask and overwrite whatever might have been previously specified. But I don't know if that's really better. > + > + return 0; > +}